ebook img

The geometry of information retrieval PDF

163 Pages·2004·0.59 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The geometry of information retrieval

TheGeometryofInformationRetrieval Informationretrieval,IR,isthescienceofextractinginformationfromdocuments.It canbeviewedinanumberofways:logical,probabilisticandvectorspacemodelsare someofthemostimportant.Inthisbook,theauthor,oneoftheleadingresearchers inthearea,showshowthesethreeviewscanbecombinedinonemathematical framework,theveryoneusedtoformulatethegeneralprinciplesofquantum mechanics.Usingthisframework,vanRijsbergenpresentsanewtheoryforthe foundationsofIR,inparticularanewtheoryofmeasurement.Heshowshowa documentcanberepresentedasavectorinHilbertspace,andthedocument’s relevancebyanHermitianoperator.Alltheusualquantum-mechanicalnotions,such asuncertainty,superpositionandobservable,havetheirIR-theoreticanalogues.Butthe approachismorethanjustanalogy:thestandardtheoremscanbeappliedtoaddress problemsinIR,suchaspseudo-relevancefeedback,relevancefeedbackandostensive retrieval.Therelationwithquantumcomputingisalsoexamined.Tohelpkeepthe bookself-contained,appendiceswithbackgroundmaterialonphysicsandmathematics areincluded,andeachchapterendswithsomesuggestionsforfurtherreading.Thisis animportantbookforallthoseworkinginIR,AIandnaturallanguageprocessing. Keith van Rijsbergen’sresearchhas,since1969,beendevotedtoinformation retrieval,workingonboththeoreticalandexperimentalaspects.Hiscurrentresearchis concernedwiththedesignofappropriatelogicstomodeltheflowofinformationand theapplicationofHilbertspacetheorytocontent-basedIR.Thisishisthirdbookon IR:hisfirstisnowregardedastheclassictextinthearea.Inadditionhehaspublished over100researchpapersandisaregularspeakeratmajorIRconferences.Keithisa FellowoftheIEE,BCS,ACM,andtheRoyalSocietyofEdinburgh.In1993hewas appointedEditor-in-ChiefofTheComputerJournal,anappointmenthehelduntil 2000.HeisanassociateeditorofInformationProcessingandManagement,onthe editorialboardofInformationRetrieval,andontheadvisoryboardoftheJournalof WebSemantics.Hehasservedasaprogrammecommitteememberandeditorialboard memberofthemajorIRconferencesandjournals.Heisanon-executivedirectorofa start-up:VirtualMirrorsLtd. The Geometry of Information Retrieval C. J. VAN RIJSBERGEN cambridge university press Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge cb2 2ru, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521838054 © C. J. van Rijsbergen 2004 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2004 isbn-13 978-0-511-21675-6 eBook (NetLibrary) isbn-10 0-511-21675-0 eBook (NetLibrary) isbn-13 978-0-521-83805-4 hardback isbn-10 0-521-83805-3 hardback Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. Tomakeastart, Outofparticulars Andmakethemgeneral,rolling Upthesum,bydefectivemeans Paterson:BookI WilliamCarlosWilliams,1992 for Nicola Contents Preface page ix Prologue 1 1 Introduction 15 2 OnsetsandkindsforIR 28 3 VectorandHilbertspaces 41 4 Lineartransformations,operatorsandmatrices 50 5 ConditionallogicinIR 62 6 ThegeometryofIR 73 AppendixI Linearalgebra 101 AppendixII Quantummechanics 109 AppendixIII Probability 116 Bibliography 120 Authorindex 145 Index 148 vii Preface Thisbookbeginsandendsininformationretrieval,buttravelsthrougharoute constructedinanabstractway.Inparticularitgoesthroughsomeofthemost interestingandimportantmodelsforinformationretrieval,avectorspacemodel, a probabilistic model and a logical model, and shows how these three and possiblyotherscanbedescribedandrepresentedinHilbertspace.Thereasoning thatoccurswithineachoneofthesemodelsisformulatedalgebraicallyandcan beshowntodependessentiallyonthegeometryoftheinformationspace.The geometry can be seen as a ‘language’ for expressing the different models of informationretrieval. The approach taken is to structure these developments firmly in terms of the mathematics of Hilbert spaces and linear operators. This is of course the approachusedinquantummechanics.Itisremarkablethattheapplicationof Hilbertspacemathematicstoinformationretrievalisverysimilartoitsappli- cationtoquantummechanics.AdocumentinIRcanberepresentedasavector inHilbertspace,andanobservablesuchas‘relevance’or‘aboutness’canbe representedbyaHermitianoperator.However,thisisemphaticallynotabook aboutquantummechanicsbutaboutusingthesamelanguage,themathematical languageofquantummechanics,forthedescriptionofinformationretrieval.It turnsouttobeveryconvenientthatquantummechanicsprovidesaready-made interpretationofthislanguage.Itisasifinphysicswehaveanexampleseman- ticsforthelanguage,andassuchitwillbeusedextensivelytomotivateasimilar butdifferentinterpretationforIR.Weintroduceanappropriatelogicandprob- abilitytheoryforinformationspacesguidedbytheirintroductionintoquantum mechanics. Gleason’s Theorem, which specifies an algorithm for computing probabilities associated with subspaces in Hilbert space, is of critical impor- tanceinquantummechanicsandwillturnouttobecentralforthesamereasons in information retrieval. Whereas quantum theory is about a theory of mea- surementfornaturalsystems,TheGeometryofInformationRetrievalisabout ix

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.