ebook img

Foundations of Statistical Natural Language Processing PDF

704 Pages·1999·11.276 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Foundations of Statistical Natural Language Processing

Foundations of Statistical Natural Language Processing E0123734 Christopher D. Manning Hinrich eztiihcS ehT TIMsserP Cambridge, Massachusetts London, England Second printing, 1999 0 9991 sttesuhcassaM etutitsnI foygolonhceT dnoceS gnitnirp htiw ,snoitcerroc0002 llA sthgir .devreser oN trap fo siht koob yam eb decudorper ni yna mrof ybyna cinortcele ro lacinahcem snaem gnidulcni( ,gniypocotohp ,gnidrocer ro-amrofni noit egarots dna )laveirter tuohtiw noissimrep ni gnitirw morf eht.rehsilbup tesepyT ni 31/ol adicuL thgirB yb eht srohtua gnisu .E2XPTE detnirP dna dnuob ni eht detinU setatS fo.aciremA yrarbiL fo ssergnoC noitacilbuP-ni-gnigolataCnoitamrofnI ,gninnaM rehpotsirhC.D snoitadnuoF fo lacitsitats larutan egaugnal gnissecorp / rehpotsirhC.D ,gninnaM hcirniH.eztuhcS .p .mc sedulcnI lacihpargoilbib secnerefer.p( ) dna.xedni ISBN 0-262-13360-l 1. Computational linguistics-Statistical methods. I. Schutze, Hinrich. .II.eltiT 63M38S.5.89P 1999 12cd-582.’014 99-21137 PIC Brief Contents I Preliminaries 1 1 Introduction 3 2 Mathematical Foundations 39 3 Linguistic Essentials 81 4 Corpus-Based Work 117 II Words 149 5 Collocations 151 6 Statistical Inference: n-gram Models over Sparse Data 191 7 Word Sense Disambiguation 229 8 Lexical Acquisition 265 III Grammar 315 9 Markov Models 317 10 Part-of-Speech Tagging 341 11 Probabilistic Context Free Grammars 381 12 Probabilistic Parsing 407 Iv Applications and Techniques 461 13 Statistical Alignment and Machine Translation 463 14 Clustering 495 15 Topics in Information Retrieval 529 16 Text Categorization 575 Contents List of Tables xv List of Figures xxi Table of Notations xxv Preface rodx Road Map mxv I Preliminaries 1 1 Introduction 3 1.1 tsilanoitaR dna tsiciripmE sehcaorppA ot egaugnaL 4 1.2 Scientific Content 7 1.2.1 Questions that linguistics should answer 8 1.2.2 Non-categorical phenomena in language 11 1.2.3 egaugnaL dna noitingoc sacitsilibaborp phenomena 15 1.3 eyhtTiugib mf:Aoegaug nyahLW PLN stIluciffiD 17 1.4 Dirty Hands 19 1.4.1 Lexical resources 19 1.4.2 Word counts 20 1.4.3 Zipf’s laws 23 1.4.4 Collocations 29 1.4.5 Concordances 31 1.5 Further Reading 34 . Contents Vlll 6.1 sesicrexE 53 2 Mathematical Foundations 39 2.1 Elementary Probability Theory 40 2.1.1 Probability spaces 40 2.1.2 lanoitidnoC ytilibaborp dna ecnednepedni 42 2.1.3 Bayes’ theorem 43 2.1.4 modnaR selbairav 45 2.1.5 noitatcepxE dna ecnairav 46 2.1.6 Notation 4 7 2.1.7 tnioJ dna lanoitidnoc snoitubirtsid 48 2.1.8 Determining P 48 2.1.9 Standard distributions 50 2.1.10 Bayesian statistics 54 2.1.11 Exercises 59 2.2 Essential Information Theory 60 2.2.1 Entropy 61 2.2.2 Joint entropy and conditional entropy 63 2.2.3 Mutual information 66 2.2.4 The noisy channel model 68 2.2.5 evitaleR yportne ro relbieL-kcablluKecnegrevid 72 6.2.2 ehT noitaler ot :egaugnal ssorC yportne 73 2.2.7 The entropy of English 76 2.2.8 Perplexity 78 2.2.9 Exercises 78 2.3 Further Reading 79 3 Linguistic Essentials 81 3.1 Parts of Speech and Morphology 8 1 1.1.3 snuoN dna 38 pronouns 3.1.2 sdroW taht ynapmocca :snuon srenimreteDdna adjectives 87 3.1.3 Verbs 88 3.1.4 Other parts of speech 91 3.2 Phrase Structure 93 1.2.3 esarhP erutcurts 69 grammars 3.2.2 Dependency: Arguments and adjuncts 101 3.2.3 ’X yroeht 601 3.2.4 Phrase structure ambiguity 107 Contents ix 3.3 Semantics and scitamgarP 109 3.4 Other Areas 112 3.5 Further Reading 113 3.6 Exercises 114 4 Corpus-Based Work 117 4.1 Getting Set Up 118 4.1.1 Computers 118 4.1.2 Corpora 118 4.1.3 Software 120 4.2 Looking at Text 123 4.2.1 Low-level formatting issues 123 2.2.4 :noitazinekoT tahW si a ?drow 124 4.2.3 Morphology 131 4.2.4 Sentences 134 4.3 Marked-up Data 136 4.3.1 Markup schemes 137 4.3.2 Grammatical tagging 139 4.4 Further Reading 145 4.5 Exercises 147 II Words 149 5 Collocations 151 5.1 Frequency 153 5.2 naeM dnaecnairaV 157 5.3 Hypothesis Testing 162 5.3.1 The t test 163 5.3.2 Hypothesis testing of differences 166 5.3.3 Pearson’s chi-square test 169 5.3.4 Likelihood ratios 172 5.4 Mutual Information 178 5.5 The Notion of Collocation 183 5.6 Further Reading 187 6 Statistical Inference: n -gram Models over Sparse Data 191 6.1 Bins: Forming Equivalence Classes 192 6.1.1 Reliability vs. discrimination 192 2.1.6 sledomn-gram 291 Contents 3.1.6 gnidliuB sledom n-gram 195 6.2 Statistical Estimators 196 6.2.1 mumixaM doohilekiL noitamitsE 197 (MLE) 6.2.2 s’ecalpaL ,wal s’enotsdiL wal dnaeht Jeffreys-Perks law 202 6.2.3 Held out estimation 205 6.2.4 noitadilav-ssorC deteled( )noitamitse 210 6.2.5 Good-Turing estimation 212 6.2.6 Briefly noted 216 6.3 Combining Estimators 217 6.3.1 elpmiS raenil noitalopretni 218 6.3.2 Katz’s backing-off 219 6.3.3 lareneG raenil noitalopretni 220 6.3.4 Briefly noted 222 6.3.5 egaugnaL sledom rofnetsuA 223 6.4 Conclusions 224 6.5 Further Reading 225 6.6 Exercises 225 7 Word Sense Disambiguation 229 7.1 Methodological Preliminaries 232 7.1.1 desivrepuS dna desivrepusnu gninrael 232 7.1.2 Pseudowords 233 7.1.3 reppU dna rewol sdnuob noecnamrofrep 233 7.2 Supervised Disambiguation 235 7.2.1 Bayesian classification 235 7.2.2 An information-theoretic approach 239 7.3 Dictionary-Based Disambiguation 241 7.3.1 Disambiguation based on sense definitions 242 7.3.2 Thesaurus-based disambiguation 244 7.3.3 noitaugibmasiD desab no snoitalsnart ni a second-language corpus 247 7.3.4 enO esnes rep ,esruocsid eno esnesrep collocation 249 7.4 Unsupervised Disambiguation 252 7.5 What Is a Word Sense? 256 6.7 Further Reading 260 7.7 Exercises 262 Contents xi 8 Lexical Acquisition 265 Evaluation Measures 267 8.1 8.2 Verb Subcategorization 271 8.3 Attachment Ambiguity 278 8.3.1 Hindle and Rooth (1993) 280 8.3.2 General remarks on PP attachment 284 8.4 Selectional Preferences 288 8.5 Semantic ytiralimiS 294 8.5.1 Vector measures 296 space 2.5.8 Probabilistic measures 303 8.6 ehT eloR fo lacixe LnoitisiuqcA ni lacitsitatSPLN 308 8.7 Further Reading 312 III Grammar 315 9 Markov Models 317 Markov Models 318 9.1 9.2 Hidden Markov Models 320 9.2.1 Why use ?sMMH 322 9.2.2 General form of an HMM 324 3.9 ehT e elraht TnsenmoaidtnsueFuQ rof sMMH 325 1.3.9 gnidniF eht ytilibaborp fo nanoitavresbo 326 9.3.2 Finding the best state sequence 331 9.3.3 The third problem: Parameter estimation 333 9.4 :sMMH Implementation, Properties, and Variants 336 9.4.1 Implementation 336 9.4.2 Variants 337 9.4.3 Multiple input observations 338 9.4.4 Initialization of parameter values 339 9.5 Further Reading 339 10Part-of-Speech Tagging 341 10.1 The Information Sources in Tagging 343 10.2 Markov Model Taggers 345 10.2.1 The probabilistic model 345 10.2.2 The Viterbi algorithm 349 10.2.3 Variations 351 10.3 Hidden Markov Model Taggers 356 xii Contents 10.3.1 Applying sMMH to POS tagging 357 10.32 ehT tceffe fo noitazilaitini no MMHgniniart 359 4.01 Transformation-Based Learning of Tags 361 10.4.1 Transformations 362 10.4.2 The learning algorithm 364 10.4.3 Relation to other models 365 10.4.4 Automata 367 10.4.5 Summary 369 10.5 Other Methods, Other Languages 370 10.5.1 Other approaches to tagging 370 10.5.2 Languages other than English 371 6.01 gniggaT ycaruccA dna sesU fosreggaT 371 10.6.1 Tagging 371 accuracy 10.6.2 Applications of tagging 374 10.7 Further Reading 377 8.01 Exercises 379 11 Probabilistic Context Free Grammars 381 11.1 Some Features of PCFGs 386 11.2 Questions for PCFGs 388 11.3 The Probability of a String 392 11.3.1 Using inside probabilities 392 11.3.2 Using outside probabilities 394 3.3.11 gnidniF eht tsom ylekil esrap rof aecnetnes 396 11.3.4 Training a PCFG 398 4.11 smelborP htiw eht edistuO-edisnImhtiroglA 401 11.5 Further Reading 402 11.6 Exercises 404 12 Probabilistic Parsing 407 12.1 Some Concepts 408 12.1.1 Parsing for disambiguation 408 12.1.2 Treebanks 412 12.1.3 Parsing models vs. language models 414 4.1.21 gninekaeW eht ecnednepedni snoitpmussa fo PCFGs 416 5.1.21 eerT seitilibaborp dna lanoitavired seitilibaborp 421 12.1.6 s’erehT erom naht eno yaw ot odti 423 . . Contents Xl11 121.7 Phrase structure grammars and dependency grammars 428 12.1.8 Evaluation 431 12.1.9 Equivalent models 437 12.1.10 Building Search methods 439 parsers: 12.1.11 Use of the geometric mean 442 12.2 Some Approaches 443 1.2.21 dezilacixel-noN grammars knabeert 443 2.2.21 dezilacixeL sledom gnisu lanoitaviredseirotsih 448 12.2.3 Dependency-based models 451 12.2.4 Discussion 454 12.3 Further Reading 456 4.21 Exercises 458 IV Applications and Techniques 461 13 Statistical Alignment and Machine Translation 463 13.1 Text Alignment 466 13.1.1 Aligning sentences and paragraphs 467 13.1.2 Length-based methods 471 3.1.31 tesffO tnemngila yb langisgnissecorp techniques 475 13.1.4 Lexical methods of sentence alignment 478 13.1.5 Summary 484 13.1.6 Exercises 484 13.2 Word Alignment 484 13.3 Statistical Machine Translation 486 13.4 Further Reading 492 14 Clustering 495 14.1 Hierarchical Clustering 500 14.1.1 Single-link and complete-link clustering 503 14.1.2 Group-average agglomerative clustering 507 3.1.41 nA :noitacilppa gnivorpmI a egaugnalledom 509 14.1.4 Top-down clustering 512 14.2 Non-Hierarchical Clustering 514 14.2.1 K-means 515 14.2.2 The EM algorithm 518 14.3 Further Reading 527

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.