University of Colorado, Boulder CU Scholar Computer Science Graduate Theses & Dissertations Computer Science Spring 1-1-2012 Optimization of Natural Language Processing Components for Robustness and Scalability Jinho D. Choi University of Colorado at Boulder, [email protected] Follow this and additional works at:https://scholar.colorado.edu/csci_gradetds Part of theArtificial Intelligence and Robotics Commons, and theLinguistics Commons Recommended Citation Choi, Jinho D., "Optimization of Natural Language Processing Components for Robustness and Scalability" (2012).Computer Science Graduate Theses & Dissertations. 46. https://scholar.colorado.edu/csci_gradetds/46 This Dissertation is brought to you for free and open access by Computer Science at CU Scholar. It has been accepted for inclusion in Computer Science Graduate Theses & Dissertations by an authorized administrator of CU Scholar. For more information, please contact [email protected]. Optimization of Natural Language Processing Components for Robustness and Scalability by Jinho D. Choi B.A., Coe College, 2002 M.S.E., University of Pennsylvania, 2003 A thesis submitted to the Faculty of the Graduate School of the University of Colorado Boulder in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science 2012 This thesis entitled: Optimization of Natural Language Processing Components for Robustness and Scalability written by Jinho D. Choi has been approved for the Department of Computer Science Martha Palmer James Martin Wayne Ward Bhuvana Narasimhan Joakim Nivre Nicolas Nicolov Date The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline. iii Choi, Jinho D. (Ph.D., Computer Science) Optimization of Natural Language Processing Components for Robustness and Scalability Thesis directed by Dr. Martha Palmer This thesis focuses on the optimization of nlp components for robustness and scalability. Three kindsof nlpcomponentsareusedforourexperiments,apart-of-speechtagger,adependencyparser, and a semantic role labeler. For part-of-speech tagging, dynamic model selection is introduced. Our dynamic model selection approach builds two models, domain-specific and generalized models, and selectsoneofthemduringdecodingbycomparingsimilaritiesbetweenlexicalitemsusedforbuilding these models and input sentences. As a result, it gives robust tagging accuracy across corpora and shows fast tagging speed. For dependency parsing, a new transition-based parsing algorithm and a bootstrapping technique are introduced. Our parsing algorithm learns both projective and non-projective transitions so it can generate both projective and non-projective dependency trees yet shows linear time parsing speed on average. Our bootstrapping technique bootstraps parse information used as features for transition-based parsing, and shows significant improvement for parsingaccuracy. Forsemanticrolelabeling,aconditionalhigher-orderargumentpruningalgorithm is introduced. A higher-order pruning algorithm improves the coverage of argument candidates and shows improvement on the overall F1-score. The conditional higher-order pruning algorithm also noticeably reduces average labeling complexity with minimal reduction in F1-score. For all experiments, two sets of training data are used; one is from the Wall Street Journal corpus, and the other is from the OntoNotes corpora. All components are evaluated on 9 different genres, which are grouped separately for in-genre and out-of-genre experiments. Our experiments show that our approach gives higher accuracies compared to other state-of-the-art nlp components, and runs fast, taking about 3-4 milliseconds per sentence for processing all three components. All components are publicly available as an open source project, called ClearNLP. We believe that this project is beneficial for many nlp tasks that need to process large-scale heterogeneous data. Dedication This thesis is dedicated to my parents, Youngsang Choi and Kyungsook Lee, my American grand- parents, Thomas and Elizabeth Janik, and my sister, Jeany Choi, who have loved, supported, and encouraged me ever since I was born. It is also dedicated to my prayer partner, Kyungshim Lee, and my future wife, Juyoung Choi, who have prayed for me constantly. v Acknowledgements We gratefully acknowledge the support of the following grants. Any contents expressed in this material are those of the authors and do not necessarily reflect the views of any grant. • The National Science Foundation Grants IIS-0325646, Domain Independent Semantic Pars- ing,CISE-CRI-0551615,TowardsaComprehensiveLinguisticAnnotation,CISE-CRI0709167, Collaborative: AMulti-RepresentationalandMulti-LayeredTreebankforHindi/Urdu,CISE- IIS-RI-0910992, Richer Representations for Machine Translation. • A grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc. • AsubcontractfromtheMayoClinicandHarvardChildren’sHospitalbasedonagrantfrom the ONC, 90TR0002/01. • Strategic Health Advanced Research Project Area 4: Natural Language Processing. vi Contents Chapter 1 Introduction 1 1.1 Research questions and objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Part-of-speech tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Dependency parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.3 Semantic role labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Overall framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Thesis statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Dependency Conversion 9 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Mapping empty categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.1 Wh-movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.2 Topicalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 Right node raising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.4 Discontinuous constituent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 vii 2.3 Finding dependency heads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.1 Head-finding rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.2 Apposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.3 Coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.4 Small clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4 Assigning dependency labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.4.1 Clear dependency labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.4.2 Dependency label heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.4.3 Comparison to the Stanford dependency approach . . . . . . . . . . . . . . . 34 2.5 Adding secondary dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5.1 GAP: gapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5.2 REF: referent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5.3 RNR: right node raising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5.4 XSUBJ: open clausal subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.6 Adding function tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.6.1 SEM: semantic function tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.6.2 SYN: syntactic function tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3 Experimental setup 44 3.1 Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2 Machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4 Part-of-speech Tagging 50 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3 Dynamic model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.2 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 viii 4.4 Tagging algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.5 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.6 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.7.1 Accuracy comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.7.2 Speed comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5 Dependency Parsing 62 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.2 Transition-based dependency parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.2.1 Transition decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.2.2 Transition recomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.2.3 Parsing algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.3.1 Bootstrapping parse history . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.3.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.4 Post-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.5 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.6.1 Accuracy comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.6.2 Speed comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6 Semantic Role Labeling 86 6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.2 Semantic roles in dependency structure . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.2.1 Predicate argument structures in PropBank . . . . . . . . . . . . . . . . . . . 88 6.2.2 Syntactic and semantic dependencies . . . . . . . . . . . . . . . . . . . . . . . 89 6.2.3 Adding semantic roles to dependency trees . . . . . . . . . . . . . . . . . . . . 91 ix 6.3 Argument pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.3.1 First-order argument pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.3.2 Higher-order argument pruning . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.3.3 Conditional higher-order argument pruning . . . . . . . . . . . . . . . . . . . 97 6.4 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.4.1 Feature templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.4.2 Positional feature separation . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.5.1 Accuracy comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.5.2 Speed comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7 Conclusion 109 Bibliography 112 Appendix A Constituent Treebank Tags 119 A.1 Function tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 A.2 Part-of-speech tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 A.3 Clause and phrase level tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 B Semantic Role Labels 122 B.1 PropBank semantic role labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 B.2 VerbNet thematic role labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C Dependency Labels 124 C.1 CoNLL dependency labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 C.2 Stanford dependency labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Description: