Technical University of Crete Electronic and Computer Engineering “Using crowdsourcing for grammar induction with application to spoken dialogue systems” by Elisavet Palogiannidi thesis committee Thesis supervisor: Assistant Professor Polychronis Koutsakis Committee member: Associate Professor Alexandros Potamianos Committee member: Professor Euripides Petrakis Chania, Crete July 2013 Abstract Spoken Dialogue Systems are becoming even more common in daily life, supporting information access to the masses in a plethora of domains, such as flight information, restaurant guide, and others. At early stages of the development of a Spoken Dialogue System, there is a chicken-and-egg problem, whereby good quality user data cannot be obtained without a reasonably robust system. Thus, many other ways of mining good quality data for such systems, are used during the development stage. Thedataarenecessaryduringthedevelopmentstage,becausetheycanbeusedindata- drivengrammarinductionalgorithms,inordertointroducenewrulestothegrammarsthat are incorporated to the system for understanding what the users need. This is important, because the performance of Spoken Dialogue Systems, is influenced by the variety of grammar rules that constitute the speech understanding grammar. One novel way for data acquisition is a new method, called Crowdsourcing. According to its definition Crowdsourcing is “the act of taking a job traditionally performed by a designated agent and outsourcing it to an undefined, generally large group of people in the form of an open call”. In Crowdsourcing jobs, a large group of people give answers to tasks that are easy for humans but difficult for computers, in exchange for a small amount of money. In this thesis the Crowdsourcing method is used, for gathering appropriate for the induction of new grammar rules, relevant to travel flight domain data. Using the Crowd- sourcing method, we design both the User Interface and the content of the tasks. Then, theCrowdsourcingworkershavetocompletethetasks. Themajorprobleminthismethod is that many Crowdsourcing workers prefer to “cheat”, with various ways, for saving time and money. Thus, during the Crowdsourcing process, we use various techniques, in order to reach the best possible quality for the data that we collect and reject users who intent to “cheat”. After the data acquisition, numerous metrics are implemented, in an attempt to quan- tify data quality and find out whether we achieved our goal. Finally, we reached the conclusions that the data collected using Crowdsourcing method, can be used for gram- mar induction, but the performance achieved is lower compared with the data collected usingwebharvestingmethod. Webelievethatifwesucceedinrejecting“cheaters”intime, the performance of grammar induction will be increased. Moreover, we introduce a kind of methodology for designing Natural Language tasks, based on an array of parameters. Furthergoalistodealwithanautomaticsentencesgeneratorappliedinthedesignprocess. Keywords: Crowdsourcing, Spoken Dialog Systems, Grammar Induction, Crowd- sourcing task Acknowledgments Foremost, I would like to express my sincere appreciation and gratitude to my super- visor Alexandros Potamianos, for all I learned from him during this year. His continuous support and encouragement in all stages of this thesis, as well as, his suggestions, ideas and comments were invaluable. Also, I would like to thank my committee members Poly- chronis Koutsakis and Euripides Petrakis for reviewing my work. Next, I’d like to thank all the people who contributed to the successful completion of this thesis: Giannis Klassinas for his help and guidance, Dr. Elias Iosif for his guidance, and the participants of the pilot evaluation for their time. Thisthesiscompletion, impliestheendofanimportantpartofmylifeandIdon’teven know if I would be here today if I hadn’t by my side some special persons. So, I thank my best friends, who I met in Chania, for sharing with them beautiful and carefree memories, and for being by my side in the difficult moments. My greatest thanks go to the persons that I could talk about my problems and excitement, and share with them my worries and ideas: Konstantinos Maragos, Georgia Athanasopoulou and Christina Ioannou. Special thanks to Konstantinos for helping me think rationally. Last but not least, I have no words to thank my family, and especially my parents, Eleftherios and Vasiliki, and my sister Anastasia, for believing in me and supporting my choices. This work is dedicated to my grandparents, Ioannis and Elisavet Palogiannidi, who wanted so much to see me achieving this goal. Also, to my grandparents Ioannis and Anastasia Kontogianni who, unfortunately, can’t see me, but I know that they would be full of pride. 2 “In God we trust. All others must bring data.” W. Edwards Deming Contents Page List of Figures 5 List of Tables 7 List of abbreviations 8 1 Introduction 10 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3 The task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.5 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Theoretical Background 13 2.1 Human Computer Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Language modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Context - free Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Natural Language Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Previous Work 20 3.1 Spoken Dialogue Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.2 Characteristics of a SDS . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.3 Previous works on Spoken Dialogue Systems . . . . . . . . . . . . . 22 3.1.4 Comparison with this thesis work . . . . . . . . . . . . . . . . . . . . 23 3.2 Crowdsourcing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.2 Crowdflower. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.3 Quality control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.4 NLP tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3 Spoken Dialogue Systems & Crowdsourcing . . . . . . . . . . . . . . . . . . 29 3.3.1 Comparison with this thesis work . . . . . . . . . . . . . . . . . . . . 30 3.4 Grammar Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4.2 Previous work in Grammar Induction . . . . . . . . . . . . . . . . . 31 3.4.3 Comparison with this thesis work . . . . . . . . . . . . . . . . . . . . 32 4 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 Definition of Crowdsourcing tasks & Pilot study 34 4.1 Introduction to the design of Crowdsourcing tasks . . . . . . . . . . . . . . 34 4.2 Design of Crowdsourcing tasks . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3 Pilot process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.2 Analysis of collected data . . . . . . . . . . . . . . . . . . . . . . . . 38 4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5 Design 43 5.1 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2 Design of the questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2.1 Parameter definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2.2 Sentences generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 6 Data Collection and Data Analysis 51 6.1 Collection process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.2 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.2.1 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.2.2 Grammar Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.2.3 Parser analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.2.4 Perplexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.2.5 Design success metric . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.2.6 Meta-data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 7 Conclusions and Future Work 66 7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 A Initial Questions designed per task 69 B Distribution of the Parameters 75 C Crowdflower and UI 86 List of Figures 2.1 Model Human Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Venn diagram of languages in Chomsky Hierarchy . . . . . . . . . . . . . . 17 5 LIST OF FIGURES 2.3 Rulesofthecontext-freegrammarforthelanguageLandcontext-freeparse tree for “aabb” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4 example of a part of an ABNF grammar . . . . . . . . . . . . . . . . . . . . 18 2.5 Hypothetical grammar and test text for a parser . . . . . . . . . . . . . . . 18 2.6 Parser trees for the test sentences based on the hypothetical parser . . . . . 19 3.1 Diagram of SDS’s basic modules . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Example of a finite-state automaton, from [37], that a DM can use . . . . . 22 3.3 Auto-induced semantic classes system from [41] . . . . . . . . . . . . . . . . 32 3.4 Grammar Induction’s algorithm basic steps . . . . . . . . . . . . . . . . . . 32 3.5 Example of agglomerative clustering . . . . . . . . . . . . . . . . . . . . . . 33 4.1 Abstract format of a Crowdsourcing task . . . . . . . . . . . . . . . . . . . 35 4.2 The form of the tasks that were designed . . . . . . . . . . . . . . . . . . . 36 5.1 Example of sentences in various levels . . . . . . . . . . . . . . . . . . . . . 46 5.2 Example of sentences in various Noc values . . . . . . . . . . . . . . . . . . 46 6.1 Fmeasure for initial corpus and corpus after flag filter for various number of clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.2 Fmeasure for domain corpora for various number of clusters . . . . . . . . . 57 6.3 Fmeasure for task corpora for various number of clusters . . . . . . . . . . . 57 6.4 Fmeasure for Qs corpora values for various number of clusters . . . . . . . . 57 6.5 Fmeasure for Ql corpora for various number of clusters . . . . . . . . . . . . 58 6.6 Fmeasure for Noc corpora for various number of clusters . . . . . . . . . . . 58 6.7 Fmeasure for Qp corpora for various number of clusters . . . . . . . . . . . 58 B.1 Qs distribution for the“Answers” task . . . . . . . . . . . . . . . . . . . . . 76 B.2 Ql distribution for the “Answers” task . . . . . . . . . . . . . . . . . . . . . 76 B.3 Noc distribution for the “Answers” task . . . . . . . . . . . . . . . . . . . . 77 B.4 Qp distribution for the “Answers” task . . . . . . . . . . . . . . . . . . . . . 77 B.5 Qs distribution for the “Prompts” task . . . . . . . . . . . . . . . . . . . . . 78 B.6 Ql distribution for the “Prompts” task . . . . . . . . . . . . . . . . . . . . . 78 B.7 Noc distribution for “Prompts” task . . . . . . . . . . . . . . . . . . . . . . 79 B.8 Qp distribution for the “Prompts” task . . . . . . . . . . . . . . . . . . . . 79 B.9 Qs distribution for the “Paraphrasing” task . . . . . . . . . . . . . . . . . . 80 B.10 Ql distribution for the “Paraphrasing” task . . . . . . . . . . . . . . . . . . 80 B.11 Noc distribution for the “Paraphrasing” task . . . . . . . . . . . . . . . . . 81 B.12 Qp distribution for the “Paraphrasing” task . . . . . . . . . . . . . . . . . . 81 B.13 Qpl distribution for the “Paraphrasing” task . . . . . . . . . . . . . . . . . 82 B.14 Qep distribution for the “Paraphrasing” task . . . . . . . . . . . . . . . . . 82 B.15 Qs distribution for the “Fill in” task . . . . . . . . . . . . . . . . . . . . . . 83 B.16 Ql distribution for the “Fill in” task . . . . . . . . . . . . . . . . . . . . . . 83 B.17 Noc distribution for the “Fill in” task . . . . . . . . . . . . . . . . . . . . . 84 B.18 Qp distribution for the “Fill in” task . . . . . . . . . . . . . . . . . . . . . . 84 B.19 Qep distribution for the “Fill in” task . . . . . . . . . . . . . . . . . . . . . 85 B.20 Df distribution for the “Free dialogues” task . . . . . . . . . . . . . . . . . . 85 C.1 “Answers” and “Prompts” task on pilot (a) . . . . . . . . . . . . . . . . . . 87 6 C.2 “Answers” and “Prompts” task on pilot (b) . . . . . . . . . . . . . . . . . . 88 C.3 “Answers” task on Crowdflower . . . . . . . . . . . . . . . . . . . . . . . . 89 C.4 “Prompts” task on Crowdflower . . . . . . . . . . . . . . . . . . . . . . . . 90 C.5 “Paraphrasing” task on pilot . . . . . . . . . . . . . . . . . . . . . . . . . . 91 C.6 “Paraphrasing” task on Crowdflower . . . . . . . . . . . . . . . . . . . . . . 92 C.7 “Free dialogue” task on pilot . . . . . . . . . . . . . . . . . . . . . . . . . . 93 C.8 “Free Dialogs” task on Crowdflower . . . . . . . . . . . . . . . . . . . . . . 94 C.9 “Fill in” task on pilot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 C.10 “Fill in” task on Crowdflower . . . . . . . . . . . . . . . . . . . . . . . . . . 96 List of Tables 3.1 Public information about crowdsourcing on Crowdflower . . . . . . . . . . . 25 4.1 Examples of System Prompts and User Responses . . . . . . . . . . . . . . 36 4.2 Examples of grammar rules about date and departure city . . . . . . . . . . 37 4.3 Examples of Top-level prompts and corresponding responses . . . . . . . . . 37 4.4 Corpora that created after pilot process . . . . . . . . . . . . . . . . . . . . 38 4.5 Ratio of reading instructions time to task completion time . . . . . . . . . . 39 4.6 Relevance metrics for the data collected from pilot . . . . . . . . . . . . . . 40 4.7 Design success metric for the data collected from pilot . . . . . . . . . . . . 40 4.8 Perplexity statistics for the various tasks using unigram LM . . . . . . . . 41 4.9 Perplexity statistics for the various tasks using bigram LM . . . . . . . . . 41 5.1 Crowdsourcing tasks in decreasing freedom order . . . . . . . . . . . . . . . 45 5.2 Tasks with their parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3 Assignment of parameter values to a freedom category . . . . . . . . . . . . 50 6.1 Corpora that created from Crowdsourcing . . . . . . . . . . . . . . . . . . . 53 6.2 Comparison of the corpus we collected and the corpora created after filtering 54 6.3 Parser success and percentages of garbage data and answers from flagged users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.4 Grammar Induction performance metrics (using a subset of the grammar) . 55 6.5 Grammar Induction performance metrics (using a subset of the grammar :only Date) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.6 Parser analysis performance metrics using a subset of the grammar . . . . 60 6.7 Perplexity statistics for the various corpora, for Bigram LM . . . . . . . . . 61 6.8 Correlation per task, based on perplexity and calculated between the data that we provide and the data that are provided by the contributors . . . . . 63 6.9 Percentages of the answers that target to collect data that belong to each domain, per task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.10 Data that belong to Date and Depar-city domain and were provided as answers to questions that were targeted in the corresponding domains . . . 63 7 LIST OF TABLES 6.11 Various statistics about contributors per task . . . . . . . . . . . . . . . . . 64 6.12 Percentagesofcontributorsthatsubmittedanumberofunitspertask, that belongs to a specific range . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 A.1 Initial Questions for the“Answers” task . . . . . . . . . . . . . . . . . . . . 70 A.2 Initial Questions for the “Prompts” task . . . . . . . . . . . . . . . . . . . . 70 A.3 Initial Questions for the “Paraphrasing” task . . . . . . . . . . . . . . . . . 71 A.4 Initial Questions for the “Fill in” task . . . . . . . . . . . . . . . . . . . . . 72 A.5 Initial Questions for the “Free dialogues” task (a) . . . . . . . . . . . . . . . 73 A.6 Initial Questions for the “Free dialogues” task(b) . . . . . . . . . . . . . . . 74 C.1 Crowdsourcing tasks’s titles . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8 List of abbreviations SDS Spoken Dialogue System NLP Natural Language Processing SU Speech Understanding UI User Interface HIT Human Intelligent Task AMT Amazon Mechanical Turk MT Machine Translation LM Language Model OOD Out of Domain HCI Human Computer Interaction MHP Model Human Processor 9
Description: