DEPARTMENT OF INFORMATION ENGINEERING AND COMPUTER SCIENCE ICT International Doctoral School CONTROLLING THE EFFECT OF CROWD NOISY ANNOTATIONS IN NLP TASKS Azad Abad Supervisor Prof. Alessandro Moschitti Università degli Studi di Trento April 2017 Acknowledgment In the beginning, I practiced the same old story of doing my PhD with the blind eyes and without thinking of why I do need PhD and how this journey is going to change my life. For the first year of my PhD, the research activities did not go very smoothly and I thought about quitting it every weekend. I was completely frustrated and almost abandoned my PhD till I got an opportunity to work with Prof. Alessandro Moschitti. I thank him very much not only because he is a great supervisor but also a good friend. Definitely, I could not find my path without his support and mentoring during the last 4 years. (I started to like doing research because of you. Thank you very much Alessandro for those simple life lessons you used to share with me.) University of Trento, provided me with all the possible help and support under the management of Prof. Nicu Sebe. I would like to thank him as well as two great PhD office secretaries, Andrea and Francesca for their wonderful accountability. I would like to thank my friends and colleagues specially Moin, Arindam, Evgeny and Orkan for fulfilling my life and giving me professional advises and critical comments. I would like to thank my family and my wife Saameh who always encourage me to stay and fight also giving me both support and freedom. Now, the journey is over and what I can see behind is not regret but the footsteps of wisdom and expertise. Eventually, learning how to learn can make me stronger and a better person to serve my family and the society in the future. Abstract Natural Language Processing (NLP) is a sub-field of Artificial Intelligence and Linguistics, with the aim of studying problems in the automatic generation and understanding of natural language. It involves identifying and exploiting linguistic rules and variation with code to translate unstructured language data into information with a schema. Empirical methods in NLP employ machine learning techniques to automatically extract linguistic knowledge from big textual data instead of hard-coding the necessary knowledge. Such intelligent machines require input data to be prepared in such a way that the computer can more easily find patterns and inferences. This is feasible by adding relevant metadata to a dataset. Any metadata tag used to mark up elements of the dataset is called an annotation over the input. In order for the algorithms to learn efficiently and effectively, the annotation done on the data must be accurate, and relevant to the task the machine is being asked to perform. In other words, the supervised machine learning methods intrinsically can not handle the inaccurate and noisy annotations and the performance of the learners have a high correlation with the quality of the input data labels. Hence, the annotations have to be prepared by experts. However, collecting labels for large dataset is impractical to perform by a small group of qualified experts or when the experts are unavailable. This is special crucial for the recent deep learning methods which the algorithms are starving for big supervised data. Crowdsourcing has emerged as a new paradigm for obtaining labels for training machine learning models inexpensively and for high level of data volume. The rationale behind this concept is to harness the “wisdom of the crowd” where groups of people pool their abilities to show collective intelligence. Although crowdsourcing is cheap and fast but collecting high quality data from the non-expert crowd requires careful attention to the task quality control management. The quality control process consists of selection of appropriately qualified workers, providing a clear instruction or training that are understandable to non-experts and performing sanitation on the results to reduce the noise in annotations or eliminate low quality workers. This thesis is dedicated to control the effect of crowd noisy annotations use for training the machine learning models in variety of natural language processing tasks namely: relation extraction, question answering and recognizing textual entailment. The first part of the thesis deals with design a benchmark for evaluation Distant Supervision (DS) for relation extraction task. We propose a baseline which involves training a simple yet accurate one-vs-all strategy using SVM classifier. Moreover, we exploit automatic feature extraction technique using convolutional tree kernels and study several example filtering techniques for improving the quality of the DS output. In the second part, we focused on the problem of the crowd noisy annotations in training two important NLP tasks, i.e., question answering and recognizing textual entailment. We propose two learning methods to handle the noisy labels by (i) taking into account the disagreement between crowd annotators as well as their skills for weighting instances in learning algorithms; and (ii) learning an automatic label selection model based on combining annotators characteristic and the task syntactic structure representation as features in a joint manner. Finally, we observe that in fine-grained tasks like relation extraction where the annotators need to have some deeper expertise, training the crowd workers has more impact on the results than simply filter-out the low quality crowd workers. Training crowd workers often requires high-quality labeled data (namely, gold standard) to provide the instruction and feedback to the crowd workers. We conversely, introduce a self-training strategy for crowd workers where the training examples are automatically selected via a classifier. Our study shows that even without using any gold standard, we still can train workers which open doors toward inexpensive crowd training procedure for different NLP tasks. 6 Thesis Committee Themistoklis Palpanas Professor Paris Descartes University Fabio Massimo Zanzotto Associate Research Professor Università degli Studi di Roma Tor Vergata Paolo Rosso Associate Research Professor Universitat Politècnica de València Keywords [Crowdsourcing; Distant Supervison; Relation Extraction; Question Answering; Recognizing Textual Entailment; Crowd Self Training; Human Noisy Annotations; NLP; Kernel Methods] 7 Contents 1 Introduction 1 1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . 6 1.3 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . 7 1.4 List of Publications . . . . . . . . . . . . . . . . . . . . . 9 2 Background Work and Concepts 11 2.1 Supervised Methods . . . . . . . . . . . . . . . . . . . . . 12 2.1.1 Supervised Learning in NLP Tasks . . . . . . . . . 13 Classification . . . . . . . . . . . . . . . . . . . . . 13 Re-Ranking . . . . . . . . . . . . . . . . . . . . . . 14 2.1.2 Support Vector Machines . . . . . . . . . . . . . . . 15 2.2 Relation Extraction . . . . . . . . . . . . . . . . . . . . . . 19 2.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 19 i 2.2.2 Related Work on Relation Extraction . . . . . . . . 20 Supervised Methods . . . . . . . . . . . . . . . . . 21 Semi Supervised Methods . . . . . . . . . . . . . . 22 Self Supervised Methods . . . . . . . . . . . . . . . 22 2.2.3 Distant Supervision . . . . . . . . . . . . . . . . . . 23 Related Work on DS . . . . . . . . . . . . . . . . . 24 Datasets . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3 Question Answering . . . . . . . . . . . . . . . . . . . . . . 27 2.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.2 Related Works on QA . . . . . . . . . . . . . . . . 28 2.3.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4 Recognizing Textual Entailment . . . . . . . . . . . . . . . 29 2.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 29 2.4.2 Related works on RTE . . . . . . . . . . . . . . . . 30 2.4.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . 31 2.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . 31 2.5.1 Precision . . . . . . . . . . . . . . . . . . . . . . . . 32 2.5.2 Recall . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.5.3 F1 Measure . . . . . . . . . . . . . . . . . . . . . . 32 ii
Description: