Irony and Sarcasm Detection in Twitter: The Role of Affective Content PhD Candidate Delia Irazú Hernández Farías Thesis Advisors Paolo Rosso Universitat Politècnica de València, Spain Viviana Patti Università degli Studi di Torino, Italy Valencia, September 2017 Dipartimento di Informatica Dottorato di ricerca in Informatica Ciclo XXX Irony and Sarcasm Detection in Twitter: The Role of Affective Content Tesi presentata da: Delia Irazú Hernández Farías Tutors: Paolo Rosso Universitat Politècnica de València, Spain Viviana Patti Università degli Studi di Torino, Italy Coordinatore del dottorato: Marco Grangetto Settembre 2017 Settore scientifico-disciplinare di afferenza: INF/01 Acknowledgments I would like to like to express my most sincere gratitude to all of those who have made this work possible. Firstly, to my advisors: Paolo Rosso and Viviana Patti, without their help, it would not have been possible to conclude this thesis. Thanks a lot for all the time dedicated to our interesting and fascinating research topic #withoutsarcasm :D. Paolo, thank you for all the opportunities you have given me since more than five years ago. Thanks a lot for encouraging me to be a better PhD student and also for all your advice and patience. I am very thankful for your help and support during these years. I just want to say this in all the languages I speak: thank you! grazie! gracias! Viviana, thank you so much for all the support and help that you’ve given me. I really appreciate that you have made me collaborate in different projects. Thank you for inviting me to spend part of my PhD in a beautiful city such as Torino (giving to me the opportunity to learn a new language: Italian). Sinceramente, Grazie mille! I’m really thankful to the reviewers of this thesis: Rachel Giora, Horacio Saggion, and Pavel Braslavski; thanks for your valuable comments about my thesis. Thank you very much to the members of the evaluation tribunal of this thesis: Horacio Saggion, Elisabetta Fersini, and Roberto Basili. Thank you so much to Universitat Politècnica de València (UPV) and Università degli Studi di Torino (UniTo) for all the facilities and support provided to me. And also to the people in the Pattern Recognition and Human Language Technology (PRHLT) research center. Thanks to all the people from different countries and cultures that shared some time in the laboratory at UPV with me. A special mention is for Maite: thank you so much for the time and experiences we share during this period: moltes gràcies! I also want to say GRAZIE to the people at UniTo, especially to Emilio Sulis, Cristina Bosco, and Mirko Lai (who learned to speak his own version of Spanish with me). Thank you to all the people who have shared not only good (also bad and stressful) moments but also their lives with me in Valencia and Torino. Thanks to my grandfather, aunts, cousins, and friends in Mexico for always having words of encouragement for me. Last but not least, I would say thank you to the most important people in my life: my mom and my brother. Thank you for being always there supporting, helping, and encouraging me no matter the distance. Mami: Thank you so much for taking care of us and also for always having a smile even in rough times. Delia Irazú Hernández Farías València, July 2017. Funding This work has been funded by the National Council for Science and Technology (CONACyT - Mexico) with the Grant No. 218109/313683. Part of the research was carried out in the framework of the SomEMBED TIN2015-71147-C2-1-P MINECO project. Abstract Investigating how people express themselves in social media has attracted the attention of several disciplines due to the great potential for research that it represents. Social media platforms, like Twitter, offer a face-saving ability that allows users to express themselves employing figurative language devices such as irony to achieve different communi- cation purposes. Ironic utterances in such platforms are generated by users that most of the time have only an intuitive definition of what irony is. Dealing with such kind of content represents a big challenge for computational linguistics. Irony is closely associated with the indirect expression of feelings, emotions and evaluations, intended as the writer’s attitude or stance towards a particular target entity involved in the ironic utterance. Thus, interest in detecting the presence of irony in social media texts has grown significantly in the recent years, also for the impact on natural language processing (NLP) areas related to sentiment analysis, where irony detection is important to avoid misinterpreting ironic statements as literal. In this thesis, we introduce the problem of detecting irony in social media under a computational linguistics perspective. We propose to address this task by focusing, in particular, on the role of affective information for detecting the presence of such figurative language device. Attemptingtotakeadvantageofthesubjectiveintrinsicvalueenclosedin ironic expressions, we present a novel model, called emotIDM, for detect- ing irony relying on a wide range of affective features. For characterising an ironic utterance, we used an extensive set of resources covering differ- ent facets of affect from sentiment to finer-grained emotions. We address irony detection by casting it as a binary classification problem. To eval- uate our model, we collected a set of Twitter corpora used by scholars in previous research, to be used as benchmarks with a two-fold purpose: to compare the performance of our model against other approaches in the state of the art, and to evaluate its robustness across several different aspects related to the characteristics of the corpora, such as collection mode, size and imbalance degree. Results show that emotIDM has a competitive performance across the experiments carried out, validating the effectiveness of the proposed approach. In most cases, our outcomes outperform those from the related work confirming that affective in- formation helps in distinguishing between ironic and non-ironic tweets. Another objective of the thesis is to investigate the differences among tweets labeled with #irony and #sarcasm. Our aim is to contribute to the less investigated topic in computational linguistics on the separation betweenironyandsarcasminsocialmedia, again, withaspecialfocuson affective features. We also studied a less explored hashtag that has been used by scholars for collecting samples of sarcastic intention: #not. We find data-driven arguments on the differences among tweets containing these hashtags, suggesting that the above mentioned hashtags are used to refer different figurative language devices. We identify promising features based on affect-related phenomena for discriminating among differentkindsoffigurativelanguagedevicesandourclassificationresults outperform the state of the art. We also analyse the role of polarity reversal in tweets containing ironic hashtags, observing that the impact of such phenomenon varies. In the case of tweets labeled with #sarcasm often there is a full reversal (varying from a polarity to its opposite, almost always from positive to negative polarity), whereas in the case of those tagged with #irony there is an attenuation of the polarity (mostly from negative to neutral). Detecting irony in user-generated content could have a broad range of applications. Undoubtedly, one of the areas that can benefit most from irony detection is sentiment analysis. We analyse the impact of irony and sarcasm on sentiment analysis, observing a drop in the performance of NLP systems developed for this task when irony is present. Therefore, we explored the possible use of our findings in irony detection for the development of an irony-aware sentiment analysis system, assuming that the identification of ironic content could help to improve the correct identification of sentiment polarity. To this aim, we incorporated emotIDM into a pipeline for determining the polarity of a given Twitter message. We compared our results with the state of the art determined by the ‘Semeval-2015 Task 11: Sentiment Analysis of FigurativeLanguageinTwitter’sharedtask,demonstratingtherelevance ofconsideringaffectiveinformationtogetherwithfeaturesalertingonthe presenceofironyforperformingsentimentanalysisoffigurativelanguage for this kind of social media texts. To summarize, we demonstrated the usefulness of exploiting different facets of affective information for dealing with the presence of irony in Twitter.
Description: