Using Twitter data to provide qualitative insights into pandemics and epidemics By: Wasim Ahmed A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy The University of Sheffield Faculty of Social Sciences Information School January 2018 Abstract Background: One area of public health research specialises in examining public views and opinions surrounding infectious disease outbreaks. Although interviews and surveys are valid sources of this information, views and opinions are necessarily generated by the context, rather than spontaneous. As such, social media has increasingly been viewed as legitimate source of pragmatic, unfiltered public opinion. Objectives: This research attempts to better understand how users converse about infectious disease outbreaks on the social media platform Twitter. The study was undertaken in order to address a gap in knowledge because previous empirical studies that have analysed infectious disease outbreaks on Twitter have focused on employing quantitative methods as the primary form of data analysis. After analysing individual cases on Ebola, Zika, and swine flu, the study performs an important comparison in the types of discussions taking place on Twitter and is the first empirical study to do so. Methods: A number of pilot studies were initially designed and conducted in order to help inform the main study. The study then manually labels tweets on infectious disease outbreaks assisted by the qualitative analysis programme NVivo, and performs an analysis using the Health Belief Model, concepts around information theory, and a number of sociological principles. The data were purposively sampled according to when Google Trends Data showed a heightened interest in the respective outbreaks, and a case study approach was utilised. Results: A substantial number of themes were uncovered which were not reported in previous literature, demonstrating the potential of qualitative methodologies for extracting greater insight into public health opinions from Twitter data. The study noted several limitations of Twitter data for use in qualitative research. However, results demonstrated the potential of Twitter to identify discussions around infectious diseases that might not emerge in an interview and/or which might not be included in a survey. 1 Acknowledgements There are a number of people who have contributed and supported me through this research process, with particular thanks to: My friends and family who have been a source of endless support through the research process. Especially my nephews who helped take my mind off the thesis when it was needed. My supervisors Peter Bath, Laura Sbaffi, and Gianluca Demartini for continual support and guidance without whom this project would not have been possible. Colleagues and friends across the Information School and the University of Sheffield, and in particular tutors on the Researching Social Media module which I undertook in my first year. The Faculty of Social Sciences at the University of Sheffield for providing a scholarship, and to my father Iftikhar Ahmed for providing additional financial and moral support. Those who kindly volunteered their time for the inter-coder reliability process Dimitrinka Atanasova, and Marc Bonne and also Nisar Ahmed for the company during the long hours in the library. Those who trusted me to talk at events and write articles on social media research. To all my A-Level teachers who supported me when I was predicted grades of DEE (because the area I lived in and previous grades). I would go on the achieve results of AAB. In particular, I would like to thank Janet Pike my Psychology A-level teacher for much inspiration during the time and for keeping in touch over the years. 2 REFEREED PUBLICATIONS ARISING FROM THIS WORK Ahmed, W., Bath, P.A., Sbaffi, L., & Demartini, G. (2018). Moral Panic through the Lens of Twitter: An Analysis of Infectious Disease Outbreaks. ACM Digital Library. 9th International Conference on Social Media & Society. Copenhagen, Denmark, July-18-20, 2018. Ahmed, W., Bath, P.A., Sbaffi, L., & Demartini. G. (2018). Measuring the Effect of Public Health Campaigns on Twitter: The Case of World Autism Awareness Day. Lecture Notes of Computer Science 2018. iConference Proceedings. Sheffield, United Kingdom, March 2018. Ahmed, W., Bath, P.A., Sbaffi, L., & Demartini. G. (2018) Using Twitter for Insights into the 2009 Swine Flu and 2014 Ebola outbreaks. IDEALS Proceedings. iConference Proceedings. Sheffield, United Kingdom, March 2018. Ahmed, W., Bath, P.A., & Demartini G (2017) Using Twitter as a data source: An overview of ethical challenges. Advances in Research Ethics and Integrity (Eds). Emerald Books. Ahmed, W., Demartini, G., and Bath, PA. (2017) Topics Discussed on Twitter at the Beginning of the 2014 Ebola Epidemic in United States. iConference proceedings 2017. Wuhan, China, March 2017. Ahmed, W., & Bath, PA. (2015) The Ebola epidemic on Twitter: challenges for health informatics. The Seventeenth International Symposium for Health Information Management proceedings (pp 289-289). York, 24 June 2015 – 26 June 2015. Ahmed, W., & Bath, PA. (2015) Comparison of Twitter APIs and tools for analysing Tweets related to the Ebola Virus Disease. iFutures prooceedings. Sheffield, 07 July 2015. KEY NON-PEER REVIEWED PUBLICATIONS Ahmed, W. (2017). Review of the book The ethics of memory in a digital age: interrogating the right to be forgotten, by Ghezzi, A., Pereira, Â., & Vesnic-Alujevic, L. (Eds.). Information, Communication & Society. http://dx.doi.org/10.1080/1369118X.2017.1362456. Ahmed, W. (2017) Using Twitter as a data source: An overview of current social media research tools (Updated for 2017) LSE Impact of Social Sciences blog http://blogs.lse.ac.uk/impactofsocialsciences/2017/05/08/using-twitter-as-a-data-source-an- overview-of-social-media-research-tools-updated-for-2017 8 May 2017 (Accessed 24/06/2017). Ahmed, W. (2015) Challenges of using Twitter as a data source: An overview of current resources LSE Impact of Social Sciences blog http://blogs.lse.ac.uk/impactofsocialsciences/2015/09/28/challenges-of-using-twitter-as-a- data-source-resources/ 28 September 2015 (Accessed 10/10/2015). Ahmed, W. (2015) Using Twitter as a data source: An overview of current social media research tools LSE Impact of Social Sciences blog http://blogs.lse.ac.uk/impactofsocialsciences/2015/07/10/social-media-research-tools- overview/ 10 July 2015 (Accessed 11/07/2015). 3 SELECTION OF INVITED TALKS Ahmed, W. (2017). Keynote Talk – Gaining Powerful Insights into Social Media Listening. Boston University College of Communication. Making Social Media Data Matter. 20th Oct, 2017. Ahmed, W., Bath, PA. (2017). Ethical Challenges of Social Media Data: Insights from Academia and Industry. OAI10 – CERN – UNIGE Workshop on Innovations in Scholarly Communication, Geneva, Switzerland. Ahmed, W. (2017). Communicating Science Through Social Media: Tools and Techniques. Society of Spanish Researchers in the UK. Sheffield, 13th May 2017. Ahmed, W. (2017). An Introduction to Social Network Analysis (Methodology) with NodeXL. Leeds Becket University, Leeds, 9th Jan 2017. (In association with the British Sociological Association). Ahmed, W. (2017). Ethical Challenges of Using Social Media Data in Research. Bite Size Guide to Research in the 21st Century. ScHARR, Sheffield, 24th Jan 2017. Ahmed, W. (2016). Insights from Social Media. Creative Entrepreneur, Media City (ITV, BBC HQs), Salford. 23 November 2016. Ahmed, W. (2016). Introduction to NodeXL. An Introduction to Tools for Social Media Research. London, 11th October 2016. Ahmed, W. (2016) Social Media Analytics. Department for Work and Pensions (DWP), Social Media Research Seminars. London, 4th August 2016. Ahmed, W. (2016) workshop on Twitter Analytics at the Contemporary Issues In Economy & Technology (CIET- workshop for Nestle Adriatic). Split, 16-18th June 2016. SELECTION OF SOCIAL MEDIA WORKSHOPS AND CONFERENCE PRESENTATIONS Ahmed, W. (2017). Theoretical and Practical Foundations of Social Media Research (Keynote Talk). Gaining Powerful Insights into Social Media Listening: A Summer School. Vodice, Croatia. 28th June 2017. Ahmed, W. (2017). One Minute Madness Poster Session. OAI10 – CERN – UNIGE Workshop on Innovations in Scholar nmhnfesfrtly Communication, Geneva, Switzerland. Ahmed, W. (2017). Social Media: A Practical Approach. Researching Social Media: A Theoretical and Practical Approach. Sheffield. 30th May 2017. Ahmed, W. (2017). Topics Discussed on Twitter at the Beginning of the 2014 Ebola Epidemic in United States. Information School Seminar Series. Sheffield, 7th March 2017. Ahmed, W. (2017). Social Networks and Infectious Disease Outbreaks: The Case of Swine Flu and Ebola. PubhD, Sheffield, 1st Feb 2017. 4 Ahmed, W. (2017). Interview with Rony Robinson on the baring all feature. BBC Radio Sheffield, Sheffield, 19th Jan 2017. Ahmed, W., Demartini, G., & Bath, PA. (2016) Using Twitter as a data source: An overview of ethical challenges. Ethics and Social Media Conference. London, 21st March 2016. Ahmed, W., & Bath, PA. (2016) The Role of Social Media for Humanitarian Assistance and Disaster Management. Sheffield Institute for International Development (SIID) 7th Annual Postgraduate Conference. Sheffield. 7th-8th April 2016. Ahmed, W (2015). Introduction to software that can be used to capture and analyse Twitter data. Audio lecture on Master of Social Science, Western Sydney University,27th August 2015. Ahmed, W. (2015). The Ebola epidemic on Twitter: challenges for health informatics. Information School, Alumni Reunion Event. 25 November 2015. 5 Table of Contents Chapter 1 Introduction ........................................................................................ 19 1.1 Introduction ................................................................................................................ 19 1.2 Background ................................................................................................................. 20 1.2.1 Swine Flu ............................................................................................................. 21 1.2.2 Ebola virus ........................................................................................................... 22 1.2.3 Zika Virus ............................................................................................................. 22 1.2.4 Social Media and Infectious Disease Outbreaks ..................................................... 22 1.3 Rationale for Study ........................................................................................................... 24 1.4 Research Aim and Objectives ............................................................................................ 26 1.5 Research Questions .......................................................................................................... 27 1.6 Outcomes of PhD .............................................................................................................. 27 1.7 Personal motivation for this research ................................................................................ 29 1.8 Thesis Structure ................................................................................................................ 29 1.9 Summary .......................................................................................................................... 31 Chapter 2 Literature Review ................................................................................ 33 2.1 Introduction ................................................................................................................ 33 2.2 Identifying Literature ................................................................................................... 33 2.2.1 Sources used ........................................................................................................ 33 2.2.2 Search Terms and Strategy ................................................................................... 33 2.3 Structure of Literature Review ..................................................................................... 34 2.4 Information and Related Concepts ............................................................................... 35 2.4.1 Definitions of Information Relevant to this Study .................................................. 35 2.4.2 Information ......................................................................................................... 35 2.4.3 Information Needs ............................................................................................... 37 2.4.4 Information Behaviour ......................................................................................... 39 2.4.5 Information Seeking ............................................................................................. 39 2.5 Social Cognition Models Relevant to this Study ............................................................ 41 2.5.1 Protection Motivation Theory .............................................................................. 41 2.5.2 Self-efficacy Theory (SET) ..................................................................................... 42 2.5.3 Health Belief Model (HBM) ................................................................................... 43 6 2.6 Sociological Concepts .................................................................................................. 48 2.6.1 Moral Panic.......................................................................................................... 48 2.6.2 Outbreak Narrative .............................................................................................. 48 2.7 Internet as a Source of Health Information ................................................................... 49 2.7.1 Twitter ................................................................................................................. 50 2.8 Health Related Research on Twitter ............................................................................. 52 2.9 Infectious Diseases ...................................................................................................... 56 2.10 Crisis Communication using Social Media Data ............................................................. 66 2.11 Studies using In-Depth Qualitative Methods to Analyse Tweets .................................... 71 2.12 Summary Table ........................................................................................................... 73 2.13 Synthesis of Literature ................................................................................................. 80 2.14 Limitations of Existing Research ................................................................................... 81 2.15 Lack of Studies and Knowledge Gaps ............................................................................ 82 2.16 Summary..................................................................................................................... 83 Chapter 3 Methodology ....................................................................................... 85 3.1 Introduction ................................................................................................................ 85 3.2 Research Paradigms .................................................................................................... 85 3.3 Deductive and Inductive Reasoning.............................................................................. 87 3.4 Research Strategies and Research Design ..................................................................... 89 3.5 Overview of Research Dimensions ............................................................................... 92 3.6 Overview of Data Sources Used in the Study ................................................................ 92 3.6.1 Twitter as Source of Data ..................................................................................... 92 3.7 Overview of Data Analysis Techniques ......................................................................... 93 3.7.1 Content Analysis .................................................................................................. 94 3.7.2 Thematic Analysis ................................................................................................ 94 3.7.3 Social Network Analysis ....................................................................................... 95 3.7.4 Sentiment Analysis ............................................................................................... 95 3.7.5 Machine Learning ................................................................................................. 95 3.8 Quality of Research ..................................................................................................... 96 3.8.1 Intercoder Reliability ............................................................................................ 96 3.8.2 Test-Retest Reliability .......................................................................................... 97 3.9 Application Programming Interfaces ............................................................................ 97 7 3.10 Comparison of Twitter Application Programming Interfaces (APIs) ............................. 100 3.11 Pilot Study on World Autism Awareness Day .............................................................. 102 3.12 Pilot Study on Ebola Tweets ....................................................................................... 102 3.12.1 Selection of Pilot Study Data .............................................................................. 103 3.12.2 Data Extraction Strategy ..................................................................................... 103 3.12.3 Data Inclusion and Exclusion Strategy ................................................................. 103 3.12.4 Coding Frames ................................................................................................... 104 3.12.5 Intercoder Reliability Process ............................................................................. 104 3.12.6 Results of Tweet Coding ..................................................................................... 105 3.12.7 Discussion .......................................................................................................... 106 3.12.8 Conclusions ........................................................................................................ 106 3.13 Ethical, privacy and Copyright Issues .......................................................................... 106 3.13.1 To what extent is user-generated content public? ............................................... 107 3.13.2 Copyright on Tweets .......................................................................................... 108 3.13.3 Potential Participants ......................................................................................... 108 3.13.4 Informed Consent .............................................................................................. 108 3.13.5 Potential harm to participants and Data Confidentiality ...................................... 109 3.13.6 Data Storage ...................................................................................................... 109 3.13.7 Summary ........................................................................................................... 110 3.14 Data Gathering and Filtering Strategies ...................................................................... 110 3.14.1 Selecting Swine Flu Data .................................................................................... 110 3.14.2 Selecting Ebola Data ........................................................................................... 111 3.14.3 Selecting Zika Data ............................................................................................. 112 3.14.4 Justification of Data Selection............................................................................. 113 4.14.5 Filtering Data ..................................................................................................... 114 3.15 How Data Were Analysed .......................................................................................... 117 3.15.1 Thematic Analysis .............................................................................................. 117 3.16 Validity and Reliability ............................................................................................... 121 3.16.1 Intercoder Reliability .......................................................................................... 121 3.16.2 Test-retest reliability .......................................................................................... 121 3.17 Comparison of Cases ................................................................................................. 121 3.18 Summary .................................................................................................................. 122 8
Description: