The Content Analysis Guidebook Second Edition To my family—Bob, Dorian, and Quinn, all of whom contributed to the 2nd edition in their own way In memoriam—In memory of my colleague and friend Paul D. Skalski, PhD, whose contributions were many and whose spirit will never fade. His substantial abilities, enthusiasm, and support were essential to both editions of this book. Kimberly A. Neuendorf Cleveland State University FOR INFORMATION: Copyright © 2017 by SAGE Publications, Inc. SAGE Publications, Inc. All rights reserved. No part of this book may be reproduced or used in any form or by any means, electronic or 2455 Teller Road mechanical, including photocopying, recording, or by any Thousand Oaks, California 91320 information storage and retrieval system, without permission E-mail: [email protected] in writing from the publisher. SAGE Publications Ltd. 1 Oliver’s Yard 55 City Road London, EC1Y 1SP Printed in the United States of America United Kingdom Library of Congress Cataloging-in-Publication Data SAGE Publications India Pvt. Ltd. Names: Neuendorf, Kimberly A., author. B 1/I 1 Mohan Cooperative Industrial Area Mathura Road, New Delhi 110 044 Title: The content analysis guidebook / Kimberly A. Neuendorf, Cleveland State University, USA. India Description: Los Angeles : SAGE, [2017] | Earlier edition: 2002. | Second edition bibliographical references and index. SAGE Publications Asia-Pacific Pte. Ltd. 3 Church Street Identifiers: LCCN 2015044657 | #10-04 Samsung Hub ISBN 9781412979474 (pbk. : alk. paper) Singapore 049483 Subjects: LCSH: Sociology—Research—Methodology. | Content analysis (Communication) Classification: LCC HM529 .N47 2017 | DDC 301.01—dc23 LC record available at http://lccn.loc.gov/2015044657 This book is printed on acid-free paper. Acquisitions Editor: Karen Omer Editorial Assistant: Sarah Dillard Production Editor: Libby Larson Copy Editor: Amy Harris Typesetter: C&M Digitals (P) Ltd. Proofreader: Jennifer Grubba Indexer: Jeanne R. Busemeyer Cover Designer: Candice Harman Marketing Manager: Amy Lammers 16 17 18 19 20 10 9 8 7 6 5 4 3 2 1 Resource 1 ___________ CATA—Computer-Aided Text Analysis Options by Kimberly A. Neuendorf, Jeffery “Phoenix” Allen, Paul D. Skalski, and Julie A. Cajigas T his resource provides information about quantitative computer content analysis software. Virtually all computer-driven content analyses are computer-aided text analysis (CATA), despite years of promises of computer analyses of the static or moving image. Table R1.1 lists a variety of CATA programs and highlights key features of each. Additional information about each program is included in Part I, which follows. The origins of this list began with the work of Popping (1997), Evans (1996), Alexa and Zuell (1999), and a number of web site authors who have, over the years, com- piled lists of quantitative text analysis software (e.g., Harald Klein, Matthias Romppel). Part II of this Resource focuses on one basic, useful freeware text analysis program, Yoshikoder. Although Table R1.1 contains a sampling of some of the most interesting and most widely used programs currently available, it is not comprehensive. The companion web site to this book (The Content Analysis Guidebook Online, or CAGO, see Resource 2) presents a more complete list, including newer programs that emerge, older “orphaned” programs that have not been updated or lack support (including some featured in the first edition of this book), additional qualitative content analysis software, and programs that simply assist in the coding process for audio and video content, with links to appropriate web sites. We also recommend Matthias Romppel’s web site, Content-Analysis.de. All of the programs featured in this resource are capable of analyzing English-language texts, while some can also accommodate additional languages; this is noted in Table R1.1. 304 ) d e noitatneserP u lacihparG YES YES NO YES NO YES NO YES NO NO YES YES NO NO ontin C ( gnidoC S O O O O O O S O O S S O O tnegremE YE N N N N N N YE N N YE YE N N segaugnaL S S O S S O O S S S O S S S elpitluM YE YE N YE YE N N YE YE YE N YE YE YE U U U U C C C C O T/ T/ U T/ T T T/ T T O U U U seiranoitciD N N N C N N N N N N N C C C I I I I I I I I ecnadrocnoC O S O S O S O O O O O O S S ro CIWK N YE N YE N YE N N N N N N YE YE sevitpircseD S S S S S S O O S O S O S S droW YE YE YE YE YE YE N N YE N YE N YE YE sesaC M M M M M M M S S S S S S S e omeD NO YES YES YES Onlindemo NA NO YES* YES NA YES* YES NA YES * O O O O O S O O O S O O S O eraweerF N N N N N YE N N N YE N N YE N Y) A) L T X N S NI O VI U C ns mroftalP PC (32 BIT PC (UP TO PC/MAC PC/LINUX/ PC/MAC PC PC Online & P PC/MAC PC/MAC/JAVA PC PC PC PC/MAC o pti O 1 e 0. ar 4. w s Soft ws vey Table R1.1 CATA margorP CATPAC II Concordance 3.3 Diction 7 Hamlet II 3.0 LIWC2015 MCCALite for Windo PCAD Profiler Plus SALT 2012 SentiStrength TextAnalyst Text Analytics for Sur(IBM SPSS) TEXTPACK TextQuest 4.2 305 no liatactinhepsaerrGP YES NO YES NO presented sepa- gs, and so forth accommodated ches, but rather tn eggnriedmoCE YES NO YES NO s must be word listin naries are based sear Table R1.1 (Continued) e cssenesviaer ei dratmmgeropnraalop ari o owurCsodrcicmiegtgfIternlstsWcnouoeaeeoaiWaMrrlDDDCCKPPFL T-LAB Pro 9.1.3PCNOYESSYESYESNOYES WordSmith 6.0 PC/MACNONO**MYESYESNOYES WordStat 7.0PC/(MAC & LINUX NOYESMYESYESINT/CUYES(Runs with SimStat only)WITH ADD-ONS) YoshikoderPCYESNAMNONOINT/CUNO * With special caveats (e.g., noncommercial use only) ** Refundable NOTES: =Platform Computer system(s) required for the program =Freeware Indicates whether the program is available for free =Demo Indicates whether a preview or demonstration version of the program is available on a limited basis ===Cases Indicates the number of text cases (or files) that can be processed simultaneously (S single, M multiple); note that usually multiple caserately in multiple files =Word Descriptives Indicates whether some type of word descriptives are provided by the program, such as word frequency output, alphabetical =KWIC or Concordance Indicates whether the program provides key word in context (KWIC) and/or some type of concordance output =Dictionaries Indicates whether internal (“standard” or “built-in”) dictionaries are provided by the program, whether custom (user-created) dictio==by the program, or both (INT internal, CU custom) =Multiple Languages Denotes whether at least one language other than English is accommodated by the program =Emergent Coding Indicates whether the program allows for emergent coding—that is, some type of analysis that is not dependent on dictionary-uses word counts and/or co-occurrences to create emergent patterns =Graphical Presentation Indicates whether the program provides some type of graphical presentation of its output or findings 306 Resource 1: CATA—Computer-Aided Text Analysis Options 307 Part I. Computer-Aided Text ____________________________ Analysis (CATA) Programs The annotated listing that follows provides a capsule description for each program itemized in Table R1.1. The listing contains (a) a brief description of the software, (b) examples of one or two good applications of the soft- ware that demonstrate the key features of the program, (c) the developer(s) of the software, and (d) recommended references, reporting either on the program itself or reporting on research for which the program was used. Further information about each program may be found at the CAGO web site. Given that our students have used most of these programs on assign- ments, some examples of their applications, including images of the program interfaces and sample outputs, may also be found at the CAGO site. The Yoshikoder program receives special attention in Part II of this Resource for several reasons. First, it performs all basic CATA functions, making it a good vehicle via which to learn the typical process and principal functions of computer text analysis. Second, the program provides options for the use of both standard, internal dictionaries and user-created, custom dictionaries. Third, the software is available for free online, generously pro- vided by author Will Lowe. For beginners to computer text analysis, we rec- ommend Yoshikoder as a tool for getting a feel for the techniques of CATA. And the program’s flexibility makes it a prime option for actual research applications as well. Key CATA programs, listed in alphabetical order, are the following: CATPAC II Description. CATPAC II, part of the Galileo suite of programs, reads text files (.txt only) and performs analyses such as simple word counts, cluster analysis (with icicle plots), and interactive neural cluster analysis in order to produce a variety of outputs, ranging from simple descriptives (e.g., word and alphabetical frequencies) to graphical summaries of the main ideas in a text. CATPAC employs a “self-organizing artificial neural network” to identify the most frequently occurring words in a text and determine patterns of similar- ity based on co-occurrence within a moving window that runs across the text. A companion program in the Galileo suite, Thought View, can generate two- and three-dimensional concept maps based on the results of a CATPAC analysis. One notable and unique feature of Thought View allows users to view the results through color anaglyph glasses (the ones with red and cyan lenses) and experience MDS-style output in stereoscopic 3-D! Advancements in the Galileo world include Wölfpak, a variation of CATPAC coded in Unicode so that it can analyze any language, and Listiac, a facility for extract- ing commonality patterns across lists. Application. Li and Rao (2010) used CATPAC to compare how news about the 2008 earthquake in China was disseminated via mainstream media 308 THE COnTEnT AnALYSIS GUIDEbOOK channels versus microblogging in terms of timeliness, quality of reports, and whether microblogging could replace traditional sources or only serve as a supplement to traditional sources. by using CATPAC’s facility for key word “include” files and entering synonyms for accuracy and completeness, they established that mainstream news had much higher “hit densities” for both concepts, although this tendency varied by time frame. Dr. Li shared through a correspondence that she found the program’s “hit ratio” feature helpful and that the application was fairly easy to learn to use through the free online tutorial available from the CATPAC developers. Developer. Joseph Woelfel References. Chung & Cho (2013); Li & Rao (2010); newton, buck, & Woelfel (1986); Salisbury (2001); Stepchenkova, Kirilenko, & Morrison (2009); Sung, Jang, & Frederick (2011); Wölfel et al. (2005) Concordance 3.3 Description. Concordance 3.3 performs a variety of functions allowing for the in-depth analysis of a text. In addition to such typical CATA features as counting words and (as its name denotes) making concordances, the program allows users to turn concordances into linked HTML files for easy viewing and publishing online. Samples of these web concordances (e.g., of Coleridge’s poem “The Ancyent Marinere” and blake’s “Songs of Innocence and Experience”) are viewable on the program web site. Concordance 3.3 also displays word lengths visually in chart form. It features an easy-to-use Windows interface and is described by the author as “the most powerful and flexible concordance program, with registered users in 70 countries.” Application. Witherspoon and Stone (2013) used several CATA programs to decipher the sentiment evidenced in online client reviews of tax preparation professionals. They actually used Concordance 3.3 to help customize the Diction program’s dictionaries, developing “domain specific, contextually unique word sets, for example, in the tax domain, which can be used to cus- tomize off-the-shelf content analysis software” (p. 101). The researchers com- pared the ability of LIWC 2007, Diction 6.0, and SentiStrength to identify client sentiments as opposed to how human coders evaluate the same texts. They concluded that human coding is superior to CATA sentiment analysis, but with customization, all of the off-the-shelf programs show better validity. Developer. R. J. C. Watt References. Coe & Reitzes (2010); Hu et al. (2009); Maxwell (2004, 2005); Myers, Zibrowski, & Lingard (2011); Witherspoon & Stone (2013) Diction 7 Description. Originally designed for the analysis of political texts (see also box 5.3), Diction 7 contains a series of internal dictionaries that search text Resource 1: CATA—Computer-Aided Text Analysis Options 309 documents (in various file types, such as *.txt, *.doc, *.pdf, *.odt, *.html, and others) for five main semantic features (activity, optimism, certainty, realism, and commonality) and 35 subfeatures (including tenacity, blame, ambiva- lence, motion, and communication). After a text is analyzed, Diction allows comparison of the results for each of its 60+ dictionary categories (31 internal and up to 30 custom) to a provided normal range of scores established by running more than 50,000 texts through the program. Users can compare their text to either a general normative profile of all 50,000+ texts or to any of six specific subcategories of texts (business, daily life, entertainment, jour- nalism, literature, politics, and scholarship) that can be further divided into 36 distinct types (e.g., corporate financial reports, email correspondence, music lyrics, newspaper editorials, novels and short stories, political debates, social science scholarship). In addition, Diction outputs raw frequencies (in alphabetical order), percentages, and standardized scores in spreadsheet form. Custom dictionaries can be created for additional analyses. Applications. The first application is an example of researchers devel- oping their own custom dictionaries and then applying them via Diction. The second application exemplifies the use of Diction’s standard/built-in dictionaries. McKenny, Short, and Payne (2013) decided to measure organizational psychological capital, which is concerned with “positively oriented” psycho- logical phenomena, such as optimism, resilience, hope, and self-efficacy/ confidence. They developed and validated a deductive word list (k = 402) that provided the set of words that are representative of the theoretical construct, used Diction in order to derive a validated 2,902 inductive word list from shareholder letters (n = 4,350) from a group of S&P 500 compa- nies (n = 664), then assessed the measure by creating and factor analyzing the dimensions from the custom dictionaries and applying the data analysis to a five-year period. McKenny et al. (2013) concluded that their work pro- vides a potential framework for elevating the level of a construct using computer-aided text analysis. Using this framework, researchers will be able to develop and validate constructs at the organizational level based on individual- level constructs, then measure these constructs directly at the organi- zational level by selecting the appropriate text for analysis. (p. 169) bligh, Kohles, and Meindl (2004) chose to rely on Diction’s internal dictionaries in their effort to analyze messages from then-President bush in relation to the 9/11 crisis. “To our knowledge, DICTIOn is the only software program that was explicitly designed to examine the linguistic elements of political leaders” (p. 564). They found that, when compared to his precrisis speeches, bush’s postcrisis speeches were significantly higher on the standard constructs of faith, patriotism, aggression, and collectives and significantly lower on ambivalence. 310 THE COnTEnT AnALYSIS GUIDEbOOK Developer. Roderick P. Hart References. Abelman & Dalessandro (2009); bligh & Hess (2007); bligh, Kohles, & Meindl (2004); Forsythe (2004); Hart (1985, 2000a); Hart & Childers (2005); Hart & Jarvis (1997); McKenny et al. (2013); Schroedel et al. (2013); Short & Palmer (2008); Witherspoon & Stone (2013) General Inquirer Description. The oldest of the CATA programs described in this Resource, the General Inquirer (GI) was first a “mainframe” computer application in the 1960s. Over the years, a PC version has existed, and a couple of different online versions have been available through GI developer Philip Stone, PhD, of Harvard University. Since Dr. Stone’s passing in 2006, the GI has in essence become “orphaned.” However, some researchers who earlier obtained the PC version have continued to use it for their research. The PC version of GI allowed the user to upload custom dictionaries in addition to the standard, internal dictionaries that were a part of the GI for over 50 years. The General Inquirer coded and classified text using the Harvard IV-4 dictionary, which assesses such features as Osgood’s three semantic dimensions, language reflecting particular institutions, motivation-related words, cognitive orienta- tion, and more. GI also coded for the Lasswell value dictionary, which includes measures of dimensions of power, respect, affection, well-being, and others. Also included were several categories reflecting positive/negative valence and social cognition, as well as “marker” categories developed pri- marily as a resource for disambiguation. Application. Abrahams et al. (2012) used General Inquirer to analyze con- sumer comments text mined from online forums used by vehicle enthusiasts. They concluded that sentiment analysis was insufficient for finding, catego- rizing, and prioritizing vehicle defects noted by consumers. Instead, they developed a set of linguistic markers (which they called “smoke words”) found in online discussion forums and social media of consumers, and the prevalence of these terms was generally more predictive of the presence of automotive safety and performance defect mentions in the posts than was sentiment (measured via the General Inquirer’s Harvard Dictionary metrics for positive and negative words). Developers. Phillip J. Stone and Vanja buvac References. Abrahams et al. (2012); Dowling & Kabanoff (1996); Kelly & Stone (1975); Stone et al. (1966); Yang & Lee (2004) Hamlet II 3.0 Description. The main facility of Hamlet II is a “Joint Frequencies” proce- dure that searches a text file for words in a user-created, custom dictionary list, and computes matrices of raw and standardized joint frequencies with Resource 1: CATA—Computer-Aided Text Analysis Options 311 respect to a chosen unit of context or of joint occurrences within a given number of words. Hamlet II will analyze a single text to provide word counts, comparisons of word lists for two text files, KWIC, and (most importantly), using co-occurrence data from the custom dictionary search list (“Vocabulary File”), it will produce a fairly sophisticated series of multivariate analyses, including cluster analysis, MDS, and correspondence analyses. The graphical output generated by Hamlet II provides some unique options, making the results easy to interpret. Application. bistrova and Lace (2012) first used the TextStat application to ascertain 20 concepts that fit into five previously accepted categories derived from an analysis of business literature and peer-reviewed scientific papers (i.e., corporate governance, capital budgeting, social responsibility, innovations, shareholder return). Then, they used the joint frequencies (co-occurrence) analysis results from Hamlet II to erect “a hierarchy based on the concepts related to shareholder value in the long-term” (p. 7), producing a graphical representation of the main concept interrelationships. Developers. A. P. brier and b. Hopp References. bistrova & Lace (2012); brier & Hopp (2005, 2011); Ciemleja, Lace, & Titko (2014); Juozeliuniene (2008) LIWC2015 Description. LIWC (Linguistic Inquiry and Word Count; see also box 5.4) was developed for researchers interested in the measurement of emotional, cognitive, social, or other psychological constructs from written or tran- scribed text. Using internal dictionaries, the program analyzes individual or multiple text samples along 82 language dimensions, including psychological constructs (e.g., affect and cognition), personal-concern categories (e.g., work, home, and leisure activities), and standard linguistic dimensions (e.g., percentages of pronouns and articles). Many of the dictionaries have been validated against human judgments and have fairly well-established psychometric properties. LIWC can also analyze numerous additional dimen- sions with custom dictionaries, which users indicate is an easy process. The program has been adopted by a large number of researchers in a wide range of disciplines. Application. Carroll (2007) used LIWC for an examination of students’ writing patterns in order to evaluate the cognitive and linguistic growth as evidenced by their essay writing assignments over the course of a semester. In one of two analyses reported in the article, 42 students in a critical-thinking course were asked to write an essay on a “weird” topic of their choice. The first and final versions of this paper were analyzed via 17 LIWC dictionaries, finding that the two drafts had significant linguistic and cognitive differences. For example, the final drafts had significantly longer sentences, more big words, fewer pronouns, less tentative language, and fewer insight words, all of which were interpreted by Carroll, in light of existing psychological theory
Description: