TLFeBOOK Analyzing Rater Agreement Manifest Variable Methods TLFeBOOK This page intentionally left blank TLFeBOOK Analyzing Rater Agreement Manifest Variable Methods Alexander von Eye Michigan State University Eun Young Mun University of Alabama at Birmingham LAWRENCE ERLBAUM ASSOCIATES, PUBLISHERS Mahwah, New Jersey London TLFeBOOK Camera ready copy for this book was provided by the authors. Copyright © 2005 by Lawrence Erlbaum Associates, Inc. All rights reserved. No part of this book may be reproduced in any form, by photostat, microform, retrieval system, or any other means, with- out prior written permission of the publisher. Lawrence Erlbaum Associates, Inc., Publishers 10 Industrial Avenue Mahwah, New Jersey 07430 Cover design by Kathryn Houghtaling Lacey Library of Congress Cataloging-in-Publication Data Eye, Alexander von. Analyzing rater agreement: manifest variable methods / Alexander von Eye, Eun Young Mun. p. cm. Includes bibliographical references and index. ISBN 0-8058-4967-X (alk. paper) 1. Multivariate analysis. 2. Acquiescence (Psychology)—Statistical methods. I. Mun, Eun Young. II. Title. QA278.E94 2004 519.5'35—dc22 2004043344 CIP Books published by Lawrence Erlbaum Associates are printed on acid- free paper, and their bindings are chosen for strength and durability. Printed in the United States of America 10 9 8 7 6 5 4 3 21 Disclaimer: This eBook does not include the ancillary media that was packaged with the original printed version of the book. TLFeBOOK Contents Preface ix Coefficients of Rater Agreement 1. 1 1.1 Cohen's K (kappa) 1 1.1.1 K as a Summary Statement for the Entire Agreement Table 2 1.1.2 Conditional K 8 1.2 Weighted K 10 1.3 Raw Agreement, Brennan and Prediger's K, and a n Comparison with Cohen's K 13 1.4 The Power of K 17 1.5 Kendall's W for Ordinal Data 19 1.6 Measuring Agreement among Three or More Raters 22 1.7 Many Raters or Many Comparison Objects 25 1.8 Exercises 27 2. Log-Linear Models of Rater Agreement 31 2.1 A Log-Linear Base Model 32 2.2 A Family of Log-Linear Models for Rater Agreement 34 2.3 Specific Log-Linear Models for Rater Agreement 35 2.3.1 The Equal-Weight Agreement Model 35 2.3.2 The Weight-by-Response-Category Agreement Model 40 2.3.3 Models with Covariates 41 2.3.3.1 Models for Rater Agreement with Categorical Covariates 42 2.3.3.2 Models for Rater Agreement with Continuous Covariates 48 2.3.4 Rater Agreement plus Linear-by-Linear Association for Ordinal Variables 54 2.3.5 Differential Weight Agreement Model with Linear-by-Linear Interaction plus Covariates 59 2.4 Extensions 63 2.4.1 Modeling Agreement among More than Two Raters 63 2.4.1.1 Estimation of Rater-Pair-Specific Parameters 64 2.4.1.2 Agreement among Three Raters 67 2.4.2 Rater-Specific Trends 68 2.4.3 Generalized Coefficients K 70 2.5 Exercises 75 3. Exploring Rater Agreement 79 3.1 Configural Frequency Analysis: A Tutorial 80 3.2 CFA Base Models for Rater Agreement Data 85 TLFeBOOK vi Contents 3.2.1 CFA of Rater Agreement Data Using the Main Effect Base Model 85 3.2.2 Zero Order CFA of Agreement Tables 87 3.2.3 CFA of Rater Agreement Data under Consideration of Linear-by-Linear Association for Ordinal Variables 91 3.2.4 Using Categorical Covariates in CFA 93 3.3 Fusing Explanatory and Exploratory Research: Groups of Types 97 3.4 Exploring the Agreement among Three Raters 100 3.5 What Else Is Going on in the Table: Blanking out Agreement Cells 103 3.5.1 CFA of Disagreement Cells 104 3.5.2 Testing Hypotheses about Disagreement 111 3.6 Exercises 112 4. Correlation Structures 115 4.1 Intraclass Correlation Coefficients 116 4.2 Comparing Correlation Matrices Using LISREL 123 4.3 Exercises 129 5. Computer Applications 131 5.1 Using SPSS to Calculate Cohen's K 132 5.2 Using SYSTAT to Calculate Cohen's K 134 5.3 Programs for Weighted K 135 5.3.1 Using SAS to Calculate Cohen's K and Weighted K 135 5.3.2 Other Programs for Weighted K 137 5.4 Using Lem to Model Rater Agreement 142 5.4.1 Specifying the Equal Weight and the Weight-by-Response- Category Agreement Models 142 5.4.2 Models with Covariates 146 5.4.2.1 Models with Categorical Covariates 147 5.4.2.2 Models with Continuous Covariates 149 5.4.3 Linear-by-Linear Association Models of Rater Agreement 149 5.4.4 Models of Agreement among More than Two Raters 151 5.4.5 Models of Rater-Specific Trends 151 5.5 Using Configural Frequency Analysis to Explore Patterns of Agreement 152 5.5.1 First Order CFA (Main Effects Only) 152 5.5.2 Zero Order CFA 156 5.5.3 First Order CFA with One Continuous Covariate 158 5.5.4 CFA of the Agreement in Two Groups; No Gender-Association Base Model 161 TLFeBOOK Contents vii 5.5.5 CFA of the Agreement among Three Raters 164 5.6 Correlation Structures: LISREL Analyses 166 5.7 Calculating the Intraclass Correlation Coefficient 170 6. Summary and Outlook 173 References 177 Author Index 185 Subject Index 187 TLFeBOOK This page intentionally left blank TLFeBOOK Preface Agreement among raters is of great importance in many domains, both academic and nonacademic. In the Olympic Games, the medals and ranking in gymnastics, figure skating, synchronized swimming, and other disciplines are based on the ratings of several judges. Extreme judgements are often discarded from the pool of scores used for the ranking. In medicine, diagnoses are often provided by more than one doctor, to make sure the proposed treatment is optimal. In criminal trials, a group of jurors is used, and sentencing depends, among other things, on the complete agreement among the jurors. In observational studies, researchers increase reliability by discussing discrepant ratings. Restaurants receive Michelin stars only after several test-eaters agree on the chef's performance. There are many more examples. We believe that this book will appeal to a broad range of students and researchers, in particular in the areas of psychology, biostatistics, medical research, education anthropology, sociology, and many other areas in which ratings are provided by multiple sources. A large number of models is presented, and examples are provided from many of these fields and disciplines. This text describes four approaches to the statistical analysis of rater agreement. The first approach, covered in chapter 1, involves calculating coefficients that allow one to summarize agreement in a single score. Five coefficients are reviewed that differ in (1) the scale level of rating categories that they can analyze; (2) the assumptions made when specifying a chance model, that is, the model with which the observed agreement is compared; (3) whether or not there exist significance tests; and (4) whether they allow one to place weights on rating categories. The second approach, presented in chapter 2, involves estimating log-linear models. These are typically more complex than coefficients of rater agreement, and allow one to test specific hypotheses about the structure of a cross-classification of two or more raters' judgements. Often, such cross-classifications display characteristics such as, for instance, trends, that help interpret the joint frequency distribution of two or more raters. This text presents a family of log-linear models and discusses submodels, that is, special cases. The third approach, in chapter 3, involves exploring cross- TLFeBOOK
Description: