ebook img

Mokken Scale Analysis in Language Assessment PDF

166 Pages·2021·10.245 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Mokken Scale Analysis in Language Assessment

Mokken Scale Analysis collectively refers to a set of methods to examine the fit of data to two nonparametric Item Response Theory (IRT) models known as the Monotone Homogeneity Model (MHM) and the Double Monotonicity Model (DMM). As nonparametric IRT models, MHM and DMM are, compared to their parametric Purya Baghaei counterparts, easier to fit to the noisy data that social science researchers usually work with. Furthermore, the logic behind these models is a lot easier to grasp by researchers who do not have a strong background in algebra. This book is an introductory treatment of the topic with examples from the field Mokken of language assessment and research. It describes the basics of MSA and includes step-by-step tutorials to help the readers run the analyses with the R package mokken. Furthermore, case studies are reported to illustrate the concepts intro- duced throughout the book. The book is comprehensive and reader-friendly and can be followed by most empirical researchers in the social sciences. It is suitable Scale Analysis for all researchers and practitioners in the fields of behavioral and social sciences who are engaged in test and scale development. It is an easy-to-use manual that covers everything that you need to know to apply Mokken scaling confidently. in Language Assessment ISBN 978-3-8309-4446-1 www.waxmann.com Purya Baghaei Mokken Scale Analysis in Language Assessment Waxmann 2021 Münster • New York Bibliographic information published by die Deutsche Nationalbibliothek Die Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the internet at http://dnb.dnb.de. Print-ISBN 978-3-8309-4446-1 E-Book-ISBN 978-3-8309-9446-6  Waxmann Verlag GmbH, 2021 Steinfurter Straße 555, 48159 Münster, Germany Waxmann Publishing Co. P. O. Box 1318, New York, NY 10028, U. S. A. www.waxmann.com [email protected] Cover Design: Anne Breitenbach, Münster Print: CPI Books GmbH, Leck Printed on age resistant paper acid-free as per ISO 9706 Printed in Germany All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, electrostatic, magnetic tape, mechanical, photocopying, recording or otherwise without permission in writing from the copyright holder. Contents Preface ................................................................................................................. 7 Chapter 1: Mokken Scale Analysis: Core Issues ............................................. 9 1.1 Overview ........................................................................................................ 9 1.2 Guttman Scaling ........................................................................................... 11 1.3 Mokken Scale Analysis ................................................................................ 13 1.4 Comparison with the Rasch Model .............................................................. 20 1.5 Comparison with the Classical Test Theory ................................................. 22 1.6 Mokken Scale Analysis for Polytomous Items ............................................ 24 1.7 Reliability ..................................................................................................... 27 1.8 Summary ...................................................................................................... 28 Chapter 2: Mokken Scale Analysis: Advanced Issues .................................. 30 2.1 Overview ...................................................................................................... 30 2.2 Scalability Coefficients ................................................................................ 30 2.3 Automated Item Selection Procedure (AISP) .............................................. 36 2.4 Monotonicity ................................................................................................ 42 2.5 Invariant Item Ordering (IIO) ...................................................................... 45 2.6 Sample Size in MSA .................................................................................... 51 2.7 Contribution of MSA to Test Validation ...................................................... 53 2.8 Criticism of Mokken Scale Analysis ............................................................ 56 2.9 Summary ...................................................................................................... 58 Chapter 3: mokken Package Tutorial ............................................................. 60 3.1 Overview ...................................................................................................... 60 3.2 Automated Item Selection Procedure (AISP) .............................................. 60 3.3 Scalability Coefficients ................................................................................ 65 3.4 Monotonicity ................................................................................................ 69 3.5 Invariant Item Ordering (IIO) ...................................................................... 71 3.6 Nonintersection of ISRFs ............................................................................. 76 3.7 Reliability ..................................................................................................... 80 3.8 Identifying Outliers ...................................................................................... 80 3.9 Two-Level MSA .......................................................................................... 82 Chapter 4: Application of MSA to a Dichotomous Test ............................... 84 4.1 Overview ...................................................................................................... 84 4.2 Analysis ........................................................................................................ 84 4.3 Comparison with the Rasch Model .............................................................. 88 Chapter 5: Application of MSA to Polytomous Items .................................. 95 5.1 Overview ...................................................................................................... 95 5.2 Data Source and Material ............................................................................. 95 5.3 Analysis ........................................................................................................ 96 Chapter 6: Application of MSA to a Partial Credit Test ............................ 103 6.1 Overview .................................................................................................... 103 6.2 Introduction ................................................................................................ 103 6.3 Seven-Point Scale (Sample 1) .................................................................... 104 6.4 Twenty-One-Point Scale (Sample 1) .......................................................... 119 6.5 Seven-Point Scale (Sample 2) .................................................................... 132 6.6 Twenty-One-Point Scale (Sample 2) .......................................................... 134 6.7 Discussion .................................................................................................. 137 Chapter 7: Application of MSA to Rater-Mediated Performance Assessment ...................................................................................................... 139 7.1 Overview .................................................................................................... 139 7.2 MSA for Performance Assessment ............................................................ 139 7.3 Analysis of Essay Writing .......................................................................... 143 Chapter 8: Application of MSA to Two-Level Data .................................... 150 8.1 Overview .................................................................................................... 150 8.2 Introduction ................................................................................................ 150 8.3 Two-Level Mokken Scale Analysis ........................................................... 155 8.4 Analyses and Results ................................................................................. 155 Bibliography ................................................................................................... 158 About the Author ........................................................................................... 168 6 Preface The purpose of this monograph is to introduce two related nonparametric item response theory (NIRT) models and a series of methods to test these models known as Mokken Scale Analysis (MSA). Although MSA is almost as old as some parametric IRT models, its application and spread in the social measure- ment contexts has been modest. Only recently, there has been a surge in the application of MSA in psychology and education. The availability of the user- friendly mokken package in the open source statistical programming software R has helped the spread of MSA over the past few years. This book is suitable for all researchers who are involved in test development and questionnaire design. MSA is a cogent and accessible technique for applied researchers who need to develop and validate tests and questionnaires. The book can be used as supple- mentary material in measurement courses in MA and PhD programmes in the social sciences. In this exposition, I have done my best to explain the logic of MSA in a simple language for social scientists in general and language testing researchers in particular. All the theoretical issues are covered in the first two chapters. Chapter 3 is a tutorial on mokken package in R. I have provided the codes and applied them to a polytomous dataset with outputs and interpretations. In chapters 4 to 8, I have applied MSA to five different types of data. In Chapter 4, MSA is applied to a dichotomous listening comprehension test. In Chapter 5, as an application to Likert-type items, a foreign language reading anxiety questionnaire is analysed. In Chapter 6, a C-Test battery, which is a language test for whose items partial credit is given, is analysed. In Chapter 7, the very recent and innovative applica- tion of the model to rater-mediated performance assessment is demonstrated. And finally, in Chapter 8, two-level scores are analysed. I believe these five types of data cover almost all the applications of MSA although the model could in future be applied in more diverse measurement contexts. The purpose of the chapters 4-8 is to demonstrate how MSA analyses are employed and interpreted in applied settings. In some cases, the Rasch model and factor analysis are also applied in parallel to MSA for comparison. 7 I would like to acknowledge the intellectual contribution of several col- leagues who read the manuscript and provided useful comments. Andries van der Ark from the University of Amsterdam, Rudy Ligtvoet from the University of Köln, Daniela Crișan from the University of Groningen, Roger Watson from Hull University, UK, Stefanie Wind from the University of Alabama, and Letty Koopman from the University of Amsterdam reviewed different chapters of the book and kindly offered their insights and recommendations. The nuances of the technicalities of MSA would have been very sloppy without their comments and corrections. A group of other colleagues and students proofread the manuscript before publication. I am deeply indebted to Hamdollah Ravand from Vali-e-Asr University of Rafsanjan, Iran, Mohammad Afsharrard, Mona Tabatabaee-Yazdi, Roya Shoahosseini, and Farshad Effatpanah. Their comments and insights helped tremendously to improve the arguments presented in the book. Any er- rors that remain are, of course, mine. Purya Baghaei July, 2021 Mashhad, Iran 8 Chapter 1 Mokken Scale Analysis: Core Issues 1.1 Overview Tests and questionnaires are prevalent in education and psychology. In fact, they are an indispensable part of everyday practices of educators, teachers, clinicians, and graduate students and instructors in these fields. Clinical decisions and diagnoses as well as findings of research heavily depend on the quality of the measures used. Furthermore, in high-stakes selection examinations, where can- didates are given access to employment or education on the basis of test results, the precision of the ordering of candidates is paramount. Therefore, validation research to ascertain that tests accurately reflect levels of the relevant construct is essential. Mokken Scale Analysis (MSA), named after the Dutch mathematician and political scientist Robert J. Mokken, is a series of methods to evaluate the fit of data to Nonparametric Item Response Theory models (NIRT). The building block of item response theory (IRT) models is the idea that educational and psychological constructs are latent, i.e., are not directly observable and measureable. These latent constructs can only become tangible by their manifes- tations through test items. Test takers’ responses to items in an educational test or a psychological questionnaire are manifestations of their locations on the latent continuum and an index of the degree to which they possess the construct of interest. However, the items or tasks in a test and examinees’ responses do not necessarily coincide with the construct of interest. IRT models provide the right apparatus to examine if there is a relationship between the items developed by the test designer and the latent variable. IRT assumes that there is a latent 9 variable on which persons and items have a location and, with this assumption, attempts to describe the structure in the observed variables (test items) (Sijtsma, 1998). The goal of IRT-based validation is to establish that the variation in the item responses and the construct the researcher wants to measure co-refer to the same entity (Baghaei & Tabatabaee-Yazdi, 2016; Borsboom, 2008). The models under MSA are probabilistic latent trait models which are suita- ble for instrument validation and placing respondents and items on an ordinal scale. They can be applied to both dichotomous and polytomous items. Being nonparametric, MSA models have less restrictive assumptions compared to Parametric Item Response Theory (PIRT) models such as the Rasch model (Rasch, 1960/1980). However, this comes at a price. MSA creates ordinal scales for persons and items. In fact, unlike parametric IRT models, there is no proce- dure for item easiness and person ability estimation on an equal-interval metric scale. In MSA, person ability and item difficulty indices are similar to those computed in classical test theory (CTT) methods. That is, for persons the total raw scores are computed as ability indices and for items the proportions of cor- rect answers are computed as indices of easiness. This limits the application of the model in equating and adaptive testing (Sijtsma & van der Ark, 2017). MSA models are suitable for validation purposes, scale construction, and checking the fundamental measurement properties of tests (Wind, 2017a). The advantage of NIRT models is that they produce better model-data fits compared to PIRT models (Sijtsma & van der Ark, 2017). Models under MSA are considered nonparametric IRT models because the relationship between the latent trait and the probability of a correct response, i.e., the item response function (IRF), need not follow a predetermined specific shape. In the Rasch model, for example, IRFs should follow a logistic function which is S-shaped. In MSA, as long as the IRFs are non-decreasing, they meet the requirement of the model. Unlike PIRT models, in NIRT the relationship between the latent trait and the item responses is represented with item response functions, not item parameters. The application of parametric IRT models implies “a relatively deep insight into the structure of the variable to be measured and the properties of the items by which it can be measured” (Mokken, 1971, p. 173). MSA is more appropriate “in contexts in which the underlying response processes are not well under- stood, such as affective variables” (Wind, 2017a, p. 50). MSA has mostly been 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.