ebook img

Test Scoring PDF

435 Pages·16.302 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Test Scoring

TEST SCORING This page intentionally left blank TEST SCORING Edited by David Thissen The University of North Carolina at Chapel Hill Howard Wainer Educational Testing Service LAWRENCE ERLBAUM ASSOCIATES, PUBLISHERS Mahwah, New Jersey London This edition published in the Taylor & Francis e-Library, 2009. To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk. Copyright © 2001 by Lawrence Erlbaum Associates, Inc. All rights reserved. No part of this book may be reproduced in any form, by photostat, microfilm, retrieval system, or any other means, without the prior written permission of the publisher. Lawrence Erlbaum Associates, Inc., Publishers 10 Industrial Avenue Mahwah, NJ 07430 Cover design by Kathryn Houghtaling Lacey Library of Congress Cataloging-in-Publication Data Test scoring/edited by David Thissen, Howard Wainer. p. cm. Includes bibliographical references and index. ISBN 0-8058-3766-3 (hardcover: alk. paper) 1. Examinations—Scoring. I. Thissen, David. II. Wainer, Howard. LB3060.77.T47 2001 371.26–dc21 00-053525 CIP ISBN 1-4106-0472-1 Master e-book ISBN Contents Preface ix 1 Overview of Test Scoring 1 An David Thissen and Howard Wainer Who Do We Expect Will Use This Book? 4 What Is a Test About? 5 The Test as a Sample Representing Performance 5 What Is This Book (and Each of Its Chapters) About? 7 Psychometric Bases for Test Scoring 7 What Is Being Measured? 10 Score Combination 12 Subscore Augmentation 15 Score Reporting 16 What Is Not Included? 16 Item Analysis and Calibration 16 The Bases for Final Decisions: 1ne Uses of the Test Scores 17 References 17 PART I: TRADITIONAL TEST THEORY AND ITEM RESPONSE THEORY 21 2 True Score Theory: The Traditional Method 23 Howard Wainer and David Thissen True Score Theory 26 Reliability for More Complex Situations 34 v vi CONTENTS Estimating True Scores 52 Three Models for Error 57 Summary 70 References 71 3 Item Response Theory for Items Scored in Two Categories 73 David Thissen and Maria Orlando Logistic Item Response Models 74 The One-Parameter Logistic Model 74 The Normal Ogive and Two-Parameter Logistic Models 78 The Three-Parameter Normal Ogive and Logistic Models 92 Scale Scores 98 Estimates of Proficiency Based on Response Patterns 98 Estimates of Proficiency Based on Summed Scores 119 Additional Considerations 136 References 137 4 Item Response Theory for Items Scored in More Than Two Categories 141 David Thissen, Lauren Nelson, Kathleen Rosa, and Lori D. McLeod Logistic Response Models for Items With More Than Two Scoring Categories 143 Samejima's (1969) Graded Model 143 Bock's (1972) Nominal Model 146 Item Parameter Estimation for Models With More Than Two Response Categories 149 Scale Scores for Items With More Than Two Response Categories 150 Estimates of Proficiency Based on Response Patterns 150 Estimates of Proficiency Based on Summed Scores 154 The Testlet Concept 173 Conclusion 183 References 184 PART II: FACTOR ANALYTIC THEORY 187 5 Factor Analysis for Items Scored in Two Categories 189 Lori D. McLeod, Kimberly A. Swygert, and David Thissen Traditional Factor Analysis 190 Item Factor Analysis 197 CONTENTS vii A New Approach Using Item Response Theory: Full-Information Factor Analysis for Dichotomous Items 198 Other Approaches to Dimensionality Assessment 212 Conclusion 214 References 214 6 Factor Analysis for Items or Testlets Scored in More Than Two Categories 217 Kimberly A. Swygert, Lori D. McLeod, and David Thissen Structural Equation Models for Polytomous Items 218 Estimation Procedures 219 Assessment of Fit 222 A New Approach Using Item Response Theory: Full-Information Factor Analysis for Polytomous Items 224 Choosing the Right Analysis 228 Conclusions 247 References 248 PART III: SPECIAL PROBLEMS, SPECIAL SOLUTIONS (A SECTION OF APPLICATIONS) 251 7 Item Response Theory Applied to Combinations of Multiple-Choice and Constructed-Response Items--Scale Scores for Patterns of Summed Scores 253 Kathleen Rosa, Kimberly A. Swygert, Lauren Nelson, and David Thissen Recapitulation: The Background in IRT for Scale Scores Based on Patterns of Summed Scores 25 5 Scale Scores Based on Patterns of Summed Scores 257 The Generalization of Scale Scores Based on Patterns of Summed Scores to Cases With More Than Two Summed Scores 280 Conclusion 290 References 291 8 Item Response Theory Applied to Combinations of Multiple-Choice and Constructed-Response Items--Approximation Methods for Scale Scores 293 David Thissen, Lauren Nelson, and Kimberly A. Swygert A Linear Approximation for the Extension to Combinations of Scores 294 viii CONTENTS The Generalization of Two or More Scores 318 Potential Applications of Linear Approximations to IRT in Computerized Adaptive Tests 319 Evaluation of the Pattern-of-Summed-Scores, and Gaussian Approximation, Estimates of Proficiency 325 Conclusion 331 References 3 33 Technical Appendix: IRT for Patterns of Summed Scores, and a Gaussian Approximation 334 9 Augmented Scores-"Borrowing Strength" to Compute Scores Based on Small Numbers ofltems 343 Howard Wainer, Jack L. Vevea, Fabian Camacho, Bryce B. Reeve III, Kathleen Rosa, Lauren Nelson, Kimberly A. Swygert, and David Thissen Regressed Estimates: Statistical Augmentation of Meager Information 345 A General Description of Empirical Bayes Theory 346 Observed Score Approach to Augmented Scores An 348 More Accurate Mathematics Scores on a Test Like the SAT 353 Regressed Observed Subscores for a 1994 American Production and Inventory Control Society (APICS) Certification Examination 357 Regressed Observed Subscores for the Performance Assessment Part of the North Carolina Test of Computer Skills 362 Approach to Augmented Scores That Uses An Linear Combinations of IRT Scale Scores 365 Empirical Bayes (Regressed) Estimates Based on IRT Scale Scores for Response Patterns 36 6 Empirical Bayes (Regressed) Estimates Based on IRT Scale Scores for Summed Scores 3 73 Discussion 384 References 387 References 389 Author Index 401 Subject Index 405 Preface At the threshold of the 21st century, educational and psychological tests and their scores hold a more prominent place in society than ever before. Tests are taking an increasingly important place in education and educa­ tional policy. Students take more tests, and the consequences associated with the scores are associated with higher stakes: A majority of U.S. states have, or are considering, statewide tests that are part of promotion or graduation decisions. Results obtained from the National Assessment of Educational Progress (NAEP) guide educational policy at the national level, and results from statewide assessments similarly support policy deci­ sions at the state and local levels. Increasing emphasis on tests and their scores may also be observed in many other countries. Although these educational uses of testing may be most visible in the media, other uses of psychological testing are on the increase as well. For example, in recent years results obtained with psychological tests and questionnaires have become the primary outcome measures in many med­ ical trials, as drugs are designed whose purpose is an improvement in the quality of life. As another example, certification tests are required for en­ try to an increasing number of occupations and careers. As a single illus­ tration, the assessments that certifY computer professionals have grown in just a few years from nonexistence to become some of the largest volume testing programs in the world. With the increasing use of tests comes greater complexity in their scor­ ing: Large-scale tests often require multiple test forms for which scores must be reported in a comparable way. Some computerized adaptive tests (CATs) may be the ultimate in tests with multiple forms; CATs may be de­ ix

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.