A Computer Adaptive Measure of Reading Motivation Marcia H. Davis Johns Hopkins University [email protected] Wenhao Wang University of Kansas Neal M. Kingston University of Kansas Michael Hock University of Kansas Stephen M. Tonks Northern Illinois University Gail Tiemann University of Kansas To appear in Journal of Research in Reading: Davis, M. H., Wang, W., Kingston, N. M., Hock, M., Tonks, S. M., & Tiemann, G. (2020). A computer adaptive measure of reading motivation. Journal of Research in Reading. https://doi.org/10.1111/1467-9817.12318 Journal of Research in Reading peer review process: Manuscripts submitted to this journal undergo editorial screening and peer review by anonymous reviewers. Corresponding Author: Marcia H. Davis ([email protected]) 2800 North Charles Street, Suite 420 Center for Social Organization of Schools, The Education Building Johns Hopkins University, Baltimore, MD 21218, Authors’ Note This article is based on data published in Kingston et al. (2018) and Davis et al. (2017). We received funding for this work from the Institute of Education Sciences, under Grant Number R305A110148. The present research was also supported by the Center for the Interdisciplinary Study of Language and Literacy at Northern Illinois University and a grant awarded to the fifth author by the Institute of Education Sciences (R305A150193). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the Institute of Education Sciences. Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data is not available. COMPUTER ADAPTED READING MOTIVATION 2 Abstract Background: The importance of reading motivation has led to the development of a large number of self-report reading motivation measures; however, there is still a need for a usable measure of adolescent reading motivation that captures a large number of theoretically and empirically distinct constructs. Methods: The current paper details the development and validation of a computer adapted measure of reading motivation, the Adaptive Reading Motivation Measure (ARMM), which assesses constructs of curiosity, involvement, interest, value, challenge, grades, recognition, competition, avoidance, self-efficacy, perceived difficulty, preference for autonomy, social motivation, prosocial goals, and antisocial goals for reading. Results: Model fit indicated that hierarchical multidimensional models fit better than models without a hierarchical structure. The validation results indicate that females scored higher than males and younger students scored higher than older students on most ARMM scores when scores were derived using a higher-order model. In addition, these scores correlated significantly to reading behavior, engagement, and achievement and indicated high reliability. Conclusions: The findings suggest that the ARMM would be a valid measure to assess a large number of reading motivation constructs in a short period of time within a classroom setting. Keywords: Reading Motivation, Measurement, Adolescents, Validity, Computer Adaptive COMPUTER ADAPTED READING MOTIVATION 3 What is already known about this topic • Motivation to read is considered a critical contributor to reading achievement. • Although there are a number of adolescent reading motivation scales, most of these scales only measure a few reading motivation constructs. What this paper adds • The paper describes the development of the ARMM, which measures 15 separate constructs as well as a general reading motivation construct and, due to the computer adaptive nature, is only 45 items long. • Findings show that the ARMM scores were sensitive to gender and grade differences consistent with prior reading motivation research and correlated significantly to reading behavior, engagement, and achievement. Implications for theory, policy or practice • Being able to assess a large number of constructs could provide useful information to teachers implementing reading interventions and improving instruction. • The ARMM was developed for fifth through twelfth grade, which would facilitate grade comparisons in research studies. COMPUTER ADAPTED READING MOTIVATION 4 A Computer Adaptive Measure of Adolescent Reading Motivation Motivation to read is considered a critical contributor to reading achievement (Retelsdorf et al., 2011; Schiefele et al., 2012). If students lack the motivation to engage in reading, reading improvement will be limited (Guthrie & Wigfield, 1999) and may actually decline (Baker & Wigfield, 1999; Unrau & Schlackman, 2006). In the end, it is motivation that activates the behavior to engage in reading, making motivation an important factor in efforts to improve literacy (Guthrie & Wigfield, 2000). The importance of reading motivation has led to the development of a large number of self-report reading motivation measures, as described in a recent review (Davis et al., 2018). Although there are many reading motivation measures for elementary school students, measures for adolescents have only been developed recently, and many measure only a few reading motivation constructs (Davis et al., 2018). There is a need for a usable measure of adolescent reading motivation that captures a large number of theoretically and empirically distinct constructs. Theoretical Perspectives In the review of reading motivation scales Davis et al. (2018) found that while quite a few scales of reading motivation were directed by one (De Naeghel et al., 2012) or even multiple theories of motivation (Wigfield & Guthrie, 1997), there were still a large number of scales that had no theory identified. To add to the confusion, although quite a few different theoretical perspectives have driven item development for these scales, items appear similar despite having different construct labels (Davis et al., 2018; Neugebauer & Fujimoto, 2018). Like Guthrie and Coddington (2009) we believe that focusing on only one theory of motivation may limit the scope and multidimensionality of a measure. Our understanding of reading motivation derives from several theories including self-determination theory (Ryan & Deci, 2000), achievement COMPUTER ADAPTED READING MOTIVATION 5 goal theory (Meece et al., 2006), expectancy-value theory (Wigfield & Eccles, 2000), social cognitive theory (Schunk, 2003), and interest development theory (Hidi & Renninger, 2006). We define reading motivation as “students’ goals, values, beliefs, and dispositions towards reading” (Guthrie et al., 2013, p. 10), which implies that reading motivation is multidimensional and is based within many different theories related to goals, values, and beliefs. Adolescent Reading Motivation Reading motivation research has indicated that reading motivation declines over time (Schaffner et al., 2016). The decline in intrinsic reading motivation may be related to the typical reading practices of secondary schools such as less reading instruction in content area classes, lack of choice in reading, poorer personal connections with teachers, less connection with reading and real-world interactions, and more complex texts compared to elementary school (Guthrie & Davis, 2003). Due to this decline, it could be argued that it is important for secondary teachers to monitor reading motivation in their classrooms. However, assessing reading motivation through observation is difficult, especially in secondary schools where teachers see students for only a short time (Guthrie & Davis, 2003). Measuring engagement and motivation in reading can be difficult and time-consuming even for trained observers (Lutz et al., 2006; Neugebauer, 2016). A dynamic adolescent reading motivation measure could help teachers examine the nuances of reading motivation and determine interventions that could target specific constructs of reading motivation. However, out of the 16 measures reviewed by Davis et al. (2018) and additional two published after the review, only seven, which the Adapted Reading Motivation Measure (ARMM) is included, were written specifically for adolescent students. Of these seven adolescent reading scales, only three measure a wide range of motivational concepts; however, COMPUTER ADAPTED READING MOTIVATION 6 one of these three can only be used to measure reading of non-fiction texts of middle school students, which can limit its use. Also, only three of the seven scales were developed for both middle and high school students. Limiting to only a few grades may make comparisons between grades or longitudinal studies over a series of grades more difficult. Further, only two of the scales measured extrinsic motivation. Although elementary studies indicate that extrinsic motivation may not correlate as highly to engagement and achievement compared to intrinsic motivation (Wang & Guthrie, 2004), as intrinsic motivation decreases with age (Lepper et al., 2005; Schaffner et al., 2016), extrinsic motivation may play a larger role in motivating reluctant adolescent readers. Finally, social motivation can be highly important for adolescents (Moje et al., 2008); however, it is only measured by two of the measures. Computer Adaptive Testing One way to include more constructs on a measure without increasing the number of items is by using computer adapted technology. Computer adaptive measures use Item Response Theory (IRT) to select items for each respondent based on their previous answers, so that each respondent only has to answer a small subset of available items. In traditional measures, all respondents answer the same items, which makes them longer. Although the use of adaptive measures for questionnaire development is well established (e.g., Edelyn & Reeve, 2007) there are no computer adaptive measures of reading motivation (Davis et al., 2018). The Current Study The goal of the current paper was to describe the development and large-sample validation of the Adaptive Reading Motivation Measure (ARMM), an adaptive adolescent reading motivation survey that assesses fifteen separate reading motivational constructs. The development process was a multi-stage process, which included an item-writing stage, a pilot COMPUTER ADAPTED READING MOTIVATION 7 test, a large field test, and a validation study. In this paper the development process is explained and validation findings are be presented and discussed. The following questions in the validation study were addressed: 1. Which of four models, varying on degree of multidimensionality and number of hierarchical levels, fit the ARMM data the best? 2. Were the ARMM scores as measured by the computer adapted version reliable? 3. Were there differences between male and female students on the ARMM, and if so, did those differences align to previous research? 4. Were there differences between younger and older students on the ARMM, and if so, did those differences align to previous research? 5. Did the ARMM scores correlate with measures of reading behavior, engagement, and achievement? Were these correlations higher than correlations with math achievement? Method Participants Development and Pilot Test In the pilot test we administered items to 2,258 fifth through twelfth students from 32 schools in the Midwest and West Coast United States. At the school level there was an average of 76.2% white students, 3.0% black students, 12.0% Hispanic students, and 3.7% Asian students across the participating schools. In addition, there was an average of 41.8% of students receiving free or reduced meal prices across the schools. Field Test and Cognitive Interviews Participating in the field test were 7,457 public school students recruited from different research and teaching networks in the United States (813 fifth grade, 1,428 sixth grade, 1,160 COMPUTER ADAPTED READING MOTIVATION 8 seventh grade, 1,090 eighth grade, 1,355 ninth grade, 563 tenth grade, 576 eleventh grade, 413 twelfth grade students, and 59 students who did not identify their grade). Self-identified gender included 3,030 males and 2,711 females; 1,716 students gave no response in regards to gender. At the school level there was an average of 36.5% white students, 34.0% black students, 21.0% Hispanic students, and less than 1% other across the 209 participating schools. In addition, there was an average of 59.5% of students receiving free or reduced meal prices across the schools. At the same time as the field test cognitive interviews with students from two elementary, one middle, and one high school (28 girls, 25 boys) were conducted. Validation Study Participating in the validation study were 1,905 students from 43 schools located in the Midwestern United States (720 fifth-grade, 1,046 sixth- to eighth-grade, and 139 high school students). Of these students, 1.8% were Black, 0.6% were Asian, 4.0% were Native American, 93.1% were White, and 0.5% were other. Each student took the reading behaviors, engagement, and ARMM scales. One participating district provided achievement data for 605 students in the fifth grade and 287 in sixth to eighth grade. Measures Item Development A goal of the ARMM developers was to measure a wide range of reading motivation constructs; therefore, the team systematically reviewed reading motivation measures over the last 25 years and consulted with reading motivation experts in order to build a comprehensive list of constructs. In their review of past measures, the team found that some constructs, like self- efficacy and self-concept, were too similar at the individual item-level to warrant separate COMPUTER ADAPTED READING MOTIVATION 9 constructs, and therefore these constructs were collapsed into one scale for the ARMM. See the final construct list in Table 1 and six-point scale in Figure 1. Seven middle school and six high school teachers with expertise in reading and language arts instruction were recruited to attend a summer item-writing workshop. ARMM project staff, including principal investigators and consultants, and an author of the Motivation for Reading Questionnaire, also attended. The workshop began with an overview of the hypothesized sub- constructs of adolescent reading motivation. The teachers received a document with a definition of each construct, sample items, and an overview of item-writing procedures. Working in pairs, members of the panel then wrote over 700 items. Given the size of the item pool, the ARMM staff created a text mining program to calculate the proportion of identical words in every possible pair of items and deleted the items that were too similar. Additionally, an experienced test-item editor revised the items for clarity and reading level. Pilot Test. ARMM staff selected 600 items (40 for each of the 15 factors) for inclusion in the pilot study. A total of 10 different basic forms of items were specified, using a blocked design, with each individual form containing 20 unique items from each of three constructs, or 60 total items per form. Across all the forms, each of the 15 constructs was presented twice; thus each construct was represented by 40 unique items, for a total of 600 items. Students who participated in the pilot were randomly assigned to an assessment form on their school computers. Field Test Classical test theory item statistics from the pilot test were used to select a final pool of 20 items per construct (300 items in total) for a model comparison study. These items had higher item total correlations and non-extreme item average scores based on pilot data. A total of 10 COMPUTER ADAPTED READING MOTIVATION 10 forms containing the 300 unique items (20 items for each of the 15 constructs) were administered to public school students on their school computers. In order to collect enough student responses to calibrate for IRT models, sparse-matrix design was used to build the test form. Sparse-matrix design is a calibration data collection involving items overlapping across forms. The resulting ten forms were named Form A, Form B, Form C, … , Form J. Each form had 60 items, four items for each of the 15 constructs. All items appeared on two different forms, but each student only had to respond to 60 items on one form. This item overlapping across test forms allows items to be calibrated on a common set of IRT metrics and also collects enough data without students taking all 300 items at one time. Model Comparisons. A confirmatory IRT model comparison method was used to compare models generated based on different construct relationship assumptions. Four either unidimensional or multidimensional graded response models (Samejima, 1969) were calibrated. In the unidimensional model only one general reading motivation dimension is extracted from the data. In the second model, the multidimensional model, fifteen construct factors were allowed to correlate with each other and items load on one of these fifteen. In the third model, the higher order model, the fifteen construct factors correlated with the general factor directly and correlated with each other indirectly through the relationships with the general factor. Finally, the last model was a bi-factor model (Gibbons & Hedeker, 1992), which allows for a general factor as well as multiple secondary factors. However, unlike the higher-order model, the fifteen construct factors do not correlate with the general factor. In the current paper, the fifteen construct factors using the bi-factor model did not correlate with each other and the covariance of all sixteen latent factors were set to zero.