EDUCATION POLICY ANALYSIS ARCHIVES A peer-reviewed scholarly journal Editor: Sherman Dorn College of Education University of South Florida Volume 14 Number 2 January 20, 2006 ISSN 1068–2341 Successive Student Cohorts and Longitudinal Growth Models: An Investigation of Elementary School Mathematics Performance Keith Zvoch University of Nevada, Las Vegas Joseph J. Stevens University of Oregon Citation: Zvoch, K., & Stevens, J. J. (2006). Successive student cohorts and longitudinal growth models: An investigation of elementary school mathematics performance. Education Policy Analysis Archives, 14(2). Retrieved [date] from http://epaa.asu.edu/epaa/v14n2/. Abstract Mathematics achievement data from three longitudinally matched student cohorts were analyzed with multilevel growth models to investigate the viability of using status and growth-based indices of student achievement to examine the multi-year performance of schools. Elementary schools in a large southwestern school district were evaluated in terms of the mean achievement status and growth of students across cohorts as well as changes in the achievement status and growth of students between student cohorts. Results indicated that the cross and between-cohort performance of schools differed depending on whether the mean achievement status or growth of students was considered. Results also indicated that the cross- cohort indicators of school performance were more reliably estimated than their between-cohort counterparts. Further examination of the performance indices revealed that cross-cohort achievement status estimates were closely related to student demographics while between-cohort estimates were associated with cohort enrollment size and cohort initial performance status. Of the four school performance indices studied, only student growth in achievement (averaged across Readers are free to copy, display, and distribute this article, as long as the work is attributed to the author(s) and Education Policy Analysis Archives, it is distributed for non- commercial purposes only, and no alteration or transformation is made in the work. More details of this Creative Commons license are available at http://creativecommons.org/licenses/by-nc-nd/2.5/. All other uses must be approved by the author(s) or EPAA. EPAA is published jointly by the Colleges of Education at Arizona State University and the University of South Florida. Articles are indexed by H.W. Wilson & Co. Send commentary to Casey Cobb ([email protected]) and errata notes to Sherman Dorn ([email protected]). Education Policy Analysis Archives Vol. 14 No. 2 2 cohorts) provided a relatively reliable and unbiased indication of school performance. Implications for the No Child Left Behind school accountability framework are discussed. Keywords: school accountability, longitudinal growth models, No Child Left Behind Act. Over the past several years, states have developed educational accountability systems as a means for improving the achievement outcomes for students (see Fuhrman & Elmore, 2004; Ladd, 1996). Educational accountability systems have been built on an implicit theory of action that assumes a public airing of student achievement results and a structured program of rewards and sanctions is requisite to motivate school personnel to constructively respond to evidence of substandard student outcomes (Forte-Fast & Hebbler, 2004; Furhman & Elmore, 2004; Marion, et al., 2002). For state policy makers, the substandard outcome most in need of redress by system stakeholders is student performance on standardized achievement tests. As reflected in the weighting of accountability outcomes, achievement test scores have been utilized as the key evidential component for determining the relative efficacy of schools in each state accountability system (Goertz & Duffy, 2001; Stevens, Parkes, & Estrada, 2000). Although widespread, the use of standardized test data as the primary or sole means for evaluating school performance is not without controversy. Questions regarding measurement precision, alignment with instructional content, and fairness in use for special student populations make the reliance on achievement tests a concern for many (e.g., AERA, APA, & NCME, 1999; Baker & Linn, 2004; Barton, 2004; Linn, 2000; Popham, 1999). Nonetheless, with passage of the No Child Left Behind federal legislation (NCLB: No Child Left Behind Act, 2002), testing is now more ubiquitous and of higher stakes than ever before. Under NCLB, states must revise their accountability systems to include annual testing of students in grades 3 through 8 in mathematics and reading/language arts. Consequences for substandard performance have also become more uniform and more stringent. Schools now face the clear prospect of a probationary designation, staff restructuring and/or state takeover if achievement standards are not met (NCLB, 2002). The institutionalization of mandatory testing across content area and grade level and the concomitant performance pressures that schools now face place a special burden on the analytic methods used to measure school performance. For accountability systems to work fairly and effectively, school performance indices need to be reliable and valid (Baker & Linn, 2004; Forte-Fast & Hebbler, 2004; Marion, et al., 2002). The challenge presented by the need for scientifically credible school performance data has led to investigation of the assessment approaches that have been used in state accountability systems. State approaches to school assessment can be categorized into those that measure school performance as a function of student achievement at one point in time (i.e., status) or those that measure the change in student achievement across two or more occasions. Status approaches (e.g., percent proficient, mean achievement) have been most commonly used by states and have had wide appeal because of the relative ease with which these measures can be calculated and understood by system stakeholders. However, status measures tend to be problematic when used for evaluative or accountability purposes. As singular snapshots of student achievement, status measures capture both the influence of student background and prior educational experience as well as current school contributions to student performance (Raudenbush, 2004; Raudenbush & Willms, 1995). The confounding of different sources of achievement performance presents a particular challenge under conditions commonly found in public school districts. Student assignment Successive Student Cohorts and Longitudinal Growth Models 3 to schools is not random, but is instead influenced by social and economic-based selection processes. The non-random sorting of families into neighborhoods and students into schools tends to result in a differential accountability burden for those schools that happen to serve large numbers of disadvantaged students (Raudenbush, 2004). Relative to their more advantaged counterparts, schools situated in impoverished contexts typically are required to produce a disproportionate increase in student achievement levels if state achievement standards are to be met and low performance sanctions are to be avoided. Perhaps in partial recognition of the challenge that schools with disadvantaged intakes face when status-type measures are used to evaluate school performance, states have also utilized measures that index the change in student achievement between testing occasions. Measures of student changes in achievement are seen as an alternative means by which schools, particularly those with challenging intakes, can demonstrate positive effects on students. Several states have measured student changes in achievement by comparing the grade level performance of successive student cohorts (e.g., the mean performance of 3rd graders in 2004 is compared to the mean performance of 3rd graders in 2005: “quasi” change) in an attempt to mitigate school differences in student intake (Stevens, et al., 2000). However, measuring school effectiveness by the change in successive student cohort performance levels can also be problematic for evaluative and accountability purposes (Hill & DePascale, 2003). Recent investigations of the successive cohort approach demonstrate that estimates of year-to-year changes in the mean achievement of students tend to be affected in large part by sampling variation, measurement error, and unique, non-persistent factors (e.g., construction noise) that affect test scores on only one of the testing occasions (Kane & Staiger, 2002; Linn & Haug, 2002). As a result, the observed change in school mean performance across student cohorts may be due in large part to the year-to-year fluctuation in student characteristics and testing conditions rather than actual changes in student performance (Carlson, 2002; Linn & Haug, 2002). The observed difficulty of obtaining valid and precise estimates of school performance when school compositions differ non-randomly and/or when the mean performance of successive student cohorts is compared has led to interest in measuring the achievement progress of individual students as another alternative for evaluating school performance (Teddlie & Reynolds, 2000; Willms, 1992; Zvoch & Stevens, 2003). In this approach, the test scores of individual students are linked across time. Individual growth trajectories are then estimated by fitting a regression function to the time series data obtained on each student. A measure of school performance follows from averaging the individual growth trajectories within each school. Tracking the achievement progress of individual students has certain advantages over the status and quasi-change models that states have used for school accountability purposes. Conceptually, longitudinal models of student achievement growth better represent the time-dependent process of academic learning (Bryk & Raudenbush, 1988; Seltzer, Choi, & Thum, 2003; Willett, 1988). Further, unlike status models, indices that capture the year-to-year changes in student achievement provide a degree of control over the stable background characteristics of students that otherwise complicate the evaluation of school effectiveness (Ballou, Sanders, & Wright, 2004; Sanders, Saxton, & Horn, 1997; Stevens, 2005). In addition, school performance measures that follow from estimates of the achievement progress of individual students tend to be more reliable than school performance measures that are based on the changes in achievement status between successive student cohorts (e.g., Kane & Staiger, 2002). Indices of student achievement growth may thus offer an alternative for monitoring school performance that avoids some of the inherent difficulties associated with the achievement status and the quasi-change approaches to school evaluation. Despite the potential of using individual time series data as a basis for measuring and evaluating school performance, states have a current disincentive for incorporating indices of student achievement growth into their accountability systems. Under NCLB, states are required to Education Policy Analysis Archives Vol. 14 No. 2 4 utilize a status-type measure (i.e., the percentage of students “proficient” or above on one testing occasion) as the primary means for evaluating school performance. Secondarily, states are permitted to evaluate schools that fail to meet standard by the percent proficient methodology by indexing the changes in proficiency between successive student cohorts (i.e., quasi-change). States can also choose to track the achievement progress of individual students as a third approach for evaluating school performance, but under the provisions of NCLB, this methodology can only serve to further identify schools in need of improvement (Olson, 2004). In other words, schools that meet standards either by the percent proficient or quasi-change approaches can be identified as needing improvement if a growth target is not met, but demonstrating strong student growth is not sufficient to avoid a low performance sanction if the school does not have an adequate percentage of students proficient by either of the two primary methodologies endorsed by NCLB. The disincentive currently associated with using individual time series data to measure and evaluate school performance has not allowed states to take full advantage of the annual testing of students required under the NCLB legislation. At present, only a couple of states and a handful of school districts have examined school performance as a function of student achievement growth (e.g., Kiplinger, 2004; Sanders, et al., 1997; Webster & Mendro, 1997; Zvoch & Stevens, 2003). Even less common are examinations of the multi-year performance of schools using longitudinal data on successive student cohorts (see Ponisciak, & Bryk, 2005; Bryk, Thum, Easton, & Luppescu, 1998; Bryk, Raudenbush, & Ponisciak, 2004, for examples). The limited application of longitudinal growth modeling methods to achievement data collected on students over time has left unanswered questions about the viability of using these techniques in state accountability systems. Although the studies conducted to date suggest that indices of student achievement growth tend to provide a less biased and a potentially more stable estimate of school performance than some NCLB-endorsed alternatives, questions about the mechanics of implementation (e.g., cross-cohort or between cohort analyses, estimation of unadjusted or value-added models) and the feasibility of use remain to be clarified (Bryk, et al., 2004; Flicek, 2004; Raudenbush, 2004). In response, the present study was designed to provide one example of how longitudinal growth models can be used to assess school performance across multiple student cohorts. Of particular interest was ascertaining whether estimates of cohort-to-cohort changes in the achievement growth of students provide a sound alternative for measuring school improvement. Note however that the intent of the current investigation was only to provide a preliminary and exploratory examination of the behavior and viability of certain growth-based approaches to measuring school performance. As such, school performance estimates were examined in relation to student intake characteristics rather than being adjusted by them. The investigation was facilitated by the analysis of achievement data from three longitudinally matched elementary school student cohorts from a large school district in the southwestern United States. The following research questions were considered: 1) Does the cross- cohort performance of schools differ based on an examination of school mean achievement vs. an examination of school average rates of growth in achievement? 2) Are the cross-cohort school performance estimates related to selected school characteristics? 3) To what degree do estimates of the mean achievement status and achievement growth of schools change with each successive student cohort? 4) Are estimates of the cohort-to-cohort changes in school performance related to selected school characteristics? and, 5) How reliable, on average, are each of the school performance estimates? Successive Student Cohorts and Longitudinal Growth Models 5 Method Participants The multi-year performance of elementary schools was investigated by examining the mathematics achievement of students from three longitudinally matched cohorts. The school district that provided the test score data has 79 kindergarten through grade 5 elementary schools that serve over 30,000 students each year. The district serves a significant number of students from special populations. At the elementary school level, English Language Learners, students eligible for a free or reduced price lunch, and students from ethnic minority groups constitute approximately 20%, 50%, and 55% of the student body, respectively. Beginning in the 1999–2000 school year, all third, fourth, and fifth grade students were assessed annually on the TerraNova/CTBS5 Survey Plus, a norm-referenced achievement test (CTB/McGraw-Hill, 1997). Between 6,000 and 6,500 students in each grade were assessed each spring. Achievement data from the three most recent longitudinal cohorts were analyzed in the present study. Table 1 diagrams the data structure associated with the current investigation. In Table 1, it can be seen that third to fifth grade longitudinal matches were available for students who entered the third grade in 1999–2000 (cohort 1), 2000–01 (cohort 2), and 2001–02 (cohort 3). Cohort 1 thus consisted of students who were third graders in 1999–2000, fourth graders in 2000–01, and fifth graders in 2001–02. The second and third cohorts consisted of the two following elementary school third to fifth grade student cohorts (i.e., cohort 2 from 2000–01 to 2002–03, and cohort 3 from 2001–02 to 2003–04). Table 1 Cohort Data Structure Year Grade 1999–2000 2000–01 2001–02 2002–03 2003–04 3 C1 C2 C3 4 C1 C2 C3 5 C1 C2 C3 Cohort 1 (N = 3,325), Cohort 2 (N = 3,347), Cohort 3 (N = 3,322); School N = 79 Within cohort matches were accomplished by the following set of procedures. For each cohort, students who participated in accountability testing in all three study years were selected (N ~ 5,000). To facilitate the study of school effects, students who attended the same elementary school in all three years were then identified. In each cohort, approximately 900 students transferred schools at least once during the respective three-year period studied. Next, students who did not have a mathematics score in any of the three study years (N ~ 100) were dropped from their cohorts. Finally, students who received one or more modified test administrations were eliminated from the working data files (N ~ 600). The sample exclusions resulted in the following within cohort sample sizes; cohort 1 (N = 3,325), cohort 2 (N = 3,347), cohort 3 (N = 3,322). The three cohorts were comprised of relatively equal numbers of students from special populations. The percentage of English Language Learners ranged between 11–13% per cohort while the percentage of students from economically disadvantaged backgrounds comprised 45 to 46% of the cohorts. The percentage of students from ethnic minority groups was also relatively constant at 54–55% Education Policy Analysis Archives Vol. 14 No. 2 6 across cohorts. Note however that the exclusion of students who did not participate in all three test administrations, students who transferred schools, and students who received at least one modified test administration lowered the percentage of students from special populations below district averages. Implications associated with the disproportionate exclusion of students from special populations will be addressed in the discussion. Measures Outcome data analyzed in the current study were student scale scores on the mathematics subtest of the TerraNova/CTBS5 Survey Plus. The Survey Plus is a standardized, vertically equated, norm referenced achievement test. All items are selected-response. According to the publisher, the mathematics subtest measures a student’s ability to apply grade appropriate mathematical concepts and procedures to a range of problem-solving situations. The publisher reports KR–20 estimates of reliability of .87 in grade 3, .89 in grade 4, and .87 in grade 5 (CTB/McGraw-Hill, 1997). Other measures utilized in the study were the five-year school average (i.e., 1999–2000 to 2003–04) of the percentage of students eligible for a free or reduced lunch (M = .58, SD = .28) and cohort enrollment size, averaged across the three student cohorts by school (M = 42.27, SD = 18.81). Analytic Procedures Three-level longitudinal models were estimated using the Hierarchical Linear Modeling (HLM) program, version 6.0 (Raudenbush, Bryk, Cheong, & Congdon, 2004). Models were estimated using student and school records that were collected in three data files. The first file (level-1) contained student and school identifiers, mathematics scale scores from students in each of the three cohorts, and a field for grade level. This file contained 30,051 records (i.e., three records for each of 10,017 students). The level-2 data file contained student and school identifiers and a field that designated cohort membership (N = 10,017). The level-3 data file contained only school identifiers (N = 79). After preparing the data for analysis, an unconditional three-level model was first used to estimate a mathematics growth trajectory for each elementary school student, to partition the observed parameter variance into its within and between school components, and to estimate the average achievement score and average growth rate for each elementary school across the three student cohorts. The level-1 model was composed of a longitudinal growth model that fitted a linear regression function to each individual student’s grade 3, 4, and 5 achievement scores. Equation 1 specifies the level-1 model, Y = π + π (Grade - 3)+ e (1) tij 0ij 1ij tij where Y is the outcome (i.e., mathematics achievement) at time t for student i in school j, π tij 0ij is the initial status of student ij (i.e., 3rd grade performance),1 π is the linear growth rate across 1ij grades 3–5 for student ij, and e is a residual term representing unexplained variation from the tij 1 By subtracting a value of 3 from GRADE, initial status is defined as the expected achievement of student i in school j at the end of grade 3 [π + π (3 - 3) = π ]. 0ij 1ij 0ij Successive Student Cohorts and Longitudinal Growth Models 7 latent growth trajectory. Levels 2 and 3 in the HLM model estimate mean growth trajectories in terms of initial status and growth rate across all students (equations 2a and 2b) and across all schools (equations 3a and 3b). π = β + r (2a) 0ij 00j 0ij π = β + r (2b) 1ij 10j 1ij β = γ + u (3a) 00j 000 00j β = γ + u (3b) 10j 100 10j In equations 2a and 2b, it can be seen that the initial achievement status and growth of students is conceived as a function of school average achievement (β ) or school average growth 00j (β ) and corresponding residuals (r , r ). Similarly, the initial status and growth by school in 10j 0ij 1ij equations 3a and 3b is conceived as a function of the grand mean achievement (γ ) or the grand 000 mean slope (γ ) and corresponding residuals (u , u ). Equations 3a and 3b were used to calculate 100 00j 10j the pooled estimates of school mean achievement (i.e., the mean performance of 3rd graders across the three cohorts) and school mean growth (i.e., the average 3rd to 5th growth rate of students across the three cohorts). The second model estimated included a term to represent changes over time in the performance of successive cohorts. As with the unconditional model, student growth trajectories were estimated at level 1 (see equation 1), but in this model the achievement and growth of students was conceived to also vary at level 2 as a function of the temporal span from one cohort to another (coded with a value of 0 for the first cohort, a 1 for the second cohort, and a 2 for the third cohort). The linear cohort term represents the federal expectation, outlined in the NCLB legislation, that regular, annual progress in student proficiency be made from one cohort of students to the next.2 Equations 4a and 4b specify the level-2 model. π = β + β (Cohort) + r (4a) 0ij 00j 01j 0ij π = β + β (Cohort) + r (4b) 1ij 10j 11j 1ij Using the above coding scheme for cohort membership, the intercept status parameter, school average achievement (β ) becomes the expected mean performance of 3rd graders in 00j cohort 1 (2000–02) whereas the intercept growth parameter, school mean growth (β ) becomes the 10j expected growth in achievement across grades 3 to 5 for the first cohort (2000–02). In addition, the cohort term (β ) can be interpreted as the expected change in the 3rd grade mean achievement of 01j schools across the three cohorts and the cohort term (β ) can be interpreted as the expected change 11j in school mean growth rates across cohorts. At level-3, between-school variation in the initial achievement status and growth rate of schools and the school-to-school differences in the cohort changes in achievement and growth were first modeled either in terms of the grand mean achievement (γ ) or the grand mean slope (γ ) of 000 100 schools and corresponding residuals (u , u ) or the grand mean achievement change (γ ) or the 00j 10j 010 2 The expectation of regular annual progress most often assumes a linear increase in school performance over succeeding student cohorts. This assumption may not always hold. The performance of schools could, for example, change across student cohorts in a non-linear fashion. In the present study, the time trend was modeled with a linear function as the time series was relatively short (three data points). When the time series is of longer duration, it may be necessary to represent the data with a more complex function. Education Policy Analysis Archives Vol. 14 No. 2 8 grand mean growth change (γ ) of schools (across cohorts) and corresponding residuals (u , u ; 110 01j 11 see equations 5a through 5d). Note that estimation of the residual variances enables assessment of the degree to which schools vary in the 3rd grade mean achievement in the first cohort (2000–02), u ; the changes in 3rd grade mean achievement between the three cohorts, u ; the achievement 00j 01j growth of elementary school students in the first cohort (2000–02), u ; and the changes in the 10j achievement growth of elementary school students between the three cohorts, u . Equations 5a 11j through 5d were used to calculate the within and between-cohort school performance estimates. β = γ + u (5a) 00j 000 00j β = γ + u (5b) 01j 010 01j β = γ + u (5c) 10j 100 10j β = γ + u (5d) 11j 110 11j Results Mathematics Achievement across Cohorts Table 2 presents the results of model 1, the pooled HLM model. In the upper panel of Table 2, the results of the fixed effects regression model are presented. The first estimate shown, the grand mean (γ ), is the average 3rd grade mathematics scale score across all students. The second 000 estimate, the grand slope (γ ), is the average yearly growth rate for those students. Across the three 100 student cohorts, the average 3rd grade mathematics scale score was estimated as 616.97 while the average yearly growth rate across grades 3 to 5 was estimated to increase by 16.74 scale score units per year. In the next panel of Table 2, estimates of the student-to-student and school-to-school variation in achievement and growth rates are presented. Chi-square tests of the model’s variance components indicated that students and schools differed significantly in achievement levels and the rate of achievement growth. The other estimates presented in the middle of Table 1 are the parameter reliabilities associated with each outcome measure. As can be seen in the table, most of the observed variability in the cross-cohort parameter estimates was true parameter variance (school mean achievement = .95, school mean growth = .84). The proportion of variation in student outcomes attributable to schools is presented in the bottom panel of Table 2. Twenty-one percent of the variation in student achievement level and 38% of the variation in student achievement growth was due to school-to-school differences. To illustrate the school-to-school differences in mathematics achievement averaged across the three cohorts, empirical Bayes (EB) estimates of the 79 elementary school mathematics mean achievement and mean growth rates are presented in the scatterplot in Figure 1. The horizontal line in the interior of the figure represents the cross-cohort grand mean achievement in mathematics. The vertical line in the interior of the figure represents the cross-cohort grand mean growth in mathematics. The two grand mean reference lines classify schools into four quadrants of school performance. The upper right quadrant contains schools with above average cross-cohort mean achievement in grade 3 and above average cross-cohort growth from grades 3 to 5. The lower right quadrant contains schools with below average cross-cohort mean scores but above average growth. The two quadrants on the left side of the figure contain schools with below average cross-cohort growth and either high or low mean achievement. The spread of points in Figure 1 demonstrates that schools with low mean scores were not always low performing schools in terms of student Successive Student Cohorts and Longitudinal Growth Models 9 growth in achievement. Similarly, above average school mean achievement at grade 3 did not always translate into above average growth across grades 3 to 5. Schools with low grade 3 mean scores had above or below average growth as did schools with relatively high grade 3 mean scores. The lack of a consistent relationship between the mean achievement and growth of schools is reflected in the correlation between the model’s level-3 residual terms (τ = -.16). In these data, knowing a school’s β initial achievement status offered little insight into the subsequent achievement progress of students. Table 2 Three-Level Cross-Cohort Model for Mathematics Achievement Variable Parameter estimates Fixed Effects Coefficient SE t School Mean Achievement, γ 616.97 1.69 365.81* 000 School Mean Growth, γ 16.74 0.40 41.50* 100 Variance Random Effects df χ2 Component Individual Achievement, r 790.85 9938 24535.66* 0ij Individual Growth, r 17.95 9938 10826.72* 1ij Level-1 Error, e 408.11 tij School Mean Achievement, u 214.19 78 2087.91* 00j School Mean Growth, u 10.82 78 542.96* 10j Reliability Estimates School Mean Achievement .95 School Mean Growth .84 Level-1 Coefficient Percentage of Variation Between Schools Individual Achievement, π 21.3 0ij Individual Growth, π 37.6 1ij Results based on data from 10,017 students distributed across 79 elementary schools. * p < .001 To assess the degree to which the estimates of school mean achievement and school mean growth were associated with schools’ social context (a measure of bias), correlations between the EB estimates of school performance and schools’ percentage free lunch rate were calculated. Percentage free lunch was strongly related to the average performance level of schools, r(77) = -.81, p < .001. Schools with a larger percentage of students eligible for free or reduced price lunch had student achievement levels that were lower than schools with smaller rates of free or reduced price lunch eligibility. However, knowing the percentage of the student body eligible for a free or reduced price lunch provided little insight into the average rate at which students learned mathematics across the three cohorts. A systematic relationship between percent free lunch and school mean growth was not observed, r(77) = -.17, p > .05. Education Policy Analysis Archives Vol. 14 No. 2 10 Figure 1. Cross-cohort relationship between school mean achievement and school mean growth in mathematics Mathematics Achievement by Cohort Table 3 presents the results of the second model that examined changes over time in the performance of successive cohorts. Estimates of the model’s fixed effects are presented in the top panel of Table 3. The first estimate presented (γ ) is the average 3rd grade mathematics scale score 000 for the first student cohort (2000–02). The second estimate (γ ) is the average cohort-to-cohort 010 change in 3rd grade mean scale scores. These estimates indicate that the 3rd grade mean achievement of the first student cohort was 619.39 and that the 3rd grade mean achievement of schools decreased by 2.43 scale score points on average with each successive student cohort. The next estimates presented are the average growth rate across grades 3 through 5 for the first student cohort (γ ) 100 and the average cohort-to-cohort change in longitudinal growth rates (γ ). These estimates indicate 110 that the first student cohort grew an average of 15.75 scale score points per year and that the mean growth rate of schools across grades 3 through 5 was increasing by an average of 1.03 scale score units with each successive cohort. Variance estimates are presented next in Table 3. Chi-square tests demonstrated that in the first cohort of students, students and schools differed significantly with respect to achievement levels and rates of growth. Further, these tests also indicated that schools differed with respect to the changes in successive cohort performance. Statistically significant school-to-school variation was observed in the changes in 3rd grade mean achievement and the grade 3 to 5 changes in achievement growth between cohorts. Parameter reliability estimates are presented