ebook img

ERIC ED577272: Describing Profiles of Instructional Practice: A New Approach to Analyzing Classroom Observation Data PDF

2015·2.5 MB·English
by  ERIC
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview ERIC ED577272: Describing Profiles of Instructional Practice: A New Approach to Analyzing Classroom Observation Data

Profiles of Instructional Practice 1 Running head: PROFILES OF INSTRUCTIONAL PRACTICE Describing Profiles of Instructional Practice: A New Approach to Analyzing Classroom Observation Data Peter F. Halpin and Michael J. Kieffer New York University Paper published as: Halpin, P. F. & Kieffer, M. J. (2015). Describing profiles of instructional practice: A new approach to analyzing classroom observation data. Educational Researcher, 44, 263-277. Acknowledgement: The research reported here was supported by funding from the Institute of Education Sciences (R305D140035), and the Spencer Foundation and William T Grant Foundation (Grant No. 201300093). The opinions expressed are those of the authors and do not represent views of these organizations. Profiles of Instructional Practice 2 Abstract The authors outline the application of latent class analysis (LCA) to classroom observational instruments. LCA offers diagnostic information about teachers’ instructional strengths and weaknesses, along with estimates of measurement error for individual teachers, while remaining relatively straightforward to implement and interpret. It is discussed how the methodology can support formative feedback to educators and facilitate research into the associations between instructional practices and student outcomes. The approach is illustrated with a secondary analysis of data from the Measures of Effective Teaching study, focusing on middle school literacy instruction. Profiles of Instructional Practice 3 Describing Profiles of Instructional Practice: A New Approach to Analyzing Classroom Observation Data Recent research in teaching effectiveness has addressed the development and implementation of classroom observational instruments used to evaluate teachers (e.g., Grossman, Cohen, Ronfeldt, & Brown, 2014; Ho & Kane, 2013; Sporte, Stevens, Healey, Jiang, & Hart, 2013; White, 2014). In contrast to other methods currently used for teacher evaluation (e.g., Haertel, 2013; “Asking Students About Teaching,” 2012), observational instruments are specifically intended to provide a detailed description of teachers’ instructional practices. They thereby offer a strong basis for supporting formative feedback to educators (e.g., Allen, Pianta, Gregory, Mikami, & Lun, 2011). Additionally, observational instruments can facilitate research about the association between instructional practices and a wide range of important student outcomes, including but not limited to academic performance. For example, teachers’ scores across a variety of instruments have been found to positively correlate with teachers’ value- added (VA) measures, as well as students’ self-reported school engagement and socio-emotional development (e.g., Kane & Staiger, 2012; also see Table 2 of this paper). Despite the growing evidence that in-classroom observations can provide useful information about teaching practices, it remains less clear how that information should be summarized to support inferences about individual teachers. Researchers in teacher evaluation often summarize the observational instruments with a total score, thereby placing teachers along a single dimension of effectiveness (e.g., Ho & Kane, 2013). While this approach may facilitate comparison among teachers, it is not in line with theory and evidence that effective teaching requires the skillful coordination of multiple practices (e.g., Darling-Hammond & Bransford, 2007; Snow, Griffin, & Burns, 2007), and that different teachers may demonstrate different Profiles of Instructional Practice 4 patterns of strengths and weaknesses (e.g., Grossman, Loeb, Cohen, & Wyckoff, 2013). Psychometric research also supports the conclusion that many observational instruments measure multiple dimensions of instructional quality (e.g., Grossman, et al. 2014; Lazarev & Newman, 2014; Savitsky & McCaffrey, 2014), suggesting that teachers’ practices are not well described in terms of a single construct. Another approach – one that has been recommended in professional development programs (e.g., Allen et al., 2011; Danielson, 2013) and appears to be commonly used in evaluation contexts – involves interpreting teachers’ scores on the individual items that make up an instrument. A strong rationale for this approach is that the individual items are directly anchored to specific instructional practices, whereas total scores or subscale scores may be more difficult to use for feedback. However, this approach brings into question the reliability of the item scores. Unreliable scores can lead to inaccurate inferences about a teacher’s strengths and weaknesses, which is especially problematic when those inferences are used to inform decisions in professional settings. In this paper we address three central measurement challenges in the application of classroom observational instruments. The first challenge is to provide a measurement methodology that captures the item-level diagnostic information that the instruments are designed to provide to educators. The second challenge is to estimate the measurement error associated with the scores assigned to teachers. Current standards in assessment require that measurement error is reported whenever scores are assigned to individuals (Joint Committee on Educational and Psychological Assessment, 2014). Yet commercial vendors of observational instruments do not commonly report estimates of measurement error, either at the item level or for total scores. Without this information, it is easy for decision-makers to fall into the Profiles of Instructional Practice 5 misconception that scores on the observational instruments are free of error, or that all teachers are measured with equal reliability, which can lead to inappropriate decisions rather than supporting the professionalism of teachers. The third challenge is to ensure that the methodology remains feasible to apply in the settings in which observational instruments are currently used. To this end we propose the application of latent class analysis (LCA). In this paper we show how LCA can be used to obtain a small number of empirical profiles of instructional practice, a term that we adopt from Grossman et al. (2013) to refer to diagnostically useful patterns of practice. We explain how these profiles capture information about teachers’ strengths and weaknesses, and illustrate how this information offers new possibilities for supporting formative feedback to educators and for informing research about the association between instructional practices and student outcomes. We also discuss how LCA provides an interpretable summary score and an estimate of measurement error for each teacher. A main reason that we suggest the use of LCA instead of more complicated diagnostic models (see Embretson & Yang, 2013, for a review) is because teachers’ scores and their measurement errors can be easily computed, once the parameters of the measurement model have been estimated. For example, the scoring procedure can be applied to new observations using a simple spreadsheet macro. In the following section we provide background on the three observational instruments that are the focus of the present study. We then provide a conceptual overview of LCA and its implementation in the current context. Next, we illustrate the use of LCA with a secondary analysis of data from the Measures of Effective Teaching (MET) study (Kane & Staiger, 2012). The example focuses on middle school English language arts (ELA) teachers, which is an important and under-researched domain (e.g., Carnegie Council on Advancing Adolescent Profiles of Instructional Practice 6 Literacy, 2010), as well as a domain in which measurement of teacher effectiveness has been particularly challenging (e.g., McCaffrey, Sass, Lockwood, & Mihaly, 2009). In the example we also address test-retest reliability and criterion-related validity with students’ reading achievement, engagement, and social-emotional development. It is important to emphasize that the empirical research we report is intended only as an illustration of LCA in the context of teacher observations. While the example illustrates the diagnostic interpretation of LCA and how it quantifies measurement error, the results do not support definitive conclusions about any particular instrument or population of teachers. We contextualize our findings in terms of ongoing measurement research in this area. Similarly, the analyses do not test a particular theoretical model of teaching and learning. When making substantive interpretations, we draw on the conceptions of teaching and learning that guided the development of the instruments. In the discussion section, we address implications and limitations of this research. Some technical details and additional figures are provided in the Online Appendices. Observation Instruments in English Language Arts In this paper we focus on three observational instruments that have been widely used to observe ELA middle school teachers. Two of these, the Classroom Assessment Scoring System (CLASS; Pianta, Hamre, Hayes, Mintz, & LaParo, 2008), as well as the more recently developed Framework for Teaching (FFT; Danielson, 2013), are intended for use across grades and subjects. The third, the Protocol for Language Arts Teaching Observations (PLATO; Grossman et al., 2013), is intended specifically for ELA teachers in grades four through nine. To be consistent with the data analyses that follow, we note when shortened versions of the instruments were used in the MET study. These versions of the instruments are summarized in Table 1. Profiles of Instructional Practice 7 CLASS was initially developed as a research tool for addressing the quality of early- childhood classroom environments (Pianta et al., 2005). Its theoretical basis is in human development and ecological systems (e.g., Bronfenbrenner & Morris, 1998), focusing on the daily interactions that take place among teachers and students (Pianta & Hamre, 2009). Multiple versions of the instrument are now used in classrooms ranging from preschool to high school, and in current implementations it is often coupled with a teacher training intervention referred to as My Teaching Partner (Allen et al., 2011). As used in the MET study, the instrument has four domains (see Table 1): emotional support, classroom organization, instructional support, and student engagement. The domains are measured using a total of twelve items, each of which is scored on a seven-point scale. FFT (Danielson, 2013) is grounded in a constructivist view of teaching and learning. The developer emphasizes its use in a concerted professional development model (Danielson, 2011). The instrument has four domains: planning and preparation, professional responsibilities, classroom environment, and instruction. Only the latter two can be scored using classroom observations and are the focus of the present research. As shown in Table 1, both domains include four items, each of which is scored on a four-point scale. It is notable that FFT has recently been adopted by numerous school districts including New York City (New York City Department of Education, 2013) and Chicago (Sporte, Stevens, Healey, Jiang, & Hart, 2013). PLATO was designed with two explicit premises in mind (Grossman et al., 2013). First, that quality teaching in ELA involves practices that are specific to the effective teaching of reading and writing. Second, that teaching quality is multidimensional, such that effective teachers “possess a range of characteristics and skills that contribute directly and indirectly to improved student outcomes” (Grossman et al., 2013, p. 447). The full instrument includes Profiles of Instructional Practice 8 thirteen items, but the version adopted in the MET study (referred to as “PLATO prime”) was shortened to six items (see Table 1). These six items are each scored on a four-point scale, and are grouped into three broader domains: disciplinary demand, instructional scaffolding, and classroom environment (Grossman et al., 2014). In addition to their demonstrated research potential, the use of observational instruments in professional settings is promising for several reasons. First, they provide educators with a common language for talking about teaching. Second, by moving the focus from year-end student outcomes to improvement of the process of teaching, the instruments can support the professionalism of teachers. Third, by providing teachers with information about their performance on a range of instructional practices, the instruments can frame learning to teach as a long-term, developmental process, rather than a task accomplished before or at the beginning of teachers’ careers. Previous Evidence of Reliability and Validity Of the three instruments considered in this paper, CLASS has the most extensive psychometric research base (e.g., Hamre & Pianta, 2010; Mashburn, Meyer, Allen, & Pianta, 2009). The MET study made a major contribution by providing comparative evidence for FFT, CLASS, and PLATO (as well as other instruments), drawing on data from over 3,000 teachers of ELA and mathematics in grades four through nine across six school districts in the United States. In Table 2 we summarize the main findings reported by Kane and Staiger (2012). The reliability coefficients of FFT, CLASS, and PLATO ranged from .31 to .37 when the instruments were administered by trained raters using 15-25 minute video recordings of teachers’ classroom practices. These coefficients were computed for the total scores and are interpretable in terms of the proportion of variance over multiple administrations of the instruments (i.e., test- Profiles of Instructional Practice 9 retest reliability). By quadrupling the number of administrations, reliabilities in the range of .6 to .7 were expected (also see Ho & Kane, 2013). The correlations of the total scores with teachers’ VA measures on ELA year-end state examinations were positive but relatively weak. For the SAT-9 open-ended reading examination, mean differences of .10 to .16 standard deviation units were observed between teachers in the top and bottom quartiles of the observational instruments. Similarly, mean differences on student engagement and social-emotional development were between .05 and .18 standard deviation units. In an analysis of the same dataset, Grossman et al. (2014) reported correlations between SAT-9 VA measures and each of three PLATO prime subscales, with the highest correlation for classroom environment (r = .15), followed by disciplinary demand (r = .12), and instructional scaffolding (r = .04). These results summarize the current state of the literature and indicate that there is an important role to be played by continued measurement research. In particular, the test-retest reliability and criterion-related validity of total scores leave room for improvement, and we are not aware of any research that has addressed the standard error of measurement of scores assigned to teachers, or the application of diagnostic measurement models. These topics are the focus of what follows. Latent Class Analysis of In-Classroom Observation Data Conceptual Overview LCA involves latent (unobserved) variables that are indicated by measured (observed) items. In contrast to more common measurement models such as factor analysis and item response theory, the latent variable in LCA is categorical rather than continuous. Contemporary psychometric research has seen a return to models that involve categorical latent variables, because of their diagnostic properties (see Embretson & Yang, 2013). In addition to its Profiles of Instructional Practice 10 psychometric applications, LCA is often used as a method of model-based clustering. From this perspective, LCA is an individual-centered approach, used primarily to identify persons who cluster together based on similarities in their item scores. In contrast, factor analysis is often considered a variable-centered approach, in which the goal is to identify items that hang together with one another. In the present application, each latent category represents a group of teachers with similar instructional practices, as measured by their scores on the items of the observational instruments. Because the observed variables are not correlated within the latent classes (see equation A2 in the appendix), the latent variable suffices to explain systematic differences in teachers’ instructional practices. Therefore, the latent variable can be interpreted to distinguish what is unique about teachers’ practices (i.e., signal) from the measurement error of the instrument (i.e., noise). Within each latent category, the most probable score on each item can be estimated (see equation A3 in the appendix). We interpret these scores as representing the profile of instructional practice of the teachers in that category. LCA has the advantage of allowing for statistical inferences about the specific practices that are important for differentiating among the profiles. The diagnostic value of LCA comes from interpreting the profiles in light of theory, research, and practice in teaching and learning, which we illustrate in the example. In combination with evidence about how the profiles are related to student outcomes, this can provide a strong basis for informing feedback to educators and future research. LCA also supports inferences about the most likely profile for each individual teacher (see equation A4 in the appendix). This is used to assign teachers a “profile membership score,” which replaces the use of a total score as a summary measure. As noted, a main reason that we

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.