Using Classroom Artifacts to Measure Instructional Practice in Middle School Science: A Two-State Field Test CSE Technical Report 690 Hilda Borko, University of Colorado Brian M. Stecher, RAND July 2006 National Center for Research on Evaluation, Standards, and Student Testing (CRESST) Center for the Study of Evaluation (CSE) Graduate School of Education & Information Studies University of California, Los Angeles GSE&IS Bldg., Box 951522 Los Angeles, CA 90095-1522 (310) 206-1532 Project 1.1 Comparative Analysis of Current Assessment and Accountability Systems Hilda Borko, University of Colorado, and Brian M. Stecher, RAND, Project Directors Copyright © 2006 The Regents of the University of California The work reported herein was supported under the Educational Research and Development Centers Program, PR/Award Number R305B960002, as administered by the Institute of Education Sciences (IES), U.S. Department of Education. The findings and opinions expressed in this report are those of the author(s) and do not necessarily reflect the positions or policies of the National Center for Education Research, the Institute of Education Sciences (IES), or the U.S. Department of Education. USING CLASSROOM ARTIFACTS TO MEASURE INSTRUCTIONAL PRACTICE IN MIDDLE SCHOOL SCIENCE: A TWO-STATE FIELD TEST Hilda Borko, CRESST/University of Colorado Brian Stecher, CRESST/RAND Abstract This report presents findings from two investigations of the use of classroom artifacts to measure the presence of reform-oriented teaching practices in middle-school science classes. It complements previous research on the use of artifacts to describe reform-oriented teaching practices in mathematics. In both studies, ratings based on collections of artifacts assembled by teachers following directions in the “Scoop Notebook” are compared to judgments based on other sources of information, including direct classroom observations and transcripts of discourse recorded during classroom observations. For this purpose, we developed descriptions of 11 dimensions of reform- oriented science instruction, and procedures for rating each on a dimension-specific five- point scale. Two investigations were conducted. In 2004, data were collected from 39 middle- school science teachers in two states. Each teacher completed a Scoop Notebook, each was observed by a singe observer on two or three occasions, and eight of the teachers were also audio-taped, allowing us to create transcripts of classroom discourse. In 2005, 21 middle-school mathematics teachers participated in a similar study, in which each teacher was observed by a pair of observers, but no audio-taping occurred. All data sources were rated independently on each of 11 dimensions. In addition, independent ratings were made using combinations of data sources. The person who observed in a classroom also reviewed the Scoop Notebook and assigned a “gold standard” rating reflecting all the information available from the Notebook and the classroom observations. Combined ratings were also assigned based on the transcripts and notebooks, and based on the observations and transcripts. The results of these field studies suggest that the Scoop Notebook is a reasonable tool for describing instructional practice in broad terms. For example, it could be useful for providing an indication of changes in instruction over time that occur as a result of program reform efforts. There was a moderate degree of correspondence between judgments of classroom practice based on the Scoop Notebook and judgments based on direct classroom observation. Correspondence was particularly high for dimensions that did not exhibit great variation from one day to the next. Furthermore, judgments based on the Scoop Notebook corresponded moderately well to our “gold standard” ratings, which included all the information we had about practice. 1 Project Goals and Overview Our long-term research program investigates the reliability and validity of using artifacts to measure reform-oriented instructional practices. We focus on instructional artifacts because of their potential strength for representing what teachers actually do in classrooms (rather than what they believe they do). We use a data collection tool called the “Scoop Notebook” to gather classroom artifacts and teacher reflections related to key features of classroom practice. To date, we have studied the use of artifacts in two subject areas—middle school mathematics and science. We conducted pilot studies to provide initial information about the reliability, validity and feasibility of artifact collections as measures of classroom practice in these subjects (Borko, Stecher, Alonzo, Moncure, & McClam, 2005), and a field study to validate the Scoop Notebook in middle school mathematics classrooms (Stecher, Borko, Kuffner, Wood, Arnold, et al., 2005). Our notebook and scoring procedures were revised on the basis of results from each of these studies. In this report, we present the results of two related studies conducted to validate the Scoop Notebook as a measure of reform-oriented instructional practice in middle school science classrooms. The report first describes the notebook, the 11 dimensions of instructional practice it measures, and the associated scoring rubrics. Next, the methodology employed for the two studies is presented, including study design and data collection procedures. Results from both studies are integrated in the next section which documents the reliability and validity of the Scoop Notebook for measuring reform-oriented practice in science. The analyses address two main research questions: 1. What is the reliability of raters’ judgments of instructional practice based on the Scoop Notebook, transcripts of classroom discourse, and classroom observations? 2. What is the evidence to support conclusions about the validity of ratings based on the Scoop Notebook as a measure of reform-oriented instructional practice in science? • To what extent do scores assigned on the basis of the Scoop Notebook agree with those assigned on the basis of transcripts or classroom observations, and with scores that use all available information about a classroom (i.e., “gold standard” scores based on observations and the Notebook). 2 • Are the patterns of relationships among the 11 dimensions of reform- oriented instruction consistent across notebooks and classroom observations? Methods Overview We conducted two field studies in middle-school science classrooms in Colorado and California. The first study, conducted in 2003-04, investigated the reliability and validity of ratings of practice based on the Scoop Notebook and audiotape transcripts. The second study, conducted in the spring of 2004-05, focused on examining the reliability of ratings of practice based on classroom observations. Together, the studies provide complementary evidence regarding the validity of the Scoop Notebook as a tool for characterizing reform-oriented instructional practice in science. Participants For the 2003-04 field study, we contacted a diverse sample of middle schools in districts that had participated in a previous study of artifacts in mathematics classes (Stecher et al., 2005). In schools that agreed to participate in the new study, volunteers were sought through notices sent to all science teachers or announcements at meetings of the science department. Thirty-nine teachers participated in this study; 16 were from California, 23 from Colorado. For the 2004-05 study, we contacted districts and schools in California and Colorado that had participated in 2003-04 and in previous studies, and we recruited teachers in a similar manner. Twenty-one science teachers participated in this study—11 in California, and 10 in Colorado. Three or four of these teachers had participated in the 2003-04 study. One of the teachers had participated in an earlier pilot study. In both studies participating teachers received an honorarium of $200-$250 for collecting artifacts, completing reflections, assembling Scoop Notebooks, and being observed.1 Data Collection The Scoop Notebook. As described in previous papers (Borko et al., 2005; Stecher et al., 2005), we developed a tool for the collection of data related to classroom instructional practices using an analogy to the approach of scientists studying unfamiliar territory (e.g., the Earth’s crust, the ocean floor). Just as scientists 1 Different rates were negotiated with districts in Colorado and California based on local practice. 3 may scoop a sample of materials to take to their laboratories for analysis, we planned to “scoop” materials from classrooms for ex situ analysis. Through the use of this tool we hoped to structure the collection of data to obtain information on instructional practices similar to what could be obtained through classroom observations, without the time and expense of such methods. We asked teachers to collect materials produced as part of their regular instruction and then place the materials in a notebook. Because of the usefulness of the analogy, we called our artifact collection package the “Scoop Notebook.” When we described the Scoop Notebook to participating teachers, we framed the task in terms of the question: “What is it like to learn science in your classroom?” For the 2003-04 and 2004-05 studies we asked teachers to collect artifacts from one of their classes for a period equivalent to five normal periods of instruction, following guidelines in the Scoop Notebook. Because we were interested in all types of materials used to foster student learning, we asked teachers to “scoop” materials or artifacts that they and their students generated, as well as materials drawn from textbooks or other curricular resources. The “scooped” artifacts included: instructional materials such as lesson plans, overhead transparencies, and grading rubrics; student work with corresponding teacher reflections; photographs of the classroom; and teacher reflections based on guiding questions posed throughout the Scoop period. We packaged the Scoop Notebook as a three-ring binder, consisting of the following components: • project overview • directions for collecting a “Classroom Scoop” • folders for assembling artifacts • “sticky notes” for labeling artifacts • calendar for describing “scooped” class sessions • daily reminders and final checklist • disposable camera • photograph log • consent forms 4 • pre-scoop, post-scoop, and daily reflection questions Directions in the notebook asked teachers to collect three categories of artifacts: materials generated prior to class (e.g., handouts, scoring rubrics), materials generated during class (e.g., writing on the board or overheads, student work), and materials generated outside of class (e.g., student homework, projects). The teachers were encouraged to include any other instructional artifacts not specifically mentioned in the directions. For each instance of student-generated work, teachers were asked to collect examples of “high,” “average,” and “low” quality work. Because we were interested in teachers’ judgments about the quality of student work, we requested that their selections be based on the quality of the work rather than the ability of the students, and we asked them to make an independent selection of student work for each assignment rather than tracking the same students throughout the artifact collection process. In addition, the teachers were given disposable cameras and asked to take pictures of the classroom layout and equipment, transitory evidence of instruction (e.g., work written on the board during class), and materials that could not be included in the notebook (e.g., posters and 3-dimensional projects prepared by students). They also kept a photograph log in which they identified each picture taken with the camera. Each day teachers made an entry in the calendar, giving a brief description of the day’s lesson. Prior to the Scoop period they responded to pre-scoop reflection questions such as, “What about the context of your teaching situation is important for us to know in order to understand the lessons you will include in the Scoop?” During the Scoop, teachers answered daily reflection questions such as, “How well were your objectives/expectations for student learning met in today’s lesson?” After the Scoop period, they answered post-scoop reflection questions such as, “How well does this collection of artifacts, photographs, and reflections capture what it is like to learn science in your classroom?” Appendix A provides a complete list of the three sets of reflection questions. Additional Data Sources: Classroom Observations and Transcripts In addition to collecting Scoop Notebooks from teachers, members of the research team observed each classroom for two to three days during the time in which the teacher collected artifacts in the Scoop Notebook. During these lessons, observers wrote open-ended field notes describing the lessons they observed. During 5 the 2003-04 study, observations were done individually, i.e., a single researcher observed each teacher for two or three days.2 In the 2004-05 study, observations were done in pairs, i.e., the same two researchers observed each teacher on two or three occasions. In the 2003-04 study, we also collected audiotapes of lessons in eleven classrooms to explore the feasibility of obtaining classroom discourse data as part of the artifact collection process, as well as to determine what additional information transcripts of classroom discourse provided. The researchers who observed in these classrooms audio-taped the lessons by having teachers wear a simple wireless microphone. The audiotapes were transcribed to provide a record of classroom discourse. Scoring the Notebooks, Observations, and Discourse In order to evaluate the extent to which teachers emphasized reform-oriented instructional practice in their science classrooms we developed 11 dimensions of practice, which were used as the basis for comparison of data collected through notebooks, classroom observations, and transcripts of audio-taped discourse. These dimensions were informed by documents such as the National Science Education Standards (National Research Council [NRC], 1996). In the following sections we describe these dimensions, the scoring guides and rating process, and the procedures followed for the training of raters and observers. Unless otherwise noted, the same dimensions, guides and procedures were used in 2003-04 and 2004-05. Dimensions of Reform-Oriented Practice The term “reform-oriented science” describes an approach to science teaching that encompasses both content (“what is taught”) and pedagogy (“how it is taught”). Reform-oriented science includes practices associated with the idea of “science as process,” i.e., having students focus on skills such as observation, measurement, and experimentation. In addition, in a reform-oriented science classroom students learn how to ask and pursue questions, construct and test explanations, form arguments, and communicate their ideas with others (NRC, 1996). Guided by the vision of a science classroom portrayed in the National Science Education Standards, as well as elements of standards-based science instruction defined by a panel of experts 2 In three or four instances in California, a teacher was observed by two different researchers during the three-day observation period. 6 convened by the Mosaic-II project (Stecher et al., 2005), we identified 11 dimensions of “reform based” instruction in science. The initial versions of the dimensions were revised as a result of the pilot study (Borko et al., 2005) and they were modified slightly between the 2003-04 and the 2004-05 studies to add additional clarification.3 The final dimension descriptions are as follows: 1. Grouping. The extent to which the teacher organizes the series of lessons to use groups to work on scientific tasks that are directly related to the scientific goals of the lesson and to enable students to work together to accomplish these activities. (Active teacher role in facilitating groups is not necessary.) 2. Structure of Lessons. The extent to which the series of lessons is organized to be conceptually coherent such that activities are related scientifically and build on one another in a logical manner. 3. Use of Scientific Resources. The extent to which a variety of scientific resources (e.g., computer software, internet resources, video materials, laboratory equipment and supplies, scientific tools, print materials,) permeate the learning environment and are integral to the series of lessons. These resources could be handled by the teacher and/or the students, but the lesson is meant to engage all students. By variety we mean different types of resources OR variety within a type of scientific resource. 4. “Hands-On”. The extent to which students participate in activities that allow them to physically engage with the scientific phenomenon by handling materials and scientific equipment. 5. Inquiry. The extent to which the series of lessons involves the students actively engaged in posing scientifically oriented questions, designing investigations, collecting evidence, analyzing data, and answering questions based on evidence. 6. Cognitive Depth. Cognitive depth refers to a focus on the central ideas of the unit, generalization from specific instances to larger concepts and connections and relationships among science concepts. There are two aspects of cognitive depth: the lesson design and teacher enactment. Thus, this dimension considers extent to which 3 All changes for the 2004-05 study were minor, except the change to Cognitive Depth described below. 7 lesson design focuses on cognitive depth and the extent to which teacher consistently promotes cognitive depth.4 7. Scientific Discourse Community. The extent to which the classroom social norms foster a sense of community in which students feel free to express their scientific ideas openly. The extent to which the teacher and students “talk science,” and students are expected to communicate their scientific thinking clearly to their peers and teacher, both orally and in writing, using the language of science. 8. Explanation/Justification. The extent to which the teacher expects and students provide explanations/justifications either orally or on written assignments. 9. Assessment. The extent to which the series of lessons includes a variety of formal and informal assessment strategies that measure student understanding of important scientific ideas and furnish useful information to both teachers and students (e.g., to inform instructional decision-making). 10. Connections/Applications. The extent to which the series of lessons helps students: connect science to their own experience and the world around them; apply science to real world contexts; or understand the role of science in society (e.g., how science can be used to inform social policy). 11. Overall. How well the series of lessons reflect a model of instruction consistent with dimensions previously described. This dimension takes into account both the curriculum and the instructional practices. Scoring Guides and Rating Process Each dimension in the Scoop Notebook is rated on a five-point scale from low (1) to high (5). To facilitate the rating process, we developed a scoring guide containing an overall description of the dimension and specific descriptions of the low, medium and high anchor points. For each of these anchor levels, one or two classroom examples are provided, as well. The complete scoring guide used for rating the 2004-05 observations and notebooks is presented in Appendix B. Minor additions were made to the observation guides prior to rating the notebooks, so they would contain examples of the type of evidence found in the notebooks. 4 In 2003-04 the last two sentences read: There are three aspects of cognitive depth: the lesson design, teacher enactment, and student performance. Thus, this dimension considers the extent to which lesson design focuses on cognitive depth; the extent to which the teacher consistently and effectively promotes cognitive depth; and the extent to which student performance demonstrates cognitive depth. 8
Description: