Student Risk Assessment for Identifying Needs and Evaluating Impacts Article Raymond E. Morley and James R. Veale Abstract: The purpose of this article is to present a basis for evaluating the effectiveness of programs and services for at-risk children and youth. An instrument designed to identify most at risk to least at risk students is presented along with examples of how to assess and evaluate program effectiveness utilizing the instrument. The primary intent of applying the process described is to assist personnel to leverage resources for maximum benefit. T he term “at risk” was originally defined in whether, in fact, the services were impacting those Iowa (Office of Educational Services for students most at risk. This previously hidden infor- Children, Families and Communities, 1996) mation was needed to develop the necessary knowl- with the following results-oriented criteria: Chil- edge to change services to help the most at-risk dren and youth (a) not meeting goals within on- children and youth. The Student Risk Assessment going education programs, (b) not completing high Instrument moved the teams to more profound lev- school, and (c) not becoming a productive worker els of knowledge for planning and leveraging re- upon leaving high school. Multiple criteria were sources. identified in each of these three categories to as- The development and implementation of the sist in identification. A given student could be at instrument occurred from 1990 to 2000, a 10-year risk by one or more of the three categories. The period of model program development between specific criteria used to identify students as at risk schools and multiple community-based support were drawn from a wide array of state and na- service agencies and organizations. Partial support tional information regarding factors that contrib- for development came through the FINE (First in ute to student failure and lack of success in school. the Nation in Education) Foundation (Veale, 1995). Multiple criteria for identification are indicated and The Student Risk Assessment Instrument serves suggested for use in each of the categories. These as a tool to assist schools and school districts to criteria are still being used in Iowa schools to iden- determine the effectiveness of programs. More- tify students who need additional assistance to over, it allows observations of student performance succeed and to leverage resources to help students on outcomes across risk levels, which can help maximize success. These same criteria plus more with planning and modifying services, as well as are used in the enclosed risk assessment instru- resource management. ment intended to assist educators to identify at- Thirty factors were identified by local commu- risk children and youth, leverage resources, and nity teams as significant reasons for students be- assess the effectiveness of services provided. Mul- ing at risk of not succeeding in school, dropping tiple examples are provided to illustrate its utiliza- out of school, or not becoming a productive mem- tion in the management and delivery of services ber of society. Seven factors were identified as and in assessing and evaluating their effectiveness. “critical” for determining degree of risk, while the other 23 were considered important but “noncriti- Student Risk Assessment cal.” A critical factor is one that may by itself force Instrument a student into a school failure, dropping out, or lack of productivity upon leaving school. The criti- An instrument is presented in this paper for cal factors are (1) dropped out or expelled; (2) vic- identifying students who are least at risk to those tim of physical, psychological, sexual abuse, rape who are most at risk. This instrument was devel- or other violent crime; (3) pregnancy/teen parent; oped from team processing of program effective- (4) homeless; (5) language/cultural barriers; (6) out- ness by school and community-based support ser- of-home placement; and (7) committed criminal vices personnel in the School-Based Youth Services acts. A noncritical factor is one which combined Program in Iowa (Veale, Morley, & Erickson, 2002). with other such factors (altogether, four or more) In order to plan how to work together and make a may force a student into school failure, dropping difference for children and youth, team members out, or lack of productivity. Noncritical factors in- needed to determine whether services were ef- clude repeated school failure, no extracurricular fective with the most at-risk children. Broad-based activities, chronic health condition, gang member- group data was not enough to demonstrate Winter 2005 VOLUME 11 NUMBER 1 1 ship, and no identified career interests, inter alia. The Student Risk For purposes of evaluating the impact of services, we suggest Assessment Instrument is presented in the Appendix. that new information can increase—but should not decrease—the risk The factors we came up with agreed closely with those estab- level of a student. This does not mean that the student cannot over- lished in the Phi Delta Kappa (PDK) “Study of Students At Risk” come these risk factors. The only situation where a student’s risk (Frymier, 1992a, 1992b). Although published a year before we de- could decrease would be when the original assessment was in er- veloped our instrument, we were not aware of that study at the ror. For example, suppose that a student’s attendance for the year time. Since that study was based on data from more than 20,000 was incorrectly recorded as 85 days missed, whereas the actual students, and all of the factors included in the resulting PDK tem- number of days was 8.5 days—a transcription error involving a plate were associated with factors included in our instrument, we misplaced decimal point. This should not be confused with the situ- felt that this provided a degree of validity for the factors included in ation where a student no longer indicates the risk factor(s), e.g., a our instrument and their generalizability outside of Iowa. student whose attendance had been very poor but who is now at- Empirical data have provided further validation. For example, tending regularly. The risk factor (poor attendance) is still there; it is students classified as high risk were found to have higher dropout just not presently being manifested. In contrast, changing a student’s rates than those of medium or low risk. Since having previously risk classification from high to low (or medium) would reduce one’s dropped out of school is one of the factors contributing to risk, this ability to demonstrate program impact using standardized measures result provides further evidence regarding the validity of risk as- or informal assessments. Since the focus of a demonstration is of- sessment using this instrument. Reliability was assessed in a study ten those who are most at risk, there would be fewer records on where separate observers assessed the same students in a collabo- which to make such an evaluation. In effect, this would be throwing rative services program site in Iowa. away data. This instrument has been found to be useful in describing popu- We consider the level of student risk to be a background charac- lations served, evaluating the impact of services in those popula- teristic, not an outcome. As such, the assessment of student risk tions, identifying student needs, establishing policy guidelines, and can yield a specification, restriction, or qualification of program ef- as a tool for leveraging resources for school improvement initia- fectiveness. Risk is not itself a measure of program effectiveness tives. The instrument has the following advantages: (outcome) in this system. Professional judgment must be utilized and trusted in the appli- • simple checklist format; cation of this instrument. Local school personnel are given the flex- • three levels of risk assessment (low, medium, and high), allow- ibility to make the decisions on risk classifications of children based ing easy entry into the database and use in surveys via color- on available data outside the instrument itself. Information from coding (e.g., for evaluating impact of services); multiple resources will be necessary in order to apply the classifica- • validity based on comparisons with an instrument of established tion of students most effectively. For example, information from validity and empirical data; human services personnel may be necessary to verify homelessness. • reliability based on indices of interobserver agreement and cor- [Note: A spreadsheet template is available to monitor and calculate relation; the level of student risk, as well as summary statistics on the risk • specifically targeted to students and families in collaborative factors for the student population. This template may be obtained services programs. by request, free of charge, from the authors.] Classification by Level of Student Risk Assessment of Student Risk: The classification by level of student risk is based on the num- What Do We Get From It? ber and types of factors identified for a student. A student is classi- The assessment of student risk yields the following benefits for fied as having students, schools, and programs: • low risk if no factors were indicated; • describing population served—gives information on how many • medium risk if one to three noncritical factors were indicated in a program are at high, medium, and low levels of risk; (no critical factors); • identifying student needs—provides a holistic, diagnostic picture • high risk if (a) one or more of the critical factors were indicated of each individual student’s needs (to personalize and fine-tune or (b) four or more of the noncritical factors were indicated. service delivery); • evaluating impacts of services—determines the effectiveness of It is intended that staff members identify these risk factors for each services for students at different levels of risk; student upon intake and update these assessments whenever risk • establishing policy guidelines—determines the minimum num- increases significantly and new information becomes available on ber and type(s) of contacts for a student in a school year to in- students. (If no information is available on a student, he or she is crease the likelihood of positive outcomes (e.g., keeping the high- classified as having unknown risk. This may occur, for example, when est-risk students in school); a student has just entered the school or program.) The rationale for • improving schools—incorporates provisions for at-risk students, the above rule was (a) to provide greater weight to the critical fac- as identified by factors in the risk assessment instrument, into tors, (b) incorporate a cumulative effect for the noncritical factors, the comprehensive school improvement plan. and (c) insure practicality by keeping it simple to use. 2 The Journal OF AT-RISK ISSUES The first of these benefits provides an answer to the first part of problems that cause him to be absent. The nurse becomes more the question that gave rise to the risk assessment instrument: “How concerned because she suspects that there may be other factors do we know we are serving and impacting the most seriously at-risk contributing to the student’s health issues. She notes that the students (in the community)?” We can determine the number of student does not appear to have warm clothing or a heavy win- students participating in a program or initiative who are high, me- ter coat. The nurse sets up a visit for the student with the school- dium, and low risk. There may also be others in the community based case manager who completes a more thorough assess- who are high risk and not participating in the support services pro- ment of needs with the student. As the case manager is assess- gram. If the instrument could be applied more generally to students, ing the various risk factors, he or she learns that the student is students not involved in services could also be assessed. homeless and that he and his family are often forced to sleep in For example, a student’s risk classification is included as a de- their car. Both parents dropped out of school before graduating mographic variable (“risk factor”) in the database EASY/2EASY used and work part-time at minimum wage. They have no benefits in the School-Based Youth Services Program (SBYSP) in Iowa (Veale, such as insurance, sick leave, or vacation time. They cannot leave Morley, & Erickson, 2002). In the SBYSP in 1997-98, based on a work to take their child to a doctor or clinic where they may total of 21,405 K-12 students served, we found that 21.8% were have to wait several hours to be served. high risk, 22.0% medium risk, and 44.0% were low risk (12.3% were of unknown risk). This may be presented in a pie chart, as in The risk associated with being homeless is far greater than that Figure 1. In this example, slightly more than one student in five is of poor attendance and/or “colds” and alerts the case manager that high risk, and about half of those of known risk are either high or a different type and intensity of services will be required. medium risk. Since the SBYSP is open to all students, these figures Student risk assessment provides an opportunity to look for plau- for high and medium risk may be considered fairly high. These fig- sible relationships among many different variables and to gauge ures will vary over program sites and over time. the type and level of intervention that may be necessary. Investigat- ing many different factors also makes it more likely that the cause of the barriers to success can be discovered and addressed rather Figure 1. Student Risk Profile for the SBYSP in than focusing on an array of symptoms. In this case a cold would be 1997-98 a symptom of the student’s more serious issue of homelessness.1 The third benefit—determining program impact for students at varying levels of risk—addresses the second aspect of the question Student Risk Profile that led to the development of the risk assessment tool. In out- comes evaluation, it is of interest to determine the degree to which SBYSP 1997-98: All 18 Sites performance on some outcome, for example absenteeism, is differ- ent for students at different risk levels. Such differences point to the importance of considering the social or cultural conditions (con- Medium risk 22.0% texts) on which the impact of the initiative may be contingent High risk 21.8% (Pawson & Tilley, 1997). For example, if absenteeism is significantly reduced among the high-risk male student participants, this indi- cates that the initiative is contributing to improved attendance for male students most at risk. This result can lead one to question Unknown 12.3% why the program isn’t also successful with high-risk female stu- Low risk 44.0% dents. Reflection and dialogue can result in changes in program focus or implementation that may yield significant improvement in N = 21,405 attendance for all high-risk students. Longitudinal analysis can add an important dimension to an evaluation. In the Caring Connection, a school-based collaborative services program in Marshalltown, Iowa, outcome data are added each year to the previous year’s database. The premise here is that The second benefit of student risk assessment is that it provides it is unrealistic to expect students to turn their academic lives around a holistic, diagnostic picture of the student’s needs. This can be in one year. Multiyear data provide the opportunity to assess progress used in customizing or personalizing services and fine-tuning deliv- on outcomes over longer time intervals. For example, improvement ery of services. The value of using the risk assessment instrument in attendance is defined as missing no more than 10 days in the as a diagnostic tool to drive service delivery is demonstrated by the current school year after missing more than 10 days in the previous following set of circumstances (Veale, Morley, & Erickson, 2002): year. This definition may be applied to succeeding years to assess improvement over a longer time interval. In the Marshalltown pro- A student is frequently absent, citing health problems as the gram among high-risk students missing more than 10 school days reason. He is sent to the school nurse, and she learns that he in 1997-98, 17.9% improved in the following year and 26.3% (a has had frequent colds and other respiratory infections. The 47% increase) improved in the third year—over their attendance in health symptoms are treated, but he continues to have health the first year. Among students missing more than 10 days in 1997- Winter 2005 VOLUME 11 NUMBER 1 3 98 who were medium risk, 23.3% improved in the following year All of the above discussions of benefits apply to school improve- and 37.7% (a 62% increase) improved in the third year, while among ment initiatives. Local school districts are required under existing those who were low risk, 40% improved in the following year and standards, largely driven by the No Child Left Behind legislation of 53.8% (a 35% increase) improved in the third year. This shows 2001, to evaluate the effectiveness of existing programs and ser- longer-term improvement among all risk categories, with somewhat vices for at-risk children and youth. The expectation is that this will greater percentage increases in improvements the third year (over occur at all levels of education (elementary through high school). those of the second year) among the medium-and high-risk stu- Effectiveness of programs can be measured by identifying whether dents.2 Moreover, among those of medium or high risk missing more the most at-risk children and youth are improving and succeeding than 10 days in 1997-98, the proportion improving their attendance in school. It is important to consider the totality of risk factors and from more than 10 days missed in 1998-99 to 10 days or fewer how these are distributed over the various groups mandated in the missed in 1999-2000 exceeded the proportion whose attendance No Child Left Behind legislation to identify specific needs and achieve worsened during this period (P < .05, McNemar test). This result success (Foster, 2004). implies that the longer-term improvement (over the three-year pe- Effectiveness can also be demonstrated longitudinally by a re- riod 1997-98 to 1999-2000) was significantly greater than the short- duction in the percentage of children and youth who are at high term improvement (over the two-year period 1997-98 to 1998-99), risk (or an increase in the percentage who are at low risk). As stated for these higher risk students. earlier, this strategy has not been utilized in past research using the The fourth benefit is related to the third—establishing policy Student Risk Assessment Instrument, but the possibilities remain guidelines to increase the likelihood of success among students. For open for application in local school districts. In order to accomplish example, in the School-Based Youth Services Program in Iowa, it this type of measurement, some attempt would have to be made to was found that high-risk students with more than 25 contacts with reclassify students at given time periods such as (a) the grade levels the program had significantly lower dropout rates than those with for standardized testing, (b) September (the beginning of the school fewer contacts—10.3% compared with 14.3% (P < .05). This was year) and May (the end of the school year), or (c) upon entry into not true for those at medium or low risk (see Figure 2). Thus, in school and upon exit or graduation. terms of lowering dropout rates, the program appears to be impact- Comprehensive school improvement plans identify evaluation ing high-risk students more than those at lower-risk levels. Since strategies to assess student progress. Yearly progress reports are high-risk students have the highest dropout rates, one of the pro- used to monitor the progress of students based on chosen proce- gram sites established a policy of encouraging high-risk participants dures. The plans and progress reports can incorporate the above to secure at least 25 service contacts with the program staff. Of ideas to address at-risk children and youth including services pro- course, the services must be appropriate to the specific needs of vided and evaluation of effectiveness of those services. This data the student. The Student Risk Assessment Instrument provides the utilization would provide more precise assessment of progress with ability to fine-tune and personalize service delivery. Similar policy high-, medium-, and low-risk children and youth from a compre- guidelines may be developed around other outcome areas or other hensive point of view. Assessments could be conducted at the el- types of programs.3 ementary, middle, and high school levels to evaluate effectiveness of services at each level, and resources could be leveraged accord- ingly. This system could also be applied at each grade level, if nec- essary, to identify program effectiveness and to leverage resources. Figure 2. Dropping Out and Magnitude of SBYSP In particular, federal and state funding sources identified in com- Contact by Level of Student Risk prehensive school plans could be directed accordingly. The case study on page 5 illustrates why we do not recommend erasing a risk factor, even though a student may no longer manifest Dropping Out and Magn itude of SBYSP Contact the particular behavioral tendencies that define it. The fact that the By Level of Stud ent Risk: 1995-96 student had those tendencies at one time means that he or she could 20.0 return to them at some time in the future. We know, for example, that students who drop out are at increased risk of dropping out again. 15.0 14.3 Moreover, although it may have receded, having the risk factor (e.g., 10.3 poor attendance) at a previous time could make it more difficult to 10.0 achieve outcomes during school or when the student gets out of school 5.0 and into a work activity (e.g., showing up for work). The fact that, in 2.9 2.6 0.4 0.0 some cases, students may overcome these risks and achieve success 0 makes their story all the more impressive. 0-25 contacts >25 contacts Magnitude of SBYSP Contact Validity of Risk Assessment Level of Student Risk The validity of an assessment is the quality of accurately assess- High risk Medium risk Low risk ing the desired construct, trait, or behavior. In this case, the con- struct is a student’s risk—of dropping out of school, not success- 4 The Journal OF AT-RISK ISSUES Case Study The following case study submitted by a local collaborative services program coordinator provides an example of how the Student Risk Assessment Instrument can help in organizing the various risk factors that are impacting the lives of students. It illustrates how the use of the tool is really a process that evolves as knowledge of the student’s risk factors increases. Example: Case Study Illustrating the Use of Risk Assessment to Diagnose Student Needs and Fine-Tune Service Delivery The risk assessment tool was initially used to determine if this particular student (we’ll call him Bill) needed to be in a tutoring program. Four factors became apparent as we filled out the form: • Experienced repeated school failure (Bill had failed several classes in the middle school); • Poor attendance (his attendance had been sporadic for some time); • No extracurricular school activities (he had not participated in any such activity); • Economically disadvantaged (he was from a low-income family). Four noncritical factors made him a high-risk student, and one for whom the tutoring program was appropriate. After being in the program for several weeks, it became apparent why he had been struggling in school. Bill opened up to me one day and told me about the physical abuse that he and his mother had been suffering at the hands of his father. Going back to the assessment tool helped us get a clearer picture of how at risk this young man was. We now had to add the following to his list of risk factors: • Recent crisis or life transition (his father moved back into the home after having been gone for a couple of years); • Extreme mobility (the family had moved several times to get away from the father); • Victim of physical . . . abuse (the boy was a victim of physical child abuse by his father); • Experienced mental health problems (we referred him to mental health counseling). After the Department of Human Services became involved, things began to change. Some of the other risk factors faded as Dad moved away. However, new ones cropped up. Bill became the father in the family, taking care of a very mentally ill and depressed mother and two little sisters. We would have to add family dysfunction to the list of factors as he took on the parental role,4 as well as substance abuse by a family member, as Mom was using (drugs). A new crisis appeared as Mom was placed into the Mental Health Institute. Sisters were removed and for a while Bill was basically homeless, with a neighbor taking care of him informally. This situation was eventually resolved. As time has gone on, new factors have arisen. Bill has become sexually active; he has had relationship problems over a girl; and he committed a delinquent act (driving without a license). While some factors may be corrected or fade over time, their effects never seem to entirely disappear. For example, Dad may leave but the effects of the abuse continue to influence how Bill reacts to his environment. Attendance may no longer be a problem, but the effects of past poor attendance could influence his learning ability and future work attendance. Therefore, it is vital to never erase a risk factor but to look instead at their cumulative effect. Strengths Indicated by Risk Factors Not Present Of particular interest are the factors that Bill has not experienced, which can be seen as strengths: • He has stayed in school (no small accomplishment) and so has not become a dropout. • His grades blossomed once he was no longer the caretaker of the family. • He is healthy and does not appear to be using drugs or alcohol. • He has personal goals and motivation to improve. • He has not been involved with the juvenile court system (a delinquent act only gave him a ticket). • He has the ability and desire to work. • He has solid career plans. With the support he now has, coupled with these strengths, we have a lot of hope for this young man. Winter 2005 VOLUME 11 NUMBER 1 5 fully completing a course of study, or not becoming a productive cluded in the PDK risk template. In some cases, there is a near worker and citizen. Validity is often considered to be a characteris- perfect match (e.g., “pregnancy/teen parent” compares with “stu- tic of the instrument. Others consider validity to be a quality of the dent involved in a pregnancy . . .”); in others, the critical factor inferences or assessments based on a specific application of the in our instrument relates to factors in the PDK template (e.g., instrument (McMillan, 2001). The latter is probably more accurate, “homeless” relates to “mother or father . . . unemployed” and but the language “validity of the instrument” is more common than “student does not live with real mother and real father . . . ”). that of “validity of the assessment.” Moreover, validity is always a Their classification criteria are also similar to ours—evidence of matter of degree. When quantifiable, this quality is often measured a single factor in the personal pain component (PDK) or critical by indices or coefficients on a scale of zero to one (or zero to 100%). factors (risk assessment instrument) was considered sufficient to assess the student to be seriously at risk. Evidence of two or 1. Content Validity: The Instrument Development Process—Content more family factors and one or more academic factors was also validity refers to the extent to which the assessment items rep- considered sufficient to assess the student as seriously at risk resent a larger domain of interest. Although theoretically quan- using the PDK instrument. This criterion is comparable to that tifiable, this type of validity is usually in the form of a qualitative of four or more noncritical factors for identifying a student as judgment. The process used to develop the instrument can con- high risk in the risk assessment instrument. tribute to this type of validity. In this case, the instrument was developed through a brainstorming process, with input from lo- A cross-correlation of factors indicates that all factors included cal program coordinators who were thought to be most knowl- in the PDK template are included or associated with those in the edgeable about the types of problems students have in their Student Risk Assessment Instrument, which includes other fac- families, school, or personal lives. A review process was used to tors considered critical by Iowa educators. The Student Risk As- further develop, fine-tune, and validate the instrument. These sessment Instrument includes 12 factors not identified in the processes resulted in the factors identified in the risk assess- PDK final template, which bring it into close conformance with ment instrument. The emphasis was on the practical utility of existing school standards. One may interpret this to mean that the instrument—both in terms of the checklist format and the the factors included in our instrument are slightly more compre- simple rule for classification. The authors and teams involved hensive, in order to align with existing standards for evaluation. believe that this process resulted in a practical instrument that The additional components relate to career development/edu- can be used to create a context within which to evaluate the cation, which is identified as part of the education program of effectiveness of local programs and services in reaching all chil- all students nationally (Secretary’s Commission on Achieving dren and youth, in particular the most at risk (Pawson & Tilley, Necessary Skills (SCANS), 1991, 1992). In addition, social fac- 1997). tors were included to address the importance of human growth and development, also identified in current research as intrinsic 2. Construct Validity: Agreement With Template Developed in Phi Delta to total student development (Adelman & Taylor, 2001; Goleman, Kappa Study—Another approach to assessment validation is con- 1995). Moreover, factors related to or leading to criminality were struct validity—how an assessment is related to an underlying also included in the enclosed instrument (Catalano, 1999). The construct, trait, or behavior, in this case, student risk. Often, PDK factors included criminal acts, but not other factors leading construct validity is established by studying how an assessment to criminal acts such as gang membership and committing de- is related to other assessments of the underlying trait. One such linquent acts. assessment is the “risk template” developed in a multiyear Phi Delta Kappa (PDK) study (Frymier, 1992b). A committee came 3. Construct Validity: Correlations With GPA, Absenteeism, and Stay- up with 45 factors that previous research indicated contributed ing in School—Another way to establish validity is by studying to putting children at risk. A protocol instrument was developed relationships between the assessments and other variables that and experienced professionals in 276 schools in 85 communi- are thought to be related (either positively or negatively) to the ties collected data on more than 21,000 students in grades 4, 7, underlying construct, trait, or behavior. Three such variables are and 10 across the United States and Canada. Teachers or coun- GPA, absenteeism, and (not) staying in school. Research indi- selors who knew the students best and had immediate access to cates that at-risk students will tend to have lower GPAs, greater their records provided the information. These data were sub- absenteeism, and reduced likelihood of staying in school (more jected to a variety of statistical and item analyses and the num- likely to drop out). The first two are highly correlated with all ber of factors was eventually reduced to 24. These items were other risk factors in the PDK study (Frymier, 1992a); the third grouped into three categories: family, personal pain, and aca- includes being suspended or expelled from school, which is demic failure factors. highly correlated with all other risk factors in the PDK study. There is considerable agreement between the 24-factor risk tem- In Iowa’s School-Based Youth Services Program in 1997-98, us- plate developed in the PDK study and the 30-factor risk assess- ing the Student Risk Assessment Instrument and classification pro- ment instrument. For example, all seven of the critical factors in cedure presented earlier, the data on the three above-mentioned the risk assessment instrument are associated with those in- outcomes are presented in Table 1. Each relationship was in the 6 The Journal OF AT-RISK ISSUES anticipated direction—decreasing GPA, increasing absenteeism, and classification system are summarized in Table 2. Although this was increasing dropout rate for increased level of risk.5 This provided not a random sample, the marginal totals are fairly typical of the additional evidence of the (construct) validity of the student risk risk distribution for this site. Note that these row and column totals assessment instrument. reflect a higher level of risk than in the overall program for an ear- lier time period (cf. Figure 1). Interobserver Reliability: Agreement and The cells representing agreements between the case manager’s Correlation Indices assessment of the student’s risk level and that of the school staff are shaded. The raw proportion of agreements is found by taking The reliability of an assessment is the quality of consistently the total in these cells (81) and dividing by the total number of stu- assessing the desired construct, trait, or behavior (student risk). Con- dents assessed by both observers (108)—yielding 0.75 or 75%. This sistency can be defined internally or in terms of stability over time, value indicated a fairly high level of interobserver agreement forms, or observers. In this case, the most appropriate definition of (McMillan, 2001). reliability is consistency over observers. This is called interobserver Some of the 81 agreements could be due to chance. To correct (interrater, interscorer) reliability. Like validity, reliability is often for this, Cohen’s kappa is sometimes used as an agreement index measured by indices or coefficients on a scale of zero to one (zero (Cohen, 1960). Expected values (based on the assumption of statis- to 100%). tical independence between the two observers) were computed and In 2001, coordinators of the SUCCESS Program, a collaborative subtracted from the numerator and denominator of the raw per- services school-based program in Des Moines, Iowa, agreed to par- cent of agreements, yielding a kappa of 0.309. Although not large, ticipate in a study to assess the reliability of the assessments using this value is statistically significant (P = .0004). the risk assessment instrument and classification procedure pre- The value of kappa is much smaller than the raw proportion of sented herein. The program case manager was asked to assess the agreements. Given the marginal totals in Table 1, a high level of risk levels of student participants in the program and, independently, agreement between the two observers can be expected by chance have an individual from the school staff (counselor, teacher, etc.) alone. With the marginal totals given in this table, the maximum assess the same students. Individuals who had knowledge of the raw proportion of agreements is found by first pairing the marginal students in question—their academic records, extracurricular in- totals (in Table 2, (2, 7), (22, 17), and (84, 84)), taking the smaller of volvement, and family situations—conducted the assessments. each pair (2, 17, and 84), summing (103), and dividing by the total Perhaps the simplest measure of reliability is the average pro- sample size (108). This yields a maximum raw proportion of agree- portion of matches, found by counting the number of factors on ments of .954. Then correct for chance agreements as before, yield- which the two observers agreed for each student, dividing by 30 ing a maximum kappa of .872. Another possible index is the ratio (the total number of factors in the instrument), and averaging over of kappa to its maximum value or “adjusted kappa”—0.309/0.872, the 108 students assessed. This yielded .835 or 83.5% matches on or 0.354 (Traub, 1994). This doubly corrected agreement index has the factors indicated or not indicated. This may be broken down the advantage that it has a maximum value of one, which simplifies into separate proportions of matches for critical factors (.937 or the interpretation. 93.7%) and noncritical factors (.804 or 80.4%). These proportions Cohen’s kappa counts only perfect agreements, that is, both ob- may be the most appropriate measures of reliability for the diag- servers assess the student at exactly the same level (low, medium, nostic use of the instrument to customize and fine-tune service de- or high). This is a rather stringent criterion. For example, the 20 (= livery. 9 + 11) who were assessed as medium by one rater and high by the The above results do not utilize the method of classifying stu- other rater were counted as disagreements in computing kappa. dents as high, medium, and low risk. The results incorporating this Table 1 Outcomes GPA, absenteeism (more than 10 days missed), and dropout rate by level of risk among student sparticipating in the SBYSP in 1997-98. Outcome Level of Risk GPA More than 10 Days Dropout Rate Missed Per Year Low 2.67 (N = 3,794) 27.0% (N = 8,428) 0.4% (N = 5,156) Medium 2.23 (N = 2,403) 38.4% (N = 3,644) 2.8% (N = 1,934) High 1.89 (N = 2,155) 52.9% (N = 3,061) 13.2% (N = 2,374) Winter 2005 VOLUME 11 NUMBER 1 7 Table 2 Table of agreements between the case manager and school staff assessments of risk of students in the SUCCESS Program in 2001. Level of Risk: Level of Risk: School Staff Assessment (#2) Case Manager Total Assessment (#1) Low Medium High Low 1 1 0 2 Medium 4 7 11 22 High 2 9 73 84 Total 7 17 84 108 15 One might argue that some “partial credit” or weight should be cation system may be the most appropriate for use of the instru- given to such ratings. Weighted kappa using the “quadratic differ- ment in evaluation.6 ence” weighting method, accomplishes this by assigning a weight The various indices of interobserver reliability provide evidence of one to the diagonal cells (perfect agreement), to those that are of the consistency of assessments across different observers or rat- just off the diagonal (near agreement: low on one, medium on the ers using the risk assessment instrument. This is considered the other or medium on one, high on the other), and zero to the two most critical type of reliability for such assessments.7 remaining cells (clear disagreement: low on one and high on the other) (Agresti, 1990). The weighted kappa is 0.451—a somewhat larger value reflecting the more liberal concept of agreement ap- Summary plied. It is also statistically significant (P = .0000). These indices of The student risk assessment instrument presented in the Ap- agreement utilizing the classification system may be the most ap- pendix has been found to be practical, valid, and reliable. It can propriate for use of the instrument in evaluation.6 help educators to (a) describe the risk levels in student populations, The various indices of interobserver reliability provide evidence (b) diagnose student risk issues and fine-tune service delivery, (c) of the consistency of assessments across different observers or rat- evaluate impacts of programs and services, (d) establish policy guide- ers using the risk assessment instrument. This is considered the lines for programs and services, and (e) assist with school improve- most critical type of reliability for such assessments.7 ment and accountability initiatives. We offer this discussion not for Cohen’s kappa counts only perfect agreements, that is, both ob- the purpose of justifying a means to classify at-risk children and servers assess the student at exactly the same level (low, medium, or youth, but rather to support its use in managing and delivering ser- high). This is a rather stringent criterion. For example, the 20 (= 9 + vices and in determining the effectiveness of such services. We rec- 11) who were assessed as medium by one rater and high by the other ommend it to all who are concerned with assisting at-risk youth in rater were counted as disagreements in computing kappa. One might their education and development. argue that some “partial credit” or weight should be given to such ratings. Weighted kappa using the “quadratic difference” weighting method, accomplishes this by assigning a weight of one to the diago- Endnotes nal cells (perfect agreement), to those that are just off the diagonal 1 “Homeless” is one of the demographic characteristics tracked (near agreement: low on one, medium on the other or medium on in EASY/2EASY, a system for monitoring services and tracking stu- one, high on the other), and zero to the two remaining cells (clear dent outcomes in collaborative services programs (see Veale, Morley, disagreement: low on one and high on the other) (Agresti, 1990). The & Erickson, 2002). Homelessness is a factor that indicates high risk weighted kappa is 0.451—a somewhat larger value reflecting the more (in particular, high correlation with dropping out of school) in our liberal concept of agreement applied. It is also statistically signifi- instrument and is associated with at least two factors in the PDK cant (P = .0000). These indices of agreement utilizing the classifi- template (Frymier, 1992a, 1992b). 8 The Journal OF AT-RISK ISSUES 2 The numbers of students on which these percentages were based were as follows: high risk, 156 for 1998-99 and 133 for 1999- References 2000; medium risk, 86 for 1998-99 and 77 for 1999-2000; low risk, Adelman, H., & Taylor, L. (2001). Enhancing classroom approaches 15 for 1998-99 and 13 for 1999-2000. The slightly lower numbers for addressing barriers to learning: Classroom-focused enabling. for the 1999-2000 year reflects attrition due to dropouts, positive Center for Mental Health Services, UCLA, Los Angeles, CA. terminations (students successfully leaving the program), and/or Agresti, A. (1990). Categorical data analysis. New York: John Wiley missing data. Also, note the low numbers for the low risk students. & Sons. This was due to the fact that we are focusing on those needing im- Catalano, R. (1999). Positive youth development in the United States: provement based on attendance (missing more than 10 days in 1997- Research findings on evaluations of positive youth development 98), which is less likely for low risk students. Thus, the percentages programs. U.S. Department of Health and Human Services, Of- for the low risk group are less precise than those for medium or fice of the Assistant Secretary for Planning and Evaluation, Na- high risk. Finally, the percentage increases in improvement were tional Institute for Child Health and Human Development. computed by dividing the percentage improvement for the third Cohen, J. (1960). A coefficient of agreement for nominal scales. year by the percentage improvement for the second year, subtract- Education and Psychological Measurement, XX(1), 37-46. ing 1, and multiplying by 100. Dryfoos, J. (1998). Safe passage: Making it through adolescence in a 3 In 1997-98, data like those of Figure 2 were collected for the risky society. New York: Oxford University Press. Caring Connection—the SBYSP site that established the aforemen- Foster, W. (2004). No Child Left Behind: Group at-risk composition tioned policy. The results were similar, with an even larger differ- and reading achievement. The Journal of At-Risk Issues, 10(1), 1-6. ence in dropout rates between the contact groups for high risk stu- Frymier, J. (1992a). Growing up is risky business, and schools are not dents in this site. To the extent that keeping students in school (their to blame. Final report of Phi Delta Kappa study of students at not dropping out) and improved attendance are related, this policy risk (Volume I). Bloomington, IN: Phi Delta Kappa. may have contributed to the positive result regarding long-term Frymier, J. (1992b). Assessing and predicting risk among students in improvement in attendance among high- (and medium-) risk stu- school. Final report of Phi Delta Kappa study of students at risk dents in this program. (The Caring Connection was one of the four (Volume II). Bloomington, IN: Phi Delta Kappa. original SBYSP sites and was cited by researcher Joy Dryfoos as an Goleman, D. (1995). Emotional intelligence: Why it can matter more outstanding “safe passage” program for youth (Dryfoos, 1998).) than IQ. New York: Bantam Books. 4 The family was dysfunctional before, but even more so now. McMillan, J. (2001). Essential assessment concepts for teachers and 5 The risk assessments using the Student Risk Assessment In- administrators. Thousand Oaks, CA: Corwin Press. strument were made as part of the intake process (when the stu- Office of Educational Services for Children, Families and Commu- dent entered the program) and, as more information was made nities (1996). Guidelines for Serving At-Risk Students. Iowa De- available, adjusted (upward) as needed. The outcomes data cited in partment of Education, Des Moines, IA. the table were collected at the end of the school year. Thus, the data Pawson, R., & Tilley, N. (1997). Realistic evaluation. Thousand Oaks, in Table 1 may be considered evidence of predictive validity—the CA: Sage. ability of the assessment to predict behavior or performance. How- Secretary’s Commission on Achieving Necessary Skills (SCANS) ever, since these outcomes have associated factors in the risk as- (1991). What work requires of schools: A SCANS report for America sessment instrument and some program sites may have reclassi- 2000. Washington, DC: U.S. Department of Labor. fied students based on evidence of these outcomes (as well as other Secretary’s Commission on Achieving Necessary Skills (SCANS) information) during the school year, there is probably some degree (1992). Skills and tasks for jobs: A SCANS report for America 2000. of functional dependence between level of risk and the outcomes Washington, DC: U.S. Department of Labor. cited. Traub, R. (1994). Reliability for the social sciences: Theory and appli- 6 It may be argued that a risk factor score (equal to the number cations. Thousand Oaks, CA: Sage. of risk factors indicated for the student) would have been a better Veale, J., Morley, R., & Erickson, C. (2002). Practical evaluation for indicator of the level of risk of the student. With this measure a collaborative services: Goals, processes, tools, and reporting sys- correlation coefficient between the scores for the two observers tems for school-based programs. Thousand Oaks, CA: Corwin would be an appropriate interobserver reliability index. In the reli- Press. ability study, this correlation coefficient was found to be 0.601, which Veale, J. (1995). Developing research tools and evaluating the impact is statistically significant (P = .0000). Other possibilities include of multi-service school programs (Comprehensive narrative). Pre- breaking this into a critical score (interobserver correlation of 0.740) pared for the FINE Education Research Foundation, with assis- and noncritical score (interobserver correlation of 0.587), as well as tance from Morley, R., Erickson, E., & Graber, G., Des Moines, more sophisticated weighted scores (e.g., giving more weight to the IA. critical factors). These were considered and rejected in favor of the simpler rule, which we felt had greater usability and practicality. 7 For example, test-retest reliability is considered inappropriate here, since a student’s level of risk can change over time. Inconsis- tent measures over time may occur due to actual changes in a student’s risk profile—not measurement error. Winter 2005 VOLUME 11 NUMBER 1 9 Acknowledgments The authors would like to thank Sharon Baughman, Cyndy Erickson, Donna Hempy, and Todd Redalen who assisted in the development of the instrument and contributed data or other materials included in this manuscript, and Margaret Jensen-Connet, Nancy Wells, and Des Moines school staff members who provided data on the reli- ability of the instrument presented here. In addition, the authors would like to thank Dr. Tom Deeter, David Winans, and Abby and Heather Morley for proofreading and clarification. Finally, the au- thors acknowledge the FINE Education Research Foundation for sup- port of the research necessary for the development of this instru- ment and its application in determining the effectiveness of pro- grams and services. Authors Raymond E. Morley, Ed.D. is an adjunct faculty member at Drake University-Des Moines, Iowa. and is a consultant in the Iowa De- partment of Education responsible for dropout prevention programs, services for high school dropouts, at-risk programs, education of homeless children and youth, and school-based youth services pro- grams. He has published over 50 manuscripts, including books, pamphlets, state guidelines and legislation, curriculum guides and journal articles. [email protected]; . James R. Veale, Ph.D. is a statistical/research consultant and edu- cator in Des Moines, Iowa. He has authored/co-authored journal ar- ticles in statistics, applied mathematics, measurement, training and development, education, and medical research. His research inter- ests include the costs of dropping out of school, evaluation of at- risk and prevention programs, and diagnostic educational measure- ment. [email protected] 10 The Journal OF AT-RISK ISSUES