ebook img

ERIC EJ1160656: Initial Validation of the Usage Rating Profile-Assessment for Use within German Language Schools PDF

2017·0.28 MB·English
by  ERIC
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview ERIC EJ1160656: Initial Validation of the Usage Rating Profile-Assessment for Use within German Language Schools

Learning Disabilities: A Contemporary Journal 15(2), 193-207, 2017 Copyright @ by LDW 2017 Initial Validation of the Usage Rating Profile-Assessment for Use Within German Language Schools Amy M. Briesch Northeastern University; USA Gino Casale University of Cologne, Germany Michael Grosche University of Wuppertal, Germany Robert J. Volpe Northeastern University, USA Thomas Hennemann University of Cologne, Germany Modern attempts to explain why some assessment tools are readily ad- opted by school-based personnel whereas others are not have focused on the concept of usability. Usability encompasses not only the degree to which consumers find an assessment tool to be acceptable, but also the degree to which it is well-understood, believed to be feasible, consistent with local norms, and supported within the larger school environment. The purpose of the current study was to conduct an initial validation of a German-language version of the Usage Rating Profile-Assessment (URP-A), a measure designed to assess the multiple influences on assess- ment usage. Participants included 101 1st-through 6th-grade teachers in Western Germany. Although findings from an exploratory factor analysis of URP-A items differed somewhat from results found for the original English-language measure, results of the current study suggest that the German URP-A may actually be used to reliably assess multiple dimen- sions of usability with a fewer number of items. Keywords: Assessment, Treatment Usage, Acceptability, Factor Anal- ysis, Teacher SELF-Report IntroductIon Significance of Data-Informed Decision Making Processes in Preventing Learning Disabilities The early and systematic identification of learning problems in students is a key element of proactive approaches for the prevention of learning disabilities (Fuchs, 2003). For instance, multi-tiered systems of support (MTSS), such as Response-to- Intervention (RtI), provide a conceptual framework for the early identification of students who struggle with the academic requirements in schools (Grosche & Volpe, 2013). Two of the driving assumptions behind a successful MTSS model are that (a) *Please send correspondence to: Amy M. Briesch, PhD, Department of Applied Psychology, Northeastern Univer- sity, 360 Huntington Avenue, Boston, MA 02115, USA, Phone: 1-617-373-8291, Email: [email protected]. Learning Disabilities: A Contemporary Journal 15(2), 193-207, 2017 students receive instruction or intervention that is evidence-based, and (b) the in- tensity of prevention and intervention efforts provided to students are informed by the results of evidence-based assessment tools (e.g., screening, progress monitoring measures) (Eagle, Dowd-Eagle, Snyder, & Holtzman, 2015). Unfortunately, how- ever, although our accumulated base of knowledge regarding both evidence-based programs and assessment has grown substantially in recent years with the develop- ment of comprehensive databases such as the What Works Clearinghouse (http:// ies.ed.gov/ncee/wwc) and National Center on Intensive Intervention (http://www. intensiveintervention.org), the technologies that we know to produce positive stu- dent outcomes are not necessarily being utilized in everyday school settings (Briesch, Chafouleas, Neugebauer, & Riley-Tillman, 2012). The problem is that the assumed effectiveness of the technology is all too often counterbalanced by the practical barri- ers that exist in the translation of research to practice. Factors Hypothesized to Influence Applied Usage Over the past four decades, researchers have sought to better understand those factors that may help explain why some intervention and assessment technol- ogies are embraced by users whereas others are not. An important line of inquiry into this issue was initiated in the early 1980s by Kazdin (1980), who focused on the construct of treatment acceptability. Treatment acceptability was defined as the degree to which prospective users of a treatment believe it to be something that is appropriate, fair, and reasonable for the given problem, and the idea was that users are more likely to put into actual practice those treatments that they find to be ac- ceptable (Kazdin, 1980). Within the field of education, much work was conducted throughout the 1980s in order to better understand the degree to which treatments ranging from math interventions (e.g., Logan & Skinner, 1998) to pharmacologi- cal treatments (e.g., Power, Hess, & Bennett, 1995) were believed to be acceptable to teachers, students, and parents, as well as which factors had the greatest influence on perceived acceptability. Results of this line of research suggested that the most accept- able treatments were those that were both effective (i.e. resulting in positive change with minimal side effects) and feasible (i.e. requiring minimal time and resources) for individuals to implement (Reimers, Wacker, & Koeppl, 1987). Although acceptability continues to be considered to be an important deter- minant of actual usage, over time researchers have acknowledged the need to consider additional factors beyond acceptability alone. One reason for this expanded consid- eration has been the fact that some research has found low correlations between how acceptable a treatment is perceived to be and the degree to which it is actually imple- mented (e.g., Sterling-Turner & Watson, 2002; Mautone, DuPaul, Jitendra, Tresco, Junod, & Volpe, 2009). As a result, more modern ecological conceptualizations of treatment usage have come to incorporate factors believed to influence usage at mul- tiple levels. At the primary level are the implementer-level factors that exist within an individual. Personal acceptability is one important factor at this level; however, an individual’s understanding of what the treatment is and how it is intended to be used may also have a notable effect on actual implementation (Reimers et al., 1987; Witt, Noell, LaFleur, & Mortensen, 1997). At the next level are the intervention-level factors that relate to features of the intervention or assessment technology itself. For 194 Learning Disabilities: A Contemporary Journal 15(2), 193-207, 2017 example, the extent to which a procedure requires extensive amounts of time or re- sources will influence the degree to which it is implemented (e.g., Perpletchikova & Kazdin, 2005). Related, those procedures that result in significant disruption to regular classroom activities will tend to be viewed less favorably (Reimers et al., 1987; Witt, 1986). Finally, it is important to consider those broader environmental factors that may influence local usage. That is, even if an intervention or assessment technol- ogy is perceived positively by an individual implementer, there may be administrative or philosophical hurdles to implementation within the broader school context. These contextual considerations include the degree to which there is administrative and peer support for the practice, both philosophically and practically speaking (Brough- ton & Hester, 1993; Buston, Wight, Hart, & Scott, 2002). Development of the Usage Rating Profile With the evolution of multi-dimensional conceptualizations of treatment usage came the need for a measure that would be capable of simultaneously assess- ing the multiple factors that are believed to influence intervention usage. The Usage Rating Profile-Intervention (URP-I; Chafouleas, Briesch, Riley-Tillman, & McCoach, 2009) was therefore developed in order to serve this purpose. Most recent work on the URP-I has supported a six-factor model of usage that considers (a) how acceptable (i.e. Acceptability) and feasible (i.e. Feasibility) a user perceives the intervention to be, (b) the degree to which the intervention is well-understood (i.e. Understanding) and family support is needed for implementation (i.e. Home-School Collaboration), (c) and the degree to which both practical (i.e. System Support) and philosophical (i.e. System Climate) system-level supports are needed (Briesch et al., 2012). Although the URP was originally designed and validated for use when con- sidering school-based interventions, the tool has recently been extended to consider use of assessment tools as well. Miller, Neugebauer, Chafouleas, Briesch, and Riley- Tillman (2012) adapted the item wording from the URP-I to reflect perceptions of as- sessment rather than intervention tools (e.g., I would need additional resources to carry out this assessment as opposed to I would need additional resources to carry out this intervention), thereby creating the Usage Rating Profile-Assessment (URP-A). These researchers then asked 283 public school teachers to complete the URP-A with regard to a teacher-completed behavioral assessment measure (i.e. Direct Behavior Rating). Results of this study suggested that the factor structure of the URP-A was consistent with the six-factor structure of the URP-I; however, the reliabilities of the System Support and Climate scales were found to be lower than in the previous investigation. The emerging literature base has suggested great promise for use of the URP within school settings; however, to date, research related to this tool has focused ex- clusively on its use in English-language contexts. In order to ensure comparability in cross-cultural research, however, it is important to make the tool available in other countries (Ziegler & Bensch, 2013). If the URP is to be used by researchers and prac- titioners in non-English speaking countries, it is necessary not only to translate the measure into the local language but also to verify that the psychometric properties of the measure are similarly strong. The purpose of the current study was therefore to conduct a validation of a German-language version of the URP-A in order to de- termine whether the factor structure would be consistent (i.e. construct validity) and 195 Learning Disabilities: A Contemporary Journal 15(2), 193-207, 2017 whether the German URP-A would allow for sufficiently reliable measurement of the hypothesized factors (i.e. internal consistency reliability). Method Participants Participants included 101 1st through 6th grade teachers (94% female) in a suburban region of Western Germany, who responded to a call for participation in the study. In total, teachers were recruited from 13 elementary schools, four second- ary schools, and one special education school. In total, the age of the participating teachers ranged from 26 to 63 (M = 43.10, SD = 10.48) and they had an average of 15.39 years of teaching experience (SD = 9.31; Range = 2-39). A detailed summary of demographic information for these teacher participants is provided in Table 1. The ratings in this study were completed for a sample of 1010 students (39.6% female). The age of the students ranged between 5 and 14 years (M = 8.14, SD = 1.77) and the mean grade level was 2.94 (SD = 1.48). Table 1. Teacher Demographics N Percentage Gender Male 6 6% Female 95 94% Age 26-35 28 28% 36-45 34 34% 46-55 20 20% 56-65 19 19% Years Experience 1-10 34 34% 11-20 42 42% 21-30 16 16% 31-40 8 8% Unknown 1 1% Training Primary 83 82% Secondary 5 5% Special 6 6% Education Other 6 6% 196 Learning Disabilities: A Contemporary Journal 15(2), 193-207, 2017 Procedure Data were collected between September and November in 2015 as part of a larger study designed to examine the psychometric characteristics of a novel multi- ple-gated screening measure designed to link screening assessment to the design of classroom-based intervention (i.e. Integrated Teacher Report Form, ITRF; Volpe & Fabiano, 2013). As part of this larger study, all participants were asked to nominate five students in their classroom who struggled with problematic classroom behavior. The researchers then selected an additional five students who were not nominated by the classroom teacher to serve as typical comparison peers. All teacher participants were then asked to complete a packet of rating forms for each of the 10 identified students (i.e. 5 nominated, 5 not nominated). As an incentive for participation, all teachers were entered into a drawing to receive one of two 50€ gift cards to a teaching material trade company. All teacher participants received a packet consisting of (a) a demographic questionnaire, (b) an explanation of rating procedures, and (c) the German language version of the ITRF (ITRF-G). In addition, each participant was asked to complete a second behavioral screening tool (see descriptions in the Measures section below), which was randomly assigned. Finally, teachers were asked to complete the translated version of the URP-A (see Measures below) with regard to each of the screening mea- sures completed. Although all participants completed the URP-A in response to the ITRF-G and one additional screening measure, responses were selected for a single measure for the purposes of this study. In order to ensure some variability in perception of usability across respondents, data were purposively selected to reflect the range of screening assessment options. That is, the data set was divided into thirds, such that the number of responses based on each of the three screening measures was roughly equivalent (ITRF: n = 40, 39.6 %; LSL: n = 27, 26.7 %; SDQ: n = 34, 33.7 %). Measures Integrated Teacher Report Form-German language version (ITRF-G). The ITRF uses a multiple-gated approach to proactively identify those students who might benefit from additional behavioral supports in the classroom. Teachers are first asked to complete a brief 16-item version of the ITRF to rate the degree to which their students’ behavior interferes with their own learning or the learning of oth- ers. Next, teachers complete a 43-item rating scale for the five students receiving the highest brief ITRF scores, which asks respondents to indicate the degree to which particular behaviors are of concern for the student using a 3-point scale (i.e. slight concern, moderate concern, strong concern). A total score is then calculated for each student so that individual students may be prioritized for follow-up intervention. Unlike many other behavioral screening tools, which focus on identifying underlying indicators of psychopathology, the items on the ITRF represent behaviors that have the potential to impair classroom functioning, but which are believed to be malleable targets of classroom intervention. 197 Learning Disabilities: A Contemporary Journal 15(2), 193-207, 2017 2 7 6 3 4 1 2 9 5 1 5 3 7 1 3 h 3 5 7 5 7 7 6 6 8 6 6 7 8 7 . . . . . . . . . . . . . . or ctV P a F or ctV P aI F Factor III P .72 .81 -.59 .84 -.58 actor II P .78 .81 .88 .33 .40 F Factor I P -.57 .65 .74 -.65 .70 .74 -.34 s. e 2. Pattern Coefficients and Communalities for the Usage Rating Profile-Assessment I would need additional resources to carry out this assessment. I would be able to allocate my time to implement this assessment. The total time required to implement the assessment procedure would be manageable. This assessment is too complex to carry out accurately. These assessment procedures are consistent with the way things are done in my system. The amount of time required for record keeping would be reasonable. A positive home-school relationship is needed to use this assessment. Parental collaboration is required in order to use this assessment. Regular home-school communication is needed to implement these assessment procedur I understand how to use this assessment. I am knowledgeable about the assessment procedures. I would need consultative support to implement this assessment. I understand the procedures of this assessment. I would require additional professional development in order to implement this assessment. e m bl e 8 9 6 5 7 3 4 8 Ta It 2 3 8 1 1 2 5 1 2 4 6 2 2 2 198 Learning Disabilities: A Contemporary Journal 15(2), 193-207, 2017 0 9 5 3 8 7 5 4 8 7 0 9 6 3 5 6 4 7 7 6 7 6 5 4 6 1 1 4 . . . . . . . . . . . . . . 3 3 6 1 5 4 3 9 3 3 6 7 6 3 3 2 . . . . . . . . -.70 -.79 .66 -.62 -.87 -.67 -.64 -.36 9 1 1 3 3 3 . . . e. n o s ms. e thi This assessment is an effective choice for understanding a variety of proble The assessment is a fair way to evaluate the child’s behavior problem. I would not be interested in implementing this assessment. I would have positive attitudes about implementing this assessment. This is a good way to assess the child’s behavior problem. I would implement this assessment with a good deal of enthusiasm. I would be committed to carrying out this assessment. The assessment procedures easily fit within my current practices. Use of this assessment would be consistent with the mission of my school. Preparation of materials needed for this assessment would be minimal. Material resources needed for this assessment are reasonable. My administrator would be supportive of my use of this assessment. Use of this assessment would not be disruptive to students. My work environment is conducive to implementation of an assessment lik 1 2 7 1 2 4 3 6 0 0 5 1 7 9 1 1 1 2 2 1 1 1 1 2 2 199 Learning Disabilities: A Contemporary Journal 15(2), 193-207, 2017 Psychometric data in support of the ITRF is promising, with published re- search supporting the internal consistency, temporal stability, and concurrent valid- ity (Daniels, Volpe, Briesch, & Fabiano, 2014), as well as the classification accuracy (Daniels, Volpe, Fabiano, & Briesch, 2016) of the measure. The ITRF-G was translated from the English language ITRF according to the Kidscreen translation guidelines (see Authors, accepted with minor revisions, for a full description of the translation procedures). Initial studies of the ITRF-G focused on the psychometric properties of the short version in gate one and supported internal consistency, and classification accuracy to some degree, as well as measurement invariance across samples from the US and Germany (Authors, under review; Authors, accepted with minor revisions). Strengths and Difficulties Questionnaire - Teacher version (SDQ-T; Goodman, 1997). The SDQ-T is a behavioral screening measure that was designed to identify those students aged 4-16 who are struggling with emotional and behavioral difficulties. Teachers are asked to rate all students in their classrooms across 25 items using a 3-point scale (i.e. not true, somewhat true, certainly true). Responses to these items are then used to generate five scale scores (i.e. Emotional Symptoms, Conduct Problems, Hyperactivity/Inattention, Peer Relationship Problems, Prosocial Behav- ior), as well as a Total Difficulties score. To date, the SDQ has been translated into over 80 languages, and the Ger- man language version was used within the current study. Psychometric studies of the German version conducted to date have supported both the construct validity and internal consistency of the measure (Bettge, Ravens-Sieberer, Wietzker, & Hölling, 2002; Saile, 2007). Furthermore, evidence of concurrent validity has been demon- strated through high correlations with selected scales of the Child Behavior Checklist - Teacher Report Form (CBCL-TRF; Becker, Woerner, Hasselhorn, Banaschewski, & Rothenberger, 2004). Teacher Assessment Schedule for Social and Learning Behavior (LSL). The LSL (orig.: Lehrereinschätzliste für Sozial-und Lernverhalten; (Petermann & Pe- termann, 2013) is a 50-item screening measure designed to assess both the social and learning behaviors of students. Teachers are asked to indicate how frequently a student has exhibited a particular behavior using a 4-point scale (i.e. 0 = never, 3 = often). Scores are then summed in order to create 10 5-item subscales. Subscale scores falling below the 10th percentile suggest the presence of a significant behav- ioral problem, whereas scores falling between the 10th and 20th percentiles suggest that the student may be at-risk for behavioral problems. Previous psychometric research has found strong evidence for the internal consistency of the measure (i.e. α=0.82 to α=.95; Gienger, 2007). Analyses conducted within the current study utilized the overall composite score. Usage Rating Profile-Assessment (URP-A; Chafouleas, Miller, Briesch, Neugebauer, & Riley-Tillman, 2012). The URP-A is a 28-item measure designed to assess individuals’ perceptions of the usability of assessment procedures. Respondents are asked to indicate the degree to which they agree with a number of statements using a 6-point Likert scale (i.e. 1 = strong disagreement, 6 = strong agreement). Responses are then used to generate six scale scores: Acceptability, Understanding, Home-School Collaboration, Feasibility, System Climate, and System Support. As noted previously, the URP-A was created by re-wording existing items designed to assess the usability 200 Learning Disabilities: A Contemporary Journal 15(2), 193-207, 2017 of intervention technologies to reflect the usability of assessment measures. Initial evidence in support of the URP-A supported both the six-factor structure and inter- nal consistency of the measure; however, the reliabilities of the System Climate (α = .71) and System Support (α = .63) were found to be notably lower than the other four scales (Range = .80-.90) (Miller et al., 2012). The URP-A was translated into the German language using a team-based four step procedure in order to ensure functional and operational equivalence of the questionnaire (Hambleton & Li, 2005). First, professional translators who were highly experienced in the translation of educational measurement tools constructed a preliminary version of the German language URP-A. Second, based on the first forward translation, a research working group consisting of the authors of the cur- rent study developed one single version by harmonizing and revising items. Third, we applied expert interviews with two primary school teachers in order to identify comprehensibility and acceptability. Fourth, we revised and modified the items based on the expert interviews, which resulted in the final German version of the URP-A. results Exploratory Factor Analysis An initial analysis of the dataset identified a total of 38 instances of missing data (1.3% of the total possible responses), which were neither restricted to particular items nor respondents. Given that the data were considered to be missing at random, the decision was made to impute missing values using multiple imputation (Enders, 2001). A total of 10 datasets were generated and the resultant values were combined in order to produce each imputed estimate. Next, the data were examined to ensure that they were appropriate for conducting a factor analysis. First, the correlation ma- trix was examined for either signs of multicollinearity (i.e. inter-item correlations above .80) or low communalities (i.e. inter-item correlations above .30 with fewer than three items). No items were found to be problematic with regard to either cri- terion. Second, the anti-image correlation matrix was examined to ensure that the measure of sampling adequacy (MSA) for all items was above .60 (Pett et al., 2003). The MSA represents the degree to which the item is correlated with other items in the measure, and no problematic items were identified. Third, both the Kaiser-Meyer- Olkin Measure of Sampling Adequacy (.82) and Bartlett’s Test of Sphericity (χ² (378) = 2017.71, p < .001) suggested that (a) there were no problems with the size of the sample and (b) the matrix was factorable. Exploratory factor analysis was conducted using SPSS 23.0 using principal axis factoring with an oblique rotation, given that the factors were expected to be cor- related with one another. Decisions regarding the number of factors to extract were made by considering multiple factors. Examination of the scree plot seemed to sug- gest an elbow in the data between the fifth and sixth factors. In addition, eigenvalues at or above 1.0 were identified for five factors and the results of parallel analysis sug- gested a five-factor solution. Given that extraction of six factors was found to result in a one-item factor; the decision was ultimately made to extract five factors. These five extracted factors accounted for 60.68% of the common variance in items. 201 Learning Disabilities: A Contemporary Journal 15(2), 193-207, 2017 Subsequent to factor extraction, indicators were considered in order to identify any potentially problematic items. First, the pattern coefficient matrix was reviewed to identify any items that either (a) loaded poorly on the primary factor (i.e. pattern coefficient < .45) or (b) demonstrated strong factor loadings on more than one factor. This resulted in the removal of Items 10, 20, 22, and 25. Second, the final item communalities were reviewed in order to identify any items for which the proportion of item variance accounted for by the extracted factors was found to be substantially low. No additional items were deleted at this stage. Reliability Estimates Reliability analyses were next conducted for each of the five extracted fac- tors. First, the inter-item correlation matrices were examined in order to identify any items that were either minimally correlated with other items in the scale or which demonstrated notably high correlations with other items in the scale. Given the high correlation between Items 23 and 28 (r = .86) within Factor III, the decision was made to delete Item 28 from the final scale. Next, we looked to determine whether the deletion of any individual items would result in a significant improvement in scale reliability; however, no items were found to be problematic. Acceptable levels of reliability were found for all subscales of the German URP-A (see Table 3). Subscale I (α = .88) was comprised mostly of items from the Feasibility subscale of the URP-A, designed to assess the degree to which potential barriers to implementation of an assessment may be present (e.g., requires too much time, is too complex). However, two additional items were found to load on this factor, which were previously considered to measure System Support (i.e., I would need additional resources to carry out this assessment) and System Climate (These as- sessment procedures are consistent with the way things are done in my system). The mean score for this subscale was 3.97, suggesting that teachers found the screening measures to be somewhat feasible to implement. Table 3. Reliability Estimates for the German URP-A Subscale Items α 95% CI (α) Subscale Mean Feasibility 2, 3, 8, 18, 19, 26 .88 .83, .91 3.98 Home-School 5, 15, 27 .88 .83, .91 5.29 Collaboration Understanding 4, 6, 24, 28 .90 .86, .93 3.24 Acceptability 1, 7, 9, 11, 12, .90 .87, .93 3.97 17, 21 Omitted Factora 13, 14, 16 .75 .66, .83 4.52 Note. a This factor was omitted from the final measure due to lack of conceptual similarity among some items, as well as the presence of a potential wording artifact. Subscale II (α = .88) was found to be consistent with previous psychometric evaluations of the URP-A. Within this Home-School Collaboration subscale, the 202

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.