ebook img

ERIC ED562632: Differences in Strategies Used to Solve STEM-Equivalent Constructed-Response and Multiple-Choice SAT®-Mathematics Items. Report No. 95-3. ETS RR No. 96-20 PDF

23 Pages·1996·1.2 MB·English
by  ERIC
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview ERIC ED562632: Differences in Strategies Used to Solve STEM-Equivalent Constructed-Response and Multiple-Choice SAT®-Mathematics Items. Report No. 95-3. ETS RR No. 96-20

College Board Report No. 95·3 ETS RR No. 96·20 Differences in Strategies Used to Solve Stem-Equivalent Constructed-Response and Multiple-Choice SAT®-Mathematics Items IRVIN R. KATZ, DEBRA E. FRIEDMAN, RANDY ELLIOT BENNETT, AND ALIZA E. BERGER College Entrance Examination Board, New York, 1996 Acknowledgments This research was supported in part by the College Board and Educational Testing Servtce. The views and conclusions contained in thts document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the College Board or ETS. We thank Ann Gallagher, Carol Jackson, Ani! Kanjee, Margaret Redman, and Susan Wilson for their assistance with this pro ject, and Brent Bridgeman, Drew Gitomer, Jacqueline Jones, and William Ward for their comments on earlier drafts of this report. Irvin R. Katz is a Research Scientist and Debra E. Friedman is a Research Associate for the Division of Cognitive and Instructional Science at Educational Testing Service. Randy Elliot Bennett is a Principal Research Scientist for the Divi sion. Aliza E. Berger is with the Department of Education at Ben Gurion University. Researchers are encouraged to freely express their profes sional judgment. Therefore, points of view or opinions stated in College Board Reports do not necessarily represent official College Board position or policy. The College Board is a national nonprofit association that champions educational excellence for all students through the ongoing collaboration of more than 3,000 member schools, colleges, universities, education systems, and organi zations. The Board promotes-by means of responsive forums, research, programs, and policy development universal access to high standards of learning, equity of opportunity, and sufficient financial support so that every student is prepared for success in college and work. Additional copies of this report may be obtained from College Board Publications, Box 886, New York, New York 10101-0886. The price is $15.00. Copyright © 1996 by College Entrance Examination Board. All rights reserved. College Board, SAT, and the acorn logo are registered trademarks of the College Entrance Examina tion Board. Printed in the United States of America. Contents Abstract .............................................................. 1 Figures Introduction ....................................................... 1 1. Possible explanation for format differences ....... 2 2. Example of isomorphic items ............................ 3 Method ............................................................... 2 3. Contents of the four test forms ......................... 3 Subjects .................................................................. 2 4. Difference in proportion correct between Instruments ............................................................ 3 formats (MC-CR) separated by item pair Procedures ............................................................. 3 and format-order ............................................... 5 Results ................................................................ 4 5. Process model of the plug-in strategy ................ 8 Test-Level Analyses ............................................... .4 6. Possible mental representation of the Item-Level Analyses Based on tickets item ........................................................ 9 Problem-Solving Strategies ................................ 5 Tables Summary of Item-Level Effects .............................. 7 Plug-in Strategy ...................................................... 8 1. Mean SAT-M Scores by Ability Level .............. .3 2. Analysis of Variance Results for Discussion and Conclusions ............................. 1 0 Test-Level Effects ............................................... 4 References ........................................................ 13 3. Means and Standard Deviations for Test-Level AN OVA .......................................... .4 Endnotes .......................................................... 14 4. Distribution of Strategies Used for Appendix: Item Stems and Multiple- Items Showing Larger Format-Related Choice Options for Each Item Pair. .............. 15 Differences in Difficulty ..................................... 6 5. Distribution of Strategies Used for Items Showing Smaller Format-Related Differences in Difficulty ....................................... 6 6. Format-by-Format-Order Interaction for Items Showing Larger Format-Related Differences in Difficulty ..................................... 7 Abstract suits remains unclear. One reason for failing to explain the presence and unexpected absence of format differ ences may be that such investigations have focused al This study investigated the strategies subjects adopted most exclusively on the results of examinees' perfor to solve stem-equivalent SAT-Mathematics (SAT-M) mance, neglecting the methods used to solve items-a word problems in constructed-response (CR) and mul potentially important source of format-related differ tiple-choice (MC) formats. Parallel test forms of CR and ences. MC items were administered to subjects representing a A process-oriented approach, as a complement to range of mathematical abilities. Format-related differ the traditional result-oriented approach, was taken by ences in difficulty were more prominent at the item level Martinez and Katz ( 1996) in their analysis of the than for the test as a whole. At the item level, analyses problem-solving requirements of stem-equivalent, archi of subjects' problem-solving processes appeared to ex tecture figural-response items. The researchers identified plain difficulty differences as well as similarities. three types of items distinguished by the general Differences in difficulty derived more from test-de processes needed to solve the items: ( 1) items that test velopment than from cognitive factors: On items in for knowledge of the definition of architectural symbols which large format effects were observed, the MC re (declarative); (2) items that require examinees to apply sponse options often did not include the erroneous an a standard procedure, often one learned in the class swers initially generated by subjects. Thus, the MC op room (learned procedure); and (3) items that require ex tions may have given unintended feedback when a aminees to apply their knowledge in a novel way (dis subject's initial answer was not an option or allowed a covered strategy). Psychometric and process analyses subject to choose the correct answer based on an esti (using "think aloud" protocols) agreed that there were mate. few format differences on items requiring the examinees Similarities between formats occurred because sub to apply a learned procedure, whereas the puzzle-like jects used similar methods to solve both CR and MC discovered-strategy problems tapped different skills de items. Surprisingly, when solving CR items, subjects pending on format. often adopted strategies commonly associated with MC In the current study, we investigated the different problem solving. For example, subjects appeared adept strategies subjects adopted to solve items in stem-equiv at estimating plausible answers to CR items and alent constructed-response (CR) and multiple-choice checking those answers against the demands of the item (MC) formats in which the formats differed only in that stem. the MC problems contained response options (Traub & Although there may be good reasons for using con MacRury, 1990). Although there are many other forms structed-response items in large-scale testing programs, of CR items that differ more widely from MC (Bennett, multiple-choice questions of the sort studied here 1993 ), we focused our analyses on stem-equivalent should provide measurement that is generally compa items in order to reduce the potential sources of differ rable to stem-equivalent constructed-response items. ences in performance, thus making the task of identi fying format-related differences more tractable. Introduction How could an understanding of problem-solving processes shed light on format effects? We offer the fol lowing conjecture (also discussed by Martinez & Katz, Researchers have frequently noted that some items are 1996, and Traub, 1993 ): to the extent that the processes more difficult in the constructed-response format than involved in solving the CR and MC versions of an item in the multiple-choice format, while performance on are the same, there should be no format effects (e.g., in other items appears to be unaffected by format (Traub, terms of difficulty). Note that the converse-different 1993). For example, Ward, Dupree, and Carlson (1987) problem-solving processes leading to different levels of classified reading comprehension items according to the difficulty-might not always occur even if our conjec cognitive demands the researchers assumed the items ture is correct. It is possible for different problem placed on examinees. Factor analyses did not support solving processes to result coincidentally in similar the notion that performance differences between for levels of difficulty. mats were related to the cognitive demands of the items. Because the only difference between stem-equiva Similarlv researchers examining the results of perfor- lent CR and MC items is that the latter contain response '' mance on computer science items were unable to docu- options, it has been suggested that whether an examinee ment performance differences even when formats ap uses the response options in solving an item determines peared to make very different cognitive demands whether format effects will occur (Traub, 1993). This (Bennett, Rock, & Wang, 1991). What causes such re- claim implies a process-based explanation of format ef- 1 Relative Process Difficulty Explanation CR Write .. answer CR=MC Generate _ ___, answer Match option L----.....~.. MC Generate Write CR: answer answer CR:¢:MC MC: Re~on from Select optmns option FIGURE 1. Possible explanation for format differences. fects. As an example, consider Figure 1. When perfor Method mance on the MC and CR counterparts of an item is equivalent, an examinee generates an answer in the "traditional" manner, perhaps by writing and solving Subjects equations, regardless of format. In the case of a CR item, the answer produced is simply written down; for A list of 672 high school students taking the June 1991 M C items, the examinee matches the generated answer administration of the SAT and living in the greater with the correct response option (Snow, 1980). When Princeton area was obtained from ETS program files. performance on MC and CR counterparts is not equiv Letters were mailed to these students describing the pro alent, examinees solve MC items by using the response ject and inviting each to either mail back a stamped, options as aids in selecting the correct answer. self-addressed postcard or telephone us directly. Of the To investigate the relationship between item format 672 contacted, 208 students responded. Each student and problem-solving processes, we created parallel test was telephoned and invited to come to ETS. Fifty-five of forms of CR and MC items. The items presented in MC the 208 students took part in the study. format on one form were presented as CR items on the The subjects were grouped into three ability levels: other, and vice-versa. These forms were evenly distrib low ability (n = 17), defined as having recent SAT-M uted among students representing a range of mathemat scores of 375-450 (26th-42nd percentile); medium ical ability. In keeping with past studies, our analyses ability (n = 18), defined as scores of 475-550 first focused on format differences in terms of accuracy (50th-71st percentile); and high ability (n = 20), de on the test as a whole. We then investigated the fined as scores of 600-800 (82nd-99th percentile). The processes underlying any format differences (or lack three ability levels represented the top three quartiles, thereof) via a comparative analysis of problem-solving narrowed somewhat to accentuate differences between strategies between formats. groups. The subjects were selected so that at each ability level there were approximately equal numbers of males and females. Table 1 shows the mean SAT-M score for each ability group. 2 Table 1 equations" problems. Also, the items involve the same number of quantities and these quantities are in the Mean SAT-M Scores by Ability Level same qualitative relation to one another. The only dif ferences are in the story used to describe the quantities Ability level and the provided values. L5l1l!. MtdiHm HW1. f2J.tmill We used . item isomorphs to alleviate one of the Male 413 520 653 533 more difficult problems· encountered in studying differ {8) {9) (9) {26) ences between item formats, that of the contamination Female 418 513 655 538 induced by asking subjects to solve the same item in two (9) (9) (11) (29) formats. Using isomorphs is a reasonable approach be Overall 415 517 655 535 cause there is considerable evidence suggesting that in (17) (18) (20) (55) dividuals fail to recognize equivalent problems, even if the problems differ only in the details of their cover sto Note: Values enclosed in parentheses represent n per cell. ries (Gick & Holyoak, 1983). The selection procedure resulted in a set of 40 items:2 10 original MC items, 10 isomorphic MC items, 10 CR Instruments versions of the original MC items, and 10 CR versions of the MC isomorphs (see Figures A-1 to A-10 in the Ap Ten multiple-choice (MC) items were selected from dis pendix). These items were compiled into two tests of 20 items each, with each test consisting of 10 M C items and closed forms of the SAT-M. These items represented the the CR counterparts of their isomorphs. For each test, general content categories of the test: three algebra two counter-balanced orders of format presentation were items, five arithmetic items (including three "percent" created, resulting in four test forms (Figure 3). Approxi questions and two other arithmetic items), and two mately equal numbers of subjects from each ability group geometry items. The items were selected subject to the were assigned to take each test form. following three constraints: 1. All the items could be converted into the constructed response format by deleting their response options; Problem Set 1 Problem Set 2 (k=20) (k=20) 2. Isomorphic items could be created for all ques tions in order to develop a new item that would require Administration Form lA Form 2A MC-original items MC-isomorph items the same problem-solving strategy as the original, but order A CR-isomorph items CR-original items have a different enough cover story that subjects would not notice the similarity; Form lB Form 2B 3. A range of difficulty could be represented. Six of Administration CR-isomorph items CR-original items the items chosen were of medium difficulty (equated a1 order B MC-original items MC-isomorph items = 11-13) and four were easier (equated a < 10). The FIGURE 3. Contents of the four test forms. Tickets version (original): Procedures If 70 rickets to a play were bought for a total of $50.00 and if tickets cost $1.00 for adults and $0.50 for children, how many children's tickets were bought? The test forms were administered individually in sessions lasting 1.5-2 hours. The subjects worked alone in a Tickets version (isomorph): room separate from the experimenter, although the ex Jenna won a total of 90 red tokens and yellow tokens perimenter was available to clarify the task, if necessary. while playing a board game. Each red token is worth When they arrived, the subjects were informed that 1 point and each yellow token is worth 4 points. If the total value of Jenna's red and yellow tokens is they would be taking a test similar to the SAT-M, con 120 points, how many yellow tokens does she have? sisting of multiple-choice and constructed-response questions. Subjects were asked to work as quickly but as accurately as possible and to complete the problems FIGURE 2. Example of isomorphic items. one at a time, in order, and without going back. Ap proximately half of the subjects were told there would easier items were included to maintain the motivation of be a four-minute time limit on each problem; the other the lower-ability subjects. Figure 2 shows an SAT-M item and its isomorph. subjects were allowed to take as long as needed. This Note that both items may be classified as "simultaneous manipulation was introduced in the anticipation that 3 some degree of speededness would accentuate format the number of subjects was small relative to the number differences. of factors, the power of the statistical test was consider The subjects provided concurrent verbal protocols ably limited. (cf., Ericsson & Simon, 1984) as they solved the items. Results from the ANOVA are presented in Table 2; Subjects were instructed to say aloud anything that they the corresponding means and standard deviations are would normally "say" to themselves while solving a shown in Table 3. 3 Significant effects were found for problem. Subjects' verbalizations were recorded on format, ability, and the format-by-format-order interac videotape. The videotape also recorded any notes or cal tion. The main effect of format was the smallest of the culations made by the subjects. three significant sources of variation (F(1,43) = 6.53, p<.02) and might be partially explained by subjects cor rectly guessing on some MC items. The main effect of Results ability (F(2,43) = 24.86, p<.0001) was more substan tial, but expected, with the lower-ability subjects per Test-Level Analyses forming worst, the high-ability subjects best, and the medium-ability subjects in between. The significant in teraction between format and format-order (F(1,43) = The first question to be addressed was, simply, does 7.40, p<.01) stemmed primarily from the performance format affect accuracy for isomorphic MC and CR on the CR items of subjects who answered the CR items items? We ran a format (CR, MC) by ability (high, medium, low) by format-order (MC first, CR first) by Table 3 timing (whether a time limit was given) repeated-mea sures ANOVA , with item format (CR, MC) as a within Means and Standard Deviations for Test-Level ANOV A subjects factor and with ability, format-order, and Fonnat-Order: Mfdim ai:J.im f21.!m111 timing as between-subjects factors. The dependent mea MC CR MC CR MC CR sures for the ANOVA were the total number correct on Al!ilit:Y. kr.e.l each section (CR and MC) of the test. Note that because Low M 6.3 6.2 6.0 4.6 6.2 5.5 Table 2 SD 1.7 1.9 1.2 1.3 1.4 1.8 Medium Analysis of Variance Results for Test-Level Effects M 7.4 7.6 8.2 6.4 7.9 6.8 SD 1.0 1.3 1.7 1.5 1.6 1.5 Sfllm;f. df E. High M 8.4 8.6 8.9 8.1 8.7 8.4 Between 011./z.ie.£~ SD 1.2 1.4 1.4 1.6 1.3 1.5 Timing (T) 2.78 Overall Ability (A) 2 24.86 ... M 7.4 7.5 7.8 6.5 7.6 7.0 Order (0) 2.17 SD 1.6 1.8 1.9 2.0 1.7 2.0 TxA 2 .93 TXO .15 AxO 2 .74 first. The subjects' performance on these items was TXAXO 2 .39 worse than their performance on the MC items and S within-group error 43 (2.72) worse than the performance of the MC-first subjects on items in either format. One explanation for this effect is Witlzi11 ~lleie.~:.t~ Format (FI 1 6.53· that the subjects learned something while solving the FxT 1 .31 MC items that helped them solve the CR counterparts. FxA 2 .50 Finally, there was no significant ability-by-format or FXO 1 7.40•. timing-by-format interaction. FxTxA 2 .22 Did all items contribute equally to the test-level ef fects? Previous research suggests that individual items FxTxO .36 FxAxO 2 .32 may be more or less sensitive to response format FxTxAxO 2 .75 (Bridgeman, 1992; Martinez & Katz, 1996). Figure 4 F X 5 within-group error 43 (1.67) shows the difference between proportion correct on the MC versus CR versions of each item type. Each set of Note: Values enclosed in parentheses represent mean square errors. S = subJects. bars represents a different original-isomorph item pair; 'p < .05. '"p < .01. ... p < .0001. the items are ordered from greatest to least in terms of 4 +-' ____ (C.D) 0.5 , "----,---------' ~----,---~----' ~--------- "-- - --- - ---- '- '- 0 0.4 - () ·c- - 0 0.3 t::cr: • MC First 8.o 0.2 - - t 0 0 I CR First '- 0..0 0.1 - - .n ·-- I c~ 0 u u ID CD I (.) c CD -0.1 - '- CD :E -0.2 -1-- - - 0 ~·".-rc0Qccc0:::uu:)." ~>((c(Qc://.u:)))) :.~-.::;(((c0QQ:::::.//.:.::: )))):).. ".~-(:(cQQ.:i0/:01:j))) ·.~:c.ccc:.::o..u:::.. ...:(000(QX.:e://0. :))) . 0QEcc:0(c~:::/:>5u:u1)" ~-ac0(Qc>E:/u: )). . ·-·Ea((Q.c.~./.))).. .~:Ec0Qc.0:.::..u: :>).. (/) (/) c:u ..0 FIGURE 4. Difference in proportion correct between formats (MC-CR) separated by item pair and format-order. the size of an item pair's format-by-format-order inter Item-Level Analyses Based on action. Bars that extend above the zero point represent Problem-Solving Strategies items for which the MC version was easier, while bars below the zero point indicate that the CR version of an item was easier. Solid bars represent scores for subjects To address the issue of how the subjects used the MC answering the MC items first; shaded bars represent options, it was necessary first to identify the different scores for the CR-first subjects. strategies (both correct and faulty) subjects used to Even though the CR and MC items used in this study solve items, which yielded a set of strategy categories were very similar, there was a wide range of differences unique to each pair of items. We then combined the in difficulty. Items ranged from being much easier in the subjects' problem-solving approaches into two groups: MC format, to having approximately equal difficulty re "traditional" strategies, which are commonly associ gardless of format, to being slightly easier in the CR ated with CR problem solving (e.g., writing and solving format. In addition, the degree of the format-by-format algebraic equations), and "nontraditional" strategies order interaction varied across item types: for the first five that involve estimation or reasoning from potentially item types shown in Figure 4, the CR versions were much correct answers-strategies commonly associated with more difficult when that format was administered first; MC problem solving. A third category, for the last five item types, the relative difficulty of the "unknown/other," indicated that a subject's problem items presented in the two formats seems independent of solving approach could not be identified or that a sub format-order. These results suggest that a test-level ject's incorrect approach was not a variant of one of the analysis may be missing important item-level differences. traditional or nontraditional strategies. What is the source of these differences in difficulty? The strategy categories were initially identified by We predicted that format-related differences in diffi viewing the videotaped protocols of 12 randomly se culty would appear when subjects use different lected subjects. In analyzing the remaining subjects, be problem-solving processes to solve CR and MC versions cause the problem-solving strategies were quite distinct, of the same item. In particular, whether subjects used there was little difficulty in unambiguously assigning to the response options when solving MC items should de one of the categories a particular subject's approach. termine whether format differences occur. One researcher classified all 55 subjects' responses, while another researcher classified 20 percent of the re- 5 sponses. Inter-rater agreement for this subsample was timation methods were more likely to work in the MC 93 percent, and conflicts were resolved through discus version simply because subjects could use an "estimate sion. Occasionally a problem-solving approach not rep and-choose-closest-option" strategy, which was un resented in the performance of the initial 12 subjects available for the CR items. Of course, incorrect estima was encountered, and the categorization scheme was tions could have occurred even with MC options if appropriately augmented. those options were numbers that were close to each For this analysis, the item pairs were divided into other (i.e., the distinctions required to select the correct those that showed relatively larger format-related dif option exceeded the subject's estimation ability). ferences in difficulty (hereafter, "format differences") For the items functioning similarly across formats, and those that showed smaller differences (median split the expectation was that subjects would not use strate on maximum MC-CR difference). The CR and MC ver gies normally associated with reliance on MC options sions of the shaded/sqtrian, baseball/students, and (i.e., nontraditional strategies), but instead would pri rain/tank item pairs (Figures A-1, A-3, and A-4, respec marily rely on traditional strategies to the same degree tively, in the Appendix) differed in difficulty more than for both formats. Table 5 shows the results. Whereas did other items. The sack/invest, tickets/tokens, and traditional strategies predominate and were used about price/swim item pairs (Figures A-2, A-5, and A-8, re equally across formats (MC: 61 percent; CR: 68 per spectively, in the Appendix) exhibited smaller format cent), nontraditional strategies were also used with ap differences. The latter three item pairs were chosen be proximately equal frequency on the MC and CR items cause they could be solved using either traditional or nontraditional strategies and because their lack of Table 5 format differences could not be attributed to ceiling ef Distribution of Strategies Used for Items Showing Smaller fects (the percentage correct for each item pair was less Format-Related Differences in Difficulty than .87). The remaining four item pairs did not meet these criteria and so were not included in the analysis. Strategy categories For the three item pairs that showed larger differ Traditional Nontraditional Unknown ences in difficulty, the expectation was that those differ Correctilncorrect Correctilncorrect Correct/Incorrect ences resulted because subjects used the response options MC 76 25 44 14 2 4 (i.e., used nontraditional strategies) to solve the MC ver CR 72 41 35 14 2 sions of the items, but used traditional strategies to solve the CR items. Table 4 shows the distribution of strategies by format for the items exhibiting format differences. As (MC: 35 percent; CR: 30 percent). This finding is con expected, nontraditional strategies were used more often sistent with the results based on item difficulty, but un by subjects when solving the MC items. Thirty percent of expected because such popular nontraditional strategies solutions to the MC items demonstrated nontraditional as "plug-in" and estimation are typically associated only strategies compared with 19 percent of solutions to the with MC items. (In the plug-in strategy, the subject gen CR items. Furthermore, when solving the CR items, sub erates potential answers by selecting response options jects were less successful (19 percent correct) in their use and checking those answers against the item stem.) Sim of nontraditional strategies compared to when they ilar distributions of traditional and nontraditional strate solved the MC items (53 percent). gies thus explain the similar functioning of these items This latter effect stemmed primarily from the across response formats. shaded/sqtrian item (Figure A-1 in the Appendix), Recall that all three of the item pairs with large which many subjects solved through visual estimation format differences contributed to the format-by-format based on the figure provided. We can speculate that es- order interaction in overall performance (Figure 4). In particular, the MC version of these items tended to be easier than the corresponding CR version when the CR Table 4 version was presented first. The reason for the interac Distribution of Strategies Used for Items Showing Larger tion can be seen by splitting Table 4 into the two format Format-Related Differences in Difficulty order conditions (Table 6). The percentage correct for the MC items was similar irrespective of whether this Strategy categories format was presented first (62 percent) or second (70 Traditional Nontraditional Unknown percent). The interaction was focused on the CR items. Correct/Incorrect Correct/Incorrect Correct/Incorrect The percentage of correct responses was low when these MC 81 24 26 23 2 9 items were presented first (40 percent), but increased CR 75 43 6 25 0 16 when the CR items were preceded by their MC counter- 6 Table 6 Format-by-Format-Order Interaction for Items Showing Larger Format-Related Differences in Difficulty Strategy categories Traditional Nontraditional Unknown Format order: f.g11J1il1 C.QrrectllttfQJI.«.t C.Qrrect/IncQrrea C.Qrrect/Incornct MC-fust MC 35 11 12 14 1 5 CR 41 18 5 7 0 7 CR-firsr MC 46 13 14 9 1 4 CR 34 25 18 0 9 parts (63 percent). sponses to the CR version were not included among the At least a portion of this interaction may be attrib MC version's options. Thus, had these subjects solved uted to feedback the MC options provided to subjects the MC version of the item, some of them might have feedback that may, in turn, have aided subjects in answered it correctly. solving the CR counterpart items. That is, if a subject generated an answer that was not among the MC alter natives, the subject may have been cued to reexamine Summary of Item-Level Effects his or her problem-solving method or to try an alterna tive method. This feedback may have prodded the sub The problem-solving strategy analyses presented above ject to correct his or her faulty problem-solving method resulted in two main findings. First, consistent with ex and later to apply the correct procedure to counterpart pectations, for the items showing format-related differ items. Although a few of the most common errors were ences in difficulty, unequal use of nontraditional strate represented in the MC options, subjects often made gies between the two formats was observed. However, arithmetic and other errors (e.g., estimation) that re at least a portion of the format differences could be at sulted in idiosyncratic answers not included among the tributed to inadvertent feedback from the MC options. MC options. That is, the largest format differences occurred when (a) Unfortunately, even with videotaped protocols, it the MC options allowed use of an "estimate-and was difficult to determine whether subjects were con choose-closest" (or "calculate-and-choose-closest") sidering the MC options. Thus, if a subject generated an strategy or (b) the MC options did not contain the er answer and then continued problem solving until even roneous answers that subjects generated. Thus, for MC tually selecting a result from among the options, it was items, subjects were cued that their initial answer was difficult to tell whether the subject continued problem incorrect-feedback that they did not receive from the solving solely because the answer originally generated CR versions. was not among the MC options. However, we can esti The second result was that nontraditional methods mate the influence of feedback from the MC options by (e.g., estimation, plug-in) were used with equal fre observing how many solutions to CR items were not among the alternatives in the MC versions of the items. quency in solving MC and CR items when those items For the CR-first subjects, 26 errors were made while showed small differences in difficulty. This result aug solving MC items and 52 while solving CR items. Nine ments the process explanation of format similarities im of the 52 incorrect responses represented a failure to plied by the literature (Figure 1). When there were no provide a response to the item. Of the remaining 43 er format differences in accuracy, it was not necessarily be rors, approximately half (22) were not among the alter cause subjects used traditional CR methods to solve MC natives offered in the MC version.4 If these 22 subjects items. Instead, similar rates of acr.•Jracy for CR and MC had been given the MC version of the items, at least items indicated that subjects used the same processes some of them might have ended up responding cor (whether traditional or nontraditional) when solving rectly, reducing the format-by-format-order interaction. items in both formats. Note that use of nontraditional One of the item pairs (sack/invest) showing small strategies may indicate that an item is tapping con format differences nevertheless contributed substan structs different than those tapped when subjects used a tially to the overall format-by-format-order interaction. traditional approach. For the CR-first subjects, approximately the same number of errors were made on the MC (11) and CR ( 14) versions. Consistent with the results on other items, approximately half (6) of the erroneous re- 7

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.