>:qo!~' S!UtIT8J!~' Mtp pt~D<u~~x~ ltt~JA O)Dess1Y~POu . . ,. v-.-'-'~ '),** 41=- Joumal ofExperimental i'I)<:IIoqy: Copyriaht 1984bythe L.eanIiD8.Memory,IDdCopition American I'sychoIQ8ic:a1AsIociation. Inc:. 1984,Vel.10,No. I. 104-114 Choice, Similarity, and the Context Theory of Classification Robert M. Nosofsky , Harvard University Medin and Schaffer's(1978) context theory ofclassification learning isinterpreted in terms ofLuce's (1963)choice theory and interms oftheoretical results obtained in multidimensional scaling theory. En route to this interpretation, quantitative relationships that may exist between identification and classification performance are investigated. It is suggestedthat the same basic choice processes may operate inthetwoparadigms but that thesimilarity parameters that determine performance changesystematicallyaccordingtothestructure ofthechoiceparadigm. Inparticular, when subjects are able to attend selectively to the component dimensions that compose the stimuli, the similarity parameters may tend toward what is optimal for maximizing performance. Medin and Schaffer (1978) proposed a In the empirical workforwhich the context quantitative model, termed the context theory. model has been verified thus far, the stimuli to account for subjects' performance in clas- have varied along four binary-valued dimen- sification paradigms. The context theory has sions, and the subjects have been required to provided good fits to data in numerous clas- classifythe stimuli as belonging in one oftwo sification experiments (Medin, 1982; Medin, categories. Tosimplify the discussion; this ba- Altom, Edelson, & Freko, 1982; Medin, sic structure is assumed in the initial part of Dewey, & Murphy, 1983; Medin & Schaffer, this article; in addition, the followingnotation 1978; Medin & Smith, 1981).The theory as- is employed: sumes, essentially, that subjects' classification I. Let X and Y denote the two categories. ofagiven stimulus isdetermined on the basis 2. Let x denote a stimulus in category X, of its similarity to the stored category exem- and let y denote a stimulus in category Y. plars. The purpose ofthis article isto provide 3. Letxjdenote the value ofstimulus xon an interpretation ofthecontext theory interms dimensionj, and similarlyforY j' of previous work concerning choice and sim- 4. Let t denotea giventest stimulus,and ilarity. This interpretation places the context let t denote the value of stimulus t on di- j model within a well-established theoretical mensionj. ftamework and allows one to ask new questions 5. LetS(t,x)denotethesimilarityofstim- and extendthe modelin waysthat werenot ulust to stimulusx. previouslyobvious.In addition, it mayhelp As isclear from Medin and Schaffer's (1978, in attempts to gaina fullerunderstandingof pp. 211-214) discussion,the context theory quantitativerelationshipsthat mayexistbe- statesthat the probabilityof classifyingtest tweensubjects'identificationandclassification stimulustasamemberofcategoryX, P(Xlt), performance. isgivenby L S(t,x) The work reported in this article was supported by "ex . (1) Grants BNS 80-26656 from the National ScienceFoun- P(Xlt) = ~ S(t,x)+ L S(t,y) dationandMH37208fromtheNational Institute ofMen- "ex )'EY tal Health to Harvard University. IwouldliketothankJ. Busemeyer,R.D. Luce,and The individual S(t, x)sare computed bythe , D. L. Medin for helpful comments and suggestions re- following multiplicative rule: gardingearlier versionsofthisarticle. Iespeciallywishto 4 thank W.K. Estesand E. E.Smith fornumerous useful II suggestionsanddiscussions, andfortheir encouragement. S(t,x)= Sj, (2) Requests forreprints should besenttoRobert M.No- j-I sofsky,Department of Psychologyand Social Relations, Harvard University,Cambridge, Massachusetts 02138. where Sj = Pj, (Os Pj s 1),if tj '" Xj; and 104 CHOICE, SIMILARITY, AND CLASSIFICATION 105 Sj = 1,iftj =xj' The individual pjSare the Ifthe mapping hypothesiswerecorrect,then parameters to be estimated in the model. the response rule proposed by Medin and Schaffer(1978)could bederived directly from Response Rule Luce's (1963) choice theory applied at the stimulus identification level. The probability Asisrather evident, the response-ratio rule of making response X given sti~ulus iin the (Equation 1)proposed byMedin and Schaffer classification paradigm, P(RxiS i),isfound by (1978)bears astriking structural resemblance summing overthe probabilities that a stimulus to Luce's (1963) choice model for stimulus in category X is chosen in the identification R identification. According to that theory, the paradigm: probability of stimulus i leading to response j in an identification experiment, P(R jlS i),is P(RxlS;) = L P(R jlS i), given by }EX \ P(RjIS;) =-Lk/3/j3Tk/Ti/j;k, (3) =}ELX (L/3/jT3/kTij/_;J, k where 0 s /3i>T/ijS 1, L /3j = 1, T/ij= T/j;, =(L /3jT/ij)/(L /3kTu/J. (4) and T/;;= 1.The /3j~ameters are interpreted }EX k as response-bias ~eters, and the T/;jpa- rameters are interpreted assimilarity measures In a bias-free experiment, this isthe response on the stimuli S; and Sj' The index k in the rule proposed by Medin and Schaffer (1978). denominator ofEquation 3rangesoverthe set The main problem with this theoretical ac- of stimuli that are eligible as responses in the count isthat the mapping hypothesis was ex- . experiment. plored systematically byShepard etaI.(1961) Medin and Schaffer's (1978) response rule and rejected on empirical grounds. These r~ may be viewed as a bias-free extension of searchers demonstrated convincingly that one Luce's (1963) choice theory applied atthe cat- could not predict subjects' classification per- = egory level, by simply defining f1(~X) formance from their identification perfor- Lxex f1(~x).A natural questionthat arises, mance on the same setof stimuli. Weare left, however,iswhetherornotthereissometheo- therefore, with the following paradox: Luce's retical aCcount of the relationship f1(~X) = (1963)choicetheory and Medin and Schaffer's Lxex f1(~x). (1978)context theory provide good fitsto data A simpleaccount ofthis relationshipmay inidentification andclassificationexperiments, resideinthecorrespondingqualitativestruc- respectively.The mapping hypothesis provides tures of the identificationand classification an obvious theoretical link between the two paradigJ11T~heon~to-onemappingofstimuli theories, but it has been rejected empirically. ontoresponsesinidentificationistransformed I attempt to resolve this paradox later in this into a many-to-onemappingofstimulionto article. The argument to be advanced is that responsesinclassificationA. nobviousstarting the similarity parameters that determine per- hypothesisfor a quantitative mOdelrelating formance may change systematically as one the two paradigms, originally proposed by goes from identification to classification in- Shepard,Hovland,and Jenkins (1961),and structions. rust, h~ a theoretical inter- ShepardandChang(1963),maybe statedas pretation. of Medin and Schaffer's multipli- follows:Topredictclassificationperformance cativesimilarity rule (Equation 2)isprovided. from identificationperformance,one should simplycumulate over all stimulus-response Multiplicative Similarity Rulel cellsin the identificationmatrix that would map onto a givenstimulus-responsecellin The second aspect of the context theory to the classificationmatrix. Thus, any intersti- be considered is the multiplicative rule for mulusconfusionin the identificationmatrix whichis a within-categoryconfusionwould result in a correct classificationresponse.I IItwas brou&bt to my attaatioD by D. L. Medin (penaDI1 c:ommunicatiOll,September 14, (982) that TabDe 8Dd willreferto thisasthemappinghypothesis. CarroU(1982) have pointed out tbe same func:ti0D8lfOe 106 ROBEIn' M. NOSOFSKY computing stimulus similarity (Equation 2). (r = 2), for those with integral dimensions.2 This has been deemed by Smith and Medin Intuitively, separable dimensions are highly (1981) as the crucial feature of the context analyzableand remain psychologicallydistinct theory that differentiates it from previous the- when in combination. Incontrast, integral di- ories of classification learning. The main mensions combine into relatively unanalyz- question to be addressed is whether the mul- able,integral wholes.Variousother converging tiplicative rule for computing stimulus simi- operations are used to distinguish between larity can be meaningfuUy integrated with thesetwotypesofdimensions (Garner, 1974). previousfindingsinthepsychologicalliterature The stimulus setsusedbyMedin and Schaf- ft regarding stimulus similarity. Although an in- fer (1978) in their initial experiments were tuitive rationale for the multiplicative rule is composed ofstimuli that variedin form, color, provided by Medin and Schaffer (1978) and size, and number. On the basis of previous Medin and Smith (1981), little mention is work these would clearly be considered sep- made of any independent empirical support arable dimensions. Thus, the psychological for the rule in general, or of its relationship distance between test stimulus t and stimulus to previously proposed models regarding x isgiven by stimulus similarity. " One ofthe major assumptions made byre- D(t,x)= Ldb (7) searchersinthe fieldofmultidimensional scal- }-I ing theory isthat stimulus similarity issome monotonically decreasing function ofpsycho- where dj = It}- x}I, appropriately scaled. Note that becausethedimensions werebinary- logical distance. Letting D denote psycholog- ical distance between two stimuli and letting valued, each d}isequal either to 0 (iftj =x}), f besome monotonically decreasing function, or to qb 0 :s qj < co (if t}:/=x}). Reviewingour previous equations, wehave then using the previous notation we have " S(t, x) =f[D(t, x», (5) II S(t, x) = sb O:s s}:s 1, that is,the similarity between test stimulus t }-I . and stimulus x is some"monotonically de- according to"the multiplicative rule of Medin creasingfunction ofthe psychologicaldistance and Schaffer,and we have between stimulus t and stimulus x. For vari- " ous measurement-theoretic reasons (Beals, S(t,x)=f(L d}), 0 s dj < co, Krantz, &Tversky, I968),.the distance metric " }-I between two stimuli is generally assumed to take the form of the Minkowslci-r metric: according to the basic assumptions of multi- " dimensional scalingtheory.The question Inow - " raiseiswhether there issome functionf which D(t,x)=(L It) x}I,)I/" (6) produces the multiplicative rule posited by }-I where r ~ 1, n is the number of dimensions composing the givenstimuli, and t}andx} are appropriately scaledpsychologicalvalues.The 1The city-bloclt aDdEuclidean metrics might bestbe considered as approximations to the best-fitting Min- particular value of rwhich yieldsa best fitto kowskW metric for stimuli composed of separable aDd the siinilarity data has been found to depend integraldimensioas. TverskyaDdGad (1982)arguedthat heavilyon the type ofdimensions composing aIarJeamount ofsimilaritydata forstimulicomposed of astimulus. In particular, the city-block metric separable dimensions isbest fitby valuesofr somewhat (r = 1)isappropriate for stimuli having sep- lesstban I.(Wbenrislesstban I,ofcourse. theMinkowslci- r."metric" isno loosera distance metric at aU,sincethe arable dimensions, and the Euclidean metric triangleinequality isnot satisfied;thisdoesnot invalidate a spatial approach to modeliDastimulus similarity, only the metric assumption.) Formathematical conYeDienceI - assume that r I servesasa fair approximation to the latioashipe that I note in tbjs section. The work that I best..fittiDaMiIIkowsIci-rforstimulicomposedofseparable report here WIScarried out iDdepeDdentlyof tbeIe rOo dimenions. Ofcourse, aDimproYedtheory mayresult by sean:ba'I' work. usiDavaluesofrlessthan I. 107 CHOICE, SIMILARITY, AND CLASSIFICATION Medin and Schaffer(1978).The answeristhat OptimizationofSimilarityRelations the multiplicative rule arises if and only if In summary, Medin and Schaffer's (1978) ~ f(x) = e-cx, c> O. (8) . context theory arises as alogical consequence of integrating the mapping hypothesis of the That is, we have . identification-classification relationship with the following well-established models in the -c~d, n II e )-1 = e-cdJ, (9) areasofchoiceandsimilarity: (a)Luce's (1963) j-I choice model for stimulus identification, (b) an exponential decay function relating stim- and because 0 s e-cdJs I, we simply set ulus genera1ization to psychological distance, Sj = e-cdJ.The case of Itj - xjl =0 (identical and (c)psychologicaldistance relationships for values) maps onto Sj = I, and Itj - xjl = 00 stimuli composed of separable dimensions onto Sj = O.A proof that this functional re- conforming (approximately) to the city-block lationship isunique may be found in Roberts metric. Furthermore, since the context model (1979). provides good fits to data in numerous ex- In summary, the basic hypothesis being periments, the framework seems to hold to- proposed is that the multiplicative rule of gether very well. The paradox in this formu- computing stimulus similarity arises asa spe- lation, however,isthat the mapping hypothesis cial case of psychological distance between has been explored systematically by Shepard stimuli conforming to the.city-block metric, etal.(1961)andrejectedonempirical grounds. and ofstimulus similarity beinganexponential Toresolvethis discrepancy it isworthwhile to decay function of psychological distance. examine the results of Shepard et al. in some The second part ofthis hypothesis receives detail. some independent support from earlier work Shepard etal.(1961)studiedthe relationship by Shepard (1957, 1958a, 1958b) regarding between identification and classification per- the relationship between stimulus generaliza- formance for sets of..eightstimuli that varied tion (measured in terms of confusion errors along.three binary-valued separable dimen-. in absolute identification tasks) and psycho- sions. In the classification tasks, each set was logicaldistance. Heconcluded that therelation divided into two categories of four stimuli was well described by an exponential decay each. In general, given any eight unique stim- function on the basis of both empirical data uli,there are 70distinct waystopartition them and more primitive underlying theoretical as- into two groups offour.However,the Shepard sumptions. Luce (1963) also incorporated the et al. (1961) investigation was simplified by assumption that similarity is an exponential the fact that, for stimuli varying along three decayfunction ofpsychologicaldistance inthe binary-valued dimensions, the 70 partitions development of his choice theory. fallinto sixdistinct types.The partitions within More recently, Getty, Swets, Swets, and each type are structurally equivalent in the Green (1979) utilized this assumption with a following sense: Any 2 partitions that are of great deal of success in predicting subjects' a given type can be transformed into one an- confusion errors inan identification task from other by a reassignment of dimension roles. their similarity ratings of the same stimuli. (For example, a partition that is formed by First, theyapplied amultidimensional scaling placing allblack stimuli in category X and all procedure to the similarity judgments to con- white stimuli in category Y is structurally struct a psychologicalspace and to obtain the equivalent to. a partition that is formed by locations of the stimuli in that space. In pre- placing all large stimuli in category X and all dicting the subjects' confusion errors on the small stimuli in category Y.Note that both of subsequent identification task, they then em- these partitions are.structured according to a ployed Luce's (1963) choice model with the singlestimulus dimension; in the firstcasethe assumption that stimulus similarity is an ex- relevant dimension is color, whereas in the ponential decayfunction ofpsychological dis- second case the relevant dimension is size.) tance. The fit of the model to data was ex- The sixbasicclassificationtypesare illustrated cellent. schematically in Figure 1.So,forexample, for 108 ROBElIT M. NOSOFSKY .6. A. 6. .6. stimuli are assigned to category X, and allwhite . 0 .6 stimuli to category Y, except that the small, . AD white square isswitched with the small, black o . a square. ~, 6 . Shepard et aI.(1961) ran a series of exper- D . 6 . D iments inwhichsubjectswererequired tolearn .I n' each of the classification types illustrated in Figure 1.Prior totheseclassificationtasksthey A6. A~ ran an identification condition in which sub- .D . . . jects wererequired to learn aunique response 6. 6 . 0 to each stimulus. In both the identification condition and the various classification con- D 6 6 ditions the procedure used was the standard . D D . . D " paired-associate learning paradigm, in which 1\7- :vI' a stimulus was presented, the subject made a response, and the correct response was then Figure 1. Schematic illustration ofthe six classification provided by the experimenter. Learning on typesinvestigated byShepard etal.(1961). (Within each eachcondition continued for400trials,oruntil box the four stimuli on the left belong in one classand asubject reached acriterion of32consecutive the four stimuli on the right in the other class. From correct responses. The total number oferrors "Learning and memorization ofclassifications" by R.N~ Shepard, C. I. Hovland, and H. M. Jenkins, 1961,Psy- made during the learning ofeach classification chological-Monographs. 75. 13, Whole No. 517, p.' 3. type was recorded. Shepard et aI. then used Capyright 1961bytheAmericanPsychologicalAssociatioo. what isessentially the mapping hypothesis to Reprinted by permission.) predict subjects' classification performance from their identification performance, and compared the predicted total number oferrors the TypeIIpartition (seeFigure 1),the subject for each classification type with the observed should respond X ifthe stimulus istriangular total number of errors.3 The results are pre- and black, or if it is square and white; oth- sented in Figure 2. As is evident, the simple erwise,the subject should respond Y.Another mapping hypothesis fails.First, the predicted example of a Type II partition, with inter- number oferrors exceeded the observed num- changed dimension roles, would be the fol- ber of errors, with the magnitude of the dis- lowing:"Respond X ifthestimulus istriangu1ar crepancies varying among the different types. and large, or ifit issquare and small," and so Second, the observed amount of variation in forth. number of errors among the types exceeded A graph-theoretic representation of the six the predicted amount ofvariation. And third, classificationtypesthat prevides further insight the observed rank order of difficulty between into their structure ispresented in Shepard et Type II and Types III-V was reversed from aI. (1961, p. 4). For present purposes, the what was predicted. reader should note that only one dimension There are acouple ofplausible explanations is relevant for performing a Type I classifi- for the discrepancies in the predicted and ob- cation, and two dimensions are relevant for performing a Type IIclassification. The Type VI classification isan extreme.case in which JA precise specification of the method of prediction allthreedimensions areequallyrelevant.Types would go beyond the scopeof the present article. In its III, IV,and V are intermediate in structural essentials,howeve.;itwasan application ofthe mapping hypothesis;namely,onepredictsclassificationerrorsfrom complexity between Types I and II and Type identification errors bycumulating overallstimulus con- VI, with all three dimensions being relevant, fusions in the identification condition that would result but to varying extents. These three types may in between-category confusions in the c1assificationcon- bedescribedassingledimensionplus exception dition. Thereadershould notethat although subjectspar- ticipated in six classification conditions, the predictions classifications, with the precise nature of the are all made from a singleidentification condition. One exception varying among the types. For ex- cumulates overdifferent cellsinthe identification matrix ample, forthe Type Vclassification, allblack foreach particular classificationprediction. ~ CHOICE. SIMILARITY. AND CLASSIFICATION 109 lows that the similarity parameters that de- termine performance may depend systemati- callyonthe structure ofthe categories aswen. Indeed, Medin (1982) hasbegun toinvestigate aspects of category structure that might be relevant in this regard. , These lines of reasoning may be modeled within the current theoretical framework in a straightforward way. We have assumed that the similarity between stimuli t and x isgiven by 8(t, x) = e-(dl+d2+".+d~), (10) where dj = qjt (O:s:qj < (0), if tj". Xj; and @ dj=0,iftj=xj' Analternative wayofwriting CD this equation is 10 10 10 .0 so 10 '70 8't, x) = e-D( I+...2+ ~), (11) ""eDeTlo NUWKIII Of' ellllllORS Figure 2. Observed number of errors made during the where D = L7-1qjt wj = qj/D = Wj' if tj". learning ofeach type ofc1assi1ication.plotted against the Xj; and wj = 0, if tj = Xj' Note that 0 :s: number oferrorspredictedtTomtheidentificationlearning wj:S: 1and L Wj = 1.TheD parameter may results. (From "Learning and memorization of c1assi1i- beinterpreted astotal distance inpsychological cations", by R. N. Shepard, C. I. Hovland, and H. M. space, and isanalogous to asensitivity param- Jenkins, 1961.PsychologicalMonographs. 75.13.Whole No.5 17,p. 27.Copyright 1961bytheAmerican Psycho- eter, whereas the Wj parameters are weights logicalAssociation. Reprinted bypermission.) assigned to each dimension in computing overalldistance. Assuming that thecomponent dimensions are of roughly equal perceptual served patterns of performance. A partial an- salience, then the Wj parameters model the effectofattending selectivelyto thecomponent swer might be that there was a generalized dimensions.4 increase insensitivityin theclassificationtasks Now, further assuming that the context- relative to the identification one. This.might model response rule (Equation 1)doesindeed have occurred for various reasons, including the fact that all the classification tasks were govern subjects' classification probabilities, then it ispossible to determine for any given conducted subsequent totheidentification one. value of D the distribution of dimension Amore compelling explanation, however.:)ug- weightsthat optimizes performance inagiven gested by Shepard et al. (1961), is that some classification paradigm. (By optimize, here, I process ofselectiveattention wasinvolved. To mean maximize the average percentage cor- perform a Type I classification, for instance, rect.) And itisreasonable to hypothesize that, asubject need attend onlytoonedimension-. with learning, subjectswilldistribute attention color in the example in Figure 1.In an iden- among the component dimensions in a way tification task, on the other hand, all three that tends to optimize performance. This ap- dimensions are relevant and the subject must proach was used by Getty et al. (1979) and divide attention. Note that in a Type VIclas- Getty, Swets, and Swets(1980) in predicting sification, all the dimensions are equally rel- evant, and that the classification results are fairly well predicted from the identification results in this case. · R. M. Shifliin (personal communication, March 16, Medin and Schaffer (1978) noted that se- 1983)suggestedthat theDparameter maybeinterpreted lective attention might play a central role in asconsistingoftwocomponents, oneassociatedwitheen- determining the similarity parameter values era!sensitivity and the other with someform ofresource sharing. According to this view,there may be some rela- inthecontext model.Therefore, ifthe focusing tionship between the wlue ofD and the distribution of of attention is influenced by the structure of weights.This isan interestiDapossibilitywhich awaitsfu- the categories, as suggestedearlier, then it fol- ture investigation. 110 ROBERT M. NOSOFSKY stimulus-response confusion matrices in lationship observed by Shepard et aI. To the identification experiments. They obtained extent that the relationships among the theo- some 'support for the hypothesis that, with retical predictions mirror the empirical rela- learning, subjects' weights of component di- tionships in Figure 2, support for the present mensions tended toward optimum. I now use modeling account is obtained. The reader the same approach in an attempt to model should note, however, that the present for- the identification-classification relationship mulation is intended to serve only as an ap- observed by Shepard et aI. (1961). proximation to the qualitative pattern of re- The context model's predictions of the av- sults in Figure 2. The model used here is a erage percentage of errors for each classifi- static one, whereas Shepard et aI.'s results cation type investigated by Shepard et aI. emerged from a dynamic learning process. A (1961) are plotted in Figure 3. These predic- complete quantitative account ofthose results tions assu~e two different combinations ofD would undoubtedly need to specifythe wayin ",' (overall sensitivity) and w(distribution of at- which D and wchange with experience. tention). I use the percentage of errors here, The predictions plotted against the hori- rather than the percentage correct, tofacilitate zontal axis in Figure 3assume a uniform dis- comparison with Shepard et aI.'s results in tribution of weights, that is, WI= W2= W3, Figure 2. In case the method of prediction is with D = 3.Assuming that subjects distribute not evident to the reader, a brief example is attention equally among the component di- provided in the Appendix. The purpose of mensions inthe identification task, then these plotting these theoretical predictions against predictions are analogous to predicting clas- one another is to gain an abstract character- sification performance from identification ization of the identification-classification re- performance bydirectly applying the mapping ~ 60 o a: a: w 50 wI/) ClI- < a:CI 40 < @ ow.<..J / I- I ;:30 a: n. ..20 / €' 0 l fw..;.)Ji'; a 0 @ / I l- 10 X W I- Z o o u o o 10 20 30 40 50 60 CONTEXTMODEL PREDICTED AVERAGE % ERRORS D-3, ECUAL WEIGHTS Figure3.Context model predictions ofthe averagepercentage oferrors foreach classificationtype fortwo different combinations ofDand w. III CHOICE, SIMILARITY, AND CLASSIFICATION hypothesis,except that the predictions are now TableI being made by a mathematical model, rather Distributionsof WeightsThatAreOptimalfor than byactual cumulation overobserved iden- EachClassificationType tification results. The predictions plotted Optimal against the vertical axis in Figure 3 assume dimensionweights an optimal distribution of weights for each Oassitication classification type and assume that overall type I 2 3 sensitivity has increased from D = 3to D = I 1.00 0.00 0.00 I 5. The increase in the value of D represents II 0.50 0.50 0.00 the generalized increase in sensitivity that III 0.35 0.35 0.30 might have occurred in the classification con- IV 0.33 0.33 0.33 V 0.46 0.27 0.27 ditions because they were conducted subse- VI 0.33 0.33 0.33 quent to the identification condition; this as- sumption isnot crucial, however,forbringing Note. Dimension weights that optimize the averageper- out the relationships seen in Figure 3. The centagecorrect foreach classificationtype, subject tothe particular values of D chosen in Figure 3 are constraint that D= 5.Dimensions 1,2, and 3correspond arbitrary, andsimilarpatterns ofresultsemerge tocolm;shape,andsize,respectively,asillustratedinFtgUI'e I.Tbeoptimaldimension weightsforTypesIIIand Vwill i.i forawiderange ofDvalues.The distributions varydepending onthe valueofD;the optimal dimension of weights that are optimal for each classifi- weightsforTypes I, II,IV,and VI are stable. cation type are summarized in Table 1.s The representation in Figure 3 appears to reflect fairlywellthe qualitative pattern of re- compared identification and classification sults in Figure 2. First, the pattern of im- performance.Asstatedpreviously,thecontext- provement foreach claSsificationtype, relative model responserule arises as a logicalcon- to what ispredicted bya direct application of sequenceofintegratingthe mappinghypoth- the mapping hypothesiS;agrees fairly system- esis with Luce's (1963) choice model for atically with the findings of Shepard et al. stimulusidentification.Accordingto the for- (1961): A good deal of improvement occurs mulation developedhere, however,the map- for Type Iand Type II classifications, less for pingrelationthatlinksidentificationandclas- TypesIII-V, and stilllessforTypeVI.6Second, sificationperformance may not be a direct the amount of variation in number of errors among the types is increased with optimal , Tbe weights (w)that are optimal foreach classification weights as compared with equal weights. type maybedetermined bythe classicalmethods ofcon- Third, and perhaps most important, the pre- strained optimization; namely, one solvesfor the wi and dicted order ofdifficulty forTypeIIand Types >.that maximize the Lagrangian function III-V also parallels Shepard et al.'s results: L(w.>')= ~ P(XIx.D.w)+ ~ P(YlJ. D.w) Performance forTypes III-V exceeds that for .ex ,er Type II with equal weights, but the order re- - ~ + >.(1 wi). verseswith optimal weights. This reversal was a highly emphasized result in Shepard et al.'s (where 0 ~ Wi~ I) by using standard methods of the theoretical discussion. The reader may note calculus(seeDaeIlenbachctOeorwe,1978).Forsomeclas- sification types, howeYel;the computation is difficult. in addition that ifShepard et al.'s findings on Therefore, oneusesacomputerized paralDeteMean:h rou- the III-IV-V triad are taken to be perfectly tine tofindthe WJSthat maximize the averaaepercentqe reliable, then the present formulation makes correct foreach classificationtype. a perfect prediction of the rank order of dif- 6 ~ba~ the major sbortcomin&ofthe representation inFJgUre3isthatperformanCe00the1yPeVIclassification ficulty for the six types, both the order pre- ispredictedtobesubstantiallybetterthan whatispredicted dicted from the identification results (equal byadirectapplication ofthemapping hypothesis(whereas weights)andthe observed classification results Shepardetal.,1961.obserYedlittleimprovement). Actually. (optimal weights). ifthere were no increase in aeneral sensitivity (D). then In summary, the interpretation of the con- there would be no predicted improvement for the 1yPe VI..,..ftificsWoo,becausetbeoptimaldistribution ofeights text model that is offered bere is capable of forthistypeistheuniform one.Unfortunately, therewould mirroring thequalitative pattern ofresults that alsobelittlepredicted imprOYementforthem-IV-Vtriad Shepard et al. (1961) observed when they in thiscircumstance; . .' 112 ROBElU M. NOSOfSKY one,but amore abstract linkage. Inparticular, the similarity parameters that determine per- formance maychangesystematicallyaccording to,the structure of the choice paradigm that is investigated. A good working hypothesis is that the similarity parameters tend toward what isoptimal for maximizing performance. The optimal pattern of dimension weights being posited here is an attempt to quantify the processes of selective attention and ab- straction that may mediate identification and classificationperformance. Medin etal.(1983) have demonstrated, however,that the process of abstraction does not occur automatically, but depends crucially on experimental con- ditions. One experiment with particularly fa- vorable conditions was reported by Medin et al. (1983)is presentedin Table 2. The esti- al. (1983, last-name-infinite condition). Sub- mated similarityparameters that achieveda jects were required to classify photographs of bestfitofthecontext modeltotheclassification women's faces into two categories. The ex- data were PI = .23,P2 = .72,P3= .25, and perimenters coded the facesas varying along P4= .24. These similarity parameters may be four binary-valued dimensions: color of hair transformed todistances (qj), according tothe (light or dark), color of shirt (light or dark), present formulation, by setting type of smile (open or closed), and length.of qj=-Inpj. hair (longorshort).Theyinformed the subjects that these were the relevant dimensions for Total distance in psychological space (D) is performing the classification.(The facesvaried then given by 4 freely on all irrelevant dimensions.) The im- D= L qj portant manipulation in this experiment was j-I . that subjects were never presented with the and the weights corresponding to each di- same face twice; rather, a different face was mension by usedeachtime to instantiate the logicalcoding =qj/D. corresponding to each abstract stimulus. Wj It was expected that this manipulation The weightsthat are obtained bytransforming would enhance the processes of selective at- the similarity parameters in this manner are tention and abstraction in subjects' classifi- plotted in Figure 4 (solid curve). Also plotted . cation learning fortworeasons. First, because inFigure4aretheweightsthat aretheoretically subjects would never be retested on any par- optimal according to the present formulation ticular exemplar, the use of a rote paired-as- (dashed curve). That is, these are the weights sociatelearning strategywouldbediscouraged. that would have maximized subjects' average Under these experimental conditions, the use percent-correct scoresinthis experiment. The of a rote learning strategy would benefit sub- close correspondence between the best-fitting sequent performance far less than a strategy weights and the theoretically optimal weights in which subjects attempted to abstract the suggeststhat the current framework isworthy relevant category structure. A second reason, of further investigation. closelyrelated to the first,isthat the potential Summary size of the exemplar population that defined the categories in this experiment was essen- To summarize, in this article I attempted tially infinite. Various researchers have found to relate Medin and Schaffer's (1978) context that the process ofabstraction improves with theory toamoregeneraltheoretical framework increases in category size(e.g., Homa, Cross, forthe modeling ofchoice and similarity. The Cornell, Goldman, & Schwartz, 1973). choice rule for classification was related to The category structure used by Medin et Luce's (1963) choice model for identification
Description: