1/13/03 An Integrated Theory of the Mind John R. Anderson and Daniel Bothell Psychology Department Carnegie Mellon University Pittsburgh, PA 15213 (412) 268-2781 [email protected] [email protected] Michael D. Byrne Psychology Department Rice University Houston, TX 77005 [email protected] Christian Lebiere Human Computer Interaction Institute Carnegie Mellon University Pittsburgh, PA 15213 [email protected] Friday, September 27, 2002 1 1/13/03 Abstract There has been a proliferation of proposed mental modules in an attempt to account for different cognitive functions but so far there has been no successful account of their integration. ACT-R (Anderson & Lebiere, 1998) has evolved into a theory that consists of multiple modules but also explains how they are integrated to produce coherent cognition. We discuss the perceptual-motor modules, the goal module, and the declarative memory module as examples of specialized systems in ACT-R. These modules are associated with distinct cortical regions. These modules place chunks in buffers that project to the basal ganglia, which implement ACT-R’s procedural system, a production system that responds to patterns of information in the buffers. At any point in time a single production rule is selected to respond to the current pattern. This serial bottleneck in production-rule selection enables the coordination that results in an organized control of cognition. Subsymbolic processes serve to guide the selection of rules to fire as well as the internal operations of (some) modules and much of learning involves tuning of these subsymbolic processes. We describe empirical examples that demonstrate the predictions of ACT-R’s modules and also examples that show how these modules result in strong predictions when they are brought together in models of complex tasks. These predictions require little parameter estimation and can be made for choice data, latency data, and brain imaging data. 2 1/13/03 Psychology, like other sciences, has seen an inexorable movement towards specialization. This is seen in the proliferation of specialty journals in the field but also in the proliferation of special-topic articles in this journal, which is supposed to serve as the place where ideas from psychology meet. Specialization is a necessary response to complexity in a field. Along with this move to a specialization in topics studied, there has been a parallel move toward viewing the mind as consisting of a set of specialized components. With varying degrees of consensus and controversy there have been claims for separate mechanisms for processing visual objects versus locations (Ungerleider & Miskin, 1982), for procedural versus declarative knowledge (Squire, 1987), for language (Fodor, 1987), for arithmetic (Dehaene, Spelke, Stanescu, Rinel, & Tsivkin, 1999), for categorical knowledge (Warrington & Shallice, 1984), and for cheater detection (Cosmides & Tooby, 2000), to name just a few. While there are good reasons for at least some of the proposals for specialized cognitive modules1, there is something unsatisfactory about the result—an image of the mind as a disconnected set of mental specialties. One can ask “how is it all put back together?” An analogy here can be made to the study of the body. Modern biology and medicine have seen a successful movement towards specialization responding to the fact that various body systems and parts are specialized for their function. However, because the whole body is readily visible, the people who study the shoulder have a basic understanding how their specialty relates to the specialty of those who study the hand and the people who study the lung have a basic understanding of how their specialty relates to the specialty of those who study the heart. Can one say the same of the person who studies categorization and the person who studies on-line inference in sentence processing or of the person who studies decision making and the person who studies motor control? Newell (1990) argued for cognitive architectures that would explain how all the components of the mind worked to produce coherent cognition. In his book he described the Soar system, which was his best hypothesis about the architecture. We have been working on a cognitive architecture called ACT-R (e.g., Anderson & Lebiere, 1998) which is our hypothesis about such an architecture. It has recently undergone a major development into a version called ACT-R 5.0 and this form offers some important new insights into the integration of cognition. The goal of this paper is to describe this new version of the theory and draw out its implications for the integration of mind. Before describing ACT-R and the picture it provides of human cognition, it is worth elaborating more on why a unified theory is needed and there is no better way to begin than with the words of Newell (1990): A single system (mind) produces all aspects of behavior. It is one mind that minds them all. Even if the mind has parts, modules, components, or whatever, they all mesh together to produce behavior. Any bit of behavior has causal tendrils that extend back through large parts of the total cognitive system before grounding in the environmental situation of some earlier times. If a theory covers only one part or component, it flirts with trouble from the start. It goes without saying that there are dissociations, independencies, impenetrabilities, and modularities. These all help to break the web of each bit of behavior being shaped by an unlimited set of antecedents. So they are important to understand and help to make that 3 1/13/03 theory simple enough to use. But they don’t remove the necessity of a theory that provides the total picture and explains the role of the parts and why they exist (pp. 17-18). Newell then goes onto enumerate many of the advantages that a unified theory has to offer; we will highlight a particular such advantage in the next subsection. Integration and Application The advantage we would like to emphasize is that unification enables tackling of important applied problems. If cognitive psychologists try to find applications for the results of isolated research programs they either find no takers or extreme misuse (consider, for instance, what has happened with research on left-right hemispheric asymmetries in Education). Applications of psychology, such as education, require that one attend to the integration of cognition. Educational applications do not respect the traditional divisions in cognitive psychology. For instance, high-school mathematics involves reading and language processing (for processing of instruction, mathematical expressions, and word problems), spatial processing (for processing of graphs and diagrams), memory (for formula and theorems), problem solving, reasoning, and skill acquisition. To bring all of these aspects together in a cognitive model one needs a theory of the cognitive architecture (Anderson, 2002). Other domains of application are at least as demanding of integration. One of them is the development of cognitive agents (Freed, 2000). These applications involve assembling large numbers of individuals to interact; prominent among these are group training exercises. Another domain is multi-agent video games and other interactive entertainment systems. In many cases it is difficult to assemble the large number of individuals required to provide the desired experience for some individual. The obvious solution is to provide simulated agents in a virtual environment. In many cases it is critical that the simulated agents provide realistic behavior in terms of cognitive properties. The demand is to have entities that can pass a limited Turing test.2 Another application area which requires integrated treatment of human capabilities is human factors/human-computer interaction (see Byrne, 2003, for a review of cognitive architectures in this area). This field is concerned with behavior in complex tasks such as piloting commercial aircraft and using CAD systems. Such behavior involves the full spectrum of cognitive, perceptual, and motor capabilities. Salvucci’s Driving Example Salvucci’s (2001b) study of the effect of cell phone use on driving (see Figure 1) illustrates the use of cognitive models to test the consequences of artifacts and their interactions, and illustrates how integrated approaches and applied problems lead to a somewhat different and sterner measure of whether theory corresponds to data than typically applied in psychology. Of course, there have been a number of empirical studies on this issue and Salvucci subsequently conducted a number of these as a test of his model. However, he took as a challenge case whether he could predict a priori the effects of a cell phone’s use in a particular situation. If cognitive models are to be useful in this domain they should truly predict results rather than being fit to data. He already had developed an ACT-R model of driving (Salvucci, Boer, & Liu, 2001) and for this task he developed a model of using one of a variety of cell phones. He put these two models together to get predictions of the effects of driving on cell phone use and cell phone use on driving. Significantly, he did this without 4 1/13/03 estimating any parameters to fit the data because he had not yet collected any data. He was using established ACT-R parameters.3 It should also be emphasized that his model actually controls a driving simulator and actually dials a simulated cell phone. While his ACT-R model does not have eyes it is possible to reconstruct what the eyes would see from the code that constructs the representation for a human driver in the simulator. Similarly while the model does not have hands it is possible to insert into the input stream the results that would happen had the wheel been turned or a button pressed. ACT-R has a theory of how perceptual attention encodes information from the screen and a theory of how manual actions are programmed. While Salvucci has subsequently looked at more complex cell phone use, in this study he was interested in dialing the phone. He compared four ways of dialing: full manual, speed manual, full voice, and speed voice. Figure 2a shows the effect of driving on the use of various cell phone modes. Figure 2b shows results that he obtained in a subsequent experiment. The correspondence between model and data is striking. Being able to predict behavior in the absence of parameter estimation is a significant test of a model. In many applications, it is also a practical necessity. Of course, there is relatively little interest in the effect of driving on cell phone use; rather the interest is in the converse. Salvucci collected a number of different measures of driving. Figure 3 shows the results for mean lateral deviation from the center of the lane. Comparing the predictions in Figure 3a with the data in Figure 3b yields a classic glass half-full, half-empty result: The model succeeds in identifying that only the full-manual condition will have a significant impact on this measure. Much research in psychology would be satisfied with predicting the relative order of conditions. However, the absolute predictions of the model are way off. The ACT-R model is driving much better and would lead to unrealistic expectations about the performance of real drivers. This shows that ACT-R and Salvucci’s model are works in progress and indeed Salvucci has made progress since this report. However, the failings are as informative as the successes in terms of illustrating what a cognitive architecture must do. Note that Salvucci could have tried to re-estimate parameters to make his model fit—but the goal is to have predictions in advance of the data and parameter estimation. What a Cognitive Architecture Must be Able to Do More generally, what properties must a cognitive architecture strive for if it is to deliver on its promise to provide an integrated conception of mind? The example above illustrates that one will not understand human thought if one treats it as abstract from perception and action. As many have stressed (e.g., Greeno, 1989; Meyer & Kieras, 1997), human cognition is embodied and it is important to understand the environment in which it occurs and people’s perceptual and motor access to that environment. Applications, particularly involving the development of cognitive agents, stress two other requirements. First, the worlds that these agents deal with do not offer a circumscribed set of interactions as occurs in a typical experiment. They emphasize the need for robust behavior in the face of error, the unexpected, and the unknown. Achieving this robustness goal is not just something 5 1/13/03 required for development of simulated agents; it is something required of real humans and an aspect of cognition that laboratory psychology typically ignores. Second, Salvucci’s application stresses the importance of a priori predictions. Rather than just predicting qualitative results or estimating parameters to predict quantitative predictions the ideal model should predict absolute values without estimating parameters. Psychology has largely been content with qualitative predictions but this does not leave it in a favorable comparison to other sciences. Requiring a priori predictions of actual values without any parameter estimation seems largely beyond the accomplishments of the field but this fact should make us strive harder. Model fitting has been criticized (Roberts & Pashler, 2000) because of the belief that the parameter estimation process would allow any pattern of data to be fit. While this is not true, such criticisms would be best dealt with by simply eliminating parameter estimation. Perhaps the greatest challenge to the goal of a priori predictions is the observation that behavior in the same experiment will vary with factors such as population, instructions, and state of the participants (motivated, with or without caffeine, etc.). In other sciences, results of experiments vary with contextual factors (in chemistry with purity of chemicals, how they are mixed, temperature) and the approach is to measure the critical factors and have a theory of how they affect the outcome, without estimating situation-specific parameters. Psychology should strive for the same.4 One major way to deal with such variability in results is to have a theory of learning that predicts how an individual’s past experience and the current experience shape behavior. Learning is at center stage in applications such as producing cognitive agents. A major criticism of the simulated agents that inhabit current environments is that they do not learn and adjust their behavior with experience like humans do. Of course, learning is also at center stage for any application concerned with education. The ACT-R 5.0 Architecture ACT-R claims that cognition emerges as the consequence of an interaction between specific units of procedural knowledge and specific units of declarative knowledge. The units of declarative knowledge are called chunks and represent things remembered or perceived. For instance, a chunk may represent the fact that 2+3=5 or that Boston is the capital of Massachusetts. For driving, chunks may represent numerous types of knowledge such as situational awareness (e.g. “there is a car to my left''), navigational knowledge (e.g. “Broad St. intersects Main St.''), or driver goals and intentions (e.g. “stop for gas at the next traffic light''). Procedural knowledge encodes the processes and skills necessary to achieve a given goal. The units of procedural knowledge are called productions, condition-action rules that “fire” when the conditions are satisfied and execute the specified actions. The conditions can depend on the current goal to be achieved, on the state of declarative knowledge (i.e. recall of a chunk), and/or the current sensory input from the external environment. Similarly, the actions can alter the state of declarative memory, change goals, or initiate motor actions in the external environment. Below is an English statement of a production rule from the driving model in Salvucci et al: IF my current goal is to encode a distant perceptual point for steering and there is a tangent point present (i.e., we are entering or in a curve) 6 1/13/03 THEN shift attention to this point and encode its position and distance. The first test in the condition above would be a test of the goal and the second of the contents of the visual system. The action of this production requests that a visual representation be built. Figure 4 illustrates the basic architecture of ACT-R 5.0. There are a set of modules devoted to things like identifying objects in the visual field, controlling the hands, retrieving information from declarative memory, or keeping track of current goals and intentions. The central production system is not sensitive to most of the activity of these modules but rather can only respond to information that is deposited in the buffers of these modules. For instance, people are not aware of all the information in the visual field but only the object they are currently attending to. Similarly, people are not aware of all the information in long-term memory but only the fact currently retrieved. Each module makes this information available as a chunk in a buffer. As illustrated in Figure 4 the core production system can recognize patterns in these buffers and make changes to these buffers – as for instance, when it makes a request to perform an action in the manual buffer. In the terms of Fodor (1983) the information in these modules is largely encapsulated and they communicate only through the information they make available in their buffers. The theory is not committed to exactly how many modules there are but a number have been implemented as part of the core system. The buffers of these modules hold the chunks that the production system can respond to. Particularly important are the goal buffer, the retrieval buffer, two visual buffers, and a manual buffer. The goal buffer, which we associate with dorsolateral prefrontal cortex (DLPFC), keeps track of one’s internal state in solving a problem. The retrieval buffer, in keeping with the HERA model (Nyberg, Cabeza, & Tulving, 1996) and other recent neuroscience models of memory (e.g., Buckner, Kelley, & Petersen, 1999; Wagner, Pare-Blagoev, Clark, & Poldrack, 2001), is associated with the ventrolateral prefrontal cortex (VLPFC) and holds information retrieved from long-term declarative memory.5 This distinction between DLPFC and VLPFC is in keeping with a number of neuroscience results (Petrides, 1994; Fletches & Henson, 2001; Thompson-Schill et al., 1997; Braver et al, 2001; Cabeza, et al, 2002). The other three modules/buffers are all based on Byrne and Anderson’s (2001) ACT-R/PM, which in turn is based on Meyer and Kieras’s (1997) EPIC. The manual buffer is responsible for control of the hands and is associated with the adjacent motor and somatosensory cortical areas devoted to controlling and monitoring hand movement. One of the visual buffers, associated with the dorsal “where” path of the visual system, keeps track of locations while the other, associated with the ventral “what” system, keeps track of visual objects and their identity. The visual and manual systems are particularly important in many tasks that ACT-R has dealt with like a participant scanning a computer screen, typing, and using a mouse at a keyboard. There also are rudimentary vocal and aural systems. Each of the buffers can hold a relatively small amount of information. Basically, the content of a buffer is a chunk. Chunks that were former contents of buffers are stored in declarative memory. In this way ACT-R can remember, for instance, objects it has attended to or solutions to goals that it has solved. The buffers are conceptually similar to Baddeley’s (1986) working memory “slave” systems. While the central cognitive system can only sense the buffer contents, the contents of chunks that appear in these buffers can be determined by rather elaborate systems within the modules. For instance, the 7 1/13/03 chunks in the visual buffers represent the products of complex processes of the visual perception and attention systems. Similarly, the chunk in the retrieval buffer is determined by complex long-term memory retrieval processes, as we will describe. Bringing the Buffers Together ACT-R 5.0 includes a theory of how these buffers interact to determine cognition. The basal ganglia and associated connections are thought to implement production rules in ACT-R. The cortical areas corresponding to these buffers project to the striatum, part of the basal ganglia, which we hypothesize performs a pattern-recognition function (in line with other proposals e.g., Amos 2000; Frank, Loughry, & O'Reilly 2000; Houk & Wise, 1995; Wise, Murray, & Gerfen, 1996). This portion of the basal ganglia projects to a number of small regions known collectively as the pallidum. The projections to pallidum are substantially inhibitory and these regions in turn inhibit cells in the thalamus, which projects to select actions in the cortex. Graybiel and Kimura (1995) have suggested that this arrangement creates a "winner-lose-all" manner such that active striatal projections strongly inhibit only the pallidum neurons representing the selected action (which then no longer inhibit the thalamus from producing the action). This is a mechanism by which the winning production comes to dominate. According to Middleton and Strick (2000), at least 5 regions of the frontal cortex receive projections from the thalamus and are controlled by this basal ganglia loop. These regions play a major role in controlling behavior. Thus, the basal ganglia implement production rules in ACT-R by the striatum serving a pattern- recognition function, the pallidum a conflict-resolution function, and the thalamus controlling the execution of production actions. Since production rules represent ACT-R’s procedural memory this also corresponds to proposals that basal ganglia subserve procedural learning (Ashby & Waldron, 2000; Hikosaka Nakahara, Rand, Sakai, Lu, Nakamura, Miyachi, & Doya, 1999; Saint-Cyr, Taylor, & Lang, 1988). An important function of the production rules is to update the buffers in the ACT-R architecture. The well-characterized organization of the brain into segregated, cortico-striatal- thalamic loops is consistent with this hypothesized functional specialization. Thus, the critical cycle in ACT-R is one in which the buffers hold representations determined by the external world and internal modules, patterns in these buffers are recognized and a production fires, and the buffers are then updated for another cycle. The assumption in ACT-R is that this cycle takes about 50 msec to complete – this estimate of 50 msec as the minimum cycle time for cognition has emerged in a number of cognitive architectures including Soar (Newell, 1990), CAPS (Just & Carpenter, 1992), and EPIC (Meyer & Kieras, 1997). The architecture assumes a mixture of parallel and serial processing. Within each module there is a great deal of parallelism. For instance, the visual system is simultaneously processing the whole visual field and the declarative system is executing a parallel search through many memories in response to a retrieval request. Also, the processes within different modules can go on in parallel and asynchronously. However, there are also two levels of serial bottlenecks in the system. First, the content of any buffer is limited to a single declarative unit of knowledge, a chunk. Thus, only a single memory can be retrieved at a time or a single object encoded from the visual field. Second, only a single production is selected at each cycle to fire. In this second respect, ACT-R 5.0 is like Pashler’s (1998) central bottleneck theory and quite different, at least superficially, from the other 8 1/13/03 prominent production system conceptions (CAPS, EPIC, and Soar). The end of the paper will return to the significance of these differences. Subsequent sections of the paper will describe the critical components of this model – the perceptual-motor system, the goal system, the declarative memory, and the procedural system. However, now that there is a sketch of what ACT-R 5.0 is, we would like to close this section by noting the relationship between the ACT-R 5.0 system described in Figure 4 and earlier ACT systems. Brief History of the Evolution of the ACT-R Theory ACT systems have historical roots in the HAM theory (Anderson & Bower, 1973) of declarative memory. ACT was created by marrying this theory with a production system theory of procedural memory (Newell, 1973b). Significant earlier embodiments of that theory were ACT-E (Anderson, 1976) and ACT* (Anderson, 1983). By the time ACT* had been formulated, a distinction had been made between a symbolic and subsymbolic level of the theory. The symbolic level consisted of the formal specification of the units of declarative memory (chunks) and the units of procedural memory (productions). The subsymbolic level consisted of the specification of continuously varying quantities that controlled the access to chunks and productions. In the case of declarative memory these quantities have always been referred to as activations, which reflect the past patterns of usage of the chunk. In the case of procedural memory the subsymbolic quantities have had various names but are currently called utilities that reflect the reinforcement history of the productions. ACT-R (Anderson, 1993) emerged as a result of marrying ACT with the rational analysis of Anderson (1990) that claimed that cognition was adapted to the statistical structure of the environment. The insight was that one could use rational analysis to design the subsymbolic computations that controlled access to information. According to rational analysis, the subsymbolic components were optimized with respect to demands from the environment, given basic computational limitations. A somewhat incidental aspect of the initial formulation of ACT-R in 1993, called ACT-R 2.0, was that a running simulation of the system was distributed. Owing to the increased power of computers, the relatively new standardization of Common Lisp in which ACT-R 2.0 was implemented, and the growth of understanding about how to achieve an effective and efficient simulation, this was the first widely available and functional version of the ACT architecture. A worldwide user community arose around that system. The system was no longer the private domain of our theoretical fancies and had to serve a wide variety of needs and serve as the basis of communication among researchers. This had a major catalytic effect on the theory and served to drive out the irrelevant and awkward assumptions. ACT-R 4.0 (Anderson & Lebiere, 1998) emerged as a cleaned-up version of ACT-R 2.0. It includes an optional perceptual-motor component called ACT-R/PM (Byrne & Anderson, 1998). While there are disagreements about aspects of ACT-R 4.0 in the community, it has become a standard for research and exchange. ACT-R 5.0 reflects a continued development of the theory in response to community experience. Over 100 models have been published by a large number of researchers based on the ACT-R 4.0 system. This productivity is a testimonial to the intellectual gain to be had by a well-working 9 1/13/03 integrated system. Table 1 summarizes the research areas covered by these models; detailed information is available from the web site act-r.psy.cmu.edu. A major commitment in the development of ACT-R 5.0 is that the models developed in 4.0 still work so that 5.0 constitutes cumulative progress. Differences between ACT-R 4.0 and 5.0 ACT-R 5.0 differs from ACT-R 4.0 in four principal ways. First, there have been some simplifications in the parameters and assumptions of the architecture. Some assumptions of ACT-R 4.0, while they seemed good ideas at the time, were not being exploited in the existing models and were sometimes being actively avoided. These were eliminated.6 Also as more models were developed it became apparent that there were some constraints on the parameter values that worked across models. These parameter constraints have moved us closer to the goal of parameter-free predictions and enabled an effort like Salvucci’s described earlier. Second, the tentative brain mapping illustrated in Figure 4 was not part of ACT-R 4.0. However, it seemed that ACT-R could be mapped onto the segregated, cortico-striatal-thalamic loops that had been proposed by a number of theorists, quite outside of ACT-R. Given this mapping it is now possible to deploy neuroimaging data to make novel, demanding tests of the ACT-R theory. Third, a key insight that this mapping onto the brain brought with it was the module-buffer conception of cognition. This enabled a more thorough integration of the cognitive aspects of ACT- R with the perceptual-motor components of ACT-R/PM. This complete integration and consequent embodiment of cognition is a significant elaboration of ACT-R 5.0 over ACT-R 4.0. Within most of the ACT-R community, which was relatively content with ACT-R 4.0 and which is not concerned with data from cognitive neuroscience, it is this module-buffer conception that has received the most attention and praise. Fourth, there now is a mechanism in ACT-R for learning new production rules, which has participated in a number of successful models of skill acquisition. While other versions of ACT have had production-rule learning mechanisms that worked in modeling circumscribed experimental tasks they failed to work in large-scale simulations of skill learning. As we will describe in the section on ACT-R’s procedural system, the successful definition of a general production-system learning mechanism also depended on the move to the buffer-based conception of production rule execution. These changes have not been without consequence for the theory. One of the consequences has been the treatment of declarative retrieval. Information retrieved from long-term declarative memory is an important part of the condition of production rules. For instance, in trying to simplify a fraction like 4/12 a critical test is whether there is a multiplication fact asserting that the numerator is a factor of the denominator (i.e. 4x3 =12 in this case). In all previous versions of ACT this test was performed by a single production requesting a retrieval (e.g., 4 x ? = 12), then waiting to see if the retrieval request was successful, and, if it was, examining what chunk was retrieved. This was implemented by an awkward process in which all production processing was suspended until the retrieval effort ran to completion. In contrast, in ACT-R 5.0 one production can make a retrieval request in its action and other productions can be fired while the retrieval request is being processed. When the retrieval is complete the result will appear in the retrieval buffer and another production can harvest it and 10
Description: