ebook img

Providing Effective Access to Shared Resources: A COIN Approach PDF

8 Pages·2003·0.77 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Providing Effective Access to Shared Resources: A COIN Approach

, Providing effective access to shared resources: a COIN approach Stephane Airiau David H. Wolpert Sandip Sen Kagan Turner Mathematical & Computer Sciences Dept NASA Ames Research Center 600 South College Avenue MS 269-2 Tulsa, OK 741 04 Moffett Field, CA94035 {stephane,sandip}bens.utulsa.edu {dhw,kagan}@emaila. rc.nasa.gov ABSTRACT As hand-coded, static procedures are likely to be brittle in Managers of systems of shared resources typically have many complex environments: we are interested in flexible, adap- separate goals. Examples are efficient utilization of the re- tive, scalable approaches to the resource sharing problem [2]. sources among its users and ensuring no user’s satisfaction One approach is to use distributed machine learning-based in the system falls below a preset minimal level. Since such agents each of which works exclusively to increase the sat- goals will usually conflict with one another, either implicitly isfaction of its associated user. While this is appealing, in- or explicitly the manager must determine the relative impor- dividual learners optimizing the “satisfaction” utility func- tance of the goals, encapsulating that into an overall utility tions of their users typically will not coordinate their activi- function rating the possible behaviors of the entire system. ties and may even work at cross-purposes, thereby degrading Here we demonstrate a distributed, robust, and adaptive the performance of the entire system. Well-known cases of way to optimize that overall function. Our approach is to this problem of global shared resources include the Tragedy interpose adaptive agents between each user and the sys- of the Commons [4].S uch problems highlight the need for tem, where each such agent is working to maximize its own careful design of the utility functions each of the learning private utility function. In turn, each such agent’s func- agents work to optimize. In general, this means having each tion should be both relatively easy for the agent to learn to agent use a utility function that differs from the satisfaction optimize, and “aligned” with the overall utility function of function of its associated user [ll].I n addition to ensuring the system manager - an overall function that is based on that the private utility functions of the agents do not induce but in general different from the satisfaction functions of the them to work at cross-purposes though, it is also necessary individual users. To ensure this we enhance the Collective that those functions can be readily optimized in an adaptive INtelligence (COIN) framework to incorporate user satisfac- manner, despite high noise levels in the environment. tion functions in the overall utility function of the system manager and accordingly in the associated private utility The field of mechanism design, originating in game the- functions assigned to the users’ agents. We present exper- ory and economics, appears to address this problem. How- imental evaluations of different COIN-based private utility ever it suffers from restrictions that diminish its suitability functions and demonstrate that those COIN-based functions for problems - like the ones considered here - involving outperform some natural alternatives. bounded-rational, non-human, noisy agents, where the vari- ables controlled by the agents and even the number of agents can be varied [15]. In contrast, the Collective INtelligence 1. INTRODUCTION (COIN) framework is explicitly formulated to address such One of the key problems confronting the designers of large- problems without such restrictions [16, 11, 141. In particu- scale, distributed agent applications is control of access to lar, its mathematics derives private utility functions to pro- shared resources. In order for the system to function well, vide the individual learning agents that are learnable, in ad- access to these shared resources must be coordinated to elim- dition to inducing the agents not to work at cross-purposes. inate potential problems like deadlock, starvation, livelock Such functions are designed both so that the agents can and other forms of resource contention. Without proper perform well at maximizing them, and so that as they do coordination, both individual user satisfaction and overall this the agents also “unintentionally” improve the provided system performance can be significantly impaired. “world utility” function rating behavior of the entire system. In the past, COIN techniques have been used for world util- ity functions that are fully exogenous, in that they do not reflect any satisfaction functions of a set of users of the sys- tem nor how best to accommodate those satisfaction func- tions. While achieving far superior performance in the d e mains tested to date than do conventional approaches like greedy agents and team games [lo, 16, 131, these tests do not assess how well COIN-based systems perform in situations like shared resource problems, in which the world utility is As system designer, our uncertainty concerning the state of not fully exogenous. In this paper we start by reviewing the system is reflected in a probability distribution P over C. the mathematics of collective intelligence. We then test the Our ability to control the system consists of setting the value techniques recommended by that mathematics on shared re- of some characteristic of the collective, which is denoted as source problems, using a world utility function that reflects its design coordinate. For instance, the design coordinate the satisfaction levels of the individual users of the system, can be setting the private utility functions of the agents. and in particular trades off concern that no individual’s sat- For a value s of the design coordinate, our analysis revolves isfaction be too low with concern that the overall system around the following central equation for P(B I s), which behave efficiently. We demonstrate that in such domains follows from Bayes’ theorem: COIN techniques also far outperform approaches like team games. We also validate some quantitative predictions of the COIN mathematics concerning how the world utility is I affected by various modifications to the COIN techniques. J dHGP(B1 Hg,S) dNgp(H0 I flgd W 9 Is) > v- 2. BACKGROUND ON COLLECTIVE IN- factoredness learnability TELLIGENCE explore vs. exploit The “COllective INtelligence” framework (COIN) concerns the design of large distributed collectives of self-interested where fig and Iif~a re the intelligence vectors of the agents agents where there exists a world utility function measur- with respect to the-private utility gs of q and the world ing the performance of the entire collective. The empha- utility g,r espectively. sis of the research is on the inverse problem: given a set of self-interested agents and a world utility function, how Note that N,,,g,,( z)= 1 means that agent q is fully rational should a designer configure the system, and in particular at z, in that its move maximizes the value of its utility, given set the private utility functions of the agents so that, when the moves of the agents. In other words, a point z where all agents try to optimize their functions, the entire col- (z)= 1 for all agents q is one that meets the definition lective performs better. In this section, we introduce the of a game-theory Nash equilibrium. On the other hand, a z mathematics of designing collectives. Because of space lim- at which all components of NG = 1 is a maximum p along itations, the concepts cannot be justified or elaborated in all coordinates of z. So if we can get these two points to be any detail; references to other papers with such details are identical, then if the agents do well enough at maximizing provided. their private utilities we are assured we will be near an axis- maximizing point for B. 2.1 Central Equation Let C be an arbitrary space whose elements z give the joint To formalize this, consider our decomposition of P(B I s). move of all agents in the collective system. We wish to search for the z that maximizes the provided world utility learnabilitv: if we can choose the -g lobal coordinate s function G(z). In addition to B we are concerned with pri- so that the third conditional proJability in the inte- vate utility functions {g,,}, one such function for each agent grand is peaked around vectors Ng all of whose com- q controlling z,,. We use the notation^q to refer to all agents ponents are close to 1 (that is agents are able to learn other than q. their tasks), then we have likely induced large (private utility function) intelligences. Intuitively, this ensures We need a way to “calibrate” the utility functions so that that the private utility functions have high “signal-tc- the value assigned to a joint action z reflects the ranking noise”. of z relative to all the other possible joint actions in C. We factoredness: by choosing s, if-we can also- have the denote as intelligence the result of this calibration process. second term be peaked about NG equal to Ng (that is The intelligence can be considered to be the percentile of the private utility and the world utility are aligned), states that would have resulted in having a worse utility (if 99% of the available actions lead to a worse utility, the then f l ~wi ll also be large. It is in the second term agent made a smart decision). In the COIN framework, the that the requirement that the private utility functions “intelligence for 77 at z with respect to U” is defined by: be “aligned with Q” arises. Note that our desired form for the second term in Equation 2 is assured if we have chosen private utilities such that Ng equals NG exactly for all z. Such a system is said to be factored. Finally, if the first term in the integrand is peaked where 0 is the Heaviside function ’, and where the subscript about high B when NG is large, then our choice of s on the (normalized) measure dp indicates it is restricted to will likely result in high E, as desired. z‘ sharing the same non-77 components as z. There is no particular constraint on the measure p other than reflecting On the other hand, unless one is careful, each agent will the type of system (whether C is countable or not, if not, have a hard time discerning the effect of its behavior on what coordinate system is being used...). its private utility when the system is large. Typically, each ‘The Heaviside function is defined to equal 1 when its ar- agent has a horrible “signal-to-noise” problem in such a sit- gument is greater or equal to 0, and to 0 otherwise. uation. This will result in a poor term three in the central Y equation. For instance, consider a collective provided by dinate, and a move by all agents other than q, z-,,. Define the human economy. A team game in that example would the associated learnability by mean that every citizen gets the national GDP as her/his reward signal, and tries to discern how best to act to maxi- mize that reward signal. At the risk of understatement, this would provide the individual members of the economy with a difficult reinforcement learning task. The expectation values in the numerator are formed by In this paper, we concentrate on the second and third terms, averaging over the training set of the learning algorithm and show how to simultaneously set them to have the desired used by agent q, n,,. Those two averages are evaluated ac- form in the next section. Hence, the research problem is to cording to the two distributions P(Uln,)P(n,Jz-,,, z,,’) and choose s so that the system both has good learnabilities for P(Uln,)P(n,,Iz-,,, zV2)r,e spectively. (That is the meaning its agents, and is factored. of the semicolon notation.) Similarly the variance being averaged in the denominator is over n,, according to the Mechanism design might, at first glance, appear to provide distribution P(Uln,)P(n, Iz-,,, z,,). us techniques for solving this version of the inverse problem. However while it can be viewed as addressing the second The denominator in Equation 4 reflects how sensitive U(z)i s term, the issue of learnability is not studied in mechanism to changing z-,,. In contrast, the numerator reflects how sen- design. Rather mechanism design is almost exclusively con- sitive U(z)i s to changing z,,. So the greater the learnability cerned with collectives that are at (a suitable refinement of) of a private utility function g,,, the more g,,(z) depends only an exact Nash equilibrium [6]. That means that every agent on the move of agent 7, i.e., the better the associated signal- is assumed to be performing as well as is theoretically possi- to-noise ratio for q. Intuitively then, so long as it does not ble, given the behavior of the rest of the system. In setting come at the expense of decreasing the signal, increasing the private utilities and the like on this basis, mechanism design signal-to-noise ratio specified in the learnability will make it ignores completely the issue of how to design the system so easier fof 7 to achieve a large value of its intelligence. This that each of the agents can achieve a good value of its pri- can be established formally: if appropriately scaled, gh will vate utility (given the behavior of the rest of the system). In result in better expected intelligence for agent 7 than will g,, particular it ignores all statistical issues related to how well 2’) whenever Af(gh;z -,,, s, z,,’, > Af(g,,; z-,,, s, z,,’, z,,’) the agents can be expected to perform for various candidate for all pairs of moves z,,’, z,, [14]. private utilities. One can solve for the set of all private utilities that are Such issues become crucial as one moves to large systems, factored with respect to a particular world utility. Unfor- where each agent is implicitly confronted with a very high- tunately though, in general a collective cannot both be fac- dimensional reinforcement learning task. It is its ignoring of tored and have infinite learnability for all of its agents [14]. this issue that means that mechanism design scales poorly to However consider difference utilities, of the form large problems. In contrast, the COIN approach is precisely designed to address both learnability issues as well as term U(Z)= P[G(z)- 0(.-,,)1 (4) 2. Any difference utility is factored [14]. In addition, for all 2.2 Wonderful Life Utility and Aristocrat Util- pairs z,,’, zV2,u nder benign approximations, the difference ity utility maximizing Af(U;z -,,, S,Z,,’,Z,,~) is found by choos- ing As an example of the foregoing, any “team game” in which eavlle rp raivs aitlel uusttrilaitteyd faubnocvtieo nins ethquea hl uBm aisn faeccotonroemd y[ le]x. amHopwle-, D(z-7)= Ef(B(.z)I .z-,,, s) , (5) team games often have very poor forms for term 3 in Equa- up to an overall additive constant, where the expectation tion 2, forms which get progressively worse as the size of value is over z,,. We call the resultant difference utility the the collective grows. This is because for such private util- Aristocrat utility (AU), loosely reflecting the fact that it ity functions each agent 77 will usually confront a very poor measures the difference between a agent’s actual action and “signal-tenoise” ratios ,in trying to discern how its actions the average action. If each agent q uses an appropriately affect its utility g,, = since so many other agent’s actions rescaled version of the associated AU as its private utility also affect 0 and therefore dilute 7’s effect on its own private function, then we have ensured good form for both terms 2 utility function. and 3 in Equation 2. We now focus on algorithms based on private utility func- Using AU in practice is sometimes difficult, due to the need tions {g,,} that optimize the signal/noise ratio reflected in to evaluate the expectation value. Fortunately there are the third term of the central equation, subject to the require- other utility functions that, while being easier to evaluate ment that the system be factored. We will introduce two than AU, still are both factored and possess superior learn- s. utility functions satisficing this criteria, namely the Won- ability to the team game utility, g,, = One such private derful Life Utility (WLU) and the Aristocrat Utility (AU) utility function is the Wonderful Life Utility (WLU). The WLU for agent q is parameterized by a prefixed clamping To understand how these algorithms work, say we are given parameter CL,, chosen from among 7’s possible moves: an arbitrary function f(z,) over agent 7’s moves, two such moves z,,‘ and z,,’, a utility U,a value s of the design coor- i WLU is factored no matter what the choice of clamping pa- We have experimented with the following world utilities: rameter. Furthermore, while not matching the high learn- ability of AU, WLU usually has far better learnability than does a team game, and therefore (when appropriately scaled) Avoid hurting any user: minimize G(Di) = max, ’Dt. 0 results in better expected intelligence [II, 16, 141. Plinimize a linear combination of the satisfaction of indi- 3. PROBLEM DEFINITION 0 vidual users where the weight associated with a user rep- Let consider a set of users and a set of rn shared resources/machinqesents the importance of that particular user: G(D~=) xi Different users have different task loads, each task being of win. one of T types. Any machine can perform any task type, but different machines have different processing speeds for each task types. A given user sends all of its tasks of a particular To have high average satisfaction without any user’s satis- 0 + xi type to a specific machine but can send tasks of different faction being low, minimize G(Di) = p * CTD wiVi, CTD types to different machines. So each user must decide, for being the variance in the dissatisfactions. each j E T, to what machine AS to send all of its tasks of type j. 3.3 Agent learning algorithms We consider a batch mode scenario, in which every user submits all of its tasks, the machines complete their tasks, Once the measure of dissatisfaction of each agent and the performance metric of the overall system have been defined, and the associated overall performance of the system is as- the remaining question is how to improve the performance. certained. Each machine works by grouping all the tasks In the experiments, we want to answer the question: what sent to it by type, and performs all tasks of one type con- tiguously before returning them to the associated users and is the function that each self interested agent has to im- prove? We refer to this function as the private utility then switching to the next type. The order of processing for the agent. In other words, given the agents’ satisfaction different task types is fixed for any given machine. Each model and the performance metric of the system, we want type is performed at a speed specific to the machine. to be able to tell the agents that the best way to improve the performance of the system (which include their own prefer- 3.1 Satisfaction functions of users ences) is to optimize a certain private utility. If they decide User i has a personal “satisfaction” utility function, N,, to optimize other criteria, the performance of the system which is an inverse function of the time that the user has will not be as good as if they follow the recommendation. to wait for all of his tasks to finish. The user may be less willing to wait for certain task types compared to others. We compared performance ensuring from four private util- The user can provide feedback or his model of satisfaction ity functions for the agents, the last two being those recom- to the system manager. mended by the COIN framework: a team game; each agent’s utility equals G,i .e. by choosing 0 We indicate the completion time of the,t asks of type j sub- its action, each agent is trying to optimize the performance mitted by user i to machine Ai by CT:. of the overall system (the agent only focuses on the result of the team, it does not consider its own performance); The functions defined below measure dissatisfaction rather 0 a greedy game: each agent i’s private utility is ‘FI, (the than satisfaction, in that they are monotonically increasing agent focuses only on its performance); functions of the delay. So the goal for agent i is to maximize 0 AU; to compute AU for an agent i E U,w e need to evaluate ‘Hi = -Vi,w here Di is defined as one of the following: B-CT=, B(i,j)w here B(i,j)d enotes the world utility when the maximum delay for user i, Di = maxjE{l..T) CT;, agent i sends its job to machine j while all other agents send an importanceweighted combination of the time of com- their jobs to the same machines they actually used. Thus, pletion of all the tasks of the user, Vi = ET=la jCT?. we need to re-evaluate the completion times of all machines V E {D,} is the set of the dissatisfactions of all users. m when the other agents maintain their decision while agent i sending its load to m). See [14] for a discussion of efficient evaluation of AU; 3.2 Defnition of the World Utility Function 4 0 WLU; to compute WLU for agent i, we chose to clamp to The world utility function measures the performance of the “null” the action of i. Hence, WLU, = - G(i,0 ). Thus we overall system. It is the responsibility of the system designer only need to evaluate what the completion time of the ma- to set this function. He might decide to measure perfor- chine chosen by i would be if 7 did not send any load to that mance by measuring some characteristics of the use of the machine. See [I41 for a discussion of efficient evaluation of resource only (e.g. the load distribution or the idle time). AU. The system designer might also incorporate user preferences to measure the performance of the system. In our simula- We had each agent use an extremely simple reinforcement tions, we have focused on that aspect by defining the world learning algorithm in OUI experiments, so that performance utility function as a function of the dissatisfactions of all more accurately reflects the quality of the agents’ utility users. This assumes that either the users have provided a functions rather than the sophistication of the learning scheme. model of their preferences to the system designer or that In this paper, each agent is “blind”, knowing nothing about the system designer has modeled them. In the former case, the environment in which it operates, and observing only users may be allowed to update their model. the utility value it gets after an iteration of the batch pro- cess. The agents each used a modified version a Boltz- comparaison between Greedy, Team Game, AU and WLU mann iearning algorithm to map the set of such utility val- temperature schedule uses a decay of 99% ues to a probability distribution over the (finite) set of pos- 200 sible moves, a distribution that is then sampled to select the agent’s move for the next iteration. Each agent tries to learn the estimated reward Ri corresponding to the ac- 150 tion i. An agent will choose the action i with probabil- (3 ity Pi = ,-P4 -PRj. The parameter 0 is the inverse -3.>- . 100 temperatuCrea.l l acDtiuonrsi nj ge round r, if the action i is chosen, P 2 the update of the estimated reward for action i is = CjEci keeC -, a (7-3) Rj,w here C, denotes the set of rounds 50 where action i was chosen, and Rj denotes the reward re- ceived at the end of the round j. The data aging parameter I I I I a models the importance of the choices made in the previous 0 100 200 300 400 500 600 rounds. In our modification, rather than have the expected iterations utility value for each possible move of an agent be a uniform average of the utilities that arose in the past when it made Figure 1: Comparison between different agent util- that move, we exponentially discount moves made further ities when B = maxi’Di into the’past, to reflect the non-stationarity of the system. 4. EXPERIMENTS dominate that of both Team Game and Greedy agent utili- ties. We found that at the end of the learning process, the We experiment with a domain where users are sending differ- standard deviation in Q for AU and WLU is small (respec- ent types of printing jobs to a shared pool of heterogeneous tively 2.1 and 1.08) whereas it is large for Greedy and Team printers. The different types of printing jobs may represent Game (respectively 14.4 and 24). This corroborates the rea- different kind of printing operations (for instance, printing soning that signal to noise is too large when we use Greedy black & white, printing in color) Though each printer is or Team Game, and therefore the learning agents are not able to process any job type, a given printer processes jobs able to determine the impact of their actions on their utility of different types at different speeds. For each printer, we functions. In contrast, as discussed above, AU and WLU are randomly choose the speed of processing each task type, and designed to have good signal to noise, which enables them processing occurs in the order of decreasing speed. to reach better (and more consistent) performance levels. Each user has a load of printing jobs to perform. We as- Note that with the greedy approach 9 worsens as each agent sign one agent to handle all tasks of a given type for a given learns how to maximize its utility. This is an example of a user, and its job is to select a printer to which these tasks tragedy of the commons: since the Greedy utility is not would be sent. Each run starts with an initial sequence of factored with respect to 4, the associated Nash equilibrium iterations in which the agents make purely random moves to - achieved as the agents learn more - has poor E. generate data for the learning algorithms. We then succes- sively’ “turn on” more and more of those algorithms i.e., at In Fig. 2, is instead thxei w eighted sum of the dissatisfaction each successive iteration, a few more agents start to make of the users, i.e., Q = w,V,. In figure 3 it is the sum of their moves based on their learning algorithm and associ- all the users dissatisfactions with the standard deviation of ated data rather than randomly. We start each agent with a those dissatisfactions, i.e., Q = ,B*aw+Ei Vi.(we have used high Boltzmann temperature, usually 10, and decrease it at p = 1). The decay factor used in these experiments is 0.99. each iteration by multiplying it by a decay factor. We refer In all cases, AU and WLU achieve comparable performance, to a temperature schedule as slow when the decay factor is and outperform team game. large (for instance 99%), and refer to the schedule as fast when the decay factor is small (for instance 70%). We use Although the asymptotic performance of the team game is an aging parameter in forming expectation values of 0.1. not as good as that of AU or WLU, in both of these exper- iments team game performs better than AU or WLU in the In the experiments reported here, the dissatisfaction of a given user is Vi = ET a .CT; where the aj are randomly etiamrley opf hQas efo ro f AtUh ea lneda rWninLgU. Iisn salodwdietri oinn tthhees ec oenxvpeerrgiemnecnet isn. chosen and remain tKZ’sime throughout the learning pro- This is consistent with COIN theory, which says that simply cess. We used 10 users, 8 types and 6 machines. changing to a more learnable utility function may actually lead to worse performance if the learning temperature is not 4.1 Results changed simultaneously. Intuitively, to achieve their better We first considered the case where the World Utility hnc- signal to noise ratios, AU and WLU shrink both the signal tion Q is computed as the maximum of the dissatisfaction and the noise, and therefore need both to be rescaled up- of the user, i.e., Q = max, Dz.T he goal is to minimize the ward. In the context of Boltzmann learning algorithms, this maximum dissatisfaction among the users. The decay fac- is equivalent to lowering their temperatures. tor used is 0.99. In Fig. 1 we provide performance for the different agent utility functions averaged over 10 runs. The To investigate this issue, in a second set of experiments performances achieved by AU and WLU are similar, and we used different decay factors and faster schedules. In comparaison with temperature decay = 99% - 40 *.- t&i? 3305 -- : TeamG ame ' - ' .......... jI,\ . -- 5 25 - d0c 20 - F -u 0 - 40 1I05 - 'x . .AW...U..L. .U... .......... x- ................... e--.----.... .........=.......i - 20 06 065 07 075 08 085 09 095 1 temperature decay 0 0 100 200 300 400 500 600 700 ci iterations Figure 4: Different decay factors (G = w,Di). Figure 2: Comparisxoni b etween different reward in the case where G = wiVi. comparaison with temperature decay = 65% I I l comparaison with temperature decay = 99% 120 (i I I I I I , I - 100 Team Game - 80 Team Game - 60 g - I, ....- .Ill' .... II1.-.lI1 ....... I. ....... 11' ...... deIc1 .'...a ..y 6115' ....% 11u1s ing WLU - 10 decay 65% using AU 3 40 , 0 I I I I I I - 0 50 100 150 200 250 300 350 400 20 iterations 0 using AU I ci I I I Figure 5: .Performance for = wiD,. 0 100 200 300 400 500 600 700 iterations Figure 3: Performances for S = u + ciDi . We also investigated the scaling properties of these approaches as one increases the size of the system and found that AU and WLU scale up much better than the team game ap- proach, again in accord with COIN theory. Intuitively, the Fig. 4 we present the asymptotic world utility value for larger the system, the worse the signal-to-noise problems Team Game, AU, and WLU for different decay factors and with utilities like team games, and therefore the greater the starting temperatures. In the experiments plotted we used gain in using a utility like AU or WLU that corrects for it. = C,wiDi, but we observed the same phenomena with In Fig. 6, we used a fixed number of types, 8, and a fixed the other discussed above: the team game never outper- ratio of 1 machine for 5 users. We increased the number forms AU or WLU. Moreover, if for each time-algorithm pair of users from 5 to 25. For each data point, we considered we used the schedule that is optimal for that pair, then is no a few scenarios (different machine configurations, different time t at which the performance of the team game is better loads distributed to users, etc.) and performed 10 runs for than the performance of AU/WLU. Fig. 5 illustrates this, each scenario. Each point represents the average over the showing that with a fast schedule having a decay factor of scenarios. The use of AU and WLU yield comparable re- 0.65, AU and WLU converge faster than team game. sults, and the difference in performance between them and Team Game increases with the number of agents. In addition to these advantages, we found that the perfor- mance with the team game is more sensitive to the change Since in general one cannot compute the best possible G in temperature schedule, again in accord with COIN theory. value, we cannot make absolute performance claims. Nonethe- Indeed, although convergence to asymptotia is faster for all less, both in terms of performance and scaling, use of WLU agent utilities with the faster schedule (see Fig. 5), from or AU is clearly preferable to a team game or greedy ap- Fig. 4 we see that both average asymptotic performance and proach, for several different choices for G. its standard deviation are substantially for the team game for the faster schedule. In contrast, AU and WLU do not suffer as much from the switch to a faster schedule. 5. RELATED WORK Problems of different sizes tribution of the paper is to show that the users, in order 8 types, 1 machine for 5 users to achieve satisficing performance of the overall system, had 180 better prefer to optimize COIN based private utilities. The results demonstrate that even in the case where the world 160 utility function is not exogenous, the previously superiority 140 of COIN technique holds. In particular, the utilities AU and WLU outperform locally greedy approaches and more 120 importantly, the Team Game approach, in terms of aver- age performance, variability in performance, and robustness against changes to the parameters of the agent's learning al- gorithm. Moreover, these improvements grow dramatically as the system size is increased, an extremely important con- sideration in future applications. ' , I 20 I I I I We are currently experimenting with aggregating several 40 60 80 100 120 140 160 180 200 printing decisions under the jurisdiction of one agent. This Number of Agents would reduce the number of agents in the system, and pos- sibly both speed up the convergence of the performance and Figure 6: Scaling properties for B = ciVi. improve it. In particular we will investigate different kind of aggregations, e.g., based on users, based on task types, crossing both, etc. There is also the interesting possibility Multiagent system researchers have studied balancing of load of learning the best aggregation of decisions into individual across shared resources with different decision-making pro- agents. cedures. These approaches include the following: Studying chaotic nature of resource loads when each agent 0 7. REFERENCES uses a greedy selection procedures [5]. 0 Effect of limited knowledge on system stability [7, 91. [l] R. H. Crites and A. G. Barto. Improving elevator 000 SUMosacirniakgle drt eimlienmefocmhrcaaen mpisremonbstl efloemra sro ntphintaigmt taiozrii snbega lwaqhnuecaneli tilyno-daodifvs-i sd[eSurIav. li caeg [e1n7ts] . pTAedoruvfaroenrtmczekasyn ,ic neM Nu. esCiun.r gaM lr eoIinznfeforo,rr macneamdti eoMnnt. PEler.ao Hrcneaisnsssgien.l gmI nSo yD, se.t deSmit. osr s- , try to greedily exploit shared resources [3, 111. 8, pages 1017-1023. MIT Press, 1996. Distinguishing easy vs. difficult resource allocation [12]. 0 [2] E. H. Durfee. Scaling up coordination strategies. IEEE Computer, 34(7 ):3946, July 2001. The typical assumption in most of this work is that each user has an atomic load to send to one of several equivalent, re- (3) N. S. Glance and T. Hogg. Dilemmas in sources. Our work addresses a more general scenario where computational societies. In First International a user has multiple task loads and resources are heteroge- Conference on Multiagent Systems, pages 117-124, neous. So not only may the decisions of the users conflict, Menlo Park, CA, 1995. AAAI Press/MIT Press. but decisions for different tasks taken by the same user can interfere with each other. Also, we explicitly handle the G. Hardin. The tragedy of the commons. Science, issue of user satisfaction metrics (something awkward to in- 162:1243-1248, 1968. corporate in load-balancing, for example) and variability in the world utility, and use the COIN procedure to ensure J. 0. Kephart, T. Hogg, and B. A. Huberman. that the combination of individual user satisfaction metrics Dynamics of computational ecosystems: Implications are optimized. The combination can include weights for dif- for DAI. In M. N. Huhns and L. Gasser, editors, ferent users and also, if necessary, reduce disparate levels of Distributed Artificial Intelligence, volume 2 of satisfaction from the outcome by minimizing the standard Research Notes in Artificial Intelligence. Pitman, 1989. deviation of satisfaction. With the COIN procedure, all of D. C. Parkes. iterative combinatorial auctions: Theory this is done using very simple agents, in a highly paralleliz- and practice, 2001. able fashion, with no modeling of the underlying dynamics of the system. S. K. Rustogi and M. P. Singh. Be patient and tolerate imprecision: How autonomous agents can 6. DISCUSSION coordinate effectively. In Proceedings of the [nternational Joint Conference on Artificial The COIN framework has been applied to many domains, [ntelligence, pages 512-51 7, 1999. including domains requiring sequence of actions. In these cases, the world utility function was decided first, and then [8] A. Schaerf, Y. Shoham, and M. Tennenholtz. Adaptive the local utility function of each agent was derived from it. load balancing: A study in multiagent learning. This paper differs in two main ways from the COIN appli- Journal of Artificial Intelligence Research, 2:4 75-500, cations studied so far. First, agents have preferences repre- 1995. sented by a dissatisfaction function. In previous work, the individual in the system did not have any intrinsic prefer- [9] S. Sen, N. Arora, and S. Roychowdhury. Using limited ences. Secondly, the world utility is based upon the prefer- information to enhance group stability. International ences of users and a system administrator. The main con- Journal of Human-Computer Studies, 48:69-82, 1998. 0 , _. I [lo] K. Tumer, A. K. Agogino, and D. H. Wolpert. Learning sequences of actions in collectives of autonomous agents. In Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems, pages 378-385, New York: NY, 2002. ACM Press. [ill K. Turner and D. H. Wolpert. Collective intelligence and braess ’ paradox. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, pages 104-109, Menlo Park, CA, 2000. AAAI Press. [12] H. van Dyke Pamnak, S. Brneckner, J. Sauter, and R. Savit. Effort profiles an multi-agent resource allocation. In Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems, pages 248-255, New York: NY, 2002. ACM Press. [13] D. H. Wolpert, S. Kirshner, C. J. Men, and K. Turner. Adaptivity in agent-based routing for data networks. In Proc. of the 4th International Conference on Autonomous Agents, pages 396-403, 2000. [Id] D. H. Wolpert and K. Tumer. Optimal payoff functions for members of collectives. Advances in Complex Systems, 4(2/3):265-279, 2001. [IS] D. H. Wolpert and K. Tumer. Beyond mechanGm design. In H. G. et al., editor, Proceedings of International Congress of Mathematicians. Qingdao Publishing, 2002. [16] D. H. Wolpert, K. R. Wheeler, and K. Turner. General principles of learning-based multi-agent systems. In Proceedings of the Third International Conference on Autonomous Agents, pages 77-83, New York: NY, 1999. ACM Press. [17] H. Yamaki, Y. Yamauchi, and T. Ishida. Implementation issues on market-based qos control. In Proceedings of the Third International Conference on Multi-Agent Systems, pages 357-364, 1998.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.