Altruism may arise from individual selection Angel Sa´nchez1 and Jose´ A. Cuesta GrupoInterdisciplinardeSistemasComplejos (GISC) DepartamentodeMatema´ticas 5 UniversidadCarlosIIIdeMadrid 0 28911Legane´s, Madrid,Spain 0 2 n a Abstract J 2 The fact that humans cooperate with non-kin in large groups, or with people 1 they will never meet again, is a long-standing evolutionary puzzle. Altruism, the ] capacitytoperformcostlyactsthatconferbenefitsonothers,isatthecoreofcoop- E erative behavior. Behavioral experiments show that humans have a predisposition P . tocooperate withothers and topunish non-cooperators atpersonal cost (so-called o strongreciprocity)which,accordingtostandardevolutionarygamearguments,can i b not arise from selection acting on individuals. This has led to the suggestion of - q group and cultural selection as the only mechanisms that can explain the evolu- [ tionary origin ofhuman altruism. Weintroduce anagent-based model inspired on 3 the Ultimatum Game, that allows us togo beyond the limitations of standard evo- v lutionary game theory and show that individual selection can indeed give rise to 3 2 strongreciprocity. Ourresultsareconsistentwiththeexistenceofneuralcorrelates 0 offairnessandingoodagreement withobservations onhumansandmonkeys. 3 0 4 0 Keywords: Strongreciprocity, Individual selection, Evolutionary theories, / o Behavioral evolution, Evolutionary gametheory i b - q : v i X r a 1Correspondingauthor.Phone:+34-916249411.Fax:[email protected] 1 Introduction Eversince Darwinfirstfaced this problem (Darwin, 1871; Gould, 2002), the aris- ing of human cooperation has been a subject of intense debate within the frame- work of evolutionary theories. Cooperation has been linked to altruism, which canbedefinedasthecapacity toperform costly acts thatconfer benefitsonothers (Fehr and Fischbacher, 2003). Previous theoretical approaches to altruism have shown that in many instances altruistic behavior is not truly so, in so far as they yield benefits for the altruist in the future. This is the case when the recipients of thealtruistic actarerelatives, wellunderstood withinkinselection theory (Hamil- ton,1964). Altruismintheabsenceofkinrelationships hasalsobeenexplained in termsofrepeatedinteractionleadingtocooperation (AxelrodandHamilton,1981; Trivers,1971),indirectbenefitthroughreputationgains(LeimarandHammerstein, 2001; Milinski et al., 2002; Nowak and Sigmund, 1998) or costly signalling the- ories (Gintis et al., 2001). However, recent behavioral experiments show that hu- mans can perform altruistic acts when interactions are anonymous and one-shot, i.e., in conditions which exclude all the explanations proposed so far (Fehr et al., 2002; Fehrand Ga¨chter, 2002; FehrandRockenbach, 2003; Henrichetal.,2001). Indeed, it has been observed that individuals are ready to punish non-cooperators (altruisticpunishment)aswellastorewardcooperativebehavior(altruisticreward- ing)evenwhendoingsowillnotproduce anybenefitforthepunisher orrewarder. Thissetofbehaviors hasbeentermedstrong reciprocity (Fehretal.,2002; Gintis, 2000)and,assuch,ithasbeenproposedasaschemaforunderstanding altruismin humans(FehrandFischbacher, 2003;Gintisetal.,2003). Substantialevidenceinfavoroftheexistenceofstrongreciprocitycomesfrom experiments using the so-called Ultimatum Game (Gu¨th et al., 1982), and from agent-based models (Bowles et al., 2003b; Bowles and Gintis, 2004; Boyd et al., 2003)[see(FehrandFischbacher, 2003;Gintisetal.,2003)forsummaries]. Inthe UltimatumGame,under conditions ofanonymity, twoplayers areshownasumof money, say 100 . One of the players, the “proposer”, is instructed to offer any e amount,from1 to100 ,totheother, the“responder”. Theproposercanmake e e only one offer, which the responder can accept or reject. If the offer is accepted, the money is shared accordingly; if rejected, both players receive nothing. Since thegameisplayedonlyonce(norepeated interactions) andanonymously (norep- utationgain),aself-interested responderwillacceptanyamountofmoneyoffered. Therefore,self-interested proposerswilloffertheminimumpossibleamount,1 , e which will be accepted. To be sure, this is a backward-induction way of reason- ing, which leads to the conclusion that the subgame-perfect Nash equilibrium is the relevant one. However, the Ultimatum game has manyNash equilibria, which can play arole inthe results wereport below (see, e.g., Samuelson, 1997, orGin- 2 tis, 2000, for complete game-theoretical discussions on this issue). We will come back to this question in Sec. 6. Notwithstanding, in actual Ultimatum Game ex- periments with human subjects, average offers do not even approximate the self- interested prediction. Generally speaking, proposers offer respondents very sub- stantial amounts (50 % being a typical modal offer) and respondents frequently reject offers below 30 %. Most of the experiments have been carried out with universitystudentsinwesterncountries, showingalargedegreeofindividual vari- ability butastriking uniformity between groups inaverage behavior. Arecent ex- periment (Gu¨thet al., 2003) used newspaper readers inorder to haveapopulation with broader characteristics and background, finding qualitatively similar results. Interestingly, alarge study in15small-scale societies (Henrich etal.,2001) found that, in all cases, respondents or proposers behave in a reciprocal manner. Fur- thermore,thebehavioralvariabilityacrossgroupswasmuchlargerthanpreviously observed: while mean offers in the case of university students are in the range 43%-48%,inthecross-cultural studytheyrangedfrom26%to58%. The fact that indirect reciprocity is excluded by the anonymity condition and that interactions are one-shot allows one to interpret rejections in terms of strong reciprocity(Fehretal.,2002;Gintis,2000). Thisamountstoconsideringthatthese behaviors aretrulyaltruistic, i.e.,thattheyarecostlyfortheindividual performing them insofarastheydonotresultindirectorindirect benefit. Asaconsequence, we immediately face an evolutionary puzzle: the negative effects of altruistic acts must decrease the altruist’s fitness as compared to the that of the recipients of the benefit,ultimatelyleadingtotheextinctionofaltruists. Indeed,standardevolution- arygametheoryargumentsappliedtotheUltimatumGameleadtotheexpectation thatinamixedpopulation, punishers (individuals whoreject lowoffers)have less chance tosurvive than rational players (indivuals whoaccept anyoffer) and even- tuallydisappear(PageandNowak2000,2002). Althoughmuchattentionhasbeen devotedtothisissuebyresearchersindifferentsaspectsofevolutionary theory,the problemisyetfarfromunderstood (Bowlesetal.,2003a;Hammerstein,2003;Vo- gel,2004) Todate,theonlywayouttothisdilemmaseems, followingtheoriginal suggestion of Darwin (Darwin, 1871), to invoke group and cultural selection to compensate forthenegativeeffectsthatreciprocity isassumedtohaveonindivid- uals(Bowlesetal.,2003b, Boydetal.,2003;Hammerstein,2003). 2 One parameter model Inordertoassessthepossibleevolutionaryoriginsofthesebehaviors,weintroduce andanalyzehereadrasticallysimplifiedmodel. ImagineapopulationofN players oftheUltimatumGamewithafixedsumofmoneyM pergame. Randompairsof 3 playersarechosen,ofwhichoneistheproposerandanotheroneistherespondent. In itssimplest version, wewillassume that players are capable ofother-regarding behavior (empathy); consequently, in order to optimize their gain, proposers of- fer the minimum amount of money that they would accept. Every agent has her own, fixed acceptance threshold, 1 ≤ ti ≤ M (ti are always integer numbers for simplicity). Agents have only one strategy: respondents reject any offer smaller than their own acceptance threshold, and accept offers otherwise. Although we believe thatthisisthewayinwhich‘empathic’ agentswillbehave, inorder notto hinderotherstrategies apriori, wehavealsoconsidered thepossibility thatagents have twoindependent acceptance and offer thresholds. Aswewillsee below, this doesnotchangeourmainresultsandconclusions. Moneysharedasaconsequence of accepted offers accumulates to the capital of each of the involved players. As our main aim is to study selection acting on modified descendants, hereafter we interpret this capital as ‘fitness’ (here used in aloose, Darwinian sense, not in the more restrictive one ofreproductive rate). Afters games, the agent with the over- all minimum fitness is removed (randomly picked if there are several) and a new agent is introduced by duplicating that with the maximum fitness, i.e., with the same threshold and the same fitness (again randomly picked if there are several). Mutation is introduced in the duplication process by allowing changes of ±1 in the acceptance threshold of the newly generated player with probability 1/3 each. Agents have nomemory (i.e., interactions areone-shot) and no information about otheragents(i.e.,noreputation gainsarepossible). Tworemarksaboutourmodelareinorderbeforeproceedinganyfurther. First, we need to clarify the motivation for our choice of simple, memoryless agents. It is likely that in early human societies some degree of repeated interaction and reputation effects was present, factors that we have excluded from our model. In thisrespect, westressthatwhatweareactually trying toshowisthatthebehavior observed intheexperiments quoted above(FehrandFischbacher, 2003), canarise by individual selection in theabsence ofprecisely those twoingredients, repeated interactions and reputation: In other words, the existence of repeated interactions andreputationisnotanecessaryconditionfortheselectionofaltruistic-likebehav- iors at the individual level. In that case, actual circumstances of human evolution would reinforce the tendency to the appearance of altruism. The fact that similar results are found in Ultimatum game experiments in a wide range of small scale societies (Henrich et al., 2001) suggests that our conclusions will have to be kept in mind when dealing with early human behavior, as the relevance of these two influences is largely different in the studied societies. Second, we want to stress thatourmutationrate,whichwechoosesomewhatlargetoenhancethefluctuation effects (seerelated commentsinSec.5below), shouldnotbeunderstood fromthe genetic viewpoint, but rather from the phenotypical viewpoint. Indeed, the inher- 4 itance of an acceptance threshold like the one we are proposing may perfectly be also affected by cultural transmission, and it is therefore subject to a large indi- vidual variability. Observations reported in the literature (Fehr and Fischbacher, 2003; Gintis et al., 2003) support this great variability. On the other hand, it has to be borne in mind that even if the mutation rate may seem large, mutations are small, withrelative changes oftheorder of1/100 inthe acceptance threshold. We believe that such changes from parent to child are actually very likely, and hence ourchoiceforthemutationrate. 3 Results Figure1showsthatstrongreciprocity, intheformofaltruistic punishment, canbe selected for at the individual level in small populations ranging from N = 10 to N = 10000 agents when selection is strong (s = 1). The initial distribution of thresholds rapidly leadstoapeaked function, withtherangeofacceptance thresh- oldsfortheagentscoveringabouta10%oftheavailableones. Theposition ofthe peak (understood as the mean acceptance threshold) fluctuates during the length of the simulation, never reaching a stationary value for the durations we have ex- plored. The width of the peak fluctuates as well, but in amuch smaller scale than the position. At certain instants the distribution exhibits two peaks (see distribu- tion at 7.5 million games). This is the mechanism by which the position of the peak moves around the possible acceptance thresholds. Importantly, the typical evolution wearedescribing doesnotdepend ontheinitialcondition. Inparticular, a population consisting solely of self-interested agents, i.e., all initial thresholds are set toti = 1, evolves in thesame fashion. Thevalue M ofthe capital atstake in every game is not important either, and increasing M only leads to a higher resolution ofthethreshold distribution function. The success of reciprocators does not depend on the selection rate (although the detailed dynamics does). Figure 2 shows the result of a simulation with 1000 agentsinwhichtheremoval-duplicationprocesstakesplaceonceeverys = 10000 games. To show further that the initial conditions are irrelevant, for this plot we have chosen an initial population of self-interested agents. As we may see, the evolutionisnowmuchlessnoisy,andthedistributionisnarrower,becominghighly peakedandimmobileafteratransient. Thevalueofsatwhichthisregimeappears increases with the population size. The final mean acceptance threshold at which simulations stabilize depends on the specific run, but it is very generally a value between 40 and 50. Wethus see that the selection rate maybe responsible for the particulars of the simulation outcome, but it is not a key factor for the emergence ofstrongreciprocity inourmodel. Wenote,however,thattakingverylargevalues 5 for s or, strictly speaking, considering the limit s/N → ∞, does lead to different results. Seenextsection foradetaileddiscussion. 4 Discussion Amongtheresultssummarizedabove,theevolutionofapopulationentirelyformed by self-interested players into a diversified population with a large majority of al- truistsisthemostrelevantandsurprisingone. Wewillnowarguethattheunderly- ing reason for this is the presence of fluctuations (or noise) in our model. For the sake ofdefiniteness, letusconsider the cases = 1(agent replacement takes place after every game) although the discussion applies to larger (but finite) values of s as well. After one or more games, a mutation event will take place and a “weak altruistic punisher” (an agent with ti = 2) will appear in the population, with a fitness inherited from its ancestor. For this new agent to be removed at the next iteration so that the population reverts to its uniform state, our model rules imply thatthisagenthastohavethelowestfitness, thatistheonlyonewiththatvalueof fitness,andalsothatitdoesnotplayasaproposerinthenextgame(ifplayingasa responder theagentwillearnnothingbecauseofherthreshold). Inanyotherevent this altruistic punisher will survive at least one cycle, in which an additional one canappearbymutation. Notealsothatincasea“weakaltruistic punisher” ischo- sentoactasaproposer, sheearnsalargeamountoffitness,whichwouldallowher to survive for many death-birth cycles, and during those she could even accumu- latemorefitnessincasesheisselectedtoplayagainasproposer. Itisimportantto realizethatthisdoesnotimplyanyconstraintonthenumberoftimestheemergent weakpunisherispickedupasrespondent: inthatcase,anduntilasecondpunisher arisesfrommutation,actingasarespondentthepunisherwillsimplyearnnothing, while the selfish agent playing the role of proposer in that game would not earn the99fitness unitsshewould earnifshemetanother selfishagent. Therefore, the survival of the first punisher does not depend on (and it may actually be favored by) the number of times she acts as respondent, as one would expect in arealistic situation. The above discussion is in fact an example, something like a worst-case sce- nario for the s = 1 case, and one can easily imagine other ways a newly created punishermaysurvive. Ourintentionistoillustratethecrucialfactthatfluctuations (i.e., the fact that the recently appeared altruist is chosen to play or not, or that it is chosen to be removed if there are more than one with the lowest fitness, or other, selfish agents are not selected to play in one or several intervals) allow for survivalandgrowthofthepopulationofaltruists. Itisinterestingtonotethatinthe dynamics in which all players play against every other once, i.e., in the replicator 6 dynamics(seenextparagraphformoreonthis),theaveragefitnessearnedbyeach type of agent can be computed analytically as a function of the frequency of the types in the population. From that result, it is easy to find out the threshold value required for one type to have afitness advantage on the other. In particular, it can be shown that if a 3% of an initial ti = 1 population turns to ti = 2, the latter ones will outperform the originally self-interested agents. Note also that, in our model, it can also be shown that the number of times a particular agent is chosen to play is a random variable given by a Poisson distribution of mean s/N (and of standarddeviation ps/N,whichfors/N ≫ 1becomesnegligiblewithrespectto themean). Therefore,irrespectiveoftheirthreshold, someagentswillhaveplayed morethanothersandmayhaveaccumulated morecapital,subsequently beingless exposed to removal. All this scenario is what we refer to as ‘dynamics governed byfluctuations.’ In the context of the above discussion, it is very illustrative to compare our results with previous studies of the Ultimatum Game by Page and Nowak (Page and Nowak 2000, 2002). The model introduced in those works has a dynamics completely different from ours: following standard evolutionary game theory, ev- ery player plays every other one in both roles (proponent and respondent), and afterwards players reproduce with probability proportional to their payoff (which isfitnessinthereproductive sense). Simulations andadaptive dynamics equations showthenthatthepopulationendsupcomposedbyplayerswithfair(50%)thresh- olds. This is different from our observations, in which we hardly ever reach an equilibrium (only for large s) and even then equilibria set up at values different from the fair share. The reason for this difference is that the Page-Nowak model dynamics describes the s/N → ∞ limit of our model, in which between death- reproduction events the time average gain all players obtain is the mean payoff with high accuracy. We thus see that our model is more general because it has one free parameter, s, that allows selecting different regimes whereas the Page- Nowak dynamics is only one limiting case. Those different regimes are what we havedescribedasfluctuationdominated(whens/N isfiniteandnottoolarge)and the regime analyzed by Page and Nowak (when s/N → ∞). This amounts to saying thatby varying swecan study regimes farfrom thestandard evolutionary game theory limit. As a result, we find a variability of outcomes for the accep- tance threshold consistent withtheobservations inreal human societies (Fehr and Fischbacher, 2003;Gintisetal.,2003;Henrichetal.,2001). 7 5 Two parameter model To further confirm the differences between our approach and Page and Nowak’s one, wehave considered the same alternative as theydid, namely toassign agents anewstrategicalvariable,oi,definedastheamountofferedbyplayeriwhenacting as proponent, and subject to the same mutation rules as the acceptance threshold, ti. While Page and Nowak observed that in their setup, this modification of the modelledtofullyrationalplayers(i.e.,inourmodel,ti = oi = 1),exceptforfluc- tuations due to mutations. Figure 3 shows clearly that in our model the dynamics remains very complicated and equilibria are never reached within the duration of our simulations. Once again, this is due to the fact that the dynamics we propose does not removethe fluctuations ofthe payoff obtained bythe players asthelimit s/N → ∞does. Itisclearthatmanyotherchoices forthedynamics arepossible, aside from choosing different values for s. For instance, a certain percentage of thepopulation could bereplaced inreproduction eventsinstead ofjust theleastfit individual. Anotherpossibilitywouldbetheselectionofindividualstobereplaced with probability given by their fitness. Notwithstanding, our main point here is that our dynamics is far away from the replicator or adaptive ones, and the form we choose for the replacement is intend to make easier and faster to visualize the fluctuation effects. Our choice for the large mutation rate points in the same di- rection as well, i.e., helps amplify the effect of fluctuations. In this respect, the question arises as to the influence of such a large mutation rate in our results. To exclude any dependence of our main conclusion, namely the appearance of altru- istic punishers even inaninitially selfish population, on thevalue ofthis quantity, we simulated the same model for smaller mutation rates. Figure 4 shows clearly that even for mutation rates as small as 1/3000 the population is taken over by the altruistic punishers, although at a correspondingly larger time. Of course, the larger the mutation rate, the wider the histogram of the population, and therefore, forthesmallestvaluestheacceptancethresholddistribution isverysharplypeaked aroundthemeanvalue(seeinsetinFigure3). Forevensmallerrates,(oftheorder of 10−4 or similar genetic mutation rates) the amount of time needed for altruis- ticindividuals toestablish becomes exceedingly large, andoutofthescopeofour computingcapabilities. Webelievethatdifferentrulesforthedynamicswouldlead toqualitatively similarresults insofarastheydonotapproach PageandNowak’s (whichwecouldcalldeterministic) limit. 8 6 Conclusions In this paper, we have shown that altruistic-like behavior, specifically, altruistic punishment, may arise by means of exclusive individual selection even in the ab- senceofrepeatedinteractions andreputationgains. Ourconclusion isimportantin sofarasitisgenerally believedthatsomekindofgroupselection isneededtoun- derstandtheobservedhumanbehavior. Thereasonforthatisthatgametheoretical argumentsapparently showthataltruistsareatdisadvantage withrespecttoselfish individual. In this respect, another relevant conclusion of the present work is that perspectives andapproaches alternative tostandard evolutionary gametheorymay be needed in order to understand paradoxical features such as the appearance of altruistic punishment. Webeginbydiscussing thissecond conclusion andproceed tothefirstoneafterwards. As we have seen, in our model the effects of finite time between generations (moreprecisely, theeffectofkeeping sfinite) andofstochasticity playanontriv- ial role and sustain strong reciprocity (existence of players with ti > 1) even if acceptance andofferobeyindependent rules. Regardingthis,itisimportant tono- tice that the way fluctuations enter our model is directly through the evolutionary dynamics we propose. Other important effects of noise have been reported in the literature (Gale etal., 1995; Binmore and Samuelson, 1999) in which fluctuations areincludedintoareplicator dynamicsfortheUltimatumgametoaccountforim- perfections in the learning process. In our case, there is no learning at all (agents have no memory) and therefore the source ofnoise is the dynamics itself, i.e., the random differences between the number of games every player plays between se- lection events. Interestingly, randomness arising from finiteness of the population hasalsobeenshowntochangetheevolutionary stabilityofcooperation(Nowaket al., 2004). In arelated context, it has been recently reported that spatial structure, previously regarded as beneficial for the evolution of cooperation on the basis of results on the evolutionary Prisoner’s Dilemma, may in fact inhibit it (Hauert and Doebeli,2004). Finally,letusalsomentiontherecentresultsabouttheevolutionof strongaltruisminrandomlyformedgroupswhentheyexistformorethanonegen- eration (Fletcher and Zwick, 2004). All these unexpected and non trivial results, along withourpresent report suggest thatgeneral approaches, involving different, non-standard dynamics, beyond standard evolutionary game theory, and particu- larlycomputersimulations ofagentmodels,mayprovideinsights intotheissueof howcooperationarises. Interestingly,ithasbeenarguedthatempathy(orfairness), i.e.,thefactthatagentsofferwhattheythemselvesarepreparedtoaccept,doesnot ariseevolutionary onitsown(PageandNowak,2002). Whilethoseresultsarenot questioned, they have been obtained in the framework of adaptive dynamics. We believe, along the same line of reasoning we are presenting here, that the effect 9 of fluctuations as described in the previous section may be enough to originate and sustain fairness in finite populations, which would in turn justify our model from the game theoretical viewpoint. In this regard, an interesting question arises when one considers the possibility of observing similar behavior indilemma-type games(suchastheprisonner’s dilemma,seeAxelrodandHamilton,1981). Inthat kindofgames,theNashequilibrium structure ismuchsimplerthanintheUltima- tum game: Usually, they have only one equilibrium. It may then be that in those situations, departure from that equilibrium by individual selection alone without additional ingredients is much more difficult. Inother words, the existence of nu- merous Nash equilibria in the Ultimatum game may facilitate the creative role of the fluctuations in leading the population away from the self-interested type. It wouldbeinteresting toanalysethecaseofdilemma-type gamesinthelightofour findingshere. Workalongtheselinesisinprogress. Evolutionary explanations of strong reciprocity have been advanced in terms ofgene-culture coevolution (Bowlesetal.,2003b; BowlesandGintis,2004;Boyd et al., 2003; Gintis, 2003; Hammerstein, 2003; Henrich and Boyd, 2001). The underlying rationale isthat altruistic behavior leads tofitness disadvantages atthe individual level. But why must strong reciprocators have lower fitness than other members of their group? While alternative compensating factors (e.g., sexual se- lection)havebeensuggested(Bowlesetal.,2003a),ourresultsshowclearlythat,in thecontextoftheUltimatumGame,altruisticpunishment(FehrandGa¨chter,2002) may be established by individual selection alone. Our simulations are consistent withthelargedegreeofvariabilityamongindividuals(FehrandFischbacher,2003; Gintis et al., 2003) and among societies (Henrich et al., 2001), and reproduce the fact that typical offers are much larger than self-interested ones, but lower than a fair share. While in our model agents have other-regarding behavior (empathy), i.e., agents offer the minimum they would accept if offered to them, this is not a requisite for the emergence of strong reciprocators as the two-threshold simula- tions show. The population evolves by descent with modification and individual selection, as the model does not implement cultural (other than parent-to-child transmission)orgroupselectionofanykind. Tobesure,wedonotmeanthatthese mechanisms are irrelevant for the appearance and shaping of altruism: what we are showing is that strong reciprocity (and hence altruism) may arise in their ab- sence. Observations ofstrongly reciprocal behavior incapucin monkeys (Brosnan and de Waal, 2003), where cultural transmission, if any, is weak, strengthens this conclusion. Further support for our thesis comes from reports of individual, pre- existent acceptance thresholds shown by neural activity measurements in (Sanfey et al., 2003). In this respect, neural mechanisms gratifying cooperation as those demonstrated in (Rilling etal., 2002) mayhave evolved toreinforce behaviors se- lected for at the individual level as we are suggesting. Of course, those results do 10