On the Common Ancestors of All Living Humans DouglasL.T.Rohde MassachusettsInstituteofTechnology November11,2003 Abstract years ago (Dorit, Akashi, & Gilbert, 1995), a rangethat wasmorerecentlynarrowedto35,000(cid:150)89,000yearsago Questions concerning the common ancestors of all (Keetal.,2001). present-dayhumanshavereceivedconsiderableattention Nevertheless, an individual’s strictly maternal and oflateinboththescienti(cid:2)candlaycommunities. Princi- strictlypaternallinesarejusttwoofavastnumberofpos- pally, this attention has focusedon ‘MitochondrialEve,’ siblepathsbackthroughhisorherancestors. Whatifwe de(cid:2)nedtobethewomanwholiesatthecon(cid:3)uenceofour adopt a more common-sense notion of ancestry that in- maternalancestrylines,andwhoisbelievedtohavelived cludesancestorsreachablealonganypathofsuccession, 100,000(cid:150)200,000 years ago. More recent attention has usingbothmothersandfathers? It seems likelythat our beengiventoourcommonpaternalancestor,‘YChromo- mostrecentcommonancestor(MRCA)underthisbroader some Adam,’ who may have lived 35,000(cid:150)89,000years de(cid:2)nitionwillbemuchmorerecentthaneitherMitochon- ago. However,ifweconsidernotjustourall-femaleand drial Eve or Y Chromosome Adam. Unfortunately, the all-male lines, but our ancestors along all parental lines, age of our MRCA cannot as easily be estimated on the it turns out that everyoneon earth may share a common basis of genetic information because the relevant genes ancestorwhoisremarkablyrecent. arenotpassedfromparentto childwith onlyoccasional This studyintroducesa large-scale, detailedcomputer mutations but are, rather, the product of recombination. model of recent human history which suggests that the Asa resultofrecombination,agivengenemaynotpass commonancestorofeveryonealivetodayverylikelylived from parent to child. In fact, an individual’s DNA may between 2,000 and 5,000 years ago. Furthermore, the retain none of the genes speci(cid:2)c to a particular ancestor model indicates that nearly everyone living a few thou- wholivedmanygenerationsinthepast. Theseadditional sand years prior to that time is either the ancestor of no complications make accurately dating the MRCA or re- oneorofalllivinghumans. constructingotherdetailsofpopulationhistoryontheba- sisofourgenesextremelydif(cid:2)cult,ifnotimpossible(Hey 1 Introduction &Machado,2003). However, alternative methods may be able to answer Advances in genetics have sparked interest in our com- thisquestion. Someresearchershaveproducedestimates monancestors,theindividualsfromwhichallpresent-day of the age of our MRCA by means of the theoretical humans descend. Initial interest focused on the topic of analysis of mathematical models. Building on work by ‘Mitochondrial Eve,’ who is de(cid:2)ned to be the most re- Ka¤mmerle(1991)andMo¤hle(1994),Chang(1999)ana- centfemaleancestorfromwhomall individualsdescend lyzedamodelthatassumesa (cid:2)xed-sizepopulation,with alongstrictlymaternallines(Cann,Stoneking,&Wilson, discrete and non-overlapping generations, and random 1987; Vigilant, Stoneking,Harpending,Hawkes, & Wil- mating. Thatis,eachchildistheproductoftwoparents, son, 1991). An approximatedate forMitochondrialEve randomlyselectedfromallmembersofthepreviousgen- of100,000to200,000yearsagohasbeenestimatedbased eration. Changshowedthat,inthismodel,themixingof onthesuccessivemutationsinmitochondrialDNA,which genesoccursquiterapidly. Infact,thenumberofgener- are passed downfrommotherto child. A similar analy- ationsbacktotheMRCAisexpectedtobeaboutlog of 2 siscanbeperformedonthestrictlypaternallinesofsuc- thepopulationsize.Withapopulationof6billionpeople, cession,usingtheYchromosome,whichispasseddown this model predicts that the MRCA is likely to occur in fromfathertoson, todeterminetheapproximatedateof justover32generations,or800(cid:150)975years. Thissuggests ‘Y Chromosome Adam.’ This date was originally esti- thatourall-pathsMRCAmaybeexceptionallyrecent. mated very loosely to fall between 27,000 and 270,000 But Chang was well aware of the limitations of the (cid:3)Workinprogress.Donotcite. simple model he analyzed. (cid:147)What are the signi(cid:2)cance ROHDE COMMONANCESTORSOF ALLLIVING HUMANS of these results? An applicationto the worldpopulation Chang’s,aresomevariantofaWright-Fishermodel,with of humans would be an obvious misuse...An important discretegenerationsandparentsselectedatrandomfrom sourceof inapplicabilityof the modelto this situation is the preceding generation (Nordborg, 2001). Because theobviousnon-randomnatureofmatinginthehistoryof computer simulations are tested empirically rather than mankind.(cid:148) (pg. 1005) There are many factors that limit throughtheoreticalanalysis, they are notsubjectto such therandomnessofhumanmating.Firstofall,clearly,are constraints. sex differences, but Chang did address this, noting that However, there are practical limits to the complexity addingdistinctsexestothemodelwouldnotcauseasub- ofacomputersimulation. Oneisthematterofcomputa- stantial change in the estimate. Another factor is mar- tionalef(cid:2)ciency.Amodelcannotbesocomplexthatrun- riage. Onceacouplehasonechild,theyarelikelytore- ningitrequiresanunreasonableamountoftimeorspace. maintogetherastheyproducemorechildren. Moreover, A more signi(cid:2)cant limitation, from a scienti(cid:2)c perspec- broadersociologicalandgeographicfactorsmayhavestill tive,mustbeplacedonthenumberoffreeparametersin moreprofoundeffects. Inshort,althoughwearebecom- the model. Ideally, for the results of a model to be reli- ing increasingly panmictic, humans groups have tended able, any free parameters should be constrained by his- towards a high rate of endogamy, (cid:2)nding mates almost toricaldata,suchasstatisticsonactualbirthormigration exclusively within the local population and social class, rates. However,muchoftherelevantdataforthecurrent onlyoccasionallytranscendingbarriersofgeography,lan- modelsconcerneventsoccurringthousandsofyearsago guage,race,andculture. and cannot be obtained with any accuracy. In this case, It seems likely that these restrictions on the random- the parameters must be varied within the range of plau- nessofhumanmatingmaydramaticallydecreasetherate siblevaluestoobtainboundsonthemodel’spredictions. ofancestralmixinginthemodel.Asaresult,thetruedate Amodelwithtoomanyfreeparameters, especiallyones oftheMRCAcouldbethousandsortensofthousandsof unconstrainedbyempiricaldata,willhavereducedpower years ago, rather than just hundreds. Thus, an obvious and will be dif(cid:2)cult to study. Therefore, a good model nextstepistotestthispossibilitybyexpandingthemodel must be complexenoughto includerelevantfactors, but toincludesomeoralloftheseconstraints.Unfortunately, notoverlyburdenedbyirrelevantones. conducting a theoretical analysis of a more complicated Thisstudyexploresaprogressionofthreemodels.The mathematicalmodelwouldbeverydif(cid:2)cult. Analterna- (cid:2)rstextendsChang’sresultstoaworldconsistingof(cid:2)ve tiveapproachistoimplementacomputersimulation.The more or less panmictic islands, or continents, with only principaladvantageofacomputersimulationisthatitcan occasionalmigrantsbetweenanypairofcontinents. The bearbitrarilycomplex. However,evengiventhespeedof secondmodel,discussedinSection3arrangestheislands today’scomputer,ef(cid:2)cientlysimulatingtheancestralhis- in a graph that roughly re(cid:3)ects the topology of the ma- toryofapopulationwhosesizeisevenclosetothescale jor worldcontinents. The(cid:2)nal model,discussed in Sec- of humanity is non-trivial. Furthermore, because a non- tion4, is amoredetailedsimulationoftheactualworld, randommodelwillnecessarilyinvolvenumerousparam- with migration routes and dates based on historical data eters that cannot be adequately constrained by available orprehistoricestimates. data,thesimulationmusttypicallyberunmanytimes to exploretheconsequencesofvariousparametersettings. Thisstudyinvolvestheimplementationandanalysisof severallarge-scalecomputermodelsofrecenthumanhis- 2 Model A: Fully-connected tory.Themodelssimulateindividualhumanlives,includ- ing life span, birth rate, choice of mates, and migration, continents andthedatatheyproduceisanalyzedtoobtainmoreac- curate estimates of the date of our most recent common The(cid:2)rstmodel,A,isquiteabstractbutincorporatessev- ancestor. Given what seem to be reasonable parameter eral levelsof detail beyondthose foundin most Wright- choices,the(cid:2)nal,mostdetailedmodelpresentedherepre- Fisher models. Themodeltypicallystarts between5000 dictsthatourmostrecentcommonancestorprobablylived and20000BCandrunstothepresentday,whichistaken between2000and5000yearsagoandthatnearlyevery- to be the year 2000 AD. As the model runs, it simu- onealivepriortoafewthousandyearsbeforethatarethe lates important details in the lives of individual people, ancestorsofeithernooneorofeveryonealivetoday. knownassims, includingtheirlifespans,possiblemigra- tions, choice of mate, and production of offspring. As the model runs, it recordsthis informationin a series of 1.1 Modelinghuman genealogy largecomputer(cid:2)les.Asecondprogram,discussedinSec- Mathematicalmodelsofhumangenealogymustbequite tion2.2,tracesancestrallinesthroughthisdatato(cid:2)ndthe simple if their analysis is to be possible. Most, like commonancestors. 2 ROHDE COMMONANCESTORSOF ALLLIVING HUMANS 2.1 Detailsofthe model contiguous, with no substantial geographical barrier to migration, so they will be considereda single continent, 2.1.1 Lifespan along with Africa, North America, and South America. Thepresentmodelsdonotassumediscrete,uniformgen- Indonesia, Australia, and Oceania are, taken together, erations. Each sim is born in a certain year and has a somewhatmoredif(cid:2)culttomodel,asthereisclearlysub- particular life span. The maximum age of any sim was stantialinternalstructure.Forthepurposesofthe(cid:2)rsttwo setto100,asitseemshighlyunlikelythatanyonewould models, they will be considered a single continent, but live, let alone father children, beyondthat age. The age willbedealtwithmoreappropriatelyinthethirdmodel. ofsexualmaturitywastakentobe16yearsforbothmen The models’continentsare dividedinto countries, ar- andwomen.Anyonewhowouldhavediedbeforethatage ranged in a grid. These re(cid:3)ect major tribal, ethnic, or couldnothaveproducedoffspringandisthusnotafactor languagegroups, with both geographicand cultural bar- forthepurposesofthisstudy.Therefore,onlythelivesof rierstointermarriage. Countriesare,inturn,dividedinto thosedestinedtoatleastreachadulthoodweresimulated. towns. These do not necessarily represent towns per se, Asaresult,thepopulationsizesdiscussedthroughoutthis buttherelevantsocialunitfromwithinwhichmostpeo- paperareeffectivelysomewhatlargerthanstatedbecause ple(cid:2)ndmates. Thus,atownmayactuallyre(cid:3)ectaclan, theydonotincludeanychildren. a rural county, or even a particular social class within a Otherwise,theprobabilitythatanindividualdiesatage largergroup.Thetownswithineachcountryareassumed s,conditionalonnothavingdiedbeforeages,isassumed to bein relativelyfrequentcontact with one anotherand to follow a discrete Gompertz-Makehamform (Pletcher, arenotinanyparticulargeographicarrangement. 1999): Notallhumanscon(cid:2)nethemselvestoasinglelocation throughouttheirlivesandacriticalfactorinthemodelis p(s)=(cid:11)+(1(cid:0)(cid:11))e(s(cid:0)100)=(cid:12) therateatwhichpeoplemigratetodifferentplacesinthe world. Although it seems likely that many people, and Inthisequation,(cid:12)isthedeathrate.Ahigherdeathrate perhapsthe vast majority historically, live out theirlives resultsinshorterlifespansonaverage,althoughtheresult closetowheretheywereborn,variousformsofmigration isnotlinear. The(cid:11)parameteristheaccidentrate,which leadtothegradualspreadofancestrallineagesoverlong can beadjustedto re(cid:3)ect the probabilitythat an individ- distances. When men and women fromdifferentgroups ualofanyagediesofunnaturalcauses. Withanaccident marry,oneofthem,oftenthewifebutsometimesthehus- rate of 0.01 and a death rate of 10.5, this formula quite band, moves to the other’s community. Merchants, sol- closely models the life span data for the U.S. between diers,andbureaucrats,whoaretypicallymale,sometimes 1900 and 1930 (U.S. National Of(cid:2)ce of Vital Statistics, travelwidely,potentiallyfatheringchildrenfarfromtheir 1956). To accountfor historically shorterlife spans due placeofbirth. And,occasionally,largegroupsofpeople to poor nutrition, medicine, and so forth, the death rate, haveconqueredorcolonizednewareas. (cid:12),wasraisedto12.5forthepurposesofthemodel. This In terms of realism, it would certainly be desirable to producesanaveragelifespanof51.8forthosewhoreach distinguishbetweentheseandotherspeci(cid:2)ctypesofmi- maturity. grationinthemodel. However,doingsowouldintroduce Thepercentageofmalesbornintothepopulationwas manynew parameters, for whichwe are unlikelyto (cid:2)nd set at 50%. Itis truethatthe actualpercentageofmales suf(cid:2)cientdata.Therefore,themodelusesasimpli(cid:2)edmi- andfemalesreachingadulthoodmaydiffersomewhatdue grationsystem,inwhicheachpersoncanmoveonlyonce toinfanticidecoupledwiththefact thata slightlyhigher inhisorherlife. Eachsim is borninthetownin which percentageofnewbornsaremalethanfemale(Davis,Got- his or her parents, or at least mother, lives, but then has tlieb, & Stampnitzky,1998). But this probablydoes not a chance to migrate to a different continent, country, or havemuchbearingontheresultsofthemodel.Andwhile townpriortoadulthood.Henceforth,thatpersoncanpro- it is true that womentendto live longer,the life spanof ducechildrenonlywithotherinhabitantsofhisorhernew women past child-bearingage is also not relevant to the town,provideditcontainspotentialmates. outcomeofthemodel. Therefore,forsimplicity,thelife spansofmalesandfemalesweregeneratedusingthesame As is the case in human mating patterns (Fix, 1979), distribution. the rate of exogamy decreases substantially with larger group size in the models. Adams and Kasakoff (1976) foundthat,acrossavarietyofhumansocieties,therewas 2.1.2 Migration a recognizable threshold in group size at around a 20% Themodelsareorganizedintothreestructurallevels:con- exogamyrate,althoughthesizesofthesegroupsdiffered tinents, countries, and towns. The continents represent asafunctionofpopulationdensity. This(cid:147)natural(cid:148)group physically separated land masses that are likely to have size is taken here to be that of the town. The Change- very low rates of inter-migration. Europe and Asia are TownProbparametercontrolsthepercentageofsimswho 3 ROHDE COMMONANCESTORSOF ALLLIVING HUMANS parentalage. Inmost ofthesimulations,the majorityof thesimsusingaportarebornlocally,initssourcecountry, whileaproportionofportusers,governedbytheNonLo- calPortProbparameter,aredrawnfromrandomcountries withinthecontinent,includingthesourcecountry. These long-distancemigrantsmight,forexample,bemerchants. ANonLocalPortProbof0meansthatallmigrantsareborn locally. Avalueof100%confersnospecialadvantageto thesourcecountry. Migrants using a port arrive in a random town within thedestinationcountry.However,theythenhavetheusual small chance of migrating to a new country within that continent. A sim can use at most one port in his or her lifetime. Model A, depicted in Figure 1, consists of (cid:2)ve conti- nentsarrangedinafully-connectedgraph. The(cid:2)vecon- tinentsareeachcomposedof60countries,with80towns per country,for a total of 24,000towns. Each continent is connected to every other one, with ports lying at the corners. Thisisnotmeanttobeanaccuratedepictionof the world by any means, but is an extensionof the pan- Figure1: ModelA: Afullyconnectedstructuredmodel. mictic modeltoasimpleformofstructuredmodelsimi- Eachcontinentconsistsof60countrieswith80townsper lartothosethathavebeenstudiedpreviously(Nordborg, country. 2001). Our primary goal in studyingsuch a model is to gain a better understanding of its sensitivity to the vari- ousparametersandtoprovideabaselineagainstwhichto leave the townof their birthfor anothertown within the comparemodelswithmorerealisticgeography. samecountry. Inthecurrentmodels it rangesfrom20% downto1%. 2.1.3 Mating Thereisamuchlowerchancethatasimwillleavehis orherhomecountryforanothercountryonthesamecon- Alongwithmigration,therateofancestralmixingisalso tinent. The probability that this occurs is governed in dependenton how mates are chosen and on the age dis- the model by the ChangeCountryProbparameter, which tribution of the parents when children are born. In this ranges from 0.1% to 0.001%(1 in 100,000). The coun- respect,themodelwasimplementedfromtheperspective tries within a continent are arrangedin a gridand local- ofthemother. Theprogram(cid:2)rst determinestheyearsin ityalsoplaysaroleininter-countrymigration. InModel whichthemotherwillgivebirth,andthenafatherischo- A,showninFigure1,allcontinentscontain60countries senforeach child. The assumptionis madethatwomen in a 6 by 10 rectangle. In the (cid:2)rst two models, inter- givebirthbetweentheagesof16and40,inclusive,with countrymigrationinvolvesatwo-tieredsystem. Thema- anequalprobabilityofproducingachildineachofthese jorityof the sims leavinga countrytravelto a neighbor- years. Of course,somewomenmay producemanychil- ingcountry(includingdiagonalneighbors). Theremain- drenandotherswillproducenone,andsomemaydiebe- der travel to a randomlychosen countrywithin the con- foretheageof40. Aftertakingthislatterfactorintoac- tinent, includingtheneighboringcountries. Thefraction count,wecancontrolpopulationgrowthbyadjustingthe ofsimswhochooserandomlyisgovernedbytheNonLo- average number of children (who reach adulthood) per calCountryProb parameter. A value of 0 means that all woman. A value of 2.0 childrenper woman results in a inter-countrytravel is to neighboringcountries. A value stablepopulationsize. of100%meansthattraveltoallcountriesinthecontinent Once it has been determined that a woman will give isequallylikely. birth in a certain year, the father is chosen. If possible, Intercontinental migration takes place through ports. the fatheris always selected fromthe town in which the Portsleadfromasourcecountryinonecontinenttoades- motherlives.Itsometimeshappens,especiallyearlyinthe tinationcountryinanother.Therateofmigrationthrough simulationwhenpopulationsareloworwhenanewarea aportcanberegulatedandmonitored. Itisexpressedin is(cid:2)rst colonized,thattherearenosuitable fathersliving terms of migrants per generation, where a generation is inthesametownas awomanwho is tohave a child. In taken to be 30 years, as that is the approximateaverage thiscase,fathersaresoughtintheothertownswithinthe 4 ROHDE COMMONANCESTORSOF ALLLIVING HUMANS 35% Table1:Worldpopulationestimates,inmillions. Women Men (Models A and B) 30% Men (Model C) Year Population Women25% 1250000000BBCC 32 Men or 20% 10000BC 4 Percentage of 1105%% 521000000000BBBCCC 25505 5% 0% 0 2 4 6 8 10 12 14 allofthoselinesdieout,whichrarelyoccursbeyondthe Number of Children (cid:2)rst few generations. In the model empirically, we (cid:2)nd that33.31%offemalesand45.42%ofmalesactuallybe- Figure 2: Distributions of the number of children per come extinct. These values are just slightly lower than woman or man. Only childrenwho reach adulthoodare we would predict based on the distributionsin Figure 2. counted. Model C producesa slightlydifferentdistribu- Thus, females have the reproductive advantage in terms tionforthemen. of the likelihood their genes will survive for all future generations. In many cultures, sons are much preferred samecountry. over daughters. However, unless one’s sons are power- Thefatherofawoman’s(cid:2)rstchildischosenatrandom fulenoughtoprocuremultiplewives,it is actuallymore fromthemenwhoareatleastasoldasthewoman. The advantageous,forthepurposeofcreatinganenduringlin- prohibition against younger husbands was primarily for eage,toproducedaughters. computationalreasons,butitseemstobeafairlyreason- able, if not entirelyvalid, assumption. Thereis an addi- 2.1.4 PopulationGrowth tionalbiassuchthatmenaretwiceaslikelytobechosen iftheyarenotalreadymarried,inthesensethattheyhave Because life span was not manipulated, the growth rate alreadyproducedachildwithanotherwoman. Afterthe of the population was controlled by adjusting the aver- (cid:2)rst child, there is an 80% chance that the father of the age number of children per woman. As Chang’s results previouschildwillalsofatherthenextone,thussimulat- suggest, the size of the populationmay be an important ing marriage. There is a fundamental asymmetry in the determinerofthedateoftheMRCA.Given(cid:2)xedsizes,a sexes,inthatawomancanonlybe(cid:147)married(cid:148)tooneman, largerpopulationwill tend to have a less recent MRCA. althoughamancouldbemarriedtomorethanonewife,or But a larger population will also tend to have a greater atleastfatheringchildrenbymorethanonewoman. But number of migrants, thus potentially leading to a more there is a bias towards monogamousrelationships. Also recentMRCA.Theneteffectofpopulationsizeis,there- notethatwomencannotbearchildrenpasttheageof40, fore,dif(cid:2)culttopredict,butwillbetestedempirically. whilemencanfatherchildrenthroughouttheiradultlives. Ideally, the model should be capable of simulating a Figure2showsthedistributionofchildrenperwoman full-sizeworldpopulation. However,duetotheavailable andmaninthemodels.Thedistributionforwomenises- disk space for recording the necessary data, the models sentially binomial, with 19% producingno children and werelimitedtoamaximumpopulationof60millionsims only 2.8% producingmore than 5 children. The middle atanyonetime. Anaturalpopulationgrowthwas simu- barshowsthedistributionformeninthe(cid:2)rsttwomodels. lated up to a point and then the population was capped. It has greater variance than that for the women. Nearly Table1showstheworldwidepopulationsusedinthe(cid:2)rst 36% ofmen produceno adultchildren, while 8.6%pro- two models. These data are based on the estimates of duce more than 5 children. Thus, there is a higher per- McEvedyandJones (1978), with the most ancient num- centageof men thanwomen that produceno childrenor bers extrapolated. Population size was regulated by ad- many children, but relatively fewer men who produce a justingthebirthratetoachievegeometricgrowthbetween moderatenumberofchildren. thegiventargets. InmostofthemodelAandB simula- Forlackofabetterterm,wewillrefertosimswhohave tions,themaximumpopulationwas25million,reachedin nolivingdescendantsasextinct.Inotherwords,theseare theyear2000B.C,andthenmaintainedthereafter. Other sims whose lineage has died out. Clearly, anyone who simulationscontinuedtoamaximumlevelof50million. producesnochildrenisextinctupontheirdeath.Buteven The migration rates between towns and countries are thosewhoproducesomechildrenmaybecomeextinctif expressedasapercentageofbirths.Therefore,asthepop- 5 ROHDE COMMONANCESTORSOF ALLLIVING HUMANS ulationincreases, the total numberofmigrantsincreases ingthedescendantswouldbefairlysimple,exceptthatit proportionally. In Models A andB, the rate of theports requiresmemoryproportionaltothesquareofthenumber was(cid:2)xedtoachieveaparticularnumberofmigrantsper of living sims. With a maximum population of 50 mil- generation once the population had reached maximum lion, this would involve the computation and storage of size. However,priorto that point, proportionatelyfewer over300terabytesofinformation. migrantswouldbeusingtheports. Therefore, (cid:2)nding the common ancestors is not tractable using a straightforward approach. However, a methodwas developedtozeroin on thecommonances- 2.1.5 Initialization tors using an initial approximation followed by a series There is one remaining aspect of the model to be de- of re(cid:2)nements. This process begins by tracking the an- scribed, which is the method of initialization. Although cestrynotofalllivingsims,butofasmall,randomlyse- somesimulationswerestartedintheyear20000BC,oth- lected subset of them. Dependingon the availablecom- ers werestartedas recentlyas 5000BC ifa morerecent putermemory,therearetypicallybetween192and512of start date would not interfere with the results. In order theseindividuals,whoareknownastracers. Byworking to get things going, we need some initial sims. A sim- backwardsthroughtherecords,theancestryofthesetrac- ple approachmight be to create all of the initial sims in ers is determined. This is done by computing,for every thesameyear.However,inthatcase,theirchildrenwould othersim,abitvectorinwhichtheithbitisturnedonif formababyboomanditwouldtakesometimefortheage thatsimisanancestoroftheithtracer.Asidefromthefact distributionwithinthepopulationtostabilize.Unlessthat thattheithtracerautomaticallyhastheithbitturnedon, stableagedistributionisknowninadvance,therewillal- aparent’sbitvectorwillbethebit-wisedisjunctionofhis waysbesomeinstabilityintroducedbythecreationofthe or her children’s vectors. These bit vectors still present initialpeople. a heavy memory burden, but can be handled more ef(cid:2)- Therefore,thesimulationactuallybegins100yearsbe- cientlybystoringonlytheuniquevectors. fore the desired start date. An initial set of sims is gen- Ifa sim is notan ancestorof everyoneofthe tracers, erated,eachinarandomtownandeachbornatarandom thatsimcouldnotpossiblybea commonancestor(CA). timewithina40-yearwindow. Themodelisthenrunas However,ifasimisacommonancestorofallofthetrac- usual, with the initial sims startingto produceoffspring. ers,thereisahighprobabilitythatthesimisanancestor Althoughthepopulationdoesnothaveanaturalagepro- ofalargeproportionofthelivingsims.Suchancestorsare (cid:2)le initially,as therearenooldpeople,itquicklysettles referredtoaspotentialcommonancestors(PCAs).Unfor- intoanear-normaldistributionwithinthe(cid:2)rst100years. tunately,itisgenerallythecasethatthemostrecentPCAs Thepopulationwillroughlydoubleduringthese(cid:2)rst100 that are found in this (cid:2)rst backward phase are not actu- yearsasfewerpeopledieofoldagethanareborn. Thus, ally true CAs. Therefore, this superset of the CAs must thesizeoftheinitialpopulationisadjustedtoachievethe bere(cid:2)ned. desiredlevelattheendofthe100-yearperiod. The next step is to start with a set of the most re- cent PCAs and trace their lineage forwardthroughtime. Thisisdoneinmuchthesamewaythatdescendancywas 2.2 Findingcommonancestors traced in the backwardphase(cid:151)a sim’s ancestors are the A simulation with a maximum population of 50 million disjunctionof hisorherparents’ancestors. Inthis case, simswillinvolveatotalofapproximately1.2billionsims weeventuallydeterminewhichofthe mostrecentPCAs overitscourse. Asthemodelruns,itgenerates(cid:2)lescon- is an ancestor of each of the living sims. If one of the tainingthevitalstatisticsofeachsim,includinghisorher PCAs was an ancestor of all of the living sims, then we parents,sex,birthanddeathyears,andplaceofbirth,typ- areguaranteedtohavefoundthetrueMRCA.Otherwise, icallytotalingabout60gigabytesofcompresseddataper anewsetoftracersischosenandasecondbackwardpass trial. Althoughrunningthe simulationis relativelyeasy, isperformedtore(cid:2)nethesetofPCAs. analyzingthis genealogicaldata to identify the common Selecting the new set of tracers randomly would help ancestorspresentsasigni(cid:2)cantcomputationalproblem. a little bit, but not much. A more effective approach is Let us refer to all of the sims alive in the year 2000, totryto(cid:2)ndthesimswhoaredif(cid:2)culttoreach,meaning when the simulations end, as living sims. A true com- that they descend from the fewest number of the PCAs. monancestor(CA)is someonewhois anancestorofall Wealsoneedto(cid:2)ndadiversesetoftracers. Iftheyareall livingsims. Astraightforwardsearchforcommonances- dif(cid:2)culttoreachbecausetheyliveinthesameplace, the torswouldstartwiththelivingsimsandworkbackwards useofmorethanoneas atracer wouldberedundant. In intime,trackingforeveryothersim,whichoftheliving ordertosatisfy these constraints,the tracersare selected sims are his or her descendants. These descendants are inorder,withthenexttracerchosenbeingthelivingsim theunionofalldescendantsofhisorherchildren.Track- withthehighestscore,de(cid:2)nedasfollows: 6 ROHDE COMMONANCESTORSOF ALLLIVING HUMANS consistent,withtheMRCAandACAdateshavingastan- darddeviationlessthan10%ofthemean,andoftenunder scorei = 2(cid:0) xp;i+ t2Txp;t 2%. ThevarianceislargerfortheACApointandforthe pX2P (cid:0) P (cid:1) simulationswithearlierdates. The(cid:2)rst simulation,referredtoas A1, usedfairlylib- Inthisequation,iisthesimbeingconsideredasapos- eral parameters, as shown in the top row of Figure 3. sibletracer.P isthesetofPCAswhosedescendantswere Themaximumpopulationwas25million,reachedinthe tracked.Theindicatorvariablex is1ifsimiisnotade- p;i year4000BP.TheChangeTownProbwas20%,meaning scendantofPCAp,and0otherwise.T isthesetoftracers that80%ofthe sims marrywithin theirbirth town. The thathavebeenselectedthusfar. Thismethodessentially ChangeCountryProbwassetto0.1%,soabout1in1000 balancesthenumberofnewtracersthatarenotdescended simsleavetheirhomecountry,whichmayseemsomewhat fromeachofthePCAs,thusincreasingthediversityofthe high. Buttoputthisinperspective,withapopulationof newtracers. 25 million, there are about 48,000 people born in each Oncethesetracershavebeenchosen,theirancestorsare countryper generation. So, on average, a ChangeCoun- foundasinthe(cid:2)rststep.Inthiscase,simsareonlyidenti- tryProbof0.1%willresultin48peopleleavingacountry (cid:2)edasPCAsiftheyareancestorsofallofthenewtracers every30years,whichcertainlydoesnotseemexcessive. andalloftheoriginaltracers. Forthispurpose,theprior Moreliberalisthefactthat,insimulationA1,thereare PCA-statusofeverysimisstoredusingacompressedrun- no locality constraints on inter-country migration or the length encoding. The most recent PCAs are once again useofports.Migrantshaveanequalchanceoftravelingto selectedandtheirlineagestracedforwardthroughtime.It anycountrywithinthecontinentandcanuseaportfrom isusuallythecasethatoneofthesenewPCAsisactually anywherewithinthecontinent.ThePortRatewassetto10 aCA, whichmeanswe havefoundthe trueMRCA. Oc- migrantspergeneration,ineachdirection,whichisabout casionally,anadditionalsetofdif(cid:2)culttracersisrequired, 1migranteverythreeyears. withonemorebackwardandforwardphase. Working backwards in time from the date of the ThebarsontherighthalfofFigure3depictthecom- MRCA,theproportionofCAsinthepopulationincreases mon ancestry timelines. The dates are in years before graduallyuntil,eventually,everyoneiseitheraCAofall present,withthepresentlocatedontheright.Inthewhite ofthelivingsimsoristheancestorofnoneofthem,and region,therearenotyetanycommonancestors. Moving isthereforeextinct.Thus,apointwillbereachedatwhich backwardsintime,theMRCAisfoundattheborderbe- 100% of the non-extinct sims are CAs. This will be re- tweenthewhiteandgraybars,in1720BPinthiscase.In ferred to as the all common ancestors, or ACA, point. thegrayregionthereisanincreasingnumberofcommon Although this successive re(cid:2)nement approach does (cid:2)nd ancestorsuntilwereachtheACApoint,at2880BP.Inthe thetrueMRCA,itdoesnotnecessarily(cid:2)ndthetrueACA blackregion,allofthesimsareeitherextinctorcommon point,onlythepointatwhicheveryoneisapotentialCA. ancestorsofthelivingsims. Thus,inthiscase,thereisa However, the ACA point that appears in the same back- fairlyrapidtransitionbetweentheappearanceofthe(cid:2)rst wardphasethattheMRCAisfoundisnearlyalwaysthe CAandthepointatwhicheveryonealivetodaysharesthe correctone,orquiteclosetoit. Thiscanbeveri(cid:2)edwith samesetofancestors. additional re(cid:2)nement steps, which generally lead to no TheactualrateofthistransitionforoneoftheA1sim- furtherchange. ulations is shown in the red line in Figure 4. The small redmarkeratthebottomrightofthe(cid:2)guredenoteswhere theMRCAoccurred,at1800BP.Fromthatpointon,the 2.3 Results percentageofCAsinthepopulationgrowsslowlyat(cid:2)rst, A numberof simulations were conductedwith Model A reaching1%in1940BP,andthenveryrapidly,reaching undervariousparametersettings. Ofprincipalinterestis 50%in2160BPand99%in2400BP.Thenthereisarel- the date of appearance, working backwards, of the most ativelylongperiodduringwhichmost,butnotall,ofthe recent common ancestor (MRCA date) and the date at simsareeitherCAsorextinct. Ittakesanother570years which all of the non-extinct sims are common ancestors toreachthetrueACApoint,denotedbytheredmarkerat (ACAdate).Thesedateswillbemeasuredinyearsbefore thetopofthe(cid:2)gure. present (BP), wherethe present is takento be2000AD. It is likely that the notion of a relatively recent ACA WhenwerefertotheMRCAtime,itisthelengthoftime pointmayleadtosomeconfusion.Ifweconsideronlyan- betweenthepresentandwhentheMRCAwaslastliving. cestorswholivedpriortotheACApoint,aJapaneseanda Therefore, a longer MRCA time means that the MRCA Norwegiantodaysharetheexactsamesetofancestors.At livedlessrecently. (cid:2)rst glance this seems patently ridiculous. Certainly the Foreachsimulation,atleastthreetrialswereperformed Japanese and Norwegian have quite different genotypes andtheresultsaveraged. Ingeneral,thetrialswerequite duetoverydifferentancestry. Theconfusingfactisthat 7 ROHDE COMMONANCESTORSOF ALLLIVING HUMANS b Simulation Max Population (mil)ChangeTownProbChangeCountryProb NonLocalCountryProPortRate NonLocalPortProb All CommSoomn eAC CnocmoNemmsotmoo Crnoso nAm Anmcnecosentsr tAyo nrTsciemsetolirnse A1) 25 20% 0.1% 100% 10 100% A1b) 12.5 20% 0.1% 100% 10 100% A1c) 50 20% 0.1% 100% 10 100% A2) 25 20% 0.1% 100% 10 5% A3) 25 20% 0.1% 100% 1 100% A4) 25 20% 0.1% 5% 10 100% A5) 25 2% 0.1% 100% 10 100% A6) 25 20% 0.01% 100% 10 100% A7) 25 2% 0.01% 5% 1 5% A8) 25 2% 0.01% 5% 1 100% A9) 25 2% 0.01% 5% 10 5% A10) 25 2% 0.01% 100% 1 5% A11) 25 20% 0.01% 5% 1 5% A12) 25 2% 0.1% 5% 1 5% 12K 10K 8K 6K 4K 2K 0 Years Before Present Figure 3: Results of the Model A simulations with various parameter settings. The timelines are in years before present,withthepresentlocatedontheright.Inthewhiteregiontherearenotyetanycommonancestors.Theborder betweenthewhiteandgrayregionsistheMRCApoint,whentheMRCAdied.Theborderbetweenthegrayandblack regionsistheACApoint. 100% s r o t es 80% c n A n o 60% m m o C f 40% o e g Simulation A1 a nt Simulation B7 e 20% c Simulation C1 r e P 0% 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 Years Before Present Figure 4: Percentage of non-extinct sims who are commonancestors of everyoneliving in the year 2000for three representativetrials. Theverticalmarkersshowthedatesatwhichthecurvesreach0%and100%. 8 ROHDE COMMONANCESTORSOF ALLLIVING HUMANS bothofthesestatementsaretrue. AlthoughtheJapanese and Norwegian have the same set of ancient ancestors, theydidnotreceiveanequalhereditarycontributionfrom eachofthoseancestors. TheJapaneseowesasmallpro- portionofhisgeneticmakeuptopeoplelivinginnorthern EtouproepoeplseevlievrianlgthionuasnanddayroeuarnsdaJgaop,aann,dwahlialregtehperoopppoortsiiotne meter Change1089000%%% AABB1818 - ---11>>22 BA-->>22 --BA6677 iecsxoatnSrmsuimiedinueoerlfdaatbtfihlouyenr,tNshbeouArrtw1ionbengSlayienacindnt.ioAdTni1hs5ctur.s3imb,.utahtnieoiipnru.alnaTctheeisstthrpyeodimnoteasxwidimlilfufbemer A Date due to Para 567000%%% C population size, keeping the (cid:2)nal port rate the same in MR 40% t(eArm1bs)ocfamusiegsraanlmtsopsetrngoencehraantgioen,.wHhaillevidnogutbhleinpgopitu(lAat1iocn) Change in 2300%% McauRsCeAsaavnedryAsCliAghtti,mpeo.ssAibslywneo’lnl-sseigeniin(cid:2)csaimntu,liantciorenasBe7icn, Percent 10% 0% alargerpopulationcanactuallyleadtomorerecentdates NonLocalPortProb PortRate NonLocalCountryProb ChangeTownProb ChangeCountryProb Parameter underdifferentconditions. Thelargerpopulationhaslit- tledirecteffectontheancestrycoalescencetimebecause Figure5:ThepercentchangeintheMRCAdateresulting it occursaftertheMRCA lived. Themostimportantde- fromvariousparameterchangesforModelsAandB.The termineroftherateofspreadofalineageisnottheabso- redbarsindicatethepercentchangefromSimulationA1 lutenumberofpeoplewiththelineagebutthepercentage toeithersimulationA2,A3,A4,A5,orA6,dependingon ofpeople. Asthepopulationuniformlygrowsorshrinks thevariableinquestion.Thebluebarsindicatethepercent overtime,thispercentageisnotaffected. Theonlypoint changefromthelessconservativesimulationsA8(cid:150)A12to atwhichthetotalpopulationplaysaroleisatthetimeof themoreconservativeA7. theoriginalancestor. Atthatpoint,thepercentageofthe populationrepresentedbythatancestorisindeedafunc- tion of the populationsize. Thus, a larger population at the time that an ancestor lived would result in a longer delay for that person to become a CA. But a population thatgrowsuniformlyoncetheancestorhasdieddoesnot resultinasimilardelay. Largerpopulationsdo,however, tendtohavemoremigrantsacrossthemostdif(cid:2)cultbar- riers,resultinginafasterspreadoflineage.Inthecaseof simulations A1(cid:150)A1c, these effects are either minimal or 100% counteracting. Change 90% AAB118 ---1>>2 BA->22 --A667 Simulations A2(cid:150)A6 were similar to A1, but each ma- meter 80% B8-12 -> B7 nipulated a single parameter, applying a more conserva- Para 70% triavmeevtearl.ueStiomtuelsattitohneAse2nsliotwiveitryedofthteheNmonoLdoeclatolPtohratPtrpoab- Date due to 5600%% from100%to 5%, so most users of a portmust be born A tcsiinmrmeiaaetlss.le.siToRnuhetrehlcaseeetiMvcpeoeRurtCconetArsnyitmt.icmuThleahantaeingoedenfsfaAei6cn1t.2,Mo%tfhRetiCrhneicAsrwecadahssaaetnaeign1aet1nhi.d4se%AAquCCiintAAe- Percent Change in AC 12340000%%%% date are shown in the left-most bars in Figures 5 and 6, 0% respectively. NonLocalPortProb PortRate NonLPoacaralCmouentetrryProb ChangeTownProb ChangeCountryProb SimulationA3loweredthePortRatefrom10simsper generationto just one per generation. This resulted in a Figure6: The percentchangein the ACA date resulting 17.2%increaseintheMRCAtimeanda12.9%increase fromvariousparameterchangesforModelsAandB. in the ACA time. Thus, the model is not tremendously sensitive to migration rate by itself. Once a lineage has spreadthroughoutmostorallofacontinent,itonlytakes asinglenon-extinctmigranttospreadthatlineagetoan- 9 ROHDE COMMONANCESTORSOF ALLLIVING HUMANS other continent and even very low migration rates may resultinonlyshort-termdelays. Simulation A4 lowered the NonLocalCountryProb from100%to5%causingmostmigrationbetweencoun- triestobelocal.Thisreducestherateofadmixturewithin continents, resultingin a 24.4%increasein MRCA time and an 18.1% increase in ACA time. Simulation A5 re- ducedtheadmixtureratewithincountriesbyloweringthe ChangeTownProbparameter from 20% to 2%. This has a similar effect on the overall dates, also increasing the MRCA time by 24.4% and increasing the ACA time by 19.8%. Finally,simulationA6reducedtheChangeCoun- tryProb from 0.1% (1 in 1,000)to 0.01% (1 in 10,000). Thishasthegreatesteffectofthesingle-parametermanip- ulations,increasingtheMRCAandACAtimesby30.5% and28.0%,respectively. If these (cid:2)ve parameter changes have independent ef- Figure7: ModelB.Ahighlysimpli(cid:2)edworldmap. fects, we mightexpecttheneteffectofcombiningallof themtobeeitherthesumoftheirindependentadditiveef- fects orthe productoftheirmultiplicativeeffects. Ifthe rate at which lineage can spread long distances across effectswereadditive,itwouldresultinpredictedMRCA continents. Lineage can spread fairly rapidly if either andACAdatesof3570BPand5355BP,respectively. If theChangeCountryProbishigh,meaningtherearemore theeffectsweremultiplicative,thepredicteddateswould inter-country migrants or the NonLocalCountryProb is be 4533BP and6241BP, respectively. Theactual dates high, meaning that there may be only a few migrants ofthe combinedparameterchangesfromSimulationA7 but they are more likely to travel long distances. When are 4910 BP and 9790 BP. Thus, the effects of the pa- thereare bothfew migrantsandtheytend to moveshort rametersappeartobegreaterthantheirindependentaddi- distances, there is a much greater resulting effect on the tiveormultiplicativecombinationandwemightconclude MRCAdate. thatthereisinteractionbetweenthem.Thisisparticularly Interestingly,thesamedoesnotholdtruefortheACA truefortheACAdate,whichexperiencesagreaterchange date,showninFigure6.Infact,alloftheparametersseem (240% relative to simulation A1) than does the MRCA to interact in determining it. As a result, the ACA date date(186%). However,wedonotyetknowthenatureof becomes increasingly sensitive and the ratio between it thisinteraction. andtheMRCAdateincreaseswhenalloftheparameters SimulationsA8(cid:150)A12startwiththesameparameterval- areassignedmoreconservativevalues. ues as A7, but change each of the variables back to its less-conservativesetting. Thepointistotestthesensitiv- 3 Model B: Coarse real-world ityofthemodeltoeachparameterinthisnewpartofthe space. Thesensitivityismeasuredasthepercentchange topology inMRCAorACAfromthelessconservativesimulation, A8forexample,tothemoreconservativeA7.Ifaparam- Thefully-connectedworldofModelAwasaninteresting eterisactingindependentlyanditseffectsaremultiplica- forumtoexperimentwiththeparametersofthemodelbe- tive,weshouldexpecttoseethesamepercentagechange causeofitsresemblancetomoretraditionalstructuredco- in the blue bars in Figures 5 and 6 as we saw in the red alescencemodels. However,itclearlybearslittleresem- bars.Ifthebluebarsarehigher,itindicatesthatthemodel blancetotherealworld.ModelB,therefore,takesasmall ismoresensitivetotheparameterwhentheotherparame- steptowardsamorerealisticmodeloftheworld,usingthe tersaremoreconservative,suggestingthattheparameters map shown in Figure 7. The continents are intended to areinteracting. resemble,clockwisefromthelowerleft,Africa,Eurasia, In terms of MRCA, the PortRate, and the Change- North America, South America, and Australia/Oceania. TownProbappeartobeactingindependentlyoftheother ThecontinentsareinternallythesameasinModelA,ex- parameters. However, the NonLocalCountryProb, the cept that Eurasia is twice as wide as the others. There ChangeCountryProb, and to some extent the NonLocal- areonlyfourbidirectionalportsinthismodel,withSouth PortProb have greater effects in simulations A7(cid:150)A12. AmericaconnectedtoNorthAmericaandtheotherconti- Thisindicatesthattheseparametersareinteracting,prob- nentsconnectedtoEurasia.Weareinterestedprimarilyin ably with one another. These parameters all affect the theeffectthisstructurewillhaveonthespreadoflineages 10
Description: