Warding off the Dangers of Data Corruption with Amulet Nedyalko Borisov Shivnath Babu∗ DukeUniversity DukeUniversity [email protected] [email protected] Nagapramod Mandagere Sandeep Uttamchandani IBMAlmadenResearch IBMAlmadenResearch [email protected] [email protected] ABSTRACT orfirmwareaswellasmistakesbyhumanadministratorsaremore worrisome. Bugs in the hundreds of thousands of lines of disk Occasional corruption of stored data is an unfortunate byproduct firmware code have caused corruption due to misdirected writes, ofthecomplexityofmodernsystems. Hardwareerrors, software partialwrites,andlostwrites[14]. Bugsinstoragesoftware[11], bugs,andmistakesbyhumanadministratorscancorruptimportant OSdevicedrivers,andhigher-levellayerslikeloadbalancers[21] sourcesofdata.Thedominantpracticetodealwithdatacorruption and database software [6] have caused corruption and data loss. todayinvolvesadministratorswritingadhocscriptsthatrundata- Recenttrendsmakedatacorruptionmorelikelytooccurthanever: integritytestsattheapplication,database,file-system,andstorage • Productionuseoffairlynewdatamanagementsystems:Abug levels. Thismanualapproachistedious,error-prone,andprovides intheCouchDBNoSQLsystemcauseddatalossbecausewrites no understanding of the potential system unavailability and data werenotbeingcommittedtodisk[6]. Arecentbugtriggered lossifacorruptionweretooccur.WeintroducetheAmuletsystem byastoragesoftwareupdatecaused0.02%ofGmailusersto that addresses the problem of verifying the correctness of stored losetheiremaildata(whichhadtoberestoredfromtape)[11]. data proactively and continuously. To our knowledge, Amulet is • Use of large numbers of commodity “white-box” systems in thefirstsystemthat:(i)givesadministratorsadeclarativelanguage datacentersinsteadofmoreexpensiveservers.Thelowerprice tospecifytheirobjectivesregardingthedetectionandrepairofdata comesfromtheuseoflessreliablehardwarecomponentsthat corruption; (ii)containsoptimizationandexecutionalgorithmsto aremorepronetocorruptionandfailures[3,22]. ensurethattheadministrator’sobjectivesaremetrobustlyandwith • Moresoftwarelayersduetovirtualizationandcloudservices: leastcost,e.g.,usingpay-as-youcloudresources;and(iii)provides CustomersoftheAmazonSimpleStorageService(S3)haveex- timelynotificationwhencorruptionisdetected,allowingproactive perienceddatacorruptionwherethedatatheygotbackonreads repairofcorruptionbeforeitimpactsusersandapplications. We wasdifferentfromthedatatheyhadstoredoriginally[21]. describe the implementation and a comprehensive evaluation of Datalosscanhaveseriousconsequences.Ittookonlyoneunfortu- Amuletforadatabasesoftwarestackdeployedonaninfrastructure- nateinstanceoffile-systemcorruption(whichspreadtodataback- as-a-servicecloudprovider. ups), and the consequent loss of data stored by users, to put the CategoriesandSubjectDescriptors social-bookmarkingsiteMa.gnolia.comoutofbusiness[15]. Most systems have a first line of defense to corruption in the H.2.0[DatbaseManagement]:DataProtection formofdetectionandrepairmechanisms.Storingchecksums,both GeneralTerms atthesoftwareandhardwarelevels,isacommonmechanismused Management,Reliability,Verification todetectcorruption[18].Storingredundantdata—e.g.,intheform oferrorcorrectingcodes(ECC)orreplicas—aswellasduplication 1. INTRODUCTION ofwork—e.g., writingtotwoseparatehosts—lowersthechances ofdatalossduetocorruptionfrombitflips,partialwrites,andlost Datacorruption—wherebitsofdatainpersistentstoragediffer writes. Despitethesemechanisms,recentliterature[14]aswellas fromwhattheyaresupposedtobe—isanuglyrealitythatdatabase plentyofanecdotalevidenceshowthatproblemsduetocorruption and storage administrators have to deal with occasionally; often happen,andmorefrequentlythanexpected[3,22]. Aparticularly when they are least prepared [6, 11, 14, 17, 25, 30]. Hardware dangerousscenariothattheauthorsaswellasothers(e.g.,[15,17]) problems such as errors in magnetic media (bit rot), erratic disk- havecomeacrossinvolvesthepropagationofcorruptionfromthe armmovementsorpowersupplies, andbitflipsinCPUorRAM productionsystemtocriticalbackups. duetoalphaparticlescancausedatacorruption. Bugsinsoftware Thus, systems have developed a second line of defense in the ∗SupportedbyNSFgrants0964560and0917062 formofdata-integritytests(hereafter,tests). Atest: (a)performs checksinordertodetectspecifictypesofdatacorruption, and/or (b)repairsspecifictypesofdatacorruption. Table1listspopular testsfordetectingandrepairingcorruptionthatoccurindifferent Permissiontomakedigitalorhardcopiesofallorpartofthisworkfor systemlayersTestshavethefollowingcharacteristics: personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesare • Tests perform more sophisticated detection and repair of cor- notmadeordistributedforprofitorcommercialadvantageandthatcopies ruptionthanispossibleautomaticallyduringregularsystemop- bearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,to erationthroughmechanismslikechecksumsandRAID[25]. republish,topostonserversortoredistributetolists,requirespriorspecific • Barringfewexceptions,testshavebeendevelopedtoberunof- permissionand/orafee. flinewhenthesystemisnotservingaworkload. Ifaworkload SIGMOD’11,June12–16,2011,Athens,Greece. Copyright2011ACM978-1-4503-0661-4/11/06...$10.00. Test System Description Does Runs Repair Online ApplicationLevel Eseutil, Exchange Useschecksumsandstructurerules(knowledgeoflogicalschemaandphysicalproperties)tocheckthemailboxdatabasefor Yes No Isinteg Server errorsinmessages,folders,orattachments;checkswhetheralldatapagesarecorrectandmatchtheirchecksums[9] Tripwire Anyfile Createsadatabasewiththehashandattributesofthecontentofallfilesinthesystem.Checksformismatchbetweenthecurrent No No stateofthefilesandtheinformationstoredinthedatabase[26] par,par2 Anyfile Usesanerrorcorrectingcodetocreateparitydata.Checkswhetherthegivenfile’scontentmatchesitsstoredparitydata[20] Yes No DatabaseLevel myisamchk MySQL Useschecksumsandstructurerules(flags,tablemetadata)tocheckwhetherdatapagesandrecordsarecorrectandindexes Yes No pointtocorrectrecords.Thissuitecontains5distinctteststhatdoincreasinglyrigorousandtime-consumingchecks[16,25] DBVerify Oracle Useschecksumsandstructurerules(head&tailinfo)tocheckthedatabasecontentanddatapages.Cancheckbackups[19] No No db2dart DB2 Useschecksumsandstructurerules(pageheaders,propertiesofrecords)tocheckthedatabase,datapages,andrecords[7] Yes No Checkdb, SQL Usesstructurerules(pageheaders,propertiesofindexes)tocheckwhetherdatapagesarestoredcorrectlyandindexentries Yes No Checktable Server pointtothecorrectrecords[24] File-SystemLevel xfs_check, XFS Usesthefile-system’sjournal(logofoperationsdone)andverifiestheintegrityoftheinodehierarchyandthatthecontentof Yes No xfs_repair theinodesisinsyncwiththestoredfile-systemdata.Checkssuperblock,free-space,andinodemaps[28] fsck ext3, Usesthejournal(inext4)andfilestructures(inext3)tocheckthesuperblock,filepathnames,datablockconnectivity,andthe Yes No ext4 fileandinodereferencecounts[10] ScanDisk, FAT*, Usesfilestructurestocheckthediskheaders,filecontent,fileattributes,lostdiskspace,andfilecrosslinks[5] Yes No Chkdsk NTFS zpoolscrub ZFS Usesbuilt-inblockchecksumsandblockreplicationmechanismsintheZFSfile-systemtocheckfilecontentforerrors[29,30] Yes Yes StorageLevel Scrubbing Disks Useschecksumsstoredatthelevelofdiskmediablockstoverifythateachblock’scontentmatchesitschecksum[23] No No Scrubbing RAID UsesstoredparityinformationorreplicasattheRAIDlevel(insteadofperdisk)toverifythecontentofeachdatablock[14] Yes Yes Table1:Dataintegrityteststodetectandpossiblyrepairdatacorruptionatdifferentlevelsofthedatabasesoftwarestack DescriptionofExampleObjectivesinEnglish Optimization 1 Ifthemyisamchktestdetectscorruptioninthelineitemtableinmy Angel Parser Test Models Snapshot Manager MySQLOLTPDBMS,thenIwanttohaveimmediateaccesstoan Program Snapshot Store 2 o(Aldseerccuorrirtuypvtuiolnne-frraebeilviteyrspiaotnchofwtahseatpabplleietdhianttihseleesxst4thfianle-1syhsotuermotlhda.t Plans ObjePcatrisveedsOptimizerModels SCnoallpeschtoort SCnoallpeschtoort myproductionDBMSisusing. Iamafraidthatthepatchmayinad- vertentlycausedatacorruption.)Runthefsckfile-systemtestatleast Approved Snapshot onceeveryhour.Notifymeimmediatelyofanycorruptiondetected. Plans Descriptors Production Host 3 MyproductionDBMSrunsonanAmazonEC2m1.largehost.Ihave Orchestration Application/ Web Server thesameobjectivesasin1,butIamwillingtospendupto12dollars Snapshot Listener Database System perdayforadditionalresourcesontheAmazoncloudtomeetthese Snapshot objectives. Howrecentofacorruption-freeversionofthedatacanI Descriptors File-systems . . . FS1 FS2 FSn haveimmediateaccesstoifacorruptionweretobedetected? Plan Manager 4 Myobjectivesareacombinationof1and2,butIwantthetimein- Progress Snapshot V1 V2 . . . Vn tervalstobe30minutesinsteadof60. Iamcostconscious. What Updates Descriptors Host for Volumes minimumnumberofm1.smallEC2hostsshouldIrenttoruntests? Host Manager Testing Cloud Provider Access Resource Table 2: Examples of objectives that an administrator may Details Requests Host for haveregardingtimelydetectionandrepairofdatacorruption Cloud Manager Testing Cloud API changes the data concurrently with a test execution, the test may detect (and worse, fix) spurious corruptions. The work- Figure1:OverviewofAmulet’soptimizationandorchestration loadcouldalsoreturnincorrectresultsbecauseofmodifications madebythetest. Asoneexample,itisrecommendedthatthe different sources of data. All these data sources have to be kept file-systembeunmountedwhilerunningthefscktest. corruption-free to guarantee correct behavior, good performance, • Mostofthetestsareveryresource-intensive. andavailabilityofapplicationsrunningonthesoftwarestack. Thedatabaselevelhasdataintablesaswellasplentyofmeta- Becauseoftheabovecharacteristicsoftests,databaseandstorage data such as indexes, materialized views, and information in the administratorsoftenstrugglewithquestionsonwhenandwhereto databasecatalog. Databasesstoretheirdataandmetadataasfiles runtests.Iftheadministratorisnotproactiveinrunningtests,then, and directories in a file-system or directly as blocks on volumes. whencorruptionstrikeseventually,highsystemdowntimeanddata Thefile-systemlevelhasfilescontainingdatastoredbythedatabase loss(andpossibly,lossoftheadministrator’sjob)willresult. level, as well as metadata such as the directory structure, inodes Administratorsusuallyhavespecificobjectivesinmindforproac- (indexesstoringfile-to-blockmappings)andjournals(logofoper- tivedetectionandrepairofdatacorruption.Table2givesexamples ationsdone). Afile-system,inturn,storesitsowndataandmeta- ofsuchobjectives. Toourknowledge,nosystemtodayhelpsad- dataonavolume.Avolumeprovidesaninterfacetoreadandwrite ministratorsspecifyobjectivesliketheseeasily,andautomatesthe blocksofdata. Beneaththisinterface,thevolumemaybeaphysi- nontrivialtaskofrunningteststomeettheseobjectives.Theresult calblockdevice(e.g.,aharddiskorsolidstatedrive)oralogical isusuallyaconvolutedmixofadhocscriptsandtestingpractices entity(e.g.,representingstorageonanetworkedserveroracombi- withnobodyhavingaclearideaofthedowntimeanddatalossa nationofpartitionsfrommultipleharddisks). potentialcorruptioncancause. ProactiveTestingforDataCorruption:Testsareruntoverifythe 1.1 Amulet: ChallengesandOverview correctnessofdata.Table1listscommonly-usedtestsateachlevel Atypicaldatabasesoftwarestackthatproductionsystemsuseis ofthedatabasesoftwarestack. Forexample,MySQL’smyisamchk showninFigure1. Differentlevelsofthesoftwarestackmaintain suitecontainsfivedifferenttestsinvokedthroughdistinctinvoca- S Snapshot i Snapshot Received Tested Snapshot Test execution i Host1 S ,T S ,T S ,T S ,T S ,T S ,T ... 0 M 2 M 4 M 6 M 8 M 10 M Time 0 20 40 60 80 100 120 140 160 180 (min) S S S S S S S S S S S S S 0 1 2 3 4 5 6 7 8 9 10 11 12 Figure2: Actualexecutiontimeline,inminutes,ofatestingplaninAmuletforExample1fromTable2. Eachboxdenotesarunof themyisamchk(T )testontherespectivesnapshotS .Thehorizontalwidthofeachboxcorrespondstothetestexecutiontime. M i tion options: fast, check-only-changed, check (default), medium- sibly repair data corruption anywhere in the software stack. To check,andextended-check.Thesetestsapplychecksofincreasing useAmulet,asshowninFigure1,auserorapplicationsubmitsa sophisticationandthoroughnesstoverifythecorrectnessoftables declarativeAngelprogramthatreferencesoneormorevolumeson and indexes in the database. The checks include verifying page- theproductionsystem. ForeachvolumeV,theprogramspecifies: levelandrecord-levelchecksumsaswellasverifyingthateachin- (a)theteststoberunondatacontainedinV,and(b)theobjectives dexentrypointstoavalidrecordinthecorrespondingtable, and tobemet.ForvolumeV,Amulet’sOptimizerwillgenerateaneffi- viceversa. Thefsckandxfs_checktestsverifythecorrectnessof cientexecutionstrategy—calledatestingplan—usinganoptimiza- metadata and data in the ext3 and XFS file-systems respectively. tionalgorithmthatmaximizesorminimizesoneobjectivesubject Forexample,theyensureconsistencybetweenthefile-systemjour- tosatisfyingallotherobjectives. Amulet’sOrchestratorwillexe- nalandthedatablocks,andverifythatallthedatablockpointers cutethetestingplanautomaticallyandcontinuouslybyprovision- intheinodesarecorrect. ingtestinghostsandschedulingtestsonaresourceprovider. ThefirstchallengethatAmuletfacesishowtorunatestauto- Figure2showsanactualexecutiontimelineofatestingplanP matically. MostofthetestsinTable1cannotberunconcurrently foranAngelprogramcorrespondingtoExample1fromTable2. withtheregularworkloadontheproductionsystembecauseofper- PlanP usesonetestinghostthatrunsthemyisamchktestonsnap- formanceandcorrectnessproblems.Thetestscanconsumesignif- shotstakenfromtheproductionhost. Onesnapshotistestedevery icantCPUorI/Oresources. Thetestsmayalsohavetolocklarge 30minutes,andeachtesttakesaround20minutestocomplete.As amountsofdata,makingresponsetimesfortheproductionwork- wewillseeinSection7,thisplanminimizesexecutioncostwhile loadslowandunpredictable.Amuletaddressestheseproblemsus- meetingtheobjectiveofcontinuouslymaintainingatestedrecov- ingthefollowingthree-stepapproachtoruntestsautomatically: erypointforapast1-hourwindow.Thistestingplan,whilesimple, 1. Create snapshot: A snapshot is a persistent copy of a point- illustratesanumberofchallengesfacingAmulet. in-time version of the data needed for a test. Snapshots can Characterizingthetestingplanspace: Atestingplanhasmultiple betakenatthedatabase,file-system,orvolumelevels. Inthis aspects. First, thereisaprovisioningaspectthatdetermineshow paper, we focus on volume-level snapshots because they cap- manytestinghostsareusedtomeetthespecifiedobjectives. Sec- turethedataneededforanytestinthesoftwarestack. While ond,thereisaschedulingaspectthatdeterminestherateatwhich thetimetocreatethefirstsnapshotwilldependonthevolume snapshotsaretestedandhowtestrunsarescheduledontheprovi- size, later snapshots only need to copy the changes since the sionedhosts. Third,thereisasustainabilityaspectthatdetermines lastsnapshot(similartoincrementalbackups).1 whether the plan will continuously meet the specified objectives 2. Runtests:Asnapshotisloadedontooneormoretestinghosts astimeprogresses. Section5givesaformalcharacterizationofa wheretestsarerun.AsshowninFigure1,testinghostsaredif- testingplaninAmulet,therebydefiningthespaceoftestingplans. ferentfromtheproductionhosttoavoidperformanceproblems. Developingacostmodelfortests:Tofindwhetheraplanenumer- 3. Applychanges: Iftestsdetectandrepaircorruptioninasnap- atedfromthetestingplanspacewillmeettheobjectivesspecified shot,thentheadministratorcanchoosetoapplythesechanges inanAngelprogram,theOptimizerneedsmodelstoestimatethe orloadtherepairedsnapshotontheproductionsystem. executiontimesoftestsscheduledbytheplan.Anovelcomponent Amulet’sDeclarativeLanguage:Makingiteasyandintuitivefor ofAmuletisalibraryofmodelstoestimatetestexecutiontimes. administrators to declare objectives like those in Table 2 poses a ThelibrarycurrentlycoverstestsfortheMySQLdatabaseandthe nontriviallanguagedesignproblem. Adissectionoftheexamples ext3andXFSfile-systems;discussedfurtherinSection3. inTable2revealstheimportantfeaturesthatareneeded: Findingagoodtestingplan:ForeachvolumereferencedinanAn- • Specificationofoneormoretestst,andassociatingtwiththe gelprogram,theOptimizerhastofindagoodplanfromahugeplan dataandtypeofresourcesonwhichitshouldberun. space.Weproposeanovelalgorithmforthisoptimizationproblem • Specification of tested recovery points to be maintained over thatconsidersallthreeaspectsoftestingplans:provisioningtesting a recent window of time. These points enable quick system hosts, schedulingtestsonsnapshotsandhosts, andensuringplan recoveryincaseseriouscorruptionisdetected. sustainabilityovertime. Whileouralgorithmisnotguaranteedto • Ability to declare different objectives such as the minimum findtheoptimalplan,weshowempirically—basedoncomparisons numberoftestrunspertimewindoworacostbudgetforpro- withanexhaustivesearchalgorithm—thatouralgorithmisveryef- visioningpay-as-you-goresourcestoruntests. ficientandfindstheoptimalplanmostofthetime. • Abilitytocombinemultipleobjectivesaswellastospecifyan optimizationobjective. Amulet’s Orchestration Phase: After submitting an Angel pro- gram, theadministratorcanviewthetestingplansgenerated, and Angel,Amulet’sdeclarativelanguage,isdesignedtosupportsuch when satisfied, submit the plans to Amulet’s Orchestrator for ex- features. The semantics and simplified syntax of Angel are de- ecution. TheOrchestratorexecutestestingplanscontinuouslyby scribedinSection4andTable4. working in conjunction with a Snapshot Manager and a resource Amulet’sOptimizationPhase: Amuletcanrunacomprehensive provider,bothofwhichareexternaltoAmulet.TheSnapshotMan- suiteoftests,includingnewuser-definedones,todetectandpos- ager notifies the Orchestrator when a new snapshot of a volume on the production system isavailable for testing. The Orchestra- 1Production deployments that need near-real-time disaster recov- tor allocates testing hosts from the resource provider which, cur- erytakesnapshotsregularlyandstorethemoncloudstorage[27]. rently, can be any infrastructure-as-a-service cloud provider. A 2.2 DealingwithDataCorruption major challenge faced by the Orchestrator is in dealing with un- ThetechniquescategorizedasthefirstlineofdefenseinSection predictableeventsarisingduringplanexecution: 1 check for data correctness during reads and writes in the pro- • Repairs: Itisimpossibletopredictwhenacorruptionwillbe ductionworkload; usuallybasedonadditionalstoredinformation detectedandarepairactionneedstobetaken. likeparitybitsandchecksums. Thesetechniquesarenotsufficient • Stragglerhosts: Ahostusedtoruntestsonthecloudmaybe- topreventordetectcorruptioncausedbycomplexissuessuchas comeslowtemporarily,causingthetestexecutionscheduleto lostandmisdirectedwritesduetobugsinthesoftwarestack[3]. lagbehindtheoptimizer-plannedschedule. Forexample, theauthorsof[14]showtheinabilityoftechniques • Wrongestimates:Lagsinthetestingschedulecanalsobecaused likeparity-basedRAIDtoavoiddatacorruption. Theauthorsalso byinaccurateestimatesoftestexecutiontimesfromthemodels. showhowcommontechniquesusedinRAIDcanspreaddatacor- RatherthancomplicatingtheOptimizerormakingunrealisticas- ruptionacrossmultipledisksandcausedataloss. Thefirstlineof sumptions, Amulet’s solution is to reserve a cost budget in each defense adds performance overheads during workload execution. testing plan that the Orchestrator can use to provision additional Enterprise systems have historically preferred performance over hosts on demand to deal with unpredictable events; discussed in the(wronglyassumed)rarechanceofdatacorruption. Amuletad- Section6. Thenoveleffectisthatatestingplanhasastatically- dressestheseproblemsbyenablingcomplexandresource-intensive plannedcomponentgeneratedbytheOptimizeraswellasanadap- tests like those in Table 1 to be run in a timely fashion. To our tivecomponentmanagedbytheOrchestrator.Section7willpresent knowledge,Amuletisthefirstsystemofitskindthatworksacross comprehensive experimental results from a prototype of Amulet differentpoint-in-timecopiesofdatatodetectandrepairdatacor- runningontheAmazoncloud. ruptionefficientlyintheend-to-endsoftwarestack. Modelingtheperformanceoftestsorimprovingtheirefficiency 2. RELATEDWORK hasreceivedlittleattention.Forexample,mosttestsstillusesingle- threadedexecutionandcannotexploitmulticoreCPUs. Currently, A number of recent empirical studies show that corruption of everytesttisanopaqueexecutionscripttoAmuletapartfromthe critical data is a reality and occurs much more commonly than model used to estimate t’s execution time. With more visibility assumedpreviously. Itisperhapssurprisingthatthedatabasere- intotests,Amuletcandoabetterjobofoptimizingt’sexecution. searchcommunityhaspaidlittleattentiontothisproblem. A promising development in this regard is the writing of tests in 2.1 TheDangersofDataCorruption declarativelanguageslikeSQLasdonein[12]. The authors of [3] analyzed corruption instances recorded in Amulet’s goal of early detection and repair of data corruption more than 10,000 production and development storage systems. formsacrucialpartofdisasterrecoveryplanning. Theauthorsof Their main focus was on studying silent data corruption which [27]arguethatcloudcomputingplatformsarewellsuitedforoffer- is corruption undetected by the disk drive or by any other hard- ingdisasterrecoveryasaservicedueto(a)thecloud’spay-as-you- ware component. Among corruption instances logged over a 41- go pricing model that can lower costs, and (b) the cloud’s use of monthperiodamongatotalof1.53milliondiskdrivesofvarious elasticvirtualplatforms. Amuletisaproof-of-conceptsystemfor types,theauthorsfoundmorethan400,000instancesofchecksum thisargumentappliedtotheproblemofdatacorruption. mismatches. The study also showed that cheaper nearline SATA The concept of declarative system management is gaining cur- disks(andtheiradapters)developchecksummismatchesanorder rency. Chef, Puppet, and Microsoft SQL Server’s policy-driven of magnitude more often than the more expensive and carefully- managerarenowpopulartoolsthattakedeclarativespecifications engineeredSCSIdisks. However,corruptioncanalsooccurinthe as input, and then configure and maintain systems automatically latterwhichareenterprise-classdrives. [13]. However, unlike Amulet, these tools do not support objec- Theauthorsof[22]analyzedmemoryerrorscollectedoverape- tivesthatarespecifieddeclarativelyandoptimizedautomatically. riodof 2.5years inthemajority ofservers usedbyGoogle. The 3. MODELINGOFTESTS authorsfoundthattherateofdatacorruptioninDRAMisordersof magnitudehigherthanpreviouslyreported,withmorethan8%of For each test t, Amulet’s Optimizer needs a model to estimate dualin-linememorymodules(DIMMs)affectedbyerrorsperyear. t’srun-timebehavior—e.g.,executiontime,usageofCPU,mem- Memoryerrors canbe classifiedinto softerrors, thatcorrupt bits ory,andI/Oresources—whentisrunongivendataandsystemre- randomlybutdonotleavephysicaldamage; andharderrors,that sources.Wedividetheinputparametersforatestmodelintothree corrupt bits in a repeatable manner because of a physical defect. categories: (i) data-dependent attributes, (ii) resource-dependent Memoryerrorsfoundinthestudyweredominatedbyharderrors, attributes,and(iii)attributestocapturetransienteffects.Inthissec- ratherthansofterrorsasassumedpreviously. tion,wediscussattributesinthetestmodelandtheirimpactforthe Injectingfaultsintothedatabasesoftwarestackprovideinsights myisamchk,fsck,andxfs_checktestsfromTable1. Ourfocusis intosystembehavioranddatalossunderdifferenttypesofcorrup- onmodelsforestimatingtestexecutiontime.Notethatwegenerate tion.Arecentstudyusedfaultinjectionsintoapopularopen-source separatemodelsforthesametestwheninvokedwithsignificantly DBMS(MySQL)toshowthatcertaintypesofdatacorruptioncan differentoptions. Forexample,fsckhasseparatemodelsbasedon harm the system, e.g., causing system crashes, data loss, and in- whether it is invoked to check file-system metadata compared to correct results [25]. The authors also point out that concurrency data. Thefsckmetadatatestinvolvesverifyingthesuperblock,in- controlandrecoveryfeaturesofdatabasesystemsarenotdesigned odes,andfreeblocklist,whilethedatatestdoesafullscantofind todetectorrepaircorrupteddataormetadataresultingfromhard- allbadblocks. ware,software,orhumanerrors.Asimilarstudyhasbeendonefor 3.1 Data-dependentAttributes theZFSfile-systemthat, comparedtopopularLinuxfile-systems like ext3 and XFS, has novel features like end-to-end checksums Data-dependentattributeshaveafirst-orderimpactontestexecu- forcorruptiondetection[30]. TheauthorsshowthatwhileZFSis tiontimes. Fortunately,thisimpactcanbecapturedfullybasedon veryresilienttodisk-levelcorruption,memory-levelcorruptioncan propertiesthatcorrespondtothesizeofthedataandareeasilymea- leadtocrashesandincorrectresults. surable.Differentdata-dependentattributesarerelevantdepending pliteon i eTmi()nm 11223344505050505 pmliteon i eTmi()nm 111110246846800000000 File size =1 KB File size = 64 KB pmli teoni eTmi()nm1122334455605050505050 pmliteon i eTmi()nm 1023456789 oCm 00 20 40 60 80 100 120 140 160 180 200 oC 2000 1 2 3 4 5 6 7 8 9 10 11 oC 059 18 27 36 45 54 63 72 81 90 oC 010 1 2 3 4 5 6 7 8 9 File-system Size (in GB) Used Inodes (in millions) Data size (GB) Number of Indexes (a) Varyingthefile-systemsizefor(b) Varyingthenumberofusedin-(c) Varyingtheinputdatasizefor(d) Varying number of indexes fscktestonext3 odesforxfs_checkonXFS myisamchkmediumtest formyisamchkmediumtest Figure3:Effectofdata-dependentattributesonthecompletiontimeofthefsck,xfs_check,andmyisamchktests Small Medium Large 40 concurrent sequential leopCmi tonTi (emmi)n 11122258140369 Check Mediu m-check Extende d-check opCmliteon i eTmi()nm 2222333332468024681 16 256 512 1024 2048 4096 opCmli teoni eTmi()nm 1111122024680246802 Instance Type Key buffer size (MB) Test set 1 Test set 2 (a) Effect of Amazon EC2 instance type on(b) Effect of MySQL’s key_buffer_size pa-(c) Concurrentvssequentialexecutionofmy- myisamchktestsfora14GBdatabase rameterformyisamchkextendedtest isamchktestsw/diff.invocationoptions Figure4:Effectofresource-dependentattributesandconcurrenttestexecutionontheexecutiontimeofmyisamchktests onwhetherthetestisattheapplication, database, file-system, or Figure 4(a) shows the execution times of myisamchk tests for storagelevels. Forfsckandxfs_check,weexploredawiderange different Amazon EC2 host types for a total database size of 14 ofattributestocapturethefile-systemdata:sizeofthefile-system, GB.Weobservea100%speedupforthemyisamchkextendedtest total number of inodes, number of u sed inodes, av erage file size, betweena smallhost(1.7GBmemorywith1EC2computeunit) files per directory, and size of inodes and data blocks. Based on andalargehost(7.5GBmemorywith4EC2computeunits). In acomprehensiveempiricalanalysis,wefoundthattherearethree general,weobservedthatthemorethoroughmyisamchktestsare attributeswiththemostimpact:totalinodes,usedinodes,andaver- bothcompute-andmemory-intensive. Executiontimesoffsckdid agefilesize. Totalinodesrepresentsthetotalavailablecapacityof notvarysignificantlyacrossthedifferentAmazonEC2hosttypes. thefile-system,whileusedinodesareindicativeoftheusedspace. fsck is I/O-intensive and does not exploit the increased CPU or The block check version of fsck on the ext3 file-system veri- memoryresourceswhengoingfromthesmalltothelargehosts. fies the correctness of the used portion of the file-system as well Software-levelresourceslikecachescanalsoimpacttestexecu- as that of the unused inodes and free blocks. Thus, as shown in tion times. For example, the myisamchk tests use a cache called Figure3(a),theexecutiontimeofthefsckdatatestdependsonthe thekeybuffer.Thesizeofthiscache,givenbythekey_buffer_size file-systemsize. Incontrast, theoptimizedxfs_checktestforthe input parameter, affects the execution time of certain myisamchk XFS file-system verifies only the used portion of the file-system. tests. Figure4(b)showsthevariationintheexecutiontimeofthe Thus, theexecutiontimeofxfs_checkdependsonthenumberof myisamchkextendedtestforinputdataofsize12GB. usedinodesasshowninFigure3(b),andisindependentofthetotal file-systemsize. 3.3 TransientEffects For the myisamchk tests, the total input data size to the test is Testexecutiontimescanvaryduetotransienteffectslikewarm theprimarydata-dependentattribute. Figure3(c)showsthelinear versuscoldcachesorresourcecontentionwhentestsareruncon- effect of the input data size on test execution time. In addition currently. AmajoradvantageAmuletenjoysinthiscontextisthat to the data size, the number of indexes present on table columns Amulet’s Orchestrator can ensure that tests are run in configura- affectstheexecutiontimeofcertainmyisamchktests. Onceagain, tionssimilartotheonesusedduringtestmodeling. Furthermore, theeffectislinearasshowninFigure3(d). aswediscussnext,Amulet’sOptimizerprioritizespredictablebe- 3.2 Resource-dependentAttributes havior(i.e.,robusttestingplans)overpotentialoptimalitysincethe Theexecutiontimeofatestisexpectedtovarywiththehardware formerismoreimportantinproactivetesting. andsoftwareresourcesallocatedtorunthetest. Thetrendamong WarmVs.coldcaches:Datacachingforthememory/diskinterface infrastructure-as-a-servicecloudprovidersistoprovidehardware withinahostdoesnotbenefittestssuchasfsckandxfs_checkthat resources such as CPU, memory, and storage in terms of a small accessdatadirectlyattheblocklevel. However,asmentionedbe- number of discrete choices. For example, the hardware resource fore, suchcachingbenefitshigher-leveltestssuchasmyisamchk. spaceisdiscretizedintomicro,small,medium,andlargenodecon- Datacachingatthehost/networkinterfacewhileusingnetworked figurationsontheAmazoncloud. Suchdiscretizationvastlysim- storagebenefitsmosttests. Wehaveobservedupto2xdifferences plifiesAmulet’staskofmodelingtestexecutiontimesforvarying intestexecutiontimesforwarmVs. coldcachesinthiscontext. resourceproperties. Thetestmodelshavebeenenhancedtoaccountfortheseeffects. Name Description Name SimplifiedSpecificationSyntax O1-On SSO,RPO,SIO,TCO,orCOobjectivesspecifiedpervol- Test Test(Data:data,Host:host,execscripts,...) umeinanAngelprogram(Table4givesasummary) Inputdatafortest Data(Volume:V,type,properties,...) Oopt OptimizationobjectiveforavolumeinanAngelprogram Hosttoruntest Host(type,setupscripts,...) t, s, h, τ, Usedtodenoterespectivelytests,volumesnapshots,hosts, Volume Volume(Host: production host where V is lo- x,andd timeintervals,numericconstants,andcurrencyvalues cated,pathonhost,volumeid,properties,...) τrpo ThetimeintervalinanRPO(seeTable4) Repairaction Repair(Test:t,t’sreturncode,execscripts,...) dPco TTehsetimngaxpilmanumforcoasvtobluudmgeetininaanCAOng(esleperToagbralem4) SSO: Safe Snap- Listoftests{t1,t2,...,tk},forvolumeV shotObjective PW PlanP’swindow.TheplanrepeatseveryPW timeunits RPO: Recovery Recovery_point≤τ,forvolumeV PI TimeintervalbetweensuccessivesnapshotsinplanP PointObjective PM Test-to-snapshotmappinginplanP SIO:SnapshotIn- Snapshot_intervalOpτ,forvolumeV PS ScheduleoftestexecutioninplanP tervalObjective PR P’scostbudgetreservedtohandleunpredictableevents TCO:TestCount Test_count(t)Opx,intimeintervalτ,forvol- ExecTime(t) Executiontimeoftestt Objective umeV Time(si) pTliamnePwrheelnatiivthetsonathpeshsotatrstio,f1th≤epila≤nwPPiWInd,oiwsaPvWailablein CtivOe: Cost Objec- Cfoorsvtol≤umde,Vintimeintervalτ,withreservationx%, start(h) Timewhenhosthisfirstusedintheplanwindow Oopt: Optimiza- Maximize (when Op is ≥ in an SIO or TCO), end(h) Timewhenhosthwillfinishitslastscheduledtestforthe tionObjective Minimize(whenOpis≤inanRPO,SIO,orCO) planwindow(end(h)canbe>PW) Notification SQLtriggersoneventtablesinlogdatabase cost(PS), CostincurredfortheplanschedulePSorahosthforone cost(h) planwindow Table 4: Summary of important Angel statements. Op ∈ {≤ ,≥}. t,τ,x,anddareconstantsofrespectivetypestest,time Table3:Notationusedinthepaper interval,numeric,andcurrency Concurrent execution of tests on the same host: Concurrent exe- rpo cutionmakestestexecutiontimesdifficulttoestimate. Figure4(c) illustratesthiscomplexity. Testsets1and2arerunsoftwomy- ExecTime(t) isamchktestsontwodifferenttables. Thetestsineachsetleadto differentoutcomeswhenruninconcurrentversussequentialfash- Time (si-1) Time (si) Time (si+1) ion. Thetestsinset2causememorythrashingwhenrunconcur- Figure5:IllustrationofRPO rently.Sincethenumberofdistincttestsisnotlarge,2wecantrain 4.1 Objectives models to estimate execution times for tests run concurrently in An Angel program can specify one of five types of objectives. pairs. As such, if Amulet’s Optimizer does not have a model to Theseobjectivescanbeusedindependentlyorcombinedtogether estimatetherunningtimeofconcurrenttests,itwillsimplychoose tospecifyavarietyofrequirementsforrunningtests. Eachobjec- nottoconsidertestingplanswithconcurrenttests. Thegoalofthe tiveOreferencesauniquevolume.TheOptimizerwillpartitionthe Optimizeristofindarobusttestingplan,wherearobustplanisone objectivesinanAngelprogrambasedonthevolumesreferenced, whoseperformanceisalmostnevermuchworsethanpromised. andgenerateonetestingplanpervolume. 4. ANGELDECLARATIVELANGUAGE SafeSnapshotObjective(SSO):AnSSOforavolumeV specifies the full list of tests t ,...,t to be run on a given snapshot s for RecallfromSection1.1thatanAngelprogramspecifiestestsand 1 k V before s can be labeled corruption-free. If none of the tests theobjectivestobemetwhilerunningthetests. Wenowdiscuss t ,...,t findcorruptionins,thensislabeledcorruption-free.Ifa themainstatementsinAngel.Table4providesasummary. 1 k testt detectsacorruptionins,thensislabeledcorruptedandany i Tests: Angel’sTeststatementdefinesatesttbyspecifyingthe repairactionassociatedwitht willberunons. i commandtoruntaswellasreferencestot’sinputdata(specified RecoveryPointObjective(RPO):Whencorruptionisdetectedin byaDatastatement)andthetypeofhostonwhichtorunt(spec- thedatainavolumeV,itisusefultohaveimmediateaccesstoa ifiedbyaHoststatement). Angel’sRepairstatementenablesa testedrecoverypointforV fromtherecentpast.Arecoverypointis repairactiontobeassociatedwithatesttforinvocationiftdetects acorruption-freesnapshotofV fromwhichthedatabasesoftware acorruption(asindicatedbyaspecificreturncodefromt). stackcanbebroughtbackonlinequickly. Recoverypointsarean Data: Angel’sDataandrelatedstatementsdefinetheinputdata essentialpartofdisasterrecoveryplanningstrategies[27]. foratest, includingthevolumethatthedatabelongsto, thedata Figure5showstwosuccessivesnapshotss ands foravol- i−1 i type(fromasetofsupportedtypes),andthedataproperties. The umeonwhichatesttisrun. Ifacorruptionweretobedetected properties, which are specific to the type of data, form inputs to whentrunsons ,thentheadministratorwantss tobearecov- i i−1 themodelsthattheOptimizerusestoestimatetestexecutiontimes. erypoint. AnRPOexpressesthisrequirement. RPOspecifiesthe TheOptimizerdoessemanticcheckstoensurethattheAngelpro- maximum (sliding) time interval τ into the past within which rpo gramspecifiesvaluesforallinputparametersrequiredbythemodel arecoverypointmustexistforavolumeifcorruptionweretobe for each test in the program. While helper tools are available to detectedonasnapshot. Amuletshouldensurethatthesumoftest extractthesevaluesfromthecorrespondinginputdata,theadmin- t’sexpectedrunningtimeandthesnapshotintervalbetweens i−1 istratorcanalsospecifyvaluesbasedontheirdomainknowledge. ands isnotgreaterthanτ (asillustratedinFigure5). Using i rpo Hosts: The primary use of Angel’s Host statement is to define notationfromTable3,theRPOmandates: ahosttype(fromasetofsupportedtypes)foratesttsothatthe ExecTime(t)+Time(s )−Time(s )≤τ (1) i i−1 rpo Orchestratorwillalwaysruntontestinghostsofthattype. An RPO together with an SSO with a list of tests t ,...,t is a 1 k 2Fromourexperience,mostadministratorsprefertousestandard powerful combination to express recovery points. Now Amulet teststhatcomewitheachsystem,ratherthanwritingnewtests. shouldensurethatalloft ,...,t willfinishons intime≤τ 1 k i rpo thenselectsonetestingplanpervolume. Theselectionofthetest- FindtheLeast-CostPlanP=(cid:104)PW,PI,PM,PS,PR(cid:105)thatSatisfiesthen ingplanistreatedasthefollowingoptimizationproblem: ObjectivesO1-On(withoutOopt)foraVolumeV inanAngelProgram 1. UseFig.7topickthesnapshotintervalPIforO1-On; Testing-plan Selection Problem for Volume V: Given n objec- 2. SelecttheplanwindowPW,andscaleO1-OntoPW (Section5.3); tivesO1-On (eachoftypeSSO,RPO,SIO,TCO,orCO)andan 3. UseFig.8topickthetest-to-snapshotmappingPMforO1-On,PW,PI; optimizationobjectiveOopt(oftypeMaximizeorMinimizeon 4. PicktestexecutionschedulePSforPW,PI,PM (detailsgivenin[4]); oneofO1-On)foravolumeV, findthetestingplan(ifany)that 5. PRisavailablefromtheCOinO1-On,oradefaultisused; meetsallofO -O whilegivingthebest(maximumorminimum, 1 n Figure6:Findingtheleast-costplanforgivenobjectivesO -O 1 n asappropriate)valueforO . opt 5.1 TestingPlansinAmuletforaVolumeV − (Time(s ) − Time(s )). If any of these tests were to detect i i−1 Formally,atestingplanP containsfivecomponents: acorruptionins ,thens willserveasarecoverypointthatis i i−1 1. Snapshot interval P is the uniformly-spaced minimum time notmorethanτ intothepast. Atthispoint, theadministrator I rpo interval between consecutive snapshots that the plan needs to ortheSnapshotManagercandecidehowtoproceedregardingthe testtomeetalltheobjectivesspecified. productionsystem: restartthesystemwithsnapshots , ignore i−1 2. WindowP isatimeintervalsuchthattheplanrepeatsevery thecorruptionfornow,etc.Amuletdoesnotinterferehere. P timeuWnits.Theplanprocesses PW snapshotsperwindow. SnapshotIntervalObjective(SIO):ForavolumeV,anSIOhas 3. TeWst-to-snapshotmappingP specifiPeIs,foreachsnapshotsin M theform: Snapshot_intervalOpτ,fortimeintervalτ and theplanwindow,thesetofteststhatneedtoberunons. Op in {≥,≤}. Snapshot_interval refers to the expected 4. TestexecutionscheduleP specifiesthenumberandrespective S time interval between two successive snapshots of V. Note that types of testing hosts to use, and when to run each test from AmuletdoesnotcontroltheSnapshotManagerwhichcollectssnap- P onthesehosts. M shotsfromtheproductionsystem. SIOsexpressthefeasiblesnap- 5. Reserved cost budget P is the part of the plan’s total cost R shotintervalsthatAmuletshouldconsiderduringplanselection. budgetthatisreservedfortheOrchestratortodealwithunpre- Test Count Objective (TCO): For a volume V, a TCO has the dictableeventsthatcanariseduringplanexecution. form: Test_count(t) Op x, in a time interval τ for a test t. The core of Amulet’s Optimizer is a cost-optimal planning algo- Here,Test_countspecifiesthenumberofuniquesnapshotsof rithmthatcanfindtheminimum-costplan(ifvalidplansexist)to V intheintervalτ onwhichAmuletshouldruntestt. Thetypical meetagivensetofobjectives.WewillbegininSections5.3–5.5by use of TCOs are to express requirements of the form: “Run t at describingthestagesinwhichthecost-optimalplanningalgorithm least four times every day.” A plan chosen by the Optimizer for works. AsillustratedinFigure6,eachstageselectsoneofthefive thisTCOwilldotheintuitivethingofspacingoutthefourtestruns componentsoftheminimum-costplan,goingintheorderP ,P , I W uniformlyinthespecifiedtimeintervalof1day. P , P , and P . Section 6 will describe how the Orchestrator M S R CostObjective(CO):ACOforavolumeV hastheform: Cost usesthereservedcostbudgetP . R ≤d,inatimeintervalτ,with(optional)reservationx. Here,dis IftheAngelprogram’soptimizationobjectiveisnotcostmini- a cost measure for the resource provider from which the Orches- mization,thenAmuletusesahigher-levelplanningalgorithm.This trator will allocate resources to run tests. The CO applies to the algorithm, discussed in Section 5.6, repeatedly invokes the cost- entiretestingplanthattheOptimizergeneratesforV.ACOcanbe optimal planning algorithm with a series of increasingly stricter specifiedonlyifthepricingmodelthattheresourceprovideruses objectivesuntilnovalidplancanbefound. Theoptimalplancan tochargeforresourceusageisinputtoAmulet.Section7describes beidentifiedatthatpoint. thepricingmodelthatweuseinourevaluation. 5.2 SelectingtheSnapshotInterval TheCOalsospecifiesafractionoftheoverallcostbudgetthat isreservedfortheOrchestratortorespondtothreetypesofunpre- TheOptimizer’sgoalinthisstageistopickthemaximumvalue dictable events that can occur during the execution of the testing thatPI canhavewhilemeetingalltheRPO,SIO,andTCOobjec- plan: repairs,stragglerhosts,andinaccurateestimatesoftestexe- tivesspecified.MaximizingPItranslatesintominimizingthenum- cutiontimesbythemodelsusedintheOptimizer. Section6will berofsnapshotsthatneedtobeprocessed. Consequently,thecost discusshowtheOrchestratorusesthereservedcostbudgettopro- oftheplanisminimized—whichisourgoal—sincemoresnapshots visionhostsondemandtohandletheseunpredictableevents. By meanhighertestexecutionandhostrequirements. default,thereservationissettothecostofusingonetestinghost. Figure7showsthestepsinvolvedinthisstage. Thealgorithm Optimization Objective (O ): An appropriate Maximize or goesthroughtheobjectivesonebyone,whilemaintaininganupper Minimize optimization objoepcttive (Oopt) can be specified along (PImax)andlower(PImin)boundonfeasiblevaluesofPI.Finally, thelargestfeasiblevalueofP ,ifany,isselected. withanRPO,SIO,TCO,orCOforavolumeV inanAngelpro- I gram. Maximize can be used when Op in the objective is ≥, 5.3 SelectingthePlanWindow andMinimizecanbeusedwhenOpis≤. Specifically,applying RecallfromSection4andTable4thattheobjectivesRPO,TCO, MaximizetoanSIOorTCOoftheformvalue≥constasksto andCOforavolumeV inanAngelprogramspecifytimeintervals. makeconstashighaspossible. ApplyingMinimizetoanRPO, TheplanwindowP servesasamechanismfortheOptimizerto SIO,orCOoftheformvalue≤constaskstomakeconstaslow W considertheintervalsinallobjectivesinauniformfashion. P is aspossible. OneandonlyoneO isallowedpervolume. The W opt pickedastheleastmultipleofP (P =n×P ,n∈N)suchthat defaultisMinimizeCOifnoO isspecifiedintheprogram. I W I opt P isgreaterthanorequaltothemaximumamong: (a)thetime W intervalsinRPO,TCO,andCOobjectives,and(b)ExecTime(t)for 5. OPTIMIZER eachtesttspecifiedinanSSOorTCOobjective. PickingP > W Inthissection,wewilldiscussthealgorithmusedbyAmulet’s ExecTime(t)foralltests(i.e.,Case(b)above)isneededtoensure Optimizer. GivenanAngelprogram,theOptimizerfirstpartitions thesustainabilityofschedulesaswewillexplaininSection5.5. theobjectivesintheprogrambasedonthevolumesreferenced,and OnceP hasbeendetermined,thecorrespondingparametersin W SelectingthePlanSnapshotIntervalPIinaTestingPlan SelectingtheTest-to-SnapshotMappingPM inaTestingPlan Inputs:ObjectivesO1-On(withsyntaxfromTable4) Inputs:ScaledobjectivesO1-On,PlanWindowPW,SnapshotIntervalPI 1. PImin=SnapshotintervalfromtheSnapshotManager; 1. for(everytesttreferencedinanSSOorTCOinO1,...,On){ 342... PifIm(τOra1px,o=.=.τ∞.;,;OPInmτacroxpno=ta=τirn∞2psoR;;P/*Ot:eRste≥co2vsenarpys_hoptosiinntτr≤poτto){meetRPO*/} 234... forCif(eO(vOUerNiysTOSmtSbijOnec:=tLivi0se;tOofCitnOesOUtsN1{,Tt.mt1.,.a.,x.O.=n,t)PkP{W}I);{} 6578.... forif(e(vfOoerrPiy(sTImOSeSbsatjOxetc:=itLnivMitse1tiOn,o[.fPi.tnIe.ms,Otatsk1x{,),t.τ1.r,.p.,o.O.-n,Et)xk{e}c)T{ime(t)];}/*Equation1*/ 567... if(fOorCi(sTOTeUCstNOtT:imtnTeitn1s,t=._.Mc.,aotxuk[nC)tO(U/t*)Nt≥Thmtaxsi,ntion,rPtuPiWmnIeo]n;in}atellrvPPaWlIPWsna)pshots*/ 9. if(OisTCO:Test_count(t)≥x,intimeintervalτ) 8. COUNTmin=Max[COUNTmin,x]; t t 10. PImax=Min[PImax,τrpo-ExecTime(t), xτ]; /*Equation1*/ 9. if(OisTCO:Test_count(t)≤x,intimeintervalPW) 11. if(OisSIO:Snapshot_interval≥τ) 10. COUNTmax=Min[COUNTmax,x]; t t 12. Pmin=Max[Pmin,τ]; 11. } I I 13. if(OisSIO:Snapshot_interval≤τ) 12. PM =∅; 14. PImax=Min[PImax,τ]; 13. for(everytesttreferencedinanSSOorTCOinO1,...,On){ 15. } 14. if(COUNTmt ax<COUNTmt in)return“NofeasiblePM exists”; 16. if(PImax<PImin)return“NofeasiblePIexistsforO1-On”; 15. else{ 17. elsesetPI =PImax; 16. MaptestttoCOUNTmt insnapshotsspreaduniformlyacrossthe Figure7:SelectionofPI (notationusedfromTables3and4) PPWI snapshotsintheplanwindow.AddthemappingstoPM;} 17. } Figure8:SelectionofP (notationusedfromTables3and4) allTCOandCOobjectivesarescaledproportionatelytoP . For M W example,aCOthatspecifiesacostbudget(d )ofU.S.$10in1 co hour, will be scaled to a cost budget of U.S. $15 for PW = 1.5 tothestartoftheplanwindow)whenahosthisfirstusedtorun hours. NotethatthetimeintervalinanRPO(τrpo)isindependent atestinthewindow. end(h)denotesthecorrespondingtimewhen ofPW,andshouldnotbescaled. host h will finish its last scheduled test for the window. (end(h) 5.4 SelectingtheTest-to-SnapshotMapping canbegreaterthanPW.)Forthescheduletobesustainableacross multiplesuccessivewindows,weneed: Forthe PW snapshotsinaplanwindow,thisstagedecideswhich testsneedtPoIberunonwhichsnapshots. Figure8showsthesteps PW +start(h)>end(h) (2) involved.ForeachtesttspecifiedinanSSOorTCO,thealgorithm This condition ensures that by the time host h is needed to run inFigure8maintainsupper(COUNTmt ax)andlower(COUNTmt in) testsforaplanwindow, alltestsscheduledonhfortheprevious boundsonhowmanysnapshotstshouldberunon.Testtismapped planwindowwillhavecompleted.Infact,testsscheduledonhfor atuniformly-spacedintervalstotheminimumnumberofsnapshots allpastwindowswillhavecompletedbecauseourtechniquefrom thattneedstoberunon. NotethatthetestsinanSSOshouldbe Section5.3toselectthewindowsizeP ensuresthatnotestrun runonall PPWI snapshots(Lines4-6inFigure8). willspanmorethantwoconsecutivewinWdows. The second way to schedule test t is to run t on a host h 5.5 SelectingtheScheduleofTestExecution ij ij k afteralltestscurrentlyscheduledonh complete. Apartfromthe k Afterthetest-to-snapshotmappingPM hasbeengenerated,the standardchecksforRPOviolationandsustainability,thealgorithm Optimizerselectsthetestexecutionscheduleaswellasthemini- alsocheckswhethert canbestartedovers onh beforethenext ij i k mumnumberofhostsneededforrunningthesetests. Thisstageis snapshots arrives. Theaimhereistoachieveabalancedtest i+1 byfarthemostcomplexoneintheOptimizer. NotethattheOp- executionworkload(totheextentpossible)throughoutthewindow. timizerisonlyidentifyingagoodschedule. Theschedulewillbe If it is not feasible to schedule t on a testing host that is al- ij executed—includingactualallocationoftestinghostsandrunning readyallocatedintheplan,thenthethirdwayistoschedulet on ij ofthetestsontheresourceprovider—onlyaftertheselectedtesting anewtestinghostaddedtotheplan.Thisstepwillusetheresource planissubmittedtotheOrchestrator. provider’spricingmodeltoensurethattheadditionofanewtest- Thecompletedetailsofthealgorithmusedinthisstagearegiven inghostwillnotovershootthecostbudgetspecifiedintheAngel intheonlinetechnicalreport[4].Thisgreedyalgorithmgoesthrough program. Notethattheallocationofanewtestinghosttorunt ij thesnapshotssi inoneplanwindowinorderfromi=1toi=PPWI , will not violate schedule sustainability because ExecTime(tij) ≤ aswellasthetestst thathavebeenmappedtos . Ahosth is P (fromSection5.3). ij i k W identifiedtorunt ons inoneofthreewaysasdiscussednext. ij i 5.6 HandlingNon-costOptimizationObjectives Thefirstwayisbymeansoftestgrouping,wheret willberun ij concurrentlywithanothertestorgroupoftestsonahostthathas Sofarwefocusedonfindingatestingplanthatminimizescost already been allocated to the plan. Recall that Amulet strives to while meeting all the given objectives. Amulet’s Optimizer can generate a robust testing plan, i.e., a plan whose chances of per- handle non-cost optimization objectives as well, and does so by forming worse than estimated is low [2]. If there is no model to repeatedly invoking the cost-optimal planning algorithm with in- estimate how the concurrent execution of a set of tests will per- creasinglystricterobjectivesuntilnovalidplancanbefound. The form, then the Optimizer will take the low-risk route of avoiding completedetailsoftwoalgorithmsthatwehavedevelopedfornon- suchexecutions. costoptimizationaregivenintheonlinetechnicalreport[4].These While selecting the test execution schedule, the algorithm also algorithmsdifferinhowthestricterobjectivesaregenerated: one addressestheimportantissueofschedulesustainabilitywhichen- algorithmdoessoinlinearincrementswhilethesecondalgorithm sures that the plan generated for one window can be run contin- usesabinary-searchtechniquetoimproveefficiency. uouslyforanynumberofwindowsthatcomeoneaftertheother. Considertheobjectiveofminimizingtheintervalτ inanRPO. rpo UsingnotationfromTable3,letstart(h)denotethetime(relative (Example3fromTable2hasthisoptimizationobjective.)Itemerges uleofexecutionfromtheOptimizer-estimatedschedule,andwith Snapshot Collector 1 Snapshot Collector 2 Snapshot Collector n repairactionsthatneedtoberunwhencorruptionisdetected. Orchestrator Snapshot Snapshot Snapshot DealingwithLags:Thisprocessinvolvestwosteps:(i)identifying Receiver Receiver Receiver stragglerhostsonwhichthelagisobserved;and(ii)allocatingone ormorehelperhostsforeachstragglerhostsubjecttothereserved Snapshot-to-Plan Mapper costbudgetP earmarkedfortheOrchestratortodealwithunpre- R dictableevents.Atestinghosthismarkedasastragglerwhentwo Plan Manager Plan Manager Plan Manager conditionshold: 1. The actual execution time of tests for a snapshot s on h has Host Host Host Host Host Host Manager Manager Manager Manager Manager Manager overshotthecorrespondingestimatedtimebymorethananal- lowedslack.(Theslackisusedtopreventoverreaction.)Strag- Cloud Manager glerhostsareprioritizedbasedontheageofs. 2. Thenextsnapshots(cid:48) onwhichhosthisscheduledtoruntests hasbecomeavailable. Host Host Host Eachhelperhosth(cid:48) forastragglerhosthwilltakeashareofh’s Host Host Cloud Provider Host workloadadaptively. Thehelperhosth(cid:48) willbeterminatedif,on completingtheexecutionoftestsonasnapshot,itisfoundthatthe correspondingtestinghosthisnolongerastraggler. Figure9:Amulet’sOrchestrator HandlingRepairs:Thisprocessalsoinvolvestwosteps: from Equation 1 that the way to achieve lower values of τrpo is 1. The first test t that detects a corruption on a snapshot s, and by reducing the snapshot interval PI = Time(si) − Time(si−1). hasanassociatedrepairaction, willcausearepairhosttobe (ExecTime(t)cannotbechangedforthehosttypespecifiedbythe allocatedtoruntherepair. Repairsforanyfutureteststhatde- Angel program to run test t.) Given a current valid plan P, this tect corruption on s will be run on the same host in order to rationalecanbeusedtocheckwhetherloweringthesnapshotin- generate a single fully-repaired snapshot. Note that applying tervalinP toaddoneormoresnapshotstotheplanwindowwill repairsoffersmuchlessscopeforparallelexecutioncompared stillgiveavalidtestingplanPnew. Thisprocessofaddingstricter torunningteststodetectdifferenttypesofcorruption. objectivescontinuesuntilnovalidplancanbefound;atthatpoint, 2. Oncetherepairscomplete,asnapshotistakentopreservethe theminimumfeasibleτrpoisknown,namely,itistheRPOinterval repairedversionofs,andtherepairhostisterminated. inthevalidplanfoundfortheprevioussetofobjectives. When a new helper or repair host is needed, the Plan Manager 6. ORCHESTRATOR checkswhetherithasenoughremainingbudgetfromPR toallo- cate a new host. If not, the Plan Manager will repeat the check Recall from Section 1 and Figure 1 that testing plans are sub- at frequent intervals. In the worst case—e.g., if estimates of test mitted to Amulet’s Orchestrator for execution. The Orchestrator executiontimesfrommodelsweresignificantlylowerthanactual willexecuteeachsubmittedplancontinuouslybyworkingincon- executiontimes—anRPOorTCOobjectivewilleventuallygetvi- junction with the Snapshot Manager and resource provider. Fig- olatedbeforeahostcanbeallocated.Inthatcase,theOrchestrator ure 9 shows the multi-threaded design of the Orchestrator which willterminatetheplanandsendanotificationtotheadministrator. hasthreeconcurrentexecutionpaths—snapshotmanagement,host management,andplanmanagement—thatwewilldiscussnext. 7. EXPERIMENTALEVALUATION SnapshotManagement:TheexternalSnapshotManager(seeFig- 7.1 ExperimentalMethodologyandSetup ure1)informstheOrchestratorabouttheavailabilityofanewsnap- shotsbysendingadescriptorfors.Snapshotsarenevercopiedto Methodology: We have implemented Amulet with all the com- theOrchestrator. SincesisatthelevelofavolumeV, thePlan ponentsandalgorithmsasdescribedintheprevioussections. We ManagerinchargeofthetestingplanforV isnotified.Inturn,the nowpresentacomprehensiveevaluationofAmuletwhenrunusing HostManagersresponsibleforhostsallocatedtothisplanfromthe theAmazonWebServicesplatform[1]astheinfrastructure-as-a- externalresourceprovidergetnotified. Recallthattheplangener- servicecloudprovider.Section7.2considerstheend-to-endexecu- ated by the Optimizer was based on PW uniformly-spaced snap- tion,withbothoptimizationandorchestration,ofAngelprograms PI shots per plan window. The Plan and Host Managers determine inAmulet. Foreaseofexposition,weconsiderfourscenariosthat whichsnapshotinthewindow,ifany,sshouldbemappedto. aresimpleintermsofAmulet’sfunctionality,butillustratethechal- HostManagement: AHostManagerisresponsibleforusingand lengesthatAmulethastodealwith. Amulet’spowerwillbecome monitoring a host allocated to a testing plan from the resource more clear in Section 7.3 where we consider both the efficiency provider. Amulet’s implementation supports any infrastructure- (howfast?) andeffectiveness(howgoodistheselectedplan?) of as-a-service cloud provider (e.g., Amazon Web Services, Joyent, theOptimizerwhilegeneratingtestingplansforhugeplanspaces. Rackspace)astheresourceproviderbyusinganappropriateCloud Database software stack and tests: For the production system, Manager(Figure9). TheHostManagerusestheAPIprovidedby wechooseapopulardatabasesoftwarestackcomposedofMySQL theCloudManagertoallocate,establishconnectionswith,andter- asthedatabasesystem,XFSorext3asthefile-system,andAma- minatehostsaswellastoloadsnapshotsontoallocatedhosts. zon’s Elastic Block Storage (EBS) volumes for persistent storage PlanManagement: APlanManagerisresponsibleforshepherd- (weused50GBvolumes)[8].Foreachlayerofthestack,wechose ing the execution of a testing plan P through one or more plan arepresentativetestfromTable1:myisamchkforMySQLdatabase windowsuntilP isterminated.ThePlanManager’sroleisstraight- integritychecking,andfsckandxfs_checkforfile-systemintegrity forwardattheconceptuallevelifP behavesastheOptimizeres- checking. Execution-time estimation models for these tests were timated when P was generated. The challenge is when the Plan trainedandvalidatedontheAmazoncloud(seeSection3). Manager has to deal with unpredictable lags in the actual sched- Snapshots and storage: We implemented a Snapshot Manager thatautomatesperiodicvolume-levelsnapshots(currently,theonly ResourceUsed Pay-as-you-goPricingMethod HostsoftheSmalltype(seeFigure4(a))cost$0.085perhour. typesupportedbytheAmazoncloud). WhentheXFSfile-system Testinghosts Medium/Largetypescost$0.17/$0.34perhourrespectively isused,theSnapshotManagerfreezestheMySQLdatabaseaswell Storage $0.10permonthper1GBofpersistentstorageused astheXFSfile-system(allcachesareflushedtothedisk)beforea I/Otostorage $0.10per1millionblockI/Orequeststopersistentstorage volume-level snapshot is taken [8]. This process finishes within Snapshotaccess $0.05per1000storeorloadrequestsforsnapshots seconds.Fortheext3file-system,onlythedatabaseisfrozensince Table5:Pricingmodelusedinourevaluation ext3doesnotsupportthefreezefeatureofXFS. • SincenoCOisspecified,P takesthedefaultvaluewhichin Amazonprovidestwopersistentstorageservices: SimpleStor- R thiscaseisthecostofoneSmallhost(seeSection4.1) ageService(S3)andElasticBlockStorage(EBS).EBSprovides much faster data access rates than S3, but has lower redundancy. OrchestrationTimeline:Figure2showstheactualexecutiontime- AmazonsupportssnapshotsofEBSvolumeswiththecaveatthat lineofplanP whenitissubmittedtoandrunbytheOrchestrator thesesnapshotsarestoredinS3. Specifically,whentheSnapshot ontheAmazoncloud.Themeaningofeachimportantsymbolused Managerinitiatesasnapshot,AmazoncopiestheEBSvolumedata inthefigureisdescribedatthetop.TheexecutionofP isshownfor toS3. AllbutthefirstsnapshotrequesttothesameEBSvolume threeplanwindows,i.e.,atotalof3hours. WhenP issubmitted, willcopytoS3onlythechangeddatasincethelastsnapshot. theOrchestratorstartsbyrequestingtheneededtestinghostfrom Amazon does not provide direct access to data in a snapshot the cloud provider. When the host is available, the Orchestrator s. Instead, Amulet can create an EBS volume from s, and at- startstheplanexecution(0minutesinthetimelineinFigure2). tach this volume to a testing host h that needs to run tests on s. TheOrchestratoriscontinuouslyexecutingtheschedulePSgiven This process copies data in a background fashion from the snap- by the Optimizer for each plan window. As part of this process, shotstoredinS3toh—prioritizingblockread/writerequestsfrom theOrchestrator(actuallythePlanandHostManagersfromSec- h—makingthevolumeaccessibleinhbeforethedatamovement tion 6) has to map the snapshots si, 1 ≤ i ≤ PPWI , identified by iscomplete. Snapshotcreationandrestoretimesdependonband- theOptimizerintheplanwindowtoactualsnapshotsS collected j width constraints and the amount of data that needs to be copied bytheSnapshotManager. NoticefromFigure2thattheSnapshot fromS3toEBSorEBStoS3. Inourexperiments, weobserved Manager is submitting snapshot descriptors every 15 minutes on anaveragebandwidthof20MB/sinbothdirections. Thisprocess average,whilethesnapshotintervalP intheplanis30minutes. I canbemademuchfasterbyremovingtheintermediatecopytoS3, AtTime(s ),1≤i≤ PW,intheplanwindow,theOrchestrator whichispartofourfuturework. checkswhethierasnapshoPtIS isavailablefromtheSnapshotMan- Plan costs: Recall from Section 4.1 that a pricing model for the ager;ifso,Swillbetested.Otherwise,theOrchestratorwaitsfora resourceproviderhastobeinputtoAmuletinordertospecifycost slackintervaltoseewhetheranewsnapshotissubmitted.Ifnonew objectivesinAngelprograms. Forevaluationpurposes,weuseda snapshotarrives,thenthelastsubmittedsnapshotwillbetested. If pricingmodelmotivatedbyhowresourceusageischargedinapay- thissnapshothasalreadybeentested,thenanerrornotificationis as-you-gofashionontheAmazoncloud[1]. Table5outlinesthis generated.TheHostManagerusesPS tofindoutwhetherithasto pricingmodelintermsofhowthefourmaintypesofresourcesused loadasubmittedsnapshot(andifso,whichtestsitneedstorunon inatestingplanarecharged.GivenatestingplanP,Amulet’sOp- thesnapshotandhow).NoticefromFigure2thateveryothersnap- timizerwillusethepricingmodeltofindhowmuchP’suseofeach shotsubmittedistested. TheboxeswithnotationSjTM inFigure resourcewillcostinoneplanwindow;andaddalltheper-resource 2denotetheactualrunoftestTM onsnapshotSjsubmittedbythe coststoestimateP’stotalcostperplanwindow. Thetotalnumber SnapshotManager. Thewidthofeachboxistheactualexecution ofblockI/Orequeststopersistentstorageiscomputedbasedonthe timeofthetest. Thesetimesareinthe18-22minutesrangewhich totalinputdatasizeforeachtestandthefile-system’sblocksize. matchestheestimatedexecutiontimeof20minutes. Thisstrategywaschosenbasedonourempiricalobservations.En- 7.2.2 Case2: MultipleTestsandMultipleObjectives hancingeachtestmodeltoestimatethenumberofI/Orequeststhat Angel program: We now consider a more complex Angel pro- thetestwillmakeispartofourfuturework. gramwithmoretestsaswellasmoreandstricterobjectives. The 7.2 End-to-endProcessingofAngelPrograms program specifies an RPO, an SSO, and an SIO for a single vol- ume. The time interval τ in the RPO is reduced to 30 min- rpo 7.2.1 Case1: MaintainingaTestedRecoveryPoint utesfrombefore. TheSSOspecifiesthreetests: twomyisamchk Angel program: We first submit to Amulet the Angel program mediumtestsrespectivelyona2.4GBlineitemtableanda1GB correspondingtoExample1fromTable2. Theprogramspecifies orderstable(withnoindexoneithertable),andanfsckmetadata two objectives, an RPO and an SSO, for a single volume. The test. All tests have to be run on Small hosts. The SIO specifies timeintervalτ intheRPOis60minutes. TheSSOspecifiesa Snapshot_interval ≤ 10 minutes. The Snapshot Manager rpo singletest:amyisamchkmediumtest(denotedT )onadatabase sends snapshot descriptors announcing new snapshots to Amulet M tableofsizeapproximately10GBwithnoindexes. Thetesthas every10minutesonaverage. toberunonhostsoftypeSmall(m1.smallontheAmazoncloud). TestingPlanfromtheOptimizer:Theminimum-costplanPgen- TheSnapshotManagersendssnapshotdescriptorsannouncingnew eratedbytheOptimizerismorecomplexthanbefore: snapshotstoAmuletevery15minutesonaverage. • P =10minutes I TestingPlanfromtheOptimizer:ThemodelforestimatingTM’s • PW =30minutes(PPWI =3snapshotss1,s2,s3perwindow) executiontimereturnsanestimateof20minuteswheninvokedby • PM hasallthreetestsmappedtoallthreesnapshotsinPW theOptimizerforthissetting.Theminimum-costplanP generated • PS assignsonehosttorunthefscktest(estimatedtorunin6 bytheOptimizer’salgorithminFigure6forthissettinghas: minutes). Thetwomyisamchktestsarerunconcurrentlyona • P =30minutes secondSmallhost,withestimatedtimesof9and7minutes. I • PW =60minutes(PPWI =2snapshotss1ands2perwindow) • PRtakesthedefaultvalueasinCase1 • P consistsoftestT mappedtobothsnapshotsinP Orchestration Timeline: Figure 10 shows the actual execution M M W • P assignsoneSmallhosttorunT ons ands timeline of the above plan P. Note that P = 10 minutes causes S M 1 2 I
Description: