Inferring Energy Bounds Statically by Evolutionary Analysis of Basic Blocks † † ∗ † U. Liqat Z. Bankovic´ P. Lopez-Garcia [email protected] [email protected] [email protected] † ‡ M.V. Hermenegildo [email protected] 6 ABSTRACT velopedforhardware,farmoreenergysavingsremaintobetapped 1 byimprovingthesoftwarethatrunsonthesedevices. Inaddition, 0 Wearecurrentlywitnessinganincreasingnumberofenergy-bound therearemanycriticalembeddedapplications(e.g.,sensor-based) 2 devices,includinginsomecasesmissioncriticalsystems,forwhich for which, beyond optimizing energy consumption, it is actually thereisaneedtooptimizetheirenergyconsumptionandverifythat n crucialtoguaranteethatexecutionwillcompletewithinaspecified theywillperformtheirfunctionwithintheavailableenergybudget. a energybudget,e.g.,beforetheavailablesystemenergyrunsout. In this work we propose a novel parametric approach to estimat- J Inthisworkwefocusonthestaticestimationoftheenergycon- ing tight energy bounds (both upper and lower) that are practical 2 sumedbyprogramexecutions(i.e.,atcompiletime,beforeactually forenergyverificationandoptimizationapplicationsinembedded 1 runningthem),asabasisforenergyoptimizationandverification. systems. Our approach consists in dividing a program into basic Suchestimationsaregivenasfunctionsoninputdatasizes,since (“branchless”) blocks, establishing the maximal (resp. minimal) ] datasizestypicallyinfluencetheenergyconsumedbyaprogram, C energy consumption for each block using an evolutionary algo- butarenotknownatcompiletime. Thisapproachallowsabstract- rithm,andcombiningtheobtainedvaluesaccordingtotheprogram D ingawaysuchsizesandinferringenergyconsumptioninawaythat controlflow,usingstaticanalysis,toproduceenergyboundfunc- . isparametriconthem. s tions. Suchfunctionsdependoninputdatasizes,andreturnupper Differenttypesofresourceusageestimationsarepossible,such c orlowerboundsontheenergyconsumptionoftheprogramforany [ given set of input values of those sizes, without running the pro- as, e.g., probabilistic, average, or safe bounds. However, not all gram. TheapproachhasbeentestedonXMOSchips,butisgen- typesofestimationsarevalidorusefulforagivenapplication.For 1 example,inordertoverify/certifyenergybudgets,safeupper-and eralenoughtobeappliedtoanymicroprocessorandprogramming v lower-bounds on energy consumption are required [16, 15]. Un- 0 language. Ourexperimentalresultsshowthattheboundsobtained fortunately,currentapproachesthatguaranteethattheboundsare 0 byourprototypetoolcanbetightwhileremainingonthesafeside alwayssafetendtocompromisetheirtightnessseriously,inferring 8 ofbudgetsinpractice. overlyconservativebounds,whicharenotusefulinpractice. With 2 thissafety/tightnesstrade-offinmind,ourgoalisthedevelopment 0 Keywords ofananalysisthatinferstightboundsthatareonthesafesidein . 1 EnergyConsumptionAnalysis,EnergyModeling,EmbeddedSys- mostcases,inordertobepracticalforverificationapplications,as 0 tems,StaticAnalysis,EvolutionaryAlgorithms. wellasforenergyoptimization. 6 Ofthesmallnumberofstaticenergyanalysesproposedtodate, 1 onlyafew[20,13,12]useresourceanalysisframeworksthatare 1. INTRODUCTION v: aimed at inferring safe upper and lower bounds on the resources i Wearewitnessinganever-increasingperformanceandubiquity usedbyprogramexecutions.Acrucialcomponentinorderforsuch X ofbattery-and/orharvestedenergy-powereddevices.Animportant frameworkstoinferhardware-dependentresources,and,inpartic- r trendinthiscontextisthesocalledInternetofThingsparadigm. ular, energy, isalow-levelresourceusagemodel, suchas, e.g., a a It is estimated that by the year 2020, about 50 billion small au- model of the energy consumption of individual instructions. Ex- tonomousdevices,embeddedinallkindofobjects,inourclothes, amplesofsuchmodelsare[11],attheJavabytecodelevel,or[10], orstucktoourbodieswilloperateandintercommunicatecontinu- attheassemblylevel. ouslyforlongperiodsoftime,suchasyears. Suchdevicesrelyon Clearly,theaccuracyoftheboundsinferredbyanalysisdepends small batteries or energy harvested from the environment, which onthenatureandaccuracyofthelow-levelmodels.Unfortunately, impliesthattheirenergyconsumptionshouldbeverylow. modelssuchas[11,10]provideaverageenergyconsumptionval- Although there have been improvements in battery and energy uesorfunctions,whicharenotreallysuitableforupper-orlower- harvestingtechnology,theyaloneareoftennotenoughtoachieve bounds analysis. Furthermore, trying to obtain instruction-level therequiredlevelofenergyconsumptiontofullysupportInternet modelsthatprovidestrictsafeenergyboundswouldresultinvery ofThingsandotherenergy-boundapplications. Thus,bettertech- conservative bounds. Although when fed with such models the niques for optimizing the energy consumption of embedded sys- staticanalysiswouldinferhigh-levelenergyconsumptionfunctions temsareneeded.Whilemanyenergy-savingfeatureshavebeende- providing strictly safe bounds, these bounds would not be useful ∗ in general because of their large inaccuracy. For this reason, the SpanishCouncilforScientificResearch(CSIC). † analysesin[20,13,12]usedinsteadthealreadymentionedinstruc- IMDEASoftwareInstitute,Madrid,Spain. tion level average energy models [11, 10]. However, this meant ‡ UniversidadPolitécnicadeMadrid(UPM). thattheenergyfunctionsinferredforthewholeprogramwerenot formallythatsuchestimationsarealwayssafe, theyarequiteac- strictbounds,butratherapproximationsoftheactualbounds,and curateinthesensethattheinferredenergyboundsareclosetothe could possibly be below or above. This trade-off between safety actual bounds, and that in practice they will also be safe/strict in andaccuracyisamajorchallengeinenergyanalysis. Inthispaper mostcases. Wearguethatouranalysisprovidesagoodpractical weaddressthischallengebyprovidingatechniqueforthegenera- compromisefortheverification/certificationofenergybudgets. tionoflower-levelenergymodelswhichareusefulandeffectivein Insummary,themaincontributionsofthispaperare: practiceforverification-typeapplications. Themainsourceofinaccuracyincurrentinstruction-levelenergy • Anovelapproachthatcombinesdynamicandstaticanalysis modelsisinter-instructiondependence(includingalsodatadepen- techniquesforinferringtheenergyconsumptionofprogram dence),whichisnotcapturedinmostmodels. Ontheotherhand, executions.ThedynamicpartisbasedonEAs,andproduces theconcretesequencesofinstructionsthatappearinprogramsex- low-levelenergymodels. hibitworstcasesthatarenotaspessimisticasconsideringtheworst caseforeachoftheindividualinterveninginstructions. Basedon • Theproposalofanewabstractionlevelatwhichtoperform this,wedecidedtousebranchlessblocksofassemblyinstructions theenergymodelingofprogramcomponentsusingdynamic asthemodelingunitinsteadofindividualinstructions. Wedivide techniques: basic (branchless) blocks of assembly instruc- the(assembly)programintosuchbasicblocks,eachastraight-line tions. codesequencewithexactlyoneentrytotheblock(thefirstinstruc- tion) and one exit from the block (the last instruction). We then • A method based on EAs to dynamically (i.e., by profiling) measure the energy consumption of these basic blocks, and de- obtain practical upper and lower bounds on the energy of termineamaximal(resp. minimal)energyconsumptionforeach suchbasicblocks,withagoodsafety/accuracybalance. block. Inthiswaytheinter-instructiondatadependencediscussed aboveandotherfactorsareaccountedfor. Theenergyvaluesob- • The use of a static analysis that takes care of the program tainedforeachblockarefedtoourstaticresourceanalysis,which control flow, in order to determine how many times blocks combinesthemaccordingtotheprogramcontrolflowandproduces areexecuted,whichcombinedwiththeinformationprovided theenergyboundfunctions. bytheblockmodels,infersfunctionsthatgivetheenergyof In order to find the maximum and minimum energy consump- aprogramanditsproceduresasfunctionsofinputdatasizes. tion of each basic block we use an evolutionary algorithm (EA). We vary the input values and take energy measurements directly • Anexperimentalstudythatsupportsourclaims. fromthehardwareforeachinputcombination. Thisway,wetake advantage of the fast search space exploration provided by EAs. Intherestofthepaper,Section2explainshowtheinformation EAshavealsobeenusedforestimatingtheworstcaseenergycon- inferredbyourapproachcanbeusedfortheenergyconsumption sumptionofwholeprograms[22],duetotheirfastexplorationof verification application. Section 3 explains our technique for en- the search space. However, if there are data-dependent branches ergymodelingofprogrambasicblocks.Section4showshowthese in the programs, which is often the case, applying this approach models are used by the static analysis to infer upper- and lower- to whole programs (or program segments that contain branches) boundsontheenergyconsumedbyprogramsasfunctionsoftheir quicklylosesaccuracy,sincedifferentinputcombinationscantrig- inputdatasizes.Section5reportsonanexperimentalevaluationof ger different sets of instructions [22]. In contrast, our approach ourapproach. RelatedworkisdiscussedinSection6,andfinally combines EAs and static analysis techniques in order to get the Section7summarisesourconclusions. bestofbothworlds. Wetakeoutthetreatmentofdata-dependent branchesfromtheEA,sothatthesamesequenceofinstructionsis alwaysexecutedineachbasicblock. Theworst(resp. best)case 2. ENERGYCONSUMPTIONVERIFICA- energyofthebasicblocksisestimatedbytheEAwithhigheraccu- TION/CERTIFICATION racysince,nothavinganybranches,themostimportantdeficiency oftheEAisavoided. Theprogramcontrolflowdependenciesare Thelower(El)andtheupperbound(Eu)inferredbyouranalysis takencareinsteadbythestaticanalysis. canbeusedforenergyconsumptionverificationandcertification. Inourexperimentswefocusforconcretenessontheenergyanal- We refer the reader to [14, 15] for a detailed description on how ysisofprogramswritteninXC[28],runningontheXMOSXS1-L staticanalysisinformationcanbeusedforgeneralresourceusage architecture. However,ourapproachisgeneralenoughbeapplied verificationwithintheCiaoPPsystem, andto[16]forhowitcan to the analysis of other programming languages (and associated bespecializedforverifyingenergyconsumptionspecificationsof lowerlevelprogramrepresentations)andarchitecturesaswell.XC embeddedprograms. is a high-level C-based programming language that includes ex- Hereweonlygivesomeintuitiveideas. Assumethataprogram tensionsforconcurrency,communication,input/outputoperations, specificationexpressesenergybudgetEb, e.g., definedbytheca- andreal-timebehavior. Ourexperimentalsetupinfersenergycon- pacityofthebattery,wecanconcludethefollowing: sumption information by processing the ISA (Instruction Set Ar- chitecture)codecompiledfromXC[28], andreflectsituptothe 1. Eu ≤Eb =⇒ the given program can be safely executed source code level. Such information is provided in the form of withintheexistingenergybudget. functionsoninputdatasizes,andisexpressedbymeansofasser- tions[7]. 2. El ≤Eb≤Eu =⇒ itmightbepossibletoexecutethepro- Intheseexperiments,theenergyestimationsproducedbyourap- gram,butwecannotclaimitforcertain. proachwerealwayssafe,inthesensethattheyover-approximated theactualbounds(i.e., theinferredupperboundswereabovethe 3. E <E =⇒ ititnotpossibletoexecutetheprogram(the b l actualupperboundsandtheinferredlowerboundsbelowtheac- systemwillrunoutofbatteriesbeforeprogramexecutionis tual lower bounds). This suggests that, even if we cannot assure completed. 3. ENERGYMODELINGOFBLOCKS as: Asmentionedbefore,thefirststepofourenergyboundsanalysis n gen(b)= (cid:83){v|v∈ref(k)∧∀(j<k).v∈/def(j)} istodetermineupperandlowerboundsontheenergyconsumption k=1 ofeachbasic(“branchless”)programblock. Weperformthemod- eling at this level rather than at the instruction level in order to where ref(n) and def(n) denote the variables referred to and de- caterforinter-instructiondependencies.Inordertodeterminesuch fined/updatedatanodeninblockbrespectively. boundsfirstallthebasicblocksoftheprogramareidentified,and ForthebasicblocksinFigure1inListing1,thesetofinputargu- thentheenergyconsumptionofeachoftheseblocksisprofiledfor mentsaregen(B1)={r0},gen(B21)={sp[0x1]},gen(B22)={sp[0x1],r0} differentinputdatausinganEA.Thesestepsareexplainedinthe andgen(B3)=0/. followingsections. 3.2 EA for Estimating the Energy of a Basic 3.1 Generating the Basic Blocks to be Mod- Block eled InthefollowingwedetailthemostimportantaspectsoftheEA Abasicblockoveraninter-proceduralcontrolflowgraph(CFG) usedforestimatingthemaximal(i.e.,worstcase)andminimal(i.e., isamaximalsequenceofdistinctinstructions,S1throughSn,such bestcase)energyconsumptionofabasicblock. Theonlydiffer- thatallinstructionsS ,1<k<nhaveexactlyonein-edgeandone encebetweenthetwoalgorithmsisthewayweinterprettheobjec- k out-edge (excluding call/return edges), S has one out-edge, and tivefunction:inthefirstcasewewanttomaximizeit,whileinthe 1 Sn hasonein-edge. Abasicblockthereforehasexactlyoneentry secondwewanttominimizeit. pointatS1andoneexitpointatSn. Individual. The search space dimensions are the different input Inordertodivideaprogramintosuchbasicblocks(forwhichan variablestotheblocks.Ourgoalistofindthecombinationofinput upperboundontheenergyconsumptionoftheprogramwillbede- valueswhichmaximizes(minimizes)theenergyofeachblock.The terminedusingtheEA),theprogramisfirstcompiledtothelower setofinputvariablestoablockisinferredusingadataflowanalysis representation,ISAinourcase.AdataflowanalysisoftheISArep- (explained in the next section). Thus, an individual is simply an resentationyieldsaninter-proceduralcontrolflowgraph(CFG).A arrayofinputvaluesgivenintheorderoftheirappearanceinthe finalcontrolflowanalysisiscarriedouttoinferbasicblocksfrom block.Theinputvaluesinanindividualarecodedasintegers,since theCFG.Thesebasicblocksarefurthermodifiedsothattheycan theyrepresent32-bitvaluesstoredindifferenthardwareregisters. berunandmeasuredindependentlybytheEA.Modificationsfor The majority of individuals are initialised with random 32-bit eachbasicblockinclude: numbers.However,wealsoincludecornercasestotheinitialpop- ulationthatareknowntocausehigh(low)energyconsumptionfor 1. Abasicblockwithkfunctioncallinstructionsisdividedinto particular instructions. For example all 1s for high energy con- k+1basicblockswithoutthefunctioncallinstructions. sumption, or all 0s for low energy consumption as operands to a multiplyISAinstruction.ThisspeedsuptheEAalgorithminfind- 2. AnumberofspecialISAinstructions(e.g.,return,call)are inginputstosomebasicblocksthatmaximize/minimizetheirover- omittedfromtheblock.Thecostofsuchinstructionsismea- allenergyconsumption. suredseparatelyandaddedtothecostoftheblock. Crossover.Thecrossoveroperationisimplementedasaneven-odd crossover,sinceitprovidesmorevariabilitythanastandardn-point 3. Memoryread/writeinstructionsareabstractedtoafixedmem- crossover. Inthiscrossoverthefirstchildiscreatedbytakingthe ory region available to each basic block in order to avoid firstelementandeveryotheroneafteritfromoneoftheparents, memoryviolations. e.g., the mother. The second element and every other one come fromtheotherparent,i.e.,thefather.Thesecondchildiscreatedin An example of the modification 1 above is shown in Figure 1, theoppositeway: thefirstelementandeveryotheroneafteritare Listing 1, which is an ISA representation of a recursive factorial takenfromthefather,whilethesecondandeveryotheronecome programwheretheinstructionsaregroupedtogetherinto3basic fromthemother.TheprocessisdepictedinFigure2,whereP1and blocks B1, B2, and B3. Consider basic block B2. Since it has a P2aretheparents,andC1andC2aretheirchildrencreatedbythe (recursive)functioncalltofactataddress12,itisdividedfurther crossoveroperation. intotwoblocksinListing2,suchthattheinstructionsbeforeand afterthefunctioncallformtwoblocksB2 andB2 ,respectively. Mutation. Forthepurposeofthisworkwehavecreatedacustom 1 2 Theenergyconsumptionofthesetwoblocksismaximized(mini- mutationoperator.Sincetheenergyconsumptionindigitalcircuits mized)byprovidingvaluestotheinputargumentstotheblock(see ismainlytheresultofbitflipping,webelievethatthemostoptimal below)usingtheEA.TheenergyconsumptionofB2canthenbe waytoexplorethesearchspaceisbyperformingsomebitflipping characterizedas: in the mutation operation. This is implemented in the following way.Foreachgene(i.e.,inputvaluetothebasicblock): B2A=B2A +B2A +blA e 1e 2e e 1. Wecreatearandom32-bitinteger,i.e.,arandommask. whereB2A, B2A, andblA denotetheenergyconsumptionofthe 2. ThenweperformtheXORoperationofthatintegerandthe 1e 2e e B21,B22 blocksandtheblISAinstruction,withapproximationA correspondinggene. Thisway,weperformrandomflipping (whereA=upperorA=lower). ofthebitsofeachgene,sinceweonlyflipthebitsofthegene For each modified basic block, a set of input arguments is in- atpositionswherethevalueoftherandommaskis1. ferred.Thissetisusedforanindividualrepresentationtodrivethe EA algorithm to maximize the energy consumption of the block. The process is depicted in Figure 3, where the input values are Fortheentryblock,theinputargumentsarederivedfromthesig- givenasbinarynumbers. natureofthefunction.Thesetgen(B)characterizesthesetofvari- Objectivefunction. Theobjectivefunctionthatwewanttomaxi- ablesreadwithoutbeingpreviouslydefinedinblockB.Itisdefined mize(minimize)istheenergyofabasicblock,whichismeasured Listing2:Modifiedbasicblocks. Listing1:Basicblocksofafactorialfunction. <fact>: <fact>: 01: entsp 0x2 01: entsp 0x2 02: stw r0, sp[0x1] 02: stw r0, sp[0x1] 03: ldw r1, sp[0x1] B1 03: ldw r1, sp[0x1] 04: ldc r0, 0x0 B1 04: ldc r0, 0x0 05: lss r0, r0, r1 05: lss r0, r0, r1 06: bf r0, <08_NEW> 06: bf r0, <08> 08_NEW: 07: bu <010> 07: bu <010> 10: ldw r0, sp[0x1] blockbeforecall 1101:: lsduwb rr00,, rs0p,[00x1x1] B21 11: sub r0, r0, 0x1 B2 12: bl <fact> 12: bl <fact> 13: ldw r1, sp[0x1] 14: mul r0, r1, r0 13: ldw r1, sp[0x1] 15: retsp 0x2 blockaftercall 14: mul r0, r1, r0 B2 15: retsp 0x2 2 B3 08: mkmsk r0, 0x1 08: mkmsk r0, 0x1 09: retsp 0x2 09: retsp 0x2 B3 Figure1:Example:Basicblockmodifications. Figure3:Mutation. by executing it on an Instruction Set Simulation (ISS) to collect cyclecounts. TheXMOSXS1architectureusedinourexperimentsdoesnot havethesepipelineeffectsbydesign,sinceexactlyoneinstruction per thread is executed in a 4-stage pipeline (more details in Sec- tion5.1). Figure2:Exampleofeven-oddcrossover. 4. ENERGYCONSUMPTIONOFTHEPRO- GRAM directlyfromthechip. Theconcretesettingoftheexperimentwill Oncetheenergymodelsofeachbasicblockoftheprogramare beexplainedinthefollowingsection. known,theenergyconsumptionofthewholeprogramisbounded Ingeneral,pipelineeffectssuchasstalls(toresolvepipelinehaz- byastaticanalyserthattakesintoaccountthecontrolflowofthe ards), which depend on the state of the processor at the start of program and infers safe upper/lower bounds on its energy con- the execution of a basic block, can affect the upper/lower bound sumption. Wehaveimplementedsuchanalyserbyspecialisingthe estimated on the energy consumption of such block. In our ap- genericresourceanalysisframeworkprovidedbyCiaoPP[24]for proach intra-block pipeline effects are accounted for, since, the programswrittenintheXCprogramminglanguage[28]andrun- dependencesamongtheinstructionswithinablockarepreserved. ningontheXMOSXS1-Larchitecture. Wehavealsowrittenthe However,theinter-blockpipelineeffectsneedtobeaccountedfor. necessarycode(i.e.,assertions[7])tofeedsuchanalyserwiththe Thesecanbemodeledinaconservativewaybyassumingamax- block-levelupper/lowerboundenergymodelobtainedbyusingthe imum stall penalty for the upper bound estimation of each block techniqueexplainedinSection3. (e.g., by adding a stall penalty, say three cycles, to the execution Thegeneric resourceanalyserensures thattheinferred bounds timeoftheblock).Similarly,forthelowerboundestimationazero arestrict/safeifitisfedwithenergymodelsprovidingsafebounds. stallpenaltycanbeused.Toapproximatethiseffect,in[3],theau- Asmentionedintheintroduction,in[13]weperformedaprevious thorscharacterizeeachblockthroughpairwiseexecutionswithall instantiationofsuchgenericanalyserbyusingtheinstruction-level ofitspossiblepredecessors. Eachbasicblockpairischaracterized energy model described in [10], which provided average energy values. As a result, the analysis inferred an upper-bound energy tobeactivewithinthepipelineatthesametime. Thisrestriction functionforthewholeprogramthatwasanapproximationofthe avoidspipelinehazards. actualupperbound,andcouldpossiblybebelowit. Inordertosupporttheprocessofmeasuringpower,thefollowing Theanalysisisgeneralenoughtobeappliedtootherprogram- hasbeenimplemented: minglanguagesandarchitectures(see[13,12]fordetails)provided • AnextensiontotheXMOStoolchainthatallowspowermea- that energy models for each architecture exist. It enables a pro- surements to be recorded and/or displayed in real time. In grammertosymbolicallyboundtheenergyconsumptionofapro- essence,asmallshuntresistorhasbeenaddedinserieswith gramPoninputdatax¯withoutactuallyrunningP(x¯). Itisbased the voltage supply. By measuring the voltage drop on the onsettingupasystemofrecursivecostequationsoveraprogram shunt,wecancalculatethecurrentI,whichisalsothecur- P that capture its cost (energy consumption) as a function of the rent of the voltage supply, since the shunt is connected in sizesofitsinputargumentsx¯. Thetransformation-basedanalysis series. In this way, we estimate the power consumption as frameworkof[13,12]transformstheassembly(orLLVMIR)rep- resentationoftheprogramintoanintermediatesemanticprogram Vsup·I,whereVsupisthevoltageofthepowersupply. representation (HC IR), that the analysis operates on, which is a • AvariantoftheXTAG-2debugadapter(calledXTAG3)that seriesofconnectedcodeblocks,representedasHornClauses.The enablespowertobemeasured[31]. Basically,ithasanex- analyserdealswiththisHCIRalwaysinthesameway, indepen- tra connector that carries analog signals necessary to esti- dentof whereitoriginatesfrom, setting upcostequationsfor all matethepowerconsumption,asexplainedabove. Themea- codeblocks(predicates). surementsregardingthesesignalsaretransportedtothehost ConsidertheexampleinListing1. Therecursivecostequations computeroverUSBusingthexSCOPEinterface[32].Inad- aresetupoverthefunctionfactthatcharacterizetheenergycon- dition,aprotocolthatenablespowermeasurementsandap- sumptionofthewholefunctionusingtheapproximationAofeach plicationprobingtobeperformedsimultaneously, anddata blockinferredbytheEA: tobetransportedsimultaneouslyovertheUSBconnectionto factA(R0) = B1A+fact_auxA(0≤R0,R0) thehostcomputer,hasbeendesigned. e e e (cid:26) B2A+factA(R0−1) ifBistrue ThetoolthatcollectsdatafromtheXTAGisxgdb,thedebugger fact_auxA(B,R0)= e e e B3A ifBisfalse thatispartoftheXMOStoolchain. xgdbconnectstotheXTAG e over a USB interface (using libusb), and reads both ordinary xS- Thecostofthe factfunctioniscapturedbytheequation facteA(R0) COPEtrafficandvoltage/currentmeasurements.Thecollecteddata underanapproximationA(e.g.,upper/lower)whichinturndepends is normally stored in an XML file, or instead, xgdb can pipe the onB1Ae (i.e.,theenergyconsumptionofblockB1)andtheequation datadirectlyintoananalysisprogramthatcanonlyrecorddatathat fact_auxeA,whichrepresentsthebranchingoriginatedfromthelast isrelevant(betweenstartandend)andonlycomputetherelevant instructionofblockB1. ItcapturesthecostofblocksB2andB3 metrics(maximumcurrent,totalenergy,etc.). basedontheconditionontheinputsizeR0. 5.2 ResultsandDiscussion Ifweassume(forsimplicityofexposition)thateachbasicblock hasunitarycostintermsofenergyconsumption,i.e., Bie=1for Theaimoftheexperimentalevaluationistoperformafirstcom- alli,weobtaintheenergyconsumedbyfactasafunctionofits parisonofactualhardwareenergymeasurementsagainsttheupper- inputdatasize(R0): facte(R0)=R0+1. andlower-boundsonenergyconsumption obtained byevaluating Thefunctionsinferredbythestaticanalysisarearithmeticfunc- thefunctionsinferredbyourproposedapproach(whichdependon tions (polynomial, exponential, logarithmic, etc.) that depend on inputdatasizes),foreachprogramconsideredandfordifferentin- inputdatasizes(naturalnumbers). put data sizes. The actual energy consumption of the programs, foreachvalueofinputdatasizes,ismeasuredwiththeevaluation 5. EXPERIMENTALEVALUATION platform, i.e., thesameusedtobuildtheupper-andlower-bound modelsoftheblocksofeachprogram. In this Section we report on an experimental evaluation of our approachtoinferringbothupperandlowerboundsontheenergy Program Upper/LowerBounds(nJ)×103 vs.HW consumedbyprogramexecutions,givenasfunctionsoninputdata ub=5.1N+4.2 7% sizes. Theexperimentshavebeenperformedwithprogramswrit- fact(N) lb=4.1N+3.8 -11.7% teninXCrunningontheXMOSXS1-Larchitecture. However,as ub=5.2lucas(N)1+6 fib(N)−6.6 8.71% alreadysaid,ourapproachisgeneralenoughbeappliedtotheanal- fibonacci(N) lb=4.5lucas(N)+5 fib(N)−4.2 -4.69% ysis of other programming languages (and associated lower level ub=3.7N+13.3 8% programrepresentations)andarchitecturesaswell. reverse(N) lb=2.95N+12 -8.8% 5.1 EvaluationPlatform ub=5N+6.9 8.7% findMax(N) lb=3.3N+5.6 -9.1% WeuseahardwareandsoftwareplatformcreatedbyXMOSthat ub=6N+26.4 8.9% enablesustomeasuretheenergy[19],time,andpoweruseddur- fir(N) lb=4.8N+22.9 -9.7% ingprogramexecutionsonrealhardware. Thedevelopedboardis adual-tileboardthatcontainsanXS1-A16-128-FB217processor. biquad(N) ub=29.6N+10 9.8% Theboardisfedwitha3.3Vpowersupply,andsupportsvoltage lb=23.5N+9 -11.9% scaling, although both tiles have to run at the same voltage sup- ply. It also supports frequency scaling, where the tiles can have Table1:Upperandlowerboundsaccuracy. different frequencies. The XMOS XS1 [17] is a cache-less, pre- dictablearchitecturebydesignandmanagesthreads onthehard- 1Themathematicalfunctionlucas(n)satisfiestherecurrencerela- ware. Thethreadsareexecutedinaround-robinfashion, usinga tionlucas(n)=lucas(n−1)+lucas(n−2)withlucas(1)=1and 4-stagepipelinewhichonlypermitsasingleinstructionperthread lucas(2)=3. AnumberofselectedbenchmarksareshowninTable1thatare ·105 1.4 eitheriterativeorrecursive.TheUpper/lowerBoundscolumnde- upper pictstheenergyestimationfunctions(oninputdatasizes)forupper actual-upper andlowerbounds. Thecolumnvs. HWshowstheaverageover- 1.2 actual-lower andunder-approximationsoftheestimationversustheactualmea- lower surementsonthehardware. 1 Thefirsttwobenchmarksaresmallarithmeticbenchmarks. The thirdbenchmarkreverse(N)reverseselementsofaninputarrayof nJ) sizeN.Thelistofbenchmarksalsoincludestwofilterbenchmarks, gy( 0.8 namelybiquadand fir(FiniteImpulseResponse). Bothprograms ner E attenuateoramplifyspecificfrequencyrangesofagiveninputsig- 0.6 nal.The fir(N)benchmarkcomputestheinner-productoftwovec- tors: avectorofinputsamples, andavectorofcoefficients. The 0.4 morecoefficients,thehigherthefidelity,andthelowerthefrequen- ciesthatcanbefiltered.Thebiquad(N)benchmarkisanequaliser, 0.2 i.e., it takes a signal and attenuates/amplifies different frequency bands. ItusesacascadeofBiquadfilterswhereeachfilteratten- 5 10 15 20 25 uatesoramplifiesonespecificfrequencyrange. Theenergycon- N sumeddependsonthenumberofbanksN,typicallybetween3and 30foranaudioequaliser. Ahighernumberofbanksenablesade- signertocreatemoreprecisefrequencyresponsecurves. Asimple Figure 5: findMax example upper/lower bounds vs actual- findMaxbenchmark(findingthemaximumnumberinanarray)is lower and upper energy consumption measurement based on alsoincludedinthelist. Thisisaprogramwheredata-dependent data. branchingcanbringsignificantvariationsoftheworst(best)case energyconsumption. Notethatunlikethefirstthreebenchmarks, to the data sensitive branching. A call to findMax with a sorted fir,biquad,and findMaxallhavedata-dependentbranches. arrayinascendingorderwilldiscoveranewmaxelementineach iterationandhenceupdatethecurrentmaxvariableresultinginan ·104 actualworstcaseofthealgorithm.Incontrast,ifthearrayissorted upper indescendingorderthenthealgorithmwillfindthemaxelementin actual thefirstiterationandtherestoftheiterationswillneverupdatethe 6 lower currentmaxvariable,resultingintheactualbestcase. Figure5depictstheupperandlowerboundsinferredbythestatic analysisaswellastheactualworstandbestcasemeasurementsof findMax(firstwithascendingorderandthenwithdescendingor- nJ) derarraydata).Theupperandlowerboundsinferredarecompared gy( 4 against the actual worst and best case measurements. The upper ner boundover-approximatesby8.7%whereasthelowerboundunder- E approximatesby9.1%.Notethatitisnotalwaystrivialtofinddata thatexhibitprogramworstandbestcasebehaviors. InTable2,differentexecutionsofthe findMaxbenchmarkare 2 shown for particular input sizes N but using a random data input arrayinonecaseandactualworst/bestcasearrayinputdatainthe othercase. ColumnNshowsthesizeoftheinputarray. Column 2 4 6 8 10 12 CostAppindicatesthetypeofapproximationoftheautomatically inferred cost functions which estimate energy consumption (de- N pendingoninputdatasizeN):upperbound(U)andlowerbound (L).Suchenergyfunctionsforthe findMaxbenchmarkareshown Figure4: factupper/lowerboundsvs.actualmeasurement. inTable1. Inordertoassesstheaccuracyofthecostfunctionswe haveevaluatedthemforparticularsetsofinputdatacorresponding Figure4depictstheupper/lowerboundinferredbytheanalysis todifferentinput(array)sizes(N), yieldingdifferentenergycon- vs.theactualmeasurementonthehardwareforthefactorialpro- sumption estimations. We have then compared such estimations gram. The actual program consumption is measured for several (columnEst)withtheobservedenergyconsumptionsofthehard- values of N, the input value, resulting in the middle curve. The waremeasurements(columnObs). ColumnDshowstherelative other two curves are the result of plotting the upper- and lower- harmonic difference between the estimated and the observed en- boundenergyfunctionsfordifferentsizes(inthecaseofaninteger, ergyconsumption,givenbytheformula: itssizeisitsvalue). (Est−Obs)×( 1 + 1 ) The upper bound values inferred by the static analysis and the rel_harmonic_diff(Est,Obs)= Est Obs EA over-approximate the actual hardware measurements by 7%, 2 whereasthelower-boundvaluesunder-approximatetheactualmea- Theinaccuraciesintheenergyestimationsofourtechniquecome surementsby11.7%–seeTable1. mainlyfromtwosources: theenergymodel,whichassignsanen- The findMaxbenchmark,which,asmentionedbefore,hassig- ergyvaluetoeachbasicblockasdescribedinSection3, andthe nificant data dependent branching, is shown in Figure 5. Unlike StaticResourceAnalysis(SRA),describedinSection4,whiches- fact,theupperandlower-boundsin findMaxaremoredistantdue timatesthenumberoftimesthatthebasicblocksareexecutedde- pendingontheinputdatasizes. (e.g., returnblocks, loopheaderblocks, etc.). Thismakesusbe- Inordertoinvestigatethesource(s)ofinaccuracies,wehavealso lieve that our approach could be used in practice in an iterative introducedColumnProf. Itshowstheresultofestimatingtheen- developmentprocess,wherethedevelopergetsfeedbackfromour ergy consumption using the energy model and assuming that the tool and modifies the program in order to reduce its energy con- SRAwasperfectandestimatedtheexactnumberoftimesthatthe sumption.ThefirsttimetheEAisrunwouldtakethehighesttime, basicblockswereexecuted. Thisobviouslyrepresentsthecasein sinceitwouldhavetodeterminetheenergyconsumptionofallthe whichalllossofaccuracymustbeattributedtotheenergymodel. programblocks. Afterafocusedmodificationoftheprogramthat The values in Column Prof have been obtained by profiling ac- onlyaffectsasmallnumberofblocks,mostoftheresultsfromthe tualexecutionsoftheprogramwithparticularinputdata,wherethe previousruncouldbereused,sothattheEAwouldrunmuchfaster profilerhasbeeninstrumentedtorecordthenumberoftimeseach duringthisdevelopmentprocess. Inotherwords,theEAprocess- basicblockisexecuted. Theenergyconsumptionoftheprogram ingcaneasilybemadeincremental. isthenobtainedbymultiplyingsuchnumbersbytheenergyvalues The static analysis, on the other hand, is quite efficient, with providedbytheenergymodelforeachbasicblock,andaddingall analysistimesofabout4to5secondsonaverage,despitethenaive ofthem. ColumnPrDshowstherelativeharmonicdifferencebe- implementationoftheinterfacewithexternalrecurrenceequation tweenProfandtheobservedenergyconsumptionObs,whichrep- solvers,whichcanbeimprovedsignificantly. resentstheinaccuracyduetotheenergymodelingofbasicblocks usingtheEA. 6. RELATEDWORK Inthecaseofrandomdata,boththeSRAandtheenergymod- Staticdataflowanalysisoftheenergyconsumedbyprogramex- elingcontributetotheinaccuracyoftheenergyestimationforthe ecutions has received relatively little attention until recently. An whole program. In contrast, in the second case two sets of array analysisofJavabytecodeprogramsforinferringupper-boundson dataareused: onethatmakes findMaxexhibititsworstcasebe- energyconsumptionasfunctionsoninputdatasizeswasproposed havior andanother that makesit exhibit thebest. These are then in [20], where the Jimple (a typed three-address code) represen- comparedagainsttheupper-andlower-boundestimations. Since tation of Java bytecode was transformed into Horn Clauses, and ColumnsEstandProfshowthesamevaluesinthiscase,itmeans a simple energy model at the Java bytecode level [11] was used. thattherewasnoinaccuracyduetotheSRA,andthattheoverall However the energy model used average estimations of the Java inaccuracyisduetotheover-andunder-approximationintheEA opcodesandanopcodecostverificationfoundtheestimationtobe tomodelenergyconsumptionofeachbasicblock. Inotherwords, between -5% and 10%. Furthermore, this work did not compare theanalysisofthe findMaxprogramprovidesaccurateboundsfor the results with actual, measured energy consumption. A similar eachdatasizeN. approachwasproposedin[13]fortheanalysisofaC-basedpro- Cost Energy(nJ)×103 gramminglanguage.Itperformedatransformationoftheassembly N D% PrD% codegeneratedbythecompilationofthesourceprogramintoHorn App Est Prof Obs Clauses,whichwerethenanalyzedbyusingtheaccurateassembly Randomarraydata levelenergymodelspresentedin[10].Theexperiments,performed L 22.3 24.9 -20.1 -9.2 5 27.3 for a number of small numerical programs, showed for the first U 31.9 30.2 15.6 10 timethatenergyboundfunctionsinferredstaticallyfromlow-level L 55.9 61.8 -17 -11 15 69.1 modelcouldbeinferredthatprovidedenergyconsumptionestima- U 82.1 75.1 21 8.3 tionswerereasonablyaccuratewithrespecttoactualexecutionsfor L 89.4 99.6 -17.6 -10.7 25 110.9 anyinputdatasize. U 132.2 120.8 21.7 8.5 Similarlytotheworkpresentedhere,theapproachesmentioned Actualworst-andbest-casearraydata above used instantiations for energy consumption of general re- 5 L 22.3 22.3 25.2 -12.2 -12.2 source analyzers, namely [21] in [20] and [13], and [24] in [12] U 31.9 31.9 29.4 8.1 8.1 and this paper. Such resource analyzers are based on setting up 15 L 55.9 55.9 62.6 -11.3 -11.3 andsolvingrecurrenceequations, anapproachproposedbyWeg- U 82.1 82.1 75.5 8.3 8.3 breit[29]thathasbeendevelopedsignificantlyinsubsequentwork[23, L 89.4 89.4 100.2 -11.4 -11.4 4, 5, 26, 21, 1, 24]. Other approaches to static analysis based 25 U 132.2 132.2 121.5 8.4 8.4 onthetransformationoftheanalyzedcodeintoanother(interme- diate) representation have been proposed for analyzing low-level Table2: findMax: Sourceofinaccuraciesinprediction: static languages [6] and Java (by means of a transformation into Java analysisvs.energymodeling. bytecode)[2]. In[2],costrelationsareinferreddirectlyforthese bytecode programs, whereas in [20] the bytecode is first trans- Regarding the time taken by the EA, it can vary depending on formedintoHornClauses. Thegeneralresourceanalyzerin[21] theparametersitisinitializedwith,aswellastheinitialpopulation. wasalsoinstantiatedin[18]fortheestimationofexecutiontimes ThispopulationisdifferenteverytimetheEAisinitiated, except oflogicprogramsrunningonabytecode-basedabstractmachine. forafixednumberofindividualsthatrepresentcornercases.Inthe Theapproachusedtimingmodelsatthebytecodeinstructionlevel, experiments,theEAisrunforuptoamaximumof20generations, foreachparticularplatform,andprogram-specificmappingstolift andisstoppedwhenthefitnessvaluedoesnotimproveforfourcon- suchmodelsuptotheHornClauselevel,atwhichtheanalysiswas secutivegenerations. Inalltheexperimentsthebiquadbenchmark performed. tookthemosttime(amaximumtimeof230minutes)formaximiz- Otherworkhastakenasitsstartingpointtechniquesreferredto ingtheenergyconsumption. Incontrast,thefactbenchmarktook generallyas“WCET”(worstcaseexecutiontimeanalyses),which the least time (a maximum time of 121 minutes). The times re- have been applied, usually for imperative languages, in different mainedwithinthe150-200minutesrangeonaverage.Timespeed- applicationdomains(seee.g.,[30]anditsreferences).Thesetech- upswerealsoachievedbyreusingtheEAresultsforsequencesof niquesgenerallyrequiretheprogrammertoboundthenumberof instructionsthatwerealreadyprocessedinapreviousbenchmark iterations of loops, and then apply an Implicit Path Enumeration techniquetoidentifythepathofmaximalconsumptioninthecon- energy consumption bounds of the whole program. We also car- trolflowgraphoftheresultingloop-lessprogram. Thisapproach riedoutanexperimentalevaluationtovalidatetheupperandlower hasinspiredsomeworstcaseenergyanalyses,suchastheonepre- boundsonasetofbenchmarks. Theresultssupportourhypothe- sentedin[9].Itdistinguishesinstruction-specific(notproportional sis thatthe boundsinferred inthis wayare indeed safeand quite totime,buttodata)frompipeline-specific(roughlyproportionalto accurate,andthetechniquepractical. time)energyconsumption. Theapproachalsotakesintoaccount complexissuessuchasbranchpredictionandcachemisses. How- 8. ACKNOWLEDGMENTS ever,theyrelyontheusertoidentifytheinputwhichwilltriggerthe ThisresearchhasreceivedfundingfromtheEuropeanUnion7th maximalenergyconsumption.In[27]thesameapproachisapplied Framework Program agreement no 318337, ENTRA, Spanish and further refined for estimating hard (i.e., over-approximated) MINECO TIN’12-39391 StrongSoft project, and the Madrid energy bounds. The main novelty of this work consists in intro- M141047003 N-GREENS program. We also thank Henk Muller, ducingrelativeenergymodels(implementedattheLLVMlevelin PrincipalTechnologist,XMOS,forhishelpwiththemeasurement this case), where the energy of each instruction is given in rela- boards,evaluationplatform,benchmarks,andoverallsupport. tiontoeachother(e.g.,ifweassumethatalltheinstructionshave relativeenergy1, thismeansthattheyallhavethesameabsolute 9. REFERENCES energy),whichdoesnotdependonthespecifichardware,butcan [1] E.Albert,P.Arenas,S.Genaim,andG.Puebla.Closed-Form beappliedforalltheplatformswhereamappingbetweenLLVM UpperBoundsinStaticCostAnalysis.JournalofAutomated andlow-levelassemblyinstructionsexists. Ontheotherhand, in Reasoning,46(2):161–203,February2011. thesituationswhentheenergyboundsarenothard(i.e.,theappli- cation allows their violation) they use a genetic algorithm to ob- [2] E.Albert,P.Arenas,S.Genaim,G.Puebla,and tainanunder-approximationoftheenergybounds. However,this D.Zanardini.CostAnalysisofJavaBytecode.InR.D. approach loses accuracy when there are data dependent branches Nicola,editor,16thEuropeanSymposiumonProgramming, presentintheprogram,sincedifferentinputscanleadtotheexecu- ESOP’07,volume4421ofLectureNotesinComputer tionofdifferentsetofinstructions. Science,pages157–172.Springer,March2007. Asimilarapproachisusedin[22]tofindtheworst-caseenergy [3] S.Chakravarty,Z.Zhao,andA.Gerstlauer.Automated, consumptionoftwobenchmarksusingageneticalgorithm.Incon- RetargetableBack-annotationforHostCompiled trasttoourapproach,theevolutionaryalgorithmisappliedtowhole PerformanceandPowerModeling.InProceedingsofthe programs, and these do not have any data-dependent branching. NinthIEEE/ACM/IFIPInternationalConferenceon Theauthorsfurtherintroduceprobabilitydistributionsforthetran- Hardware/SoftwareCodesignandSystemSynthesis, sitioncostsamongpairsofindependentinstructions,whichcanbe CODES+ISSS’13,pages36:1–36:10,USA,2013.IEEE thenbeconvolvedtogiveaprobabilitydistributionoftheenergy Press. forasequenceofinstructions. [4] S.K.Debray,N.-W.Lin,andM.Hermenegildo.Task In contrast to the work presented here and in [18], all these GranularityAnalysisinLogicPrograms.InProc.ofthe1990 WCET-stylemethods(eitherforexecutiontimeorenergy)donot ACMConf.onProgrammingLanguageDesignand infer cost functions on input data sizes but rather absolute maxi- Implementation,pages174–188.ACMPress,June1990. mumvalues,and,asmentionedbefore,theygenerallyrequirethe [5] S.K.Debray,P.López-García,M.Hermenegildo,andN.-W. manual annotation of all loops to express an upper bound on the Lin.LowerBoundCostEstimationforLogicPrograms.In numberofiterations,whichcanbetedious(orimpossible)andef- 1997InternationalLogicProgrammingSymposium,pages fectivelyreducesthecasetothatofprogramswithnoloops. 291–305.MITPress,Cambridge,MA,October1997. AnotheralternativeapproachtoWCET-stylemethodswaspre- [6] K.S.HenriksenandJ.P.Gallagher.Abstractinterpretation sented in [8]. It is based on the idea of amortization, which al- ofPICprogramsthroughlogicprogramming.InSixthIEEE lows inferringmore accurate yet safe upperbounds by averaging InternationalWorkshoponSourceCodeAnalysisand theworstexecutiontimeofoperationsovertime. Itwasappliedto Manipulation(SCAM2006),pages184–196.IEEE afunctionallanguage,buttheapproachisinprinciplegenerallyap- ComputerSociety,2006. plicable. Atiminganalysisbasedongame-theoreticlearningwas [7] M.V.Hermenegildo,F.Bueno,M.Carro,P.López,E.Mera, presented in [25]. The approach combines static analysis to find J.Morales,andG.Puebla.AnOverviewofCiaoandits a set of basic paths which are then tested. Its main advantage is DesignPhilosophy.TheoryandPracticeofLogic thatitcaninferdistributionsontime,notonlyaveragevalues. In Programming,12(1–2):219–252,January2012. principle,bothapproachescouldbeadaptedtoinferenergyusage. [8] C.Herrmann,A.Bonenfant,K.Hammond,S.Jost,H.-W. Loidl,andR.Pointon.AutomaticAmortisedWorst-Case 7. CONCLUSIONS ExecutionTimeAnalysis.In7thInternationalWorkshopon We have proposed an approach for inferring parametric upper Worst-CaseExecutionTimeAnalysis(WCET’07),volume6 andlowerboundsontheenergyconsumptionofaprogramusinga ofOASIcs.SchlossDagstuhl–Leibniz-Zentrumfuer combinationofstaticanddynamictechniques. Thedynamictech- Informatik,2007. nique, based on an evolutionary algorithm, is used to determine [9] R.Jayaseelan,T.Mitra,andX.Li.Estimatingthe the maximal / minimal energy consumption of each basic block. Worst-CaseEnergyConsumptionofEmbeddedSoftware.In Suchblockscontainmultipleinstructions,whichallowsthisphase IEEEReal-TimeandEmbeddedTechnologyand totakeintoaccountinter-instructiondependencies. Sincesuchba- ApplicationsSymposium(RTAS2006),pages81–90.IEEE sic blocks are branchless, the evolutionary algorithm approach is ComputerSociety,2006. morepracticalandefficientandthetechniqueinfersenergyvalues [10] S.KerrisonandK.Eder.EnergyModelingofSoftwarefora thatareaccuratesincenocontrolflow-relatedvariationsoccur. A HardwareMultithreadedEmbeddedMicroprocessor.ACM staticanalysisisthenusedtocombinetheenergyvaluesobtained TransactionsonEmbeddedComputingSystems,14(3):1–25, fortheblocksaccordingtotheprogramcontrolflow,andproduce April2015. [11] S.LafondandJ.Lilius.Energyconsumptionanalysisfortwo [23] M.Rosendahl.AutomaticComplexityAnalysis.In4thACM embeddedJavavirtualmachines.J.Syst.Archit., ConferenceonFunctionalProgrammingLanguagesand 53(5-6):328–337,2007. ComputerArchitecture(FPCA’89),pages144–156.ACM [12] U.Liqat,K.Georgiou,S.Kerrison,P.Lopez-Garcia,M.V. Press,1989. Hermenegildo,J.P.Gallagher,andK.Eder.InferringEnergy [24] A.Serrano,P.Lopez-Garcia,andM.Hermenegildo. ConsumptionatDifferentSoftwareLevels:ISAvs.LLVM ResourceUsageAnalysisofLogicProgramsviaAbstract IR.InProc.oftheFoundationalandPracticalAspectsof InterpretationUsingSizedTypes.TheoryandPracticeof ResourceAnalysis,LNCS.Springer,2015.Toappear. LogicProgramming,30thInt’l.ConferenceonLogic [13] U.Liqat,S.Kerrison,A.Serrano,K.Georgiou, Programming(ICLP’14)SpecialIssue,14(4-5):739–754, P.Lopez-Garcia,N.Grech,M.Hermenegildo,andK.Eder. 2014. EnergyConsumptionAnalysisofProgramsbasedonXMOS [25] S.A.SeshiaandJ.Kotker.Gametime:Atoolkitfortiming ISA-levelModels.InLogic-BasedProgramSynthesisand analysisofsoftware.InP.A.AbdullaandK.R.M.Leino, Transformation,23rdInternationalSymposium,LOPSTR editors,TACAS,volume6605ofLectureNotesinComputer 2013,RevisedSelectedPapers,volume8901ofLecture Science,pages388–392.Springer,2011. NotesinComputerScience,pages72–90.Springer,2014. [26] P.VasconcelosandK.Hammond.InferringCostEquations [14] P.López-García,L.Darmawan,andF.Bueno.AFramework forRecursive,PolymorphicandHigher-OrderFunctional forVerificationandDebuggingofResourceUsage Programs.In15thInternationalWorkshopon Properties.InM.HermenegildoandT.Schaub,editors, ImplementationofFunctionalLanguages(IFL’03),Revised TechnicalCommunicationsofthe26thInt’l.Conferenceon Papers,volume3145ofLectureNotesinComputerScience, LogicProgramming(ICLP’10),volume7ofLeibniz pages86–101.Springer-Verlag,Sep2005. InternationalProceedingsinInformatics(LIPIcs),pages [27] P.Wagemann,T.Distler,T.Honig,H.Janker,R.Kapitza, 104–113,Dagstuhl,Germany,July2010.Schloss andW.Schroder-Preikschat.Worst-caseenergyconsumption Dagstuhl–Leibniz-ZentrumfuerInformatik. analysisforenergy-constrainedembeddedsystems.In [15] P.Lopez-Garcia,L.Darmawan,F.Bueno,and Real-TimeSystems(ECRTS),201527thEuromicro M.Hermenegildo.Interval-BasedResourceUsage Conferenceon,pages105–114,July2015. Verification:FormalizationandPrototype.InR.P.na, [28] D.Watt.ProgrammingXConXMOSDevices.XMOS M.Eekelen,andO.Shkaravska,editors,Foundationaland Limited,2009. PracticalAspectsofResourceAnalysis.SecondIternational [29] B.Wegbreit.Mechanicalprogramanalysis.Commun.ACM, WorkshopFOPARA2011,RevisedSelectedPapers,volume 18(9):528–539,1975. 7177ofLectureNotesinComputerScience,pages54–71. [30] R.Wilhelm,J.Engblom,A.Ermedahl,N.Holsti,S.Thesing, Springer-Verlag,2012. D.Whalley,G.Bernat,C.Ferdinand,R.Heckmann, [16] P.Lopez-Garcia,R.Haemmerlé,M.Klemen,U.Liqat,and T.Mitra,F.Mueller,I.Puaut,P.Puschner,J.Staschulat,and M.V.Hermenegildo.TowardsEnergyConsumption P.Stenström.Theworst-caseexecution-timeproblem- VerificationviaStaticAnalysis.InWorkshoponHigh Overviewofmethodsandsurveyoftools.ACMTrans. PerformanceEnergyEfficientEmbeddedSystems(HIP3ES EmbeddedComput.Syst.,7(3),2008. 2015),arXiv:1501.03064,2015. [31] XMOS.TheXTAG-2HardwareManual,September2009. [17] D.May.TheXMOSXS1architecture.availableonline: Availableonlineat: http://www.xmos.com/published/xmos-xs1-architecture, https://www.xmos.com/download/private/XTAG-2- 2013. Hardware-Manual [18] E.Mera,P.López-García,M.Carro,andM.Hermenegildo. [32] XMOS.UsexTIMEcomposerandxSCOPEtotracedatain TowardsExecutionTimeEstimationinAbstract real-time,2013.Availableonlineat: Machine-BasedLanguages.In10thInt’l.ACMSIGPLAN https://www.xmos.com/download/public/Trace-data-with- SymposiumonPrinciplesandPracticeofDeclarative XScope(X9923H).pdf. Programming(PPDP’08),pages174–184.ACMPress,July 2008. [19] H.Muller,editor.MetricsandCaseStudies.ENTRAProject: Whole-SystemsEnergyTransparency(FETproject318337), November2013.Deliverable6.1,http://entraproject.eu. [20] J.Navas,M.Méndez-Lojo,andM.Hermenegildo.Safe Upper-boundsInferenceofEnergyConsumptionforJava BytecodeApplications.InTheSixthNASALangleyFormal MethodsWorkshop(LFM08),pages29–32,April2008. ExtendedAbstract. [21] J.Navas,E.Mera,P.López-García,andM.Hermenegildo. User-DefinableResourceBoundsAnalysisforLogic Programs.InInternationalConferenceonLogic Programming(ICLP’07),LectureNotesinComputer Science,pages348–363.Springer,2007. [22] J.Pallister,S.Kerrison,J.Morse,andK.Eder.Data dependentenergymodelingforworstcaseenergy consumptionanalysis.arXivpreprintarXiv:1505.03374, 2015.