Increasing GP Computing Power via Volunteer Computing Daniel Lombran˜a Gonz´alez1, Francisco Ferna´ndez de Vega1, L. Trujillo2, G. Olague2, F. Cha´vez de la O1, M. C´ardenas3, L. Araujo4, P. Castillo5, and K. Sharman6 8 0 1 University of [email protected], [email protected], [email protected] 0 2 CICESE [email protected], [email protected] 2 3 Ceta-Ciemat [email protected] n 4 UNED [email protected] a 5 University of Granada [email protected] J 6 University Polit´ecnica of Valencia, [email protected] 8 ] Abstract. ThispaperdescribeshowitispossibletoincreaseGPCom- C puting Power via Volunteer Computing (VC) using the BOINC frame- D work. Two experiments using well-known GP tools -Lil-gp & ECJ- are . performed in order to demonstrate the benefit of using VC in terms of s computing power and speed up. Finally we present an extension of the c [ model where any GP tool or framework can be used inside BOINC re- gardless of its programming language, complexity or required operating 1 system. v 0 1 1 Introduction 2 1 Real world optimization problems are usually complex and their resolution is . 1 CPUtime consuming when EAs areemployed.This is due to the big amountof 0 individualswhichareevaluatedandalsoduetothenumberofiterationsrequired 8 to find a solution. 0 In order to alleviate this problem, EAs and GP have benefited from parallel : v models.Twomainapproachestoparallelizehavebeendescribed:(see[1]):Fine- i X grain which uses a master-slave architecture and Coarse-grain also known as islandmodel. Given the stochastic nature ofEAs andGP andthe largenumber r a of runs, frequently required for obtaining results, parameter sweep models have recently been applied in combination with high throughput computer systems. GRIDcomputingisnowadaysoneofthemostemergingtechnologiesandhas been recently demonstrated as a powerful tool to deal with time-consuming ap- plications, see [2]. The GRID allows super computers, clusters or desktop PCs, which aredistributed overnetworks,to be harnessedby means of a specialsoft- ware called middleware. The middleware exports and handles all the computer resources with the goal of providing a standard layer where scientists can run their experiments. The middleware can be focused on handling two type of resources: desktop PCsorsupercomputers/clusters.Anexampleofsupercomputer-clustermiddle- ware is gLite (see [3]). A successful attempt of using EAs and GRID computing is presented by N. Mealab et al. see [4]. However, this kind of middleware is usually associated with expensive hardware which is a drawback for scientists from developing regions or countries. There exists other middleware systems which focus on cheap desktop PCs. ThistypeofmiddlewareisaimedatbuildingupDesktopGridComputing(DGC) systems. The complexity and deployment of this technology is much smaller than alternative GRID one (such as gLite). There are several DGC systems such as: Xtremweb [5] a research project which presents a Global Computing platform using a large base of volunteer PCs, Condor [6] a middleware which implements a scheduling system with desktop PCs and BOINC [7] a multi- platform middleware that uses workstation CPU idle periods to run jobs. WhendealingwithDGC,theusersarereallyimportant.DGCreliesonthem because the idea behind DGC is to allow users to collaborate with a scientific researchprojectbydonatingdesktopcomputingpower.Tothebestofourknowl- edge, the pioneer on engaging users to collaboratewith a scientific researchwas the project SETI@HOME [8]. This project has been able to create a super vir- tual computer of 259.729 TeraFLOPS thanks to the collaboration of 706,586 users7. Thus, DGC is also known as Volunteer Grid Computing (VGC) due to the users that unselfishly collaborate with a scientific research.Therefore, VGC computing power can be used for free if users see the interest of the project. From all the above presented VGC middleware technologies BOINC is the mostusedone.Furthermore,BOINCisalreadywidelyusedindifferentresearch fields such as: High Energy Physics [9], Astrophysics [10], Climate Prediction [11], Epidemiology [12], etc. A noveltechnology has also been recently described which allows to harness computing resources by only browsing a given and special web page [13,14,15]. Yet, we don’t consider it here because it allows to avoid users to notice the background work of the web-browser, thus computing power is “stolen” from user PCs. Users will thus be annoyed if they discovered that someone has been usingtheirresourceswithouttheirpermission.Ourideaisjusttheopposite,not only to inform users about the projectthat we are running but also invite them to join and collaborate. VGCisalsoagoodcomputingplatformforrunningParameterSweepexper- iments in genetic and evolutionary computation. M.E. Samples describes Com- mander, a new generic parameter sweep framework (see [16]). However, firstly, is not aimedat harnessingvolunteer resources,secondly, has not been so widely spread and tested with real scientific research problems, and thirdly it hasn’t involvedthe huge number of clients like BOINC (2,364,170computer clients), a consolidated VGC. Any of the above parallel described models -fine and coarse grain- can run within this approach. We simply have to bear in mind that any of the whole parallelmodelswillrunonasinglemachine.Thepowercomesfromthemultiple and simultaneously runs of the same experiment with different parameters or identical runs for statistical analysis. 7 Data obtained from theweb http://boincstats.com Therefore,whatwe proposeis touse a VGC BOINCmodel andGP inorder to: – HarnessalargenumberofBOINCresources(nowadaysBOINChas2,364,170 computers in total which provides in average 668,541.2GigaFLOPS8). – Improve the speed up of GP sequential executions thanks to the parallel environment which VGC provides. WehavechosenBOINCasourVGCmiddlewarebecauseBOINCisthemost used one, therefore allowing a great computing power. The remainder of the paper includes a description of the BOINC model in Section2;wepresentthe newVGC andGPmodelinSection3;Section4shows the experiments and results and an extension to the model. We conclude in Section 5. 2 The BOINC model As described above, BOINC is a middleware that harness the commodity com- puter resourcesfor a givenproject.BOINC has two mainkey features:it’s Mul- tiplatform andOpen source.BOINC uses amaster-slavemodelwherethe server is in charge of: – Hosting the scientific project experiments. One project is composed by a bi- nary (the algorithm)andsome input files.The binaryis classifiedaccording to the targetplatform(Ms. Windows,GNU/Linux, MacOSX)and architec- ture (x86 32-64 bits and sparc). – Creation and distribution of jobs. In BOINC’s terminology a job is called “workunit”(WU).AWUdescribeshowtheexperimentmustberunbythe clients(thenameofthebinary,theinput/outputfilesandthecommandline arguments). – Assimilation and validation of results. When the clients finish the compu- tations, they upload the results to the server. At this point the server runs two processes: a validation and assimilator program. The validator goal is to verify if the results are corrupted or not. If everything is correct the re- sults are validated and prepared for the next program: the assimilator. The assimilator is in charge of parsing the output files of the project to perform tasks like: compute some statistics, store results inside other database, etc. Theclientisquite simple.TheBOINCclientconnectstothe serverandasks for work (WU). The client downloads the necessary files and starts the compu- tations. Once the results are obtained, the client uploads them to the server. Additionally, during the whole process (all the steps: download WU, process it, uploadthe results)theclientcontactstheservertokeepa“heartbeat”function. Theheartbeatisusedtotakesomedecisionsandgeneratesomestatisticaldata. As BOINC relies on users, BOINC resources are not reliable because: 8 Data obtained from http://boincstats.comunder BOINCCombined stats – Users turn on and off its machines without knowing if they are interrupt- ing a BOINC execution. Therefore, the research application must have a checkpoint facility. – Users could try to cheat.BOINChasaredundancyfeaturethatcircumvents this problem. The BOINC administrator can define which is the minimum required quorum to validate a solution. – Users could tryhacking the BOINCserver.Ifoneusercouldhacktheserver, hecouldbeabletolaunchnewWUswhichcanhaveviruses.Inordertoavoid this problem, BOINC uses digital signatures to sign binary applications. Therefore, only signed applications can be distributed over the clients. AswecanobserveBOINC’sstructureissimpleandprovidesthemainneeded features that any VGC system requires. The next sub-section explains how we can use BOINC with a scientific researchproject. 2.1 How to use BOINC with a Scientific Project Ascientificprojectthatwantstouse BOINChastosetupaGNU/Linux server (Apache, MySQL, PHP) and then take into account the following key points: – Programming Language. BOINC uses C++ and Fortran. Thus, if the sci- entific project is not coded in C++ or FORTRAN the project has to be ported to C++ or Fortran in order to link its source code with the BOINC framework. – Operating System. BOINC has clients for GNU/Linux, Ms. Windows and MacOSX. However,the scientific project has to be adapted to all of them if we want to harness as much as possible available resources. BOINC uses a generic framework which builds binaries which are OS dependant. Therefore, there are two ways to support BOINC: – Method 1. To port the code. This method is the most used. Basically, a researcher has to adapt its application source code to support BOINC. The changes could be easy if the tool is coded in C/C++ or Fortran. In other cases the researchwill have to rewrite the whole code. – Method 2. The Wrapper. The BOINC team provides a tool called wrap- per which enables to run statically linked software inside BOINC without needingtomodifyorporttheapplicationsourcecode.Basicallythewrapper embodies the applicationinsuchawaythatfor the BOINC clientdoesonly exists one application: the wrapper. Inconclusion,anapplicationwhich is notcoded inC/C++/Fortranwilluse the second method. However, if the application is coded in C/C++/Fortran some minor changes will be needed to support BOINC (Method 1). 3 VGC and GP problems Our proposal is to use VGC for running GP experiments via BOINC technol- ogy. We present two examples that show how any GP preferred tool could be effectively used within the BOINC framework. The examples include adapting Lil-gp to BOINC (Method 1) and using the wrapper for ECJ (Method 2). 3.1 Lil-gp and BOINC LilgpisawellknownCGPsystem(see[17]).AsLil-gpiscodedinC,portingthe framework to BOINC is not difficult (Method 1). The main porting changes were done in all the Input/Output routines that access files. Under BOINC the I/O routines are treated with specific I/O functions due to the parallel nature of the model. Once the changes were done Lil-gp was ready to be compiled and linked with the BOINC libraries. In summary, a Lil-gp-BOINC project is composed by the following items: – Binary. TheLil-GPcompiledproblem(symbolic linearregression,evenpar- ity 5, etc.) using the adapted Lil-gp-BOINC framework. – Input files. Lil-GP uses as input file the GP parameter file (probability of crossover,mutation, etc.). – WU. The WU for this project specifies which are the input files needed by the Binary, the output files which are going to be generated by the Binary andfinallyifitisnecessarythe commandline argumentsthatcanbe passed to the Binary. In this project, it’s necessary to specify the input file using a command line argument. The results that we described below show the effectiveness of the approach. Nevertheless,researchersfrequently don’t have the time to manage a porting or the toolis written in a languagedifferent to C or Fortran.In this cases the goal is to use the tool as it is. Next sub-section deals with this case. 3.2 ECJ and BOINC ECJ is a modern JAVA framework for Evolutionary Computation (EC) [18]. ThisframeworkcanrundifferentkindsofECtechniqueslike:geneticalgorithms, evolutionary strategies, genetic programming, etc. As ECJ is not coded in C++ or Fortran there are two ways of supporting BOINC: 1. Port the framework. This solution is quite difficult due to the framework is written in JAVA. As is written in a different programming language, the port step implies a complete rewriting of all the framework using C++ or Fortran. 2. Use the Wrapper. This is a simple solution consisting of to not modify any source code line to run the framework inside BOINC. Thereforeweemploythewrappersolution.Howeverthissolutionimpliesthat all the clients should have installed a JAVA virtual machine, because without JAVA it is impossible to run any ECJ experiment. So, the adopted solution to support JAVA inside all the clients, was to pack also the JAVA virtual machine with ECJ. In summary, the ECJ-BOINC project is composed by the following items: – Binary. The binary is the wrapper. The Wrappers uses a file called job.xml whichspecifiestheunmodified binary thatmustbelaunchedbyBOINC,the unmodified binary input/output files and the command line arguments. – Input files • ECJ and JAVA. ECJ and JAVA are packaged in different compressed files. These files will be unpacked when the clients have downloaded them. • Unmodified Binary. The unmodified binary is a script file which is in charge ofunpacking allthe input files (ECJ and JAVA virtualmachine) and start the execution of ECJ. Additionally it’s also in charge of han- dlingtheinternalECJcheckpointinginordertorestart,whennecessary, from the last saved and stable stage. – WU.TheWUforthisprojectspecifieswhicharetheinputfilesneededbythe Binary and the output files which are going to be generated by the Binary (in this case by the unmodified binary). In summary the workflowof ECJ-BOINC,once the clients have downloaded all the necessary files, will be the next: 1. The Wrapper launches the starter script. 2. The script: (a) Unpacks ECJ and JAVA compressed files. (b) Checks if an ECJ checkpoint file had been created and in that case it will launch ECJ with the checkpoint file, else it will launch ECJ in the normal way. (c) Copies the ECJ output file to the solution file and exits. 3. The wrapper waits until the solution file is created and then notifies to the BOINC coreclient that anexecution has ended and that the result files can be uploaded to the BOINC server. Any other statically linked tool could also follow this scheme to be run on BOINC clients. The next section explains how Lil-gp-BOINC and ECJ-BOINC were employed to run different experiments using a geographically distributed pool of clients. 4 Experiments & Results The goal of all the experiments presented before is to show that VGC is a useful computing platform for running GP problems. We are not interested in checking the quality of obtained results, and therefore we have employed the standard implementation of benchmark problems provided by the tools Lil-gp andECJ.We focus onmeasurethe performanceimprovementthatit ispossible to achieve when a BOINC model is used compared with the traditional and sequential mode of running only one machine. Usually speed up is measure by the standard equation 1: T A= seq (1) T B where: – A is the acceleration. – T is the consumed time by the sequential mode. seq – T is the consumed time by the BOINC mode. B Nevertheless, given the special features of VGC, we also measure the available computing power (CP)by using the method describedby Andersonand Fedack in [19]: CP =X ∗X ∗X ∗X ∗X ∗X ∗X ∗X ∗X arrival life ncpus flops eff onfrac active redundancy share (2) In all the experiments, X is equal to 1 because we didn’t use the redundancy redundancy facility providedby BOINC. X is also equal to 1 because none share of the clients shared its resources with other BOINC projects. X and arrival X areimportantvariablesduetothey measurethehostchurn(thevolunteer life computingproject’spoolofhostsisdynamic).Therestofthevariablesmeasure specific hardware features [19]. 4.1 Lil-gp-BOINC Thisfirstexperimentpresentstheproofofconceptandwassetuponacontrolled environment, a laboratory, using Lil-gp, Method 1 see Section 2.1. In order to measure the performance improvement we chose the Artificial Ant on Santa Fe Trail problem (see [20]). Werun25executionsoftheexperimentwithdifferentpopulationsizes(1000 and2000individuals)andgenerations(1000and2000).Twopoolsofclients,one with 5 and another with 10 machines, were used for running the Lil-gp-BOINC model. As said above,given the aim of this researchwe don’t present the quality of obtained results (which are the same as the sequential execution). We focus on the performance, computing power and speed up. Table 1 shows the consumed timebyLil-gp-BOINC,standardLil-gpandtheaccelerationwhichwasobtained using 5 and 10 client machines with the BOINC model. From the above results we can conclude that as more clients are used better performance is obtained. For instance, when we are using 10 clients we have achieved an acceleration of 5 while with 5 clients we only get an acceleration of 3usingthesamenumberofgenerationsandindividuals.Itisimportanttopoint out that this performance grows as more clients collaborate with the project. Table 1. Execution time for Lil-gp and Lil-gp-BOINC (a) Using 5 Clients (b) Using 10 Clients Tseq TB Acc. Tseq TB Acc. 1000 Gen, 1000 Ind.4250s1548s2.7455 1000 Gen, 1000 Ind.4250s1033s4.1142 1000 Gen, 2000 Ind. 650s 395s 1.6456 2000 Gen, 1000 Ind 9200s1623s5.6685 2000 Gen, 1000 Ind 9200s2356s3.9049 As this experiment is a proof-of-concept the measure variables, speed up and computing power, have not taken into account real volunteer users. Therefore, wedonotshowtheavailablecomputingpowerobtainedinavolunteercomputing scenario (see equation 2). In order to continue the evaluation of the model we decided to face a more complex GP tool besides a more computing demanding GP problem. 4.2 ECJ-BOINC This second experiment employs a modern and complex JAVA GP framework (ECJ). For this tool we decided to use the wrapper (Method 2) in order to sup- port BOINC. We chose the GP benchmark Boolean Multiplexer function (see [20]).Ingeneral,theinputtotheBooleanMultiplexerfunctionconsistsofk“ad- dress”bitsaiand2k“data”bitsdiwhichhastheformak−1···a1a0d2k−1···d1d0 withalengthequaltok+2k.Thesearchspaceforthisfunctionisequalto2k+2k. This problem has been run in several geographically distributed labora- tory clients belonging to the University of Extremadura (Ca´ceres, Badajoz and M´erida),see Fig.1(a).Fig.1(b)showsthe number ofclients per citywhichtake part in the experiment. It’s important to point out that in following tables, T B measuresalltheemployedtime(clientconnection,WUdownload,CPUtime,re- sultsupload,etc.):sincethefirstclientregistersandcollaboratewiththeproject until the last connection to the server from any client. 828iterationsofthe11multiplexerfunction(k =3)wereinitiallyperformed using 45 computers. The experiment used the same Koza parameters (4000 in- dividuals and 50 generations) for more details see [21]. From the 828 iterations 449 iterations found the perfect solution (although this is not the goal of our research)tothe11multiplexerproblem.Someiterationsgaveanerrorduetothe initial and default restrictionof the JAVA heap size9. 119.18seconds in average wereneededinordertofindthe perfectsolutionwhile134.75secondsinaverage is the needed time to run one execution. From the 45 available computers, only 27produced828results.Theachievedspeedupwas0.29whichmeansadeterio- rationintheperformance.Thereasonistheeasinessoftheproblem,only134.75 seconds in average,and we have to take into accountthat T measuresalso the B host churn (see 2). Taking into account that the project was running for 5.35 days,X is measure only from the first connection to the last communication life 9 The heap size was later modified toavoid this problem. Fig.1. Distributed Infrastructure (a) InterconnectedCities (b) Clients perCity ofhoststhathadnocommunicatedinatleastoneday.Thus,theachievedCPis equalto 80GigaFLOPS.This CPwas obtainedbecause the projectwasrunning only a few days and all the hosts were active during all the computation time. Tab. 2(a) shows a summary. We increased the complexity of the problem with k = 4 (compare the new searchspace 21048576 with the previousone 22048). Our interestis not in solving theproblem,butinsettingupatimeconsumingexperimentfortestingtheVGC model. We tried first to deploy 42 runs of an experiment using 50 generations with a population of 1000 individuals. The rest of the GP parameters are the same as Koza parameters for the 11 multiplexer see [21]. VolunteercomputersfromotherUniversitiesorinstitutionssuchas:CICAin Sevilla,UniversityofExtremadura(Ca´ceres,Badajoz,M´erida),Granada,Valen- cia,UNEDinMadrid,andCeta-CiematinTrujillocollaboratedwiththeproject. Thus, the computing resources are more heterogeneous and reallistic now. Us- ing this infrastructure we performed 42 runs of the experiment. A performance improvement was achieved as this problem needs in average 31079.28 seconds to run one execution. 41 machines were used to solve this problem. From 41 computers, 7 produced the 42 runs due to some machines were turned off for hours, others still computing, etc. (typical VGC behavior). Thus, the obtained accelerationwas1.95(seeTab.2).Theobtainedspeedupwasnicealthoughnot impressive but it was obtained for free with a quite small number of volunteers involved. In average one iteration employs 30944.53 seconds more than the 11 multiplexer.X wasalsomeasureas inthe 11multiplexer problemdue to the life projectwasrunningonlyfewdays.So,theachievedcomputingpowerisequalto 23 GigaFLOPS and is smaller because we are employing only 41 hosts and be- causetheprojectwasrunning7.75days.However,bearinmindthatBOINChas a pool of 2,364,170 available computers which could collaborate with a project in the future instead of only 42 which we have employed here, not all of them simultaneously available. Tab.2(b)showsasummaryoftherelevantdata.ThebestfoundFitnesswas Raw=180224.0Adjusted=5.54862e−06 Hits=868352.0. Table 2. Execution time for ECJ and ECJ-BOINC Tseq TB Acc. CP 11 bits, 828 runs, 50 Gen, 4000 Ind. 134078s 462259s 0.29 80 GFLOPS 20 bits, 42 runs, 50 Gen, 1000 Ind. 1305330s 669759s 1.95 23 GFLOPS Finally,we performedanotherexperiment with a complex andnot statically linked GP tool which makes impossible to employ Method 1 or 2. Additionally, this experiment faces a real life and time-consuming Computer Vision problem (instead of a benchmark problem) that has already been solved in a sequential fashion (Interest Point detectors,see [22]). This GP framework uses the Matlab environment and severalimage tool-boxes,which implies a much more complex system, being therefore more difficult to deploy it overa BOINC infrastructure. Hence, our proposal is to use a Virtualization layer (see [23]) inside BOINC by creatinganimageofaworkingscientificsystem(hardware,OSandtheresearch tool). Thanks to this new virtualization layer (based on VMware [23]) any GP systemorframework-independentlyfromitscomplexity,programminglanguage and operating system- can be run on any BOINC client (Linux, Windows or MacOSX, for further details see [24]). For this experiment we set up 10 Ms Windows volunteer computers. The virtual image was build using a GNU/Linux x86 operating system. Thus, a GNU/Linux scientific environment runs directly inside Ms Windows thanks to the Virtual-BOINCapproach.The 10 Windows PCsproduced12solutionsdur- ing 48 hours. The consumed time by each solution was in average of 18 hours. Thetotaltime consumedforproducing12solutionsbyasequentialrunwas215 hours. Therefore, thanks to this new model the obtained speed up was of 4.48 and a CP of 25.67 GFLOPS, (see Tab. 3). Table 3. Execution time for IP and IP-Virtual-BOINC Tseq TB Acc. CP 75 Gen, 75 Ind.215h48h 4.48 25.67 GFLOPS Insummary,theBOINCmodelimprovessignificantlytheperformanceswhen CPU-intensive, time-consuming, real-life problems are solved by means of GP. Moreover, as more computers collaborate with one project more computing power and speed up is achieved with a free cost.