Peer-to-Peer Distributed Shared Memory? Gabriel Antoniu, Luc Bougé, Mathieu Jan To cite this version: Gabriel Antoniu, Luc Bougé, Mathieu Jan. Peer-to-Peer Distributed Shared Memory?. [Research Report] RR-4924, INRIA. 2003. inria-00071655 HAL Id: inria-00071655 https://hal.inria.fr/inria-00071655 Submitted on 23 May 2006 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. INSTITUTNATIONALDERECHERCHEENINFORMATIQUEETENAUTOMATIQUE Peer-to-Peer Distributed Shared Memory? GabrielAntoniu,LucBougé,MathieuJan N˚4924 Septembre2003 THÈME1 (cid:13) G N E apport + R F 4-- de recherche(cid:13) 2 (cid:13) 9 4 R-- R A/ RI N N I R S 9 I 9 3 6 9- 4 2 0 N S S I Peer-to-Peer Distributed Shared Memory? GabrielAntoniu,LucBougé,MathieuJan Thème1—Réseauxetsystèmes ProjetParis Rapportderecherche n˚4924 —Septembre2003—14pages Abstract: Inthispaper, weshowthat,althoughP2PsystemsandDSMsystemshavebeen designed in rather different contexts, both can serve as major sources of inspiration for thedesignofahybridsystem,withintermediatehypothesesandproperties. Weproposethe conceptofdatasharingserviceforgridcomputing,asacompromisebetweenDSMsystems andP2Psystems. Themaincontribution ofsuchaserviceistodecoupledatamanagement from grid computation, by providing location transparency as well as data persistence in a dynamic environment. To illustrate the proposed concept and validate its feasibility, we haveimplementedasoftwareprototype: theJUXMEM platform. Key-words: distributed sharedmemory,peer-to-peer, datasharing,grid,JUXMEM (Résumé:tsvp) UnitéderechercheINRIARennes IRISA,CampusuniversitairedeBeaulieu,35042RENNESCedex(France) Téléphone: 0299847100-International: +33299847100 Télécopie: 0299847171-International: +33299847171 Mémoire virtuellement partagée pair-à-pair Résumé: Danscepapier,nousmontronsquemêmesilessystèmespair-à-pair(P2P)etles systèmes à mémoire virtuellement partagée (MVP) ont été conçus dans des contextes dif- férents,ilspeuventêtreuneimportantesourced’inspirationpourlaconceptiond’unsystème hybride, avec des hypothèses et des propriétés intermédiaires. Nous proposons le concept deservicedepartagededonnéespourlecalculsurgrille,uncompromisentrelessystèmes à MVP et les systèmesP2P. La principalecontribution d’untel serviceest de découplerla gestiondes données ducalcul sur grille, en fournissantune localisationtransparente ainsi qu’unstockage persistant desdonnéesdansunenvironnement dynamique. Pourillustrerle concept proposé et valider sa faisabilité, nous avons implémenté un prototype : la plate- formeJUXMEM. Mots-clé : Mémoire virtuellement partagée, pair-à-pair, partage de données, grille, JUXMEM Peer-to-PeerDistributedSharedMemory? 3 1 A peer-to-peer DSM? Peer-to-peer [18, 16] (P2P) computing has recently known a growing interest within the distributedcomputingcommunity. Thisismainlyduetothesuccessoffilesharingsystems likeNapster[36],Gnutella[19]orKaZaA[35],whichhaveproventheadequacyofthepeer- to-peerapproachfordatasharingonhighlydynamic,large-scaleconfigurations(millionsof nodes). SeveralresearchprojectsonP2Psystemsarecurrentlyinprogress,howevermostof themhavefocusedondevisingsmartmechanismsforefficientsharingofimmutable, read- only data at a very large scale. Only a few systems (e.g. Oceanstore [14], Ivy [17]) have started to address the issue of sharing mutable data, but the current solutions have proven efficientonlyinspecialcases: theygenerallyassumefewwritersorfewdatamodifications. Ontheotherhand,intoday’sscientificapplications,datacangenerallyberead,butalso modified by multiple sites. To handle the consistency of replicated data, a lot of models and protocols have been proposed within the context of DSM systems (Distributed Shared Memory[21]). However, letus note that thesesystemshave beendesignedby assuminga static,small-scalearchitecture(typically, aclusterofPC). Inthispaper,weshowthat,althoughP2PsystemsandDSMsystemshavebeendesigned inratherdifferentcontexts, bothcanserveasmajorsourcesofinspirationforthedesignof ahybridsystem,withintermediatehypothesesandproperties. 2 Why combine DSM systems and P2P systems? 2.1 Peer-to-peer systems: highscalabilityonhighlydynamic configurations Thepeer-to-peermodelhasbeenmadepopularbysystemsallowingmusicfilestobeshared amongmillionsofintermittentlyconnectedusers(Napster, Gnutella,etc.). Theunderlying model of these systems is simple and complementary to the client-server model: the rela- tionsbetweenmachinesaresymmetrical,eachnodecanbeclientinatransactionandserver inanother. Ithasthusbeenproventhatsuchamodelscalesverywellwithoutanyneedfor acentralizedstorageserverforthesharedfiles: eachclientnodeisequallyaserverandcan providefilestotheothernodes. Asanexample,withintheKaZaAnetwork,4,500,000users simultaneouslyconnectedshare900,000,000filescontaining9peta-bytesofdata. Thishigh scalability has drawn the attention of the distributed systems community, since it shows a way to make an important step forward in this field. Traditionaldistributed systems based ontheclient-servermodelhaveoftenshownlimitedscalability,generallyduetobottlenecks generatedbytheuseofcentralizedservers. Byremovingthesebottlenecks,thepeer-to-peer model not onlyenhances the system’s scalability, but also improves its fault tolerance and RR n˚4924 4 G.Antoniu,L.Bougé&M.Jan availabilitydespitethehighnodevolatility. Thesystem’sactivityisnolongerdependenton theavailabilityofasingleserver. Theseimportantpropertiesexplainwhythepeer-to-peermodelhasattractedtheinterest of the scientific distributed systems community. Within this context, the research efforts mainlyfocusedondevisingefficientpeer-to-peerlocalizationandroutingschemes[20,25, 27, 29, 23, 12, 7], based on the use of distributed hash tables (DHT). These schemes have been illustrated by systems like Chord [27], Tapestry[29] and Pastry [25], which serve as basic layers for higher-level data management systems, such as CFS [6], Oceanstore and PAST[26],respectively. Ontheotherhand,wecannotethatthesesystemsfocusonsharingimmutablefiles: the shared data are read-only and can be replicated at ease, without any limit on the number ofcopies. Itiseasytounderstandthat,ifthedataweremodifiable,amechanismwouldbe necessary to handle the consistency of the data copies. But, by guaranteeing consistency, thesystemwouldsetanon-desiredlimittoscalability: themorethedatacopies,thehigher theconsistency overhead. Therefore,mostpeer-to-peer systemsmakea compromise: they favorscalabilitybysacrifyingthedatamutability. Recently, some mechanisms for sharing mutable data in a peer-to-peer environment have been proposed by systems like OceanStore, Ivy and P-Grid [8]. In OceanStore, for each data, only a small set of primary replicas, called the inner ring agrees, serializes and applies updates. Updates are then multicast down a dissemination tree to all other cached copiesofthedata,calledsecondaryreplicas. However,OceanStoreusesaversioningmech- anismwhichhasnotproven tobe efficientatlarge scales,aspublishedmeasurements[24] ontheperformanceofupdatesassumeasinglewriterperdatablock. TheIvysystemhasone main limitation: applications have to repair conflicting writes, thus the number of writers per data is equally very limited. P-Grid proposes a flooding-based algorithm for updating data, but assumes no conflicting writes. Besides, no experimental results have been pub- lishedsofarforthissystem. Wecanclearlyconcludethathandlingconsistency isaserious problemforpeer-to-peersystems: thepreliminarysolutionstentativelyproposedasoftoday havethesignificantdrawbackoflimitingthesystemscalability, whichisthemainproperty whichmakespeer-to-peersystemsinteresting. 2.2 DistributedShared Memory: consistencyand transparency Theproblemofsharingmutabledataindistributedenvironmentshasbeenintensivelystud- ied during the past fifteen years within the context of Distributed Shared Memory (DSM) systems [15, 21, 2]. These systems provide transparent data sharing, via a unique address spaceaccessibletophysicallydistributedmachines. Asinthecaseofpeer-to-peersystems, readingdataonmultiplenodesmayresultindatareplication. ButtheDSMnodescanalso INRIA Peer-to-PeerDistributedSharedMemory? 5 modify the data, and this results in triggering some consistency action (e.g. invalidation or update), according to some consistency protocol which implements a given semantics statedbysomeconsistency model. Alarge varietyof DSMconsistency modelsandproto- cols[21,10,5,4,11,30]havebeendefined[22],inordertoprovidedifferentcompromises between the strength of the consistency guarantees and the efficiency of the consistency actions. These efforts have been carried out within the context of research on high per- formanceparallelcomputing,oftenwiththegoalof providing maximumtransparency ata minimumcost. A centralfeatureof DSM systems is transparency. First, these systems provide trans- parentaccesstodata: allnodescanreadandwriteanyshareddatainauniformway,should the data be local or remote. The DSM systeminternally checks fordata localityand takes the appropriate action in order to satisfy the access. Second, DSM systems also provide transparent localization: if the program accesses remote data, it is the responsibility of theDSMsystemto localize,transferor replicateit locally, accordingtothe corresponding consistency protocol. However, existing DSM systems have generally shown satisfactory efficiency (i.e. near-linear speedups) only on small-scale configurations (in practice, up to a few tens of nodes [22]). This is often due to the intrinsic lack of scalability of the algorithms used to handle data consistency. These algorithms have often been designed by assuming a small number of copies per shared data. For instance, Multiple-Reader-Single-Writer al- gorithms[15]clearlycannotperformwellatlargescale,sinceanywriteoperationonsome dataresultsinanexpensiveinvalidationofallexistingdatacopies. Inthesameway,Home- Based, Multiple-Writer algorithms [30] also rely on having the home node centralize and merge data modifications from all writers. On the other hand, an overwhelming major- ity of protocols assume a static configuration where nodes do not disconnect nor fail: the uniquewriterofagivendataisnotsupposedtogodown, noristhehomenodeinahome- based DSM. Only a few DSM systems have integrated some mechanisms for fault toler- ance[28,13]. However,nodesfailuresaresupposedtobeinfrequentandareconsideredas anexceptionalbehavior. Thisistobecontrastedwiththebasichypothesesof peer-to-peer systems,inwhichnodesareassumedtojoinandleavethenetworkatanytime,asaregular behavior. Therefore, we can conclude that getting DSM highly scalable and adaptive to dynamicconfigurationsisarealchallenge,sinceitconflictswiththefoundingpropertiesof traditionalDSMsystems. 2.3 Hybridapproach: a datasharingserviceforscientificcomputing AlthoughP2PsystemsandDSMsystemshavebeendesignedinratherdifferentcontexts,we thinkbothcanserve asmajorsourcesofinspirationforthedesignofahybrid datasharing RR n˚4924 6 G.Antoniu,L.Bougé&M.Jan DSM Griddataservice P2P Scale (cid:0)(cid:2)(cid:1)(cid:4)(cid:3) –(cid:0)(cid:2)(cid:1)(cid:6)(cid:5) (cid:0)(cid:2)(cid:1)(cid:6)(cid:7) –(cid:0)(cid:2)(cid:1)(cid:6)(cid:8) (cid:0)(cid:2)(cid:1)(cid:6)(cid:9) –(cid:0)(cid:2)(cid:1)(cid:6)(cid:10) Topology Flat Hierarchical Flat Resourcecontrol High Medium Null andtrustdegree Dynamicity Null Medium High Resource Homogeneous Ratherheterogeneous Heterogeneous homogeneity (clusters) (clustersofclusters) (Internet) Datatype Mutable Mutable Immutable Application Complex Complex Simple complexity Typical Scientific Scientificcomputation Filesharingand applications computation anddatastorage storage Table1: AgriddatasharingserviceasacompromisebetweenDSMandP2Psystems. system. If DSM systems can usually handle configurations of tens or hundreds of nodes, corresponding to cluster computing, peer-to-peer systems generally target configurations of millions of nodes, corresponding to the scale of Internet. The hybrid data sharing sys- temweproposetargets configurationsof thousandsandtensofthousandsofnodes,which correspondspreciselytothescaleofgridcomputing[9]. Therefore,wethinktheadequateapproachforthedesignofsuchasystemisnottobuild a peer-to-peer DSM, nor a shared-memory peer-to-peer system, but rather a data sharing serviceforgridcomputing. Suchaservicehastoaddresstheproblemofmanagingmutable dataondynamic,large-scaleconfigurations. Theapproachweproposebenefitsbothfrom DSM systems (transparent access to data, consistency protocols) and from P2P systems (scalability, supportforresourcevolatilityanddynamicity). Thesetwoclassesofsystemshavebeendesignedandstudiedinverydifferentcontexts. In DSM systems, the nodes are generallyunder the controlof a single administration, and theresourcesaretrusted. Incontrast,P2Psystemsaggregateresourceslocatedattheedgeof theInternet,withnotrustguarantee,andloosecontrol. Moreoverthesenumerousresources areessentiallyheterogeneousin termsof processors,operatingsystemsand networklinks, as opposed to DSM systems, where nodesare generallyhomogeneous. Finally, DSMsys- tems are typically used to support complex numerical simulation applications, where data are accessed in parallel by multiple nodes. In contrast, P2P systems generally serve as a INRIA Peer-to-PeerDistributedSharedMemory? 7 supportforstoringandsharingimmutablefiles. Theseantagonistfeaturesaresummarized inthefirstandthirdcolumnsofTable1. Adatasharingservicetargetsphysicalarchitectureswithintermediatefeaturesbetween those of DSM and P2P systems. It addresses scales of the order of thousands or tens of thousandsofnodes,organizedasafederationofclusters,saytensorhundredsofhundred- nodeclusters. Atagloballevel,theresourcesarethusratherheterogeneous,whiletheycan probablybeconsideredashomogeneouswithintheindividualclusters. Thecontroldegree and the trust degree are also intermediate, since the clusters may belong to different ad- ministrations,whichsetupagreementsonthesharingprotocol. Finally, theservicetargets numerical applications like heavy simulations, made by coupling individual codes. These simulations process large amounts of data, with significant requirements in terms of data storage and sharing. These intermediate features are illustrated in the second column of Table1. The main contribution of such a service is to decouple data management from grid computation, by providing location transparency as well as data persistence in a dynamic environment. Asexplainedinthescenariosdescribedbelow,suchaservicecanprovehelp- fulforheavynumericalsimulations,basedoncodecoupling,withsignificantrequirements intermsofdatastorageandsharing. 3 A data sharing service for the grid: sample scenarios Persistence. Sincegridapplicationscanhandlelargemassesofdata,datatransferamong sites can be costly, in terms of both latency and bandwidth. In order to limit these data exchanges, the data sharing service has to rely on strategies able to 1) reuse previously produced data; 2) trigger “smart” pre-fetching actions to anticipate future accesses and 3) provide useful information on data location to the task scheduler, in order to optimize the globalexecutioncost. Letusconsiderthefollowingscenario,whichillustratesthefirstpointmentionedabove. A client submits a computation to the grid infrastructure. The execution is (cid:0)(cid:2)(cid:1)(cid:4)(cid:3) (cid:3)(cid:6)(cid:5)(cid:8)(cid:7)(cid:10)(cid:9)(cid:12)(cid:11)(cid:14)(cid:13) scheduled on server (cid:0) . To run this computation, the client needs to submit and (cid:15) (cid:7) (cid:11) (whichmaybelarge matrices)to (cid:0) . Attheendofthecomputation, istransferredfrom (cid:15) (cid:0) (cid:0) to the client. Let us now assume that the client has to execute a second computation (cid:15) (cid:16) onthesameserver. Todothis,theclientwouldhavetoresubmit and to (cid:1)(cid:17)(cid:3) (cid:5) (cid:5)(cid:8)(cid:11)(cid:18)(cid:9) (cid:0) (cid:13) (cid:11) (cid:0) (cid:0) . Toavoidsuchunnecessarytransfers,thedatasharingservicehastoprovidepersistence (cid:15) by allowing to reuse the matrices already present on the storage infrastructure. The client shouldbeabletocharacterizethepersistenceguaranteesforanydatastoredbytheservice. RR n˚4924
Description: