ebook img

Large Scale Graph Processing Using Apache Giraph PDF

243 Pages·2017·7.45 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Large Scale Graph Processing Using Apache Giraph

Large-Scale Graph Processing Using Apache Giraph Item Type Book Authors Sakr, Sherif; Orakzai, Faisal Moeen; Abdelaziz, Ibrahim; Khayyat, Zuhair Citation Sakr, S., Orakzai, F. M., Abdelaziz, I., & Khayyat, Z. (2016). Large-Scale Graph Processing Using Apache Giraph. doi:10.1007/978-3-319-47431-1 Eprint version Post-print DOI 10.1007/978-3-319-47431-1 Publisher Springer International Publishing AG Rights The final publication is available at Springer via http:// dx.doi.org/10.1007/978-3-319-47431-1 Download date 26/03/2019 09:39:33 Link to Item http://hdl.handle.net/10754/623047 Sherif Sakr, Faisal Moeen, Ibrahim Abdelaziz, Zuhair Khayyat Large Scale Graph Processing Using Apache Giraph Monday20th March,2017 Tomywife,Radwa,mydaughter,Jana,and myson,Shehabfortheirlove, encouragement,andsupport. SherifSakr Tomyfather,MoeenOrakzai,mymother, HikmatMoeen,mywife,Sanaandmyson Ibrahimfortheirsupport. FaisalMoeen Tomywifeandmylovelydaughtersfortheir unconditionalloveandsupport. IbrahimAbdelaziz Tomyfather,Yarub,mymother,Nadia,my lovelywife,Manal,andmytwohandsome boys,MazinandZiyad,fortheirloveand support. ZuhairKhayyat vi Foreword ThepresentdecadehasbeendubbedtheDigitalUniverseDecadebecausethedig- ital universe, all data available in digital form, is growing at an exponential rate during this decade. Lots of this data is graph data. Examples include data related to social network connections and data diffusion, computer networks, telecommu- nication networks, the Web, and knowledge bases. Yet another example close to myheartisroad-networkdata:instepwiththerapidincreasesisavailablevehicle trajectorydata,thisdatagrowsrapidlyinsize,resolution,andsophistication. Graph analytics enables the extraction of valuable information from graph data and enables valuable services on top of graph data. For example, in social media, graphanalyticscanbeusedtodetectcommunicationpatternsthatmightbeofinter- esttonationaldefense.Inthemedicalandbiologicaldomains,graphanalyticsmay be used for analyzing relationships in the contexts of proteins, DNA, cells, chem- ical pathways, and organs in order to determine how they are affected by lifestyle choicesandmedications.Intransportation,graphanalyticscanenablepersonalized andstochasticroutingthattakesintoaccounttime-varyingtraveltimesandgreen- house gas emissions. These are but some uses of graph data analytics; new ones emergeatanacceleratedpace. PracticalGraphanalyticsusesacombinationofgraph-theoretic,statistical,and data management techniques to model, store, query, and analyze graph data. The processingofgraphdataembodiescomputationallyhardtasksthathaveonlybeen exacerbated by the rapid and increasing growth in available graph-related data. In 2010,GoogleintroducedPregel,adistributedsystemforlarge-scalegraphprocess- ing.InspiredbytheBulkSynchronousParallelmodel,Pregelfeaturesanintuitive, vertex-centricorganizationofcomputationsthatletsitsusers"thinklikeavertex." Since its introduction, Pregel has spurred substantial interest in large-scale graph dataprocessing,andanumberofPregel-likesystemshaveemerged. vii viii Foreword ApacheGiraphisanopen-sourcesystemforPregel-like,large-scalegraphdata processing.Ithasaglobalandgrowingusercommunityandisthusanincreasingly popular system for managing and analyzing graph data. This book provides step- by-stepguidancetodatamanagementprofessionals,students,andresearcherswho arelookingtounderstandanduseApacheGiraph.Itguidesthereaderthroughthe detailsofinstallingandconfiguringthesystem.Itoffersadetaileddescriptionofthe programmingmodelandrelatedaspects.Anditoffersastep-by-stepcoverageofthe implementationofseveralpopular,aswellasadvanced,graphanalyticsalgorithms, coveringalsorelatedoptimizationdetails. In a nutshell, this book is a timely and valuable resource for data management professionals,students,andresearchersinterestedinusingApacheGiraphforlarge- scalegraphprocessingandgraphanalytics. -ChristianS.Jensen Aalborg,Denmark,July2016 Preface We are generating data more than ever. The ubiquity of the Internet has dramati- callychangedthesize,speedandnatureofthegenerateddata.Almosteveryhuman becomeadatageneratorandeverybusinessbecameadigitalbusiness.Asaresult, wearewitnessingadataexplosion.Inthelastyears,severaltechnologieshavecon- tributedtothisdataexplosionincludingmobilecomputing,Web2.0,socialmedia, socialnetwork,cloudcomputingandSoftware-as-a-Service(SaaS).Inthefuture,it isexpectedthattheInternetofThingswillfurtheramplifythischallenge.Inparticu- lar,severalthingswouldbeabletogetconnectedtotheInternet,andthustherewill belotsofdatapassedfromuserstodevices,toservers,andback.Hence,inaddition tothebillionsofpeoplewhoarecurrentlyusingtheInternetanddailyproducinga lotofdata,watches,cars,fridges,toaster,andmanyotherdeviceswillbeonlineand continuouslygeneratingdataaswell.Itisquiteexpectedthatinthenearfuture,our toasterswillbeabletorecommendtypesofbreadbasedonsuggestedinformation fromourfriendsontheSocialNetworks. With the recent emerging wave of technologies and applications, the world has becoming more connected than ever. Graph is a popular neat data structure which isusedtomodelthedataasanarbitrarysetofobjects(vertices)connectedbyvari- ouskindsofrelationships(edges).Withthetremendousincreaseonthesizeofthe graph-structureddata,large-scalegraphprocessingsystemshavebeencruciallyon- demandandattractedalotofinterest.Thisbookisintendedtotakeyouinajourney withApacheGiraph,apopulardistributedgraphprocessingplatform,whichisde- signed to bring the power of big data processing to graph data that would be too large to fit on a single machine. We describe the fundamental abstractions of the system,itsprogrammingmodelsanddescribevarioustechniquesforusingthesys- temtoprocessgraphdataatscale.Thebookisdesignedasaself-studystep-by-step guide for any reader with an interest in for large-scale graph processing. All the ix x Preface sources code presented in the book is available for download from the associated githubrepositoryofthebook. Organizationofthebook Chapter1startsbyageneralbackgroundofthebigdataphenomena.Wethenpro- videanintroductiontothebigGraphproblem,itsapplications,howitdiffersfrom thetraditionalchallengesofthebigdataproblemandmotivatestheneedfordomain- specificsystemsthatwhicharedesignedtotacklethelargescalegraphprocessing problem. We then introduce the Apache Giraph system, its abstraction, program- mingmodelanddesignarchitecturesothatwesitthestageforthereaderandpro- vide him with the fundamental information which is required to smoothly follow theotherchaptersofthebook. Chapter 2 takes on Giraph as a platform. Keeping in view that Giraph uses Hadoop as its underlying execution engine, we explain how to setup Hadoop in differentmodes,howtomonitoritandhowtorunGiraphontopofitusingitsbi- naries or source code. We then move to explaining, how to use Giraph. We start byrunninganexamplejobindifferentHadoopmodesandthenapproachmoread- vancedtopicslikemonitoringGiraphapplicationlife-cycleandmonitoringGiraph jobs using different methods. Giraph is a very flexible platform and its behaviour canbetunedinmanyways.WeexplainthedifferentmethodsofconfiguringGiraph andendthechapterbygivingadetaileddescriptionofsettingupaGiraphproject inEclipseandIntelliJIDE. Chapter 3 provides an introduction to Giraph programming. We introduce the basic Giraph graph model and explain how to write a Giraph program using the vertexsimilarityalgorithmasausecase.Weexplainthreedifferentwaysofwriting thedriverprogramandtheirprosandcons.ForloadingdataintoGiraph,itcomes packaged with numerous input formats for reading different formats of data. We describeeachoftheformatswithexamplesandendthechapterwiththedescription ofGiraphoutputformats. Chapter 4 discusses the implementation of some popular graph algorithms in- cluding PageRank, connected components, shortest paths and triangle closing. In eachofthesealgorithms,wegiveanintroductorydescriptionandshowsomeofits possibleapplications.Thenusingasampledatagraph,weshowhowthealgorithm works.Finally,wedescribetheimplementationdetailsofthealgorithminGiraph. Chapter 5 spots the light on advanced Giraph programming. We start by dis- cussing common Giraph algorithmic optimizations and how those optimizations Preface xi may improve the performance and flexibility of the algorithms implemented in Chapter 4. We explain different graph optimizations to enable users to implement complexgraphalgorithms.Then,wediscussasetoftunableGiraphconfigurations thatcontrolsGiraph’sutilizationoftheunderlyingresources.Wealsodiscusshow tochangeGiraph’sdefaultpartitioningalgorithmandhowtowriteacustomgraph input and output format. We then talk about common Giraph runtime errors and finalizethechapterwithinformationonGiraph’sfailurerecovery. Recently, several systems have been introduced to tackle the challenge of large scalegraphprocessing.InChapter6,wehighlighttwoofthesesystems,GraphXand GraphLab.Wedescribetheirprogramabstractionsandtheirprogrammingmodels. We also highlight the main commonalities and differences between these systems andApacheGiraph. TargetAudience Wehopethisbookservesasausefulreferenceforstudents,researchersandpracti- tionersinthedomainoflargescalegraphprocessing. ToStudents:Wehopethatthebookprovidesyouwithanenjoyableintroduction tothefieldoflargescalegraphprocessing.Wehaveattemptedtoproperlydescribe the state of the art and present the technical challenges in depth. The book will provideyouwithacomprehensiveintroductionandhands-onexperiencetotackling largescalegraphprocessingproblemusingtheApacheGiraphsystems. ToResearchers:Thematerialofthisbookwillprovideyouwithathoroughcov- eragefortheemergingandongoingadvancementsonbiggraphprocessingsystems. Youalsocanusethisbookasastartingpointtotackleyournextresearchchallenge inthedomainoflargescalegraphprocessing. ToPractitioners:Youwillfindthisbookaveryusefulstep-by-stepguidewith several code examples , with source codes avilable in the github repository of the book, and programming optimization techniques so that you can immediately put thegainedknowledgefromthisbookintopracticeduetotheopensourceavailability ofApacheGiraphsystem. SherifSakr FaisalMoeen IbrahimAbdelaziz ZuhairKhayyat

Description:
In transportation, graph analytics can enable personalized with Apache Giraph, a popular distributed graph processing platform, which is de- signed to 1 A camel is an even-toed ungulate within the genus Camelus, bearing.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.