ebook img

Learning Storm PDF

255 Pages·2014·10.164 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Learning Storm

Learning Storm Copyright©2014PacktPublishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, exceptinthecaseofbrief quotationsembeddedincriticalarticlesorreviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectlybythisbook. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannotguaranteetheaccuracyofthisinformation. Firstpublished:August2014 Productionreference:1200814 PublishedbyPacktPublishing Ltd. LiveryPlace 35LiveryStreet BirminghamB32PB,UK. ISBN978-1-78398-132-8 www.packtpub.com Coverimage byPratyushMohanta(<[email protected]>) 2 Credits Authors AnkitJain AnandNalya Reviewers VinothKannan SonalRaj DanijelSchiavuzzi Commissioning Editor UshaIyer Acquisition Editor LlewellynRozario Content DevelopmentEditor SankalpPawar Technical Editors MenzaMathew SiddhiRane CopyEditors SarangChari MradulaHegde Project Coordinator 3 HarshalVed Proofreaders SimranBhogal AmeeshaGreen PaulHindle Indexers HemanginiBari TejalSoni PriyaSubramani Graphics AbhinashSahu ProductionCoordinator SaiprasadKadam Cover Work SaiprasadKadam 4 Table of Contents AbouttheAuthors...............................................................................................................................10 AbouttheReviewers................................................................................................................................10 Preface.....................................................................................................................................................13 Whatthisbookcovers..............................................................................................................................13 Whatyouneedforthisbook...................................................................................................................14 Whothisbookisfor.................................................................................................................................14 Conventions...............................................................................................................................................14 Readerfeedback........................................................................................................................................15 Customersupport.....................................................................................................................................16 Chapter1. Setting Up Stormon a Single Machine.................................................................18 FeaturesofStorm.....................................................................................................................................19 Stormcomponents..................................................................................................................................20 Nimbus................................................................................................................................................20 Supervisornodes...............................................................................................................................20 TheZooKeeper cluster......................................................................................................................20 TheStormdatamodel..............................................................................................................................21 DefinitionofaStormtopology........................................................................................................22 Operation modes................................................................................................................................25 Settingupyourdevelopmentenvironment...................................................................................25 Developingasampletopology.........................................................................................................29 SettingupZooKeeper........................................................................................................................36 SettingupStormonasingledevelopmentmachine....................................................................37 Deployingthesampletopologyonasingle-nodecluster............................................................38 Summary...................................................................................................................................................42 Chapter2. SettingUp a StormCluster.......................................................................................43 SettingupadistributedStormcluster..................................................................................................47 DeployingatopologyonaremoteStormcluster................................................................................49 5 Deployingthesampletopologyontheremotecluster.................................................................50 Configuringtheparallelism ofatopology............................................................................................52 Theworkerprocess............................................................................................................................52 Theexecutor.......................................................................................................................................52 Tasks....................................................................................................................................................52 Configuringparallelism atthecodelevel.......................................................................................53 Distributingworkerprocesses,executors,andtasksin thesampletopology...........................54 Rebalancingtheparallelism ofatopology............................................................................................55 Rebalancingtheparallelism ofthesampletopology....................................................................56 Streamgrouping.......................................................................................................................................58 Shufflegrouping................................................................................................................................58 Fieldsgrouping..................................................................................................................................58 Allgrouping........................................................................................................................................59 Globalgrouping.................................................................................................................................60 Directgrouping...................................................................................................................................61 Localorshufflegrouping..................................................................................................................61 Customgrouping...............................................................................................................................62 Guaranteedmessageprocessing............................................................................................................63 Summary...................................................................................................................................................66 Chapter3. Monitoring theStorm Cluster.................................................................................67 StartingtousetheStormUI...................................................................................................................67 MonitoringatopologyusingtheStormUI..........................................................................................68 Clusterstatisticsusing theNimbusthriftclient...................................................................................75 FetchinginformationwiththeNimbusthriftclient.....................................................................75 Summary...................................................................................................................................................87 Chapter4. Storm andKafka Integration..................................................................................88 TheKafkaarchitecture............................................................................................................................88 Theproducer......................................................................................................................................89 Replication.........................................................................................................................................90 Consumers..........................................................................................................................................90 Brokers.................................................................................................................................................91 6 Dataretention.....................................................................................................................................91 SettingupKafka.......................................................................................................................................92 Settingupasingle-nodeKafkacluster...........................................................................................92 Settingupathree-nodeKafkacluster............................................................................................95 AsampleKafkaproducer........................................................................................................................97 IntegratingKafkawithStorm...............................................................................................................101 Summary.................................................................................................................................................108 Chapter5. Exploring High-level Abstraction in Stormwith Trident..........................109 IntroducingTrident...............................................................................................................................109 UnderstandingTrident'sdatamodel...................................................................................................110 WritingTrident functions,filters, andprojections...........................................................................110 Tridentfunctions..............................................................................................................................110 Tridentfilters.....................................................................................................................................111 Tridentprojections...........................................................................................................................113 Tridentrepartitioning operations........................................................................................................114 Theshuffleoperation.......................................................................................................................114 ThepartitionByoperation...............................................................................................................114 Theglobaloperation........................................................................................................................116 Thebroadcastoperation..................................................................................................................116 ThebatchGlobaloperation..............................................................................................................117 Thepartitionoperation...................................................................................................................118 Tridentaggregators................................................................................................................................119 Thepartitionaggregate...................................................................................................................119 Theaggregate...................................................................................................................................120 Thepersistentaggregate.................................................................................................................123 Aggregatorchaining........................................................................................................................124 Utilizing thegroupByoperation...........................................................................................................125 Anon-transactional topology...............................................................................................................125 AsampleTrident topology....................................................................................................................129 MaintainingthetopologystatewithTrident.....................................................................................133 Atransactional topology.......................................................................................................................134 7 Theopaquetransactionaltopology.....................................................................................................136 DistributedRPC......................................................................................................................................137 WhentouseTrident...............................................................................................................................141 Summary..................................................................................................................................................141 Chapter6. IntegrationofStorm with Batch Processing Tools.......................................142 ExploringApacheHadoop....................................................................................................................142 UnderstandingHDFS......................................................................................................................143 UnderstandingYARN......................................................................................................................145 InstallingApacheHadoop....................................................................................................................146 Settinguppassword-lessSSH........................................................................................................147 GettingtheHadoopbundleandsettingupenvironment variables.........................................148 SettingupHDFS..............................................................................................................................149 SettingupYARN..............................................................................................................................153 IntegrationofStormwithHadoop......................................................................................................156 SettingupStorm-YARN..................................................................................................................156 DeployingStorm-StartertopologiesonStorm-YARN......................................................................161 Summary.................................................................................................................................................163 Chapter7. Integrating Stormwith JMX, Ganglia, HBase, andRedis.........................164 MonitoringtheStormclusterusingJMX...........................................................................................164 MonitoringtheStormclusterusingGanglia......................................................................................167 IntegratingStormwithHBase..............................................................................................................177 IntegratingStormwithRedis...............................................................................................................188 Summary.................................................................................................................................................194 Chapter8. Log Processing with Storm.....................................................................................195 Serverlog-processingelements............................................................................................................195 ProducingtheApachelogin Kafka.....................................................................................................196 Splittingtheserver logline..................................................................................................................200 Identifyingthecountry,theoperating system type,andthebrowsertypefromthelogfile......204 Extracting thesearchedkeyword........................................................................................................207 Persistingtheprocess data....................................................................................................................211 Definingatopologyand theKafkaspout............................................................................................217 8 Deployingatopology............................................................................................................................220 MySQLqueries.......................................................................................................................................221 Calculatingthepage hitsfromeachcountry...............................................................................222 Calculatingthecountforeachbrowser........................................................................................223 Calculatingthecountforeachoperating system........................................................................223 Summary.................................................................................................................................................224 Chapter9. MachineLearning......................................................................................................225 Exploringmachinelearning.................................................................................................................225 UsingTrident-ML..................................................................................................................................226 Theusecase–clusteringsyntheticcontroldata...............................................................................227 Producingatraining datasetintoKafka............................................................................................228 BuildingaTrident topologytobuildtheclusteringmodel..............................................................232 Summary................................................................................................................................................238 Index......................................................................................................................................................240 9 About the Authors Ankit Jain holds a Bachelor's degree in Computer Science Engineering. He has 4 years of experience in designing and architecting solutions for the Big Data domain and has been involved with several complex engagements. His technical strengths include Hadoop, Storm, S4, HBase, Hive, Sqoop, Flume, ElasticSearch, Machine Learning, Kafka, Spring, Java, and J2EE. HeiscurrentlyemployedwithImpetusInfotechPvt.Ltd. He also shares his thoughts on his personal blog at http://ankitasblogger.blogspot.in/. You can follow him on Twitter at @mynameisanky. He spends most of his time reading books and playing with different technologies. When not at work, he spends time with his family and friendswatching moviesandplaying games. I would like to thank my family and colleagues for always being there for me. Special thanks to thePacktPublishing team;withoutyouguys,thisworkwouldnothavebeen possible. Anand Nalya is a full stack engineer with over 8 years of extensive experience in designing, developing, deploying, and benchmarking Big Data and web-scale applications for both start-ups and enterprises. He focuses on reducing the complexity in getting things done with brevityincode. He blogs about Big Data, web applications, and technology in general at http://anandnalya.com/. You can also follow him on Twitter at @anandnalya. When not workingonprojects, hecanbefoundstargazing orreading. I would like to thank my wife, Nidhi, for putting up with so many of my side projects and my family members who are always there for me. Special thanks to my colleagues who helped me validate the writing, and finally, the reviewers and editors at Packt Publishing, without whom thisworkwouldnothavebeen possible. About the Reviewers Vinoth Kannan is a solution architect at WidasConcepts, Germany, that focuses on creating robust, highly scalable, real-time systems for storage, search, and analytics. He now works in GermanyafterhisprofessionalstintsinFrance,Italy,andIndia. 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.