www.it-ebooks.info Learning Storm Create real-time stream processing applications with Apache Storm Ankit Jain Anand Nalya BIRMINGHAM - MUMBAI www.it-ebooks.info Learning Storm Copyright © 2014 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: August 2014 Production reference: 1200814 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78398-132-8 www.packtpub.com Cover image by Pratyush Mohanta ([email protected]) www.it-ebooks.info Credits Authors Project Coordinator Ankit Jain Harshal Ved Anand Nalya Proofreaders Simran Bhogal Reviewers Vinoth Kannan Ameesha Green Sonal Raj Paul Hindle Danijel Schiavuzzi Indexers Hemangini Bari Commissioning Editor Usha Iyer Tejal Soni Priya Subramani Acquisition Editor Llewellyn Rozario Graphics Abhinash Sahu Content Development Editor Sankalp Pawar Production Coordinator Saiprasad Kadam Technical Editors Menza Mathew Cover Work Siddhi Rane Saiprasad Kadam Copy Editors Sarang Chari Mradula Hegde www.it-ebooks.info About the Authors Ankit Jain holds a Bachelor's degree in Computer Science Engineering. He has 4 years of experience in designing and architecting solutions for the Big Data domain and has been involved with several complex engagements. His technical strengths include Hadoop, Storm, S4, HBase, Hive, Sqoop, Flume, ElasticSearch, Machine Learning, Kafka, Spring, Java, and J2EE. He is currently employed with Impetus Infotech Pvt. Ltd. He also shares his thoughts on his personal blog at http://ankitasblogger. blogspot.in/. You can follow him on Twitter at @mynameisanky. He spends most of his time reading books and playing with different technologies. When not at work, he spends time with his family and friends watching movies and playing games. I would like to thank my family and colleagues for always being there for me. Special thanks to the Packt Publishing team; without you guys, this work would not have been possible. www.it-ebooks.info Anand Nalya is a full stack engineer with over 8 years of extensive experience in designing, developing, deploying, and benchmarking Big Data and web-scale applications for both start-ups and enterprises. He focuses on reducing the complexity in getting things done with brevity in code. He blogs about Big Data, web applications, and technology in general at http://anandnalya.com/. You can also follow him on Twitter at @anandnalya. When not working on projects, he can be found stargazing or reading. I would like to thank my wife, Nidhi, for putting up with so many of my side projects and my family members who are always there for me. Special thanks to my colleagues who helped me validate the writing, and finally, the reviewers and editors at Packt Publishing, without whom this work would not have been possible. www.it-ebooks.info About the Reviewers Vinoth Kannan is a solution architect at WidasConcepts, Germany, that focuses on creating robust, highly scalable, real-time systems for storage, search, and analytics. He now works in Germany after his professional stints in France, Italy, and India. Currently, he works extensively with open source frameworks based on Storm, Hadoop, and NoSQL databases. He has helped design and develop complex, real-time Big Data systems for some of the largest financial institutions and e-commerce companies. He also co-organizes the Big Data User group in Karlsruhe and Stuttgart in Germany, and is a regular speaker at user group meets and international conferences on Big Data. He holds a double Master's degree in Communication Systems Engineering from Politecnico di Torino, Italy, and Grenoble Institute of Technology, France. This is for my wonderful parents and my beloved wife, Sudha. www.it-ebooks.info Sonal Raj is a Pythonista, technology enthusiast, and an entrepreneur. He is an engineer with dreams. He has been a research fellow at SERC, IISc, Bangalore, and he has pursued projects on distributed computing and real-time operations. He has spoken at PyCon India on Storm and Neo4J and has published articles and research papers in leading magazines and international journals. Presently, he works at Sigmoid Analytics, where he is actively involved in the development of machine-learning frameworks and Big Data solutions. I am grateful to Ankit and Anand for patiently listening to my critiques, and I'd like to thank the open source community for keeping their passion alive and contributing to remarkable projects such as Storm. A special thank you to my parents, without whom I never would have grown to love learning as much as I do. Danijel Schiavuzzi is a software engineer and technology enthusiast with a passionate interest in systems programming and distributed systems. Currently, he works at Infobip, where he finds new usages for Storm and other Big Data technologies in the telecom domain on a daily basis. He has a strong focus on real-time data analytics, log processing, and external systems monitoring and alerting. He is passionate about open source, having contributed a few minor patches to Storm itself. In his spare time, he enjoys reading a book, following space exploration and scientific and technological news, tinkering with various gadgets, listening and occasionally playing music, discovering old art movie masterpieces, and enjoying cycling around beautiful natural sceneries. I would like to thank the Apache Storm community for developing such a great technology and making distributed computing more fun. www.it-ebooks.info www.PacktPub.com Support files, eBooks, discount offers, and more You might want to visit www.PacktPub.com for support files and downloads related to your book. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks. TM http://PacktLib.PacktPub.com Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read, and search across Packt's entire library of books. Why subscribe? • Fully searchable across every book published by Packt • Copy and paste, print, and bookmark content • On demand and accessible via web browser Free access for Packt account holders If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access. www.it-ebooks.info Table of Contents Preface 1 Chapter 1: Setting Up Storm on a Single Machine 7 Features of Storm 8 Storm components 9 Nimbus 9 Supervisor nodes 9 The ZooKeeper cluster 10 The Storm data model 10 Definition of a Storm topology 11 Operation modes 14 Setting up your development environment 15 Installing Java SDK 6 15 Installing Maven 16 Installing Git – distributed version control 17 Installing the STS IDE 17 Developing a sample topology 19 Setting up ZooKeeper 25 Setting up Storm on a single development machine 26 Deploying the sample topology on a single-node cluster 28 Summary 31 Chapter 2: Setting Up a Storm Cluster 33 Setting up a ZooKeeper cluster 33 Setting up a distributed Storm cluster 37 Deploying a topology on a remote Storm cluster 39 Deploying the sample topology on the remote cluster 40 Configuring the parallelism of a topology 42 The worker process 42 The executor 42 www.it-ebooks.info
Description: