www.it-ebooks.info Cassandra Design Patterns Understand and apply Cassandra design and usage patterns, and solve real-world business or technical problems Sanjay Sharma BIRMINGHAM - MUMBAI www.it-ebooks.info Cassandra Design Patterns Copyright © 2014 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: January 2014 Production Reference: 1200114 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78328-880-9 www.packtpub.com Cover Image by Aniket Sawant ([email protected]) www.it-ebooks.info Credits Author Project Coordinator Sanjay Sharma Akash Poojary Reviewers Proofreader William Berg Simran Bhogal Mark Kerzner Indexer Hemangini Bari Acquisition Editors Pramila Balan Sam Wood Graphics Abhinash Sahu Commissioning Editor Sharvari Tawde Production Coordinator Nilesh R. Mohite Technical Editors Mario D'Souza Cover Work Nilesh R. Mohite Dennis John Gaurav Thingalaya Pankaj Kadam Copy Editors Tanvi Gaitonde Dipti Kapadia Kirti Pai Stuti Srivastava www.it-ebooks.info About the Author Sanjay Sharma has been the architect of enterprise-grade solutions in the software industry for around 15 years and using Big Data and Cloud technologies over the past four to five years to solve complex business problems. He has extensive experience with cardinal technologies, including Cassandra, Hadoop, Hive, MongoDB, MPP DW, and Java/J2EE/SOA, which allowed him to pioneer the LinkedIn group, Hadoop India. Over the years, he has also played a pivotal role in many industries, including healthcare, finance, CRM, manufacturing, and banking/insurance. Sanjay is highly venerated for his technological insight and is invited to speak regularly at Big Data, Cloud, and Agile events. He is also an active contributor to open source. I would like to thank my employer, Impetus and iLabs, and its R&D department, which invests in cutting-edge technologies. This has allowed me to become a pioneer in mastering Cassandra- and Hadoop-like technologies early on. But, most importantly, I want to acknowledge my family, my beautiful wife and son, who have always supported and encouraged me in all my endeavors in life. www.it-ebooks.info About the Reviewers William Berg is a software developer for OpenMarket. He helps maintain the Apache Cassandra cluster, which forms part of their internal, distributed file storage solution. Mark Kerzner holds degrees in Law, Math, and Computer Science. He has been designing software for many years and Hadoop-based systems since 2008. He is President of SHMsoft, a provider of Hadoop applications for various verticals, and a co-founder of the Hadoop Illuminated training and consulting firm, as well as the co-author of Hadoop Illuminated, Hadoop illuminated LLC. He has also authored and co-authored other books and patents. I would like to acknowledge the help of my colleagues, in particular Sujee Maniyam and last but not the least, my multitalented family. www.it-ebooks.info www.PacktPub.com Support files, eBooks, discount offers and more You might want to visit www.PacktPub.com for support files and downloads related to your book. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks. TM http://PacktLib.PacktPub.com Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books. Why Subscribe? • Fully searchable across every book published by Packt • Copy and paste, print and bookmark content • On demand and accessible via web browser Free Access for Packt account holders If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access. www.it-ebooks.info Table of Contents Preface 1 Chapter 1: An Overview of Architecture and Data Modeling in Cassandra 5 Understanding the background of Cassandra's architecture 5 Amazon Dynamo 6 Google BigTable 7 Understanding the background of Cassandra modeling 8 An overview of architecture and modeling 8 A summary of the features in Cassandra 10 Summary 11 Chapter 2: An Overview of Case and Design Patterns 13 Understanding the 3V Model 14 High availability 15 Columns on the fly! 16 Count and count! 16 Streaming analytics! 17 Needle in a haystack! 17 Graph problems! 17 Analytics 17 Blob store 18 Design patterns 18 Summary 19 Chapter 3: 3V Patterns 21 Pattern name – Web scale store 22 Problem/Intent 22 Context/Applicability 22 Forces/Motivations 22 www.it-ebooks.info Table of Contents Solution 23 Consequences 24 Pattern name – Ultra fast data sink 26 Problem/Intent 26 Context/Applicability 27 Forces/Motivations 27 Solution 28 Consequences 29 Related patterns 29 Pattern name – Flexi schema 29 Problem/Intent 30 Context/Applicability 30 Forces/Motivations 30 Solution 31 Consequences 31 Related patterns 31 Summary 32 Chapter 4: Core Cassandra Patterns 33 Pattern name – Highly available store 33 Problem/Intent 33 Context/Applicability 34 Forces/Motivations 34 Solution 35 Example 36 Pattern name – Time series analytics 36 Problem/Intent 36 Context/Applicability 37 Forces/Motivations 37 Solution 38 Example 38 Pattern name – Atomic distributed counter service 40 Problem/Intent 40 Context/Applicability 40 Forces/Motivations 40 Solution 40 Example 40 Summary 42 Chapter 5: Search and Analytics Applied Use Case Patterns 43 Pattern name – Streaming/CEP analytics 43 Problem/Intent 43 Context/Applicability 44 [ ii ] www.it-ebooks.info Table of Contents Forces/Motivations 44 Solution 44 Pattern name – Needle in a haystack/search engine 46 Problem/Intent 46 Context/Applicability 47 Forces/Motivations 47 Solution 47 Pattern name – Graph problems 49 Problem/Intent 49 Context/Applicability 49 Forces/Motivations 50 Solution 50 Pattern name – Advanced analytics 50 Problem/Intent 50 Context/Applicability 50 Forces/Motivations 51 Solution 51 Summary 52 Chapter 6: Patterns and Anti-patterns 53 Pattern name – Content/Document store 53 Problem/Intent 53 Context/Applicability 54 Forces/Motivations 54 Solution 54 Example 54 Caution 55 Pattern name – Object/Entity store 55 Problem/Intent 55 Context/Applicability 56 Forces/Motivations 56 Solution 56 Caution 57 Pattern name – CAP the ACID 57 Problem/Intent 57 Context/Applicability 57 Forces/Motivations 57 Solution 58 Caution 59 Pattern name – Materialized view 60 Problem/Intent 60 [ iii ] www.it-ebooks.info