UNIVERSITY OF OSLO Department of Informatics Automatic scaling of Cassandra clusters Master thesis Tor Andreas Baakind April 30, 2013 Automatic scaling of Cassandra clusters TorAndreasBaakind April30,2013 ii Abstract Thepurposeofthisthesisistocreateanautomaticscalingimplementation forCassandraclusters. Theautomaticscalershouldneverlowertheoverall performance of the cluster in a way that results in a bad user experience. It should also be able to successfully scale up and down nodes, and the cluster should continue as if nothing happened. Last but not least, it is desirable that the automatic scaler performs equally, or better than, the personwhoisinchargeofadministratingthedatabase. In this thesis we have developed an early version of an autoscaler that may run alongside a Cassandra instance. The implementation is split into two separate implementations: a master-, and an agent-implementation. Themasterwillbedeployedtothesameserverastheapplicationusingthe cluster,eventhoughthisisnotrequired. Theagentimplementationwillbe deployed to, and run alongside, all nodes that are a part of the cluster. The agent will monitor the node’s resource usage, and send messages backtothemasteriftheusageincreasesabove,ordecreasesbelowcertain thresholds. Weperformedasetoftestcasestoprovethattheimplementationworks asintended. Thetestcasesrecordedthenodesresource-usagetodetermine theimpactourimplementationmakestotheoverallperformance. iii iv Acknowledgments I would like to thank my supervisors, Ketil Velle and Dag Langmyhr, for their guidance and valuable feedback. This thesis would not have been completed without them. I would also like to thank Jørgen Sørensen for proofreadingthethesis. I also want to thank my girlfriend Anniken, for being so supportive and understanding throughout the thesis work. And finally, I would like tothankmyparentsforbelievinginme. v vi Contents I Introduction 1 1 Introduction 3 1.1 Problemdefinition . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 II Background 7 2 Motivation 9 3 NoSQL 13 3.1 Databasetransactions . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.1 TheACIDsacrifice . . . . . . . . . . . . . . . . . . . . 14 3.2 Brewer’sCAPtheorem . . . . . . . . . . . . . . . . . . . . . . 16 3.3 NoSQLdatastores . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.1 Extensiblerecordstores . . . . . . . . . . . . . . . . . 18 3.3.2 Key-valuestores . . . . . . . . . . . . . . . . . . . . . 20 3.3.3 Documentstores . . . . . . . . . . . . . . . . . . . . . 21 3.3.4 Graphdatabases . . . . . . . . . . . . . . . . . . . . . 22 3.3.5 Otherknowndatamodelsthatarenotcovered . . . . 22 3.3.6 NoSQLadvantagesanddisadvantages . . . . . . . . 23 4 Cassandra 27 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2 Datamodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3 Nodeandclusterconfiguration . . . . . . . . . . . . . . . . . 30 4.4 Thegossipprotocol . . . . . . . . . . . . . . . . . . . . . . . . 31 4.4.1 Hintedhandoffs . . . . . . . . . . . . . . . . . . . . . 33 4.5 Merkletree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.6 Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.6.1 Single-threadedstages . . . . . . . . . . . . . . . . . . 34 4.6.2 Multi-threadedstages . . . . . . . . . . . . . . . . . . 35 4.7 NodeTool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 vii 5 Relatedwork 39 5.1 Netflix’sPriam. . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.1.1 AmazonWebServices . . . . . . . . . . . . . . . . . . 39 5.1.2 Netflix’smotivation . . . . . . . . . . . . . . . . . . . 39 5.1.3 WhywedidnotchoosePriam . . . . . . . . . . . . . 40 5.2 Hector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2.2 WhywedidnotuseHector . . . . . . . . . . . . . . . 42 III Theproject 43 6 Introduction 45 6.1 NamingtheimplementationHecuba . . . . . . . . . . . . . . 45 7 Goalsandmethodology 47 7.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7.2.1 Kanban . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7.2.2 Storypoints . . . . . . . . . . . . . . . . . . . . . . . . 49 7.2.3 Velocitytrack . . . . . . . . . . . . . . . . . . . . . . . 49 7.3 Sourcecontrol . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 8 Failedattempts 53 8.1 IncludeHecubaintoCassandra’ssourcecode . . . . . . . . . 53 8.2 ImplementedasanextensiontoexistingJava-projects . . . . 54 8.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 9 Hecubadesign 57 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 9.2 Loadbalanceissues . . . . . . . . . . . . . . . . . . . . . . . . 57 9.2.1 Tokenrange . . . . . . . . . . . . . . . . . . . . . . . . 59 9.2.2 Token-generation . . . . . . . . . . . . . . . . . . . . . 60 9.2.3 LoadbalancedifferencesbetweenHecubaandPriam 60 9.3 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . 60 9.4 Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 9.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 10 Hecubaimplementation 65 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 10.2 Codeseparation . . . . . . . . . . . . . . . . . . . . . . . . . . 65 10.2.1 Autoscale . . . . . . . . . . . . . . . . . . . . . . . . . 66 10.2.2 Autoscale-common . . . . . . . . . . . . . . . . . . . . 69 10.2.3 Autoscale-agent . . . . . . . . . . . . . . . . . . . . . . 70 10.3 Toolsandframeworks . . . . . . . . . . . . . . . . . . . . . . 73 10.3.1 Maven . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 10.3.2 SigarAPI . . . . . . . . . . . . . . . . . . . . . . . . . . 73 10.4 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 viii
Description: