Apache Kafka Guide ImportantNotice ©2010-2021Cloudera,Inc.Allrightsreserved. Cloudera,theClouderalogo,andanyotherproductor servicenamesorsloganscontainedinthisdocumentaretrademarksofClouderaand itssuppliersorlicensors,andmaynotbecopied,imitatedorused,inwholeorinpart, withoutthepriorwrittenpermissionofClouderaortheapplicabletrademarkholder.If thisdocumentationincludescode,includingbutnotlimitedto,codeexamples,Cloudera makesthisavailabletoyouunderthetermsoftheApacheLicense,Version2.0,including anyrequirednotices.AcopyoftheApacheLicenseVersion2.0,includinganynotices, isincludedherein.AcopyoftheApacheLicenseVersion2.0canalsobefoundhere: https://opensource.org/licenses/Apache-2.0 HadoopandtheHadoopelephantlogoaretrademarksoftheApacheSoftware Foundation.Allothertrademarks,registeredtrademarks,productnamesandcompany namesorlogosmentionedinthisdocumentarethepropertyoftheirrespectiveowners. Referencetoanyproducts,services,processesorotherinformation,bytradename, trademark,manufacturer,supplierorotherwisedoesnotconstituteorimply endorsement,sponsorshiporrecommendationthereofbyus. Complyingwithallapplicablecopyrightlawsistheresponsibilityoftheuser.Without limitingtherightsundercopyright,nopartofthisdocumentmaybereproduced,stored inorintroducedintoaretrievalsystem,ortransmittedinanyformorbyanymeans (electronic,mechanical,photocopying,recording,orotherwise),orforanypurpose, withouttheexpresswrittenpermissionofCloudera. Clouderamayhavepatents,patentapplications,trademarks,copyrights,orother intellectualpropertyrightscoveringsubjectmatterinthisdocument.Exceptasexpressly providedinanywrittenlicenseagreementfromCloudera,thefurnishingofthisdocument doesnotgiveyouanylicensetothesepatents,trademarkscopyrights,orother intellectualproperty.ForinformationaboutpatentscoveringClouderaproducts,see http://tiny.cloudera.com/patents. Theinformationinthisdocumentissubjecttochangewithoutnotice.Clouderashall notbeliableforanydamagesresultingfromtechnicalerrorsoromissionswhichmay bepresentinthisdocument,orfromuseofthisdocument. Cloudera,Inc. 395PageMillRoad PaloAlto,CA94306 [email protected] US:1-888-789-1488 Intl:1-650-362-0488 www.cloudera.com ReleaseInformation Version: CDH6.0.x Date:August2,2021 Table of Contents Apache Kafka Guide.................................................................................................7 IdealPublish-SubscribeSystem............................................................................................................................7 Kafka Architecture................................................................................................................................................7 Topics.....................................................................................................................................................................................8 Brokers...................................................................................................................................................................................8 Records..................................................................................................................................................................................9 Partitions................................................................................................................................................................................9 RecordOrderandAssignment.............................................................................................................................................10 LogsandLogSegments........................................................................................................................................................11 KafkaBrokersandZooKeeper..............................................................................................................................................12 Kafka Setup............................................................................................................14 Hardware Requirements....................................................................................................................................14 Brokers.................................................................................................................................................................................14 ZooKeeper............................................................................................................................................................................14 Kafka Performance Considerations....................................................................................................................15 OperatingSystemRequirements........................................................................................................................15 SUSELinuxEnterpriseServer(SLES).....................................................................................................................................15 KernelLimits........................................................................................................................................................................15 Kafka in Cloudera Manager....................................................................................16 Kafka Clients..........................................................................................................17 CommandsforClientInteractions......................................................................................................................17 KafkaProducers..................................................................................................................................................18 Kafka Consumers................................................................................................................................................19 Subscribingtoatopic...........................................................................................................................................................19 GroupsandFetching............................................................................................................................................................20 ProtocolbetweenConsumerandBroker.............................................................................................................................20 Rebalancing Partitions.........................................................................................................................................................22 ConsumerConfigurationProperties.....................................................................................................................................23 Retries..................................................................................................................................................................................23 KafkaClientsandZooKeeper..............................................................................................................................23 Kafka Brokers.........................................................................................................25 Single Cluster Scenarios.....................................................................................................................................25 Leader Positions...................................................................................................................................................................25 In-SyncReplicas....................................................................................................................................................................26 Topic Configuration............................................................................................................................................26 Topic Creation......................................................................................................................................................................27 Topic Properties...................................................................................................................................................................27 Partition Management.......................................................................................................................................27 PartitionReassignment........................................................................................................................................................28 Adding Partitions.................................................................................................................................................................28 ChoosingtheNumberofPartitions......................................................................................................................................28 Controller.............................................................................................................................................................................28 Kafka Integration....................................................................................................30 Kafka Security.....................................................................................................................................................30 Client-BrokerSecuritywithTLS............................................................................................................................................30 UsingKafka’sInter-BrokerSecurity......................................................................................................................................33 Enabling Kerberos Authentication.......................................................................................................................................34 EnablingEncryptionatRest.................................................................................................................................................35 TopicAuthorizationwithKerberosandSentry.....................................................................................................................36 ManagingMultipleKafkaVersions.....................................................................................................................39 KafkaFeatureSupportinClouderaManagerandCDH........................................................................................................39 Client/BrokerCompatibilityAcrossKafkaVersions..............................................................................................................40 UpgradingyourKafkaCluster..............................................................................................................................................40 ManagingTopicsacrossMultipleKafkaClusters................................................................................................42 Consumer/ProducerCompatibility.......................................................................................................................................42 TopicDifferencesbetweenClusters......................................................................................................................................43 OptimizeMirrorMakerProducerLocation..........................................................................................................................43 DestinationClusterConfiguration........................................................................................................................................43 KerberosandMirrorMaker.................................................................................................................................................43 SettingupMirrorMakerinClouderaManager...................................................................................................................43 SettingupanEnd-to-EndDataStreamingPipeline............................................................................................44 Data Streaming Pipeline......................................................................................................................................................44 IngestUsingKafkawithApacheFlume................................................................................................................................44 UsingKafkawithApacheSparkStreamingforStreamProcessing......................................................................................51 Developing Kafka Clients....................................................................................................................................52 Simple Client Examples........................................................................................................................................................52 MovingKafkaClientstoProduction.....................................................................................................................................55 KafkaMetrics......................................................................................................................................................57 Metrics Categories...............................................................................................................................................................57 Viewing Metrics...................................................................................................................................................................57 BuildingClouderaManagerChartswithKafkaMetrics.......................................................................................................58 Kafka Administration..............................................................................................59 Kafka Administration Basics...............................................................................................................................59 Broker Log Management.....................................................................................................................................................59 Record Management...........................................................................................................................................................59 BrokerGarbageLogCollectionandLogRotation................................................................................................................60 AddingUsersasKafkaAdministrators.................................................................................................................................60 MigratingBrokersinaCluster............................................................................................................................60 UsingrsynctoCopyFilesfromOneBrokertoAnother........................................................................................................61 Setting User Limitsfor Kafka..............................................................................................................................61 Quotas................................................................................................................................................................61 Setting Quotas.....................................................................................................................................................................62 KafkaAdministrationUsingCommandLineTools..............................................................................................62 UnsupportedCommandLineTools.....................................................................................................................................62 NotesonKafkaCLIAdministration.....................................................................................................................................63 kafka-topics..........................................................................................................................................................................64 kafka-configs........................................................................................................................................................................64 kafka-console-consumer......................................................................................................................................................64 kafka-console-producer.......................................................................................................................................................65 kafka-consumer-groups.......................................................................................................................................................65 kafka-reassign-partitions.....................................................................................................................................................65 kafka-log-dirs.......................................................................................................................................................................69 kafka-*-perf-test..................................................................................................................................................................70 EnablingDEBUGorTRACEincommandlinescripts............................................................................................................71 Understandingthekafka-run-classBashScript.................................................................................................................71 JBOD...................................................................................................................................................................71 JBODSetupandMigration...................................................................................................................................................71 JBODOperationalProcedures..............................................................................................................................................74 Kafka Performance Tuning......................................................................................77 Tuning Brokers....................................................................................................................................................77 TuningProducers................................................................................................................................................77 Tuning Consumers..............................................................................................................................................78 Mirror Maker Performance................................................................................................................................78 KafkaTuning:HandlingLargeMessages.............................................................................................................78 Kafka Cluster Sizing............................................................................................................................................79 ClusterSizing-NetworkandDiskMessageThroughput.....................................................................................................79 ChoosingtheNumberofPartitionsforaTopic....................................................................................................................80 KafkaPerformanceBrokerConfiguration...........................................................................................................82 JVMandGarbageCollection................................................................................................................................................82 Networkand I/O Threads....................................................................................................................................................82 ISR Management.................................................................................................................................................................82 Log Cleaner..........................................................................................................................................................................83 KafkaPerformance:System-LevelBrokerTuning...............................................................................................83 FileDescriptorLimits............................................................................................................................................................83 Filesystems...........................................................................................................................................................................84 Virtual Memory Handling....................................................................................................................................................84 Networking Parameters.......................................................................................................................................................84 Configuring JMX Ephemeral Ports.......................................................................................................................................84 Kafka-ZooKeeperPerformanceTuning...............................................................................................................85 Kafka Reference.....................................................................................................86 Metrics Reference..............................................................................................................................................86 UsefulShellCommandReference....................................................................................................................161 Hardware Information.......................................................................................................................................................161 Disk Space..........................................................................................................................................................................161 I/OActivityandUtilization.................................................................................................................................................161 FileDescriptor Usage.........................................................................................................................................................162 NetworkPorts,States,andConnections............................................................................................................................162 Process Information...........................................................................................................................................................162 Kernel Configuration..........................................................................................................................................................162 Kafka Public APIs..................................................................................................163 Kafka Frequently Asked Questions.......................................................................164 Basics................................................................................................................................................................164 Use Cases.........................................................................................................................................................166 References........................................................................................................................................................172 Appendix: Apache License, Version 2.0.................................................................173 ApacheKafkaGuide Apache Kafka Guide ApacheKafkaisastreamingmessageplatform.Itisdesignedtobehighperformance,highlyavailable,andredundant. Examplesofapplicationsthatcanusesuchaplatforminclude: • InternetofThings.TVs,refrigerators,washingmachines,dryers,thermostats,andpersonalhealthmonitorscan allsendtelemetrydatabacktoaserverthroughtheInternet. • SensorNetworks.Areas(farms,amusementparks,forests)andcomplexdevices(engines)canbedesignedwith anarrayofsensorstotrackdataorcurrentstatus. • PositionalData.Deliverytrucksormassivelymultiplayeronlinegamescansendlocationdatatoacentralplatform. • OtherReal-TimeData.Satellitesandmedicalsensorscansendinformationtoacentralareaforprocessing. Ideal Publish-Subscribe System Theidealpublish-subscribesystemisstraight-forward:PublisherA’smessagesmustmaketheirwaytoSubscriberA, PublisherB’smessagesmustmaketheirwaytoSubscriberB,andsoon. Figure1:IdealPublish-SubscribeSystem Anidealsystemhasthebenefitof: • UnlimitedLookback.AnewSubscriberA1canreadPublisherA’sstreamatanypointintime. • MessageRetention.Nomessagesarelost. • UnlimitedStorage.Thepublish-subscribesystemhasunlimitedstorageofmessages. • NoDowntime.Thepublish-subscribesystemisneverdown. • UnlimitedScaling.Thepublish-subscribesystemcanhandleanynumberofpublishersand/orsubscriberswith constantmessagedeliverylatency. Nowlet'sseehowKafka’simplementationrelatestothisidealsystem. Kafka Architecture Asisthecasewithallreal-worldsystems,Kafka'sarchitecturedeviatesfromtheidealpublish-subscribesystem.Some ofthekeydifferencesare: • Messagingisimplementedontopofareplicated,distributedcommitlog. • Theclienthasmorefunctionalityand,therefore,moreresponsibility. • Messagingisoptimizedforbatchesinsteadofindividualmessages. • Messagesareretainedevenaftertheyareconsumed;theycanbeconsumedagain. Theresultsofthesedesigndecisionsare: • Extremehorizontalscalability • Veryhighthroughput • Highavailability • but,differentsemanticsandmessagedeliveryguarantees ApacheKafkaGuide|7 ApacheKafkaGuide Thenextfewsectionsprovideanoverviewofsomeofthemoreimportantparts,whilelatersectiondescribedesign specificsandoperationsingreaterdetail. Topics Intheidealsystempresentedabove,messagesfromonepublisherwouldsomehowfindtheirwaytoeachsubscriber. Kafkaimplementstheconceptofatopic.Atopicallowseasymatchingbetweenpublishersandsubscribers. Figure2:TopicsinaPublish-SubscribeSystem Atopicisaqueueofmessageswrittenbyoneormoreproducersandreadbyoneormoreconsumers.Atopicis identifiedbyitsname.ThisnameispartofaglobalnamespaceofthatKafkacluster. SpecifictoKafka: • Publishersarecalledproducers. • Subscribersarecalledconsumers. Aseachproducerorconsumerconnectstothepublish-subscribesystem,itcanreadfromorwritetoaspecifictopic. Brokers Kafkaisadistributedsystemthatimplementsthebasicfeaturesoftheidealpublish-subscribesystemdescribedabove. EachhostintheKafkaclusterrunsaservercalledabrokerthatstoresmessagessenttothetopicsandservesconsumer requests. 8|ApacheKafkaGuide ApacheKafkaGuide Figure3:BrokersinaPublish-SubscribeSystem Kafkaisdesignedtorunonmultiplehosts,withonebrokerperhost.Ifahostgoesoffline,Kafkadoesitsbesttoensure thattheotherhostscontinuerunning.Thissolvespartofthe“NoDowntime”and“UnlimitedScaling”goalsfromthe idealpublish-subscribesystem. KafkabrokersalltalktoZookeeperfordistributedcoordination,additionalhelpforthe"UnlimitedScaling"goalfrom theidealsystem. Topicsarereplicatedacrossbrokers.Replicationisanimportantpartof“NoDowntime,”“UnlimitedScaling,”and “MessageRetention”goals. Thereisonebrokerthatisresponsibleforcoordinatingthecluster.Thatbrokeriscalledthecontroller. Asmentionedearlier,anidealtopicbehavesasaqueueofmessages.Inreality,havingasinglequeuehasscalingissues. Kafkaimplementspartitionsforaddingrobustnesstotopics. Records InKafka,apublish-subscribemessageiscalledarecord.Arecordconsistsofakey/valuepairandmetadataincluding atimestamp.Thekeyisnotrequired,butcanbeusedtoidentifymessagesfromthesamedatasource.Kafkastores keysandvaluesasarraysofbytes.Itdoesnototherwisecareabouttheformat. Themetadataofeachrecordcanincludeheaders.Headersmaystoreapplication-specificmetadataaskey-valuepairs. Inthecontextoftheheader,keysarestringsandvaluesarebytearrays. Forspecificdetailsoftherecordformat,seetheRecorddefinitionintheApacheKafkadocumentation. Partitions Insteadofallrecordshandledbythesystembeingstoredinasinglelog,Kafkadividesrecordsintopartitions.Partitions canbethoughtofasasubsetofalltherecordsforatopic.Partitionshelpwiththeidealof“UnlimitedScaling”. Recordsinthesamepartitionarestoredinorderofarrival. Whenatopiciscreated,itisconfiguredwithtwoproperties: ApacheKafkaGuide|9 ApacheKafkaGuide partitioncount Thenumberofpartitionsthatrecordsforthistopicwillbespreadamong. replicationfactor Thenumberofcopiesofapartitionthataremaintainedtoensureconsumersalwayshaveaccesstothequeueof recordsforagiventopic. Eachtopichasoneleaderpartition.Ifthereplicationfactorisgreaterthanone,therewillbeadditionalfollower partitions.(Forthereplicationfactor=M,therewillbeM-1followerpartitions.) AnyKafkaclient(aproducerorconsumer)communicatesonlywiththeleaderpartitionfordata.Allotherpartitions existforredundancyandfailover.Followerpartitionsareresponsibleforcopyingnewrecordsfromtheirleader partitions.Ideally,thefollowerpartitionshaveanexactcopyofthecontentsoftheleader.Suchpartitionsarecalled in-syncreplicas(ISR). WithNbrokersandtopicreplicationfactorM,then • IfM<N,eachbrokerwillhaveasubsetofallthepartitions • IfM=N,eachbrokerwillhaveacompletecopyofthepartitions Inthefollowingillustration,thereareN=2brokersandM=2replicationfactor.Eachproducermaygeneraterecords thatareassignedacrossmultiplepartitions. Figure4:RecordsinaTopicareStoredinPartitions,PartitionsareReplicatedacrossBrokers Partitionsarethekeytokeepinggoodrecordthroughput.Choosingthecorrectnumberofpartitionsandpartition replicationsforatopic • Spreadsleaderpartitionsevenlyonbrokersthroughoutthecluster • Makespartitionswithinthesametopicareroughlythesamesize. • Balancestheloadonbrokers. RecordOrderandAssignment Bydefault,Kafkaassignsrecordstoapartitionsround-robin.Thereisnoguaranteethatrecordssenttomultiple partitionswillretaintheorderinwhichtheywereproduced.Withinasingleconsumer,yourprogramwillonlyhave recordorderingwithintherecordsbelongingtothesamepartition.Thistendstobesufficientformanyusecases,but doesaddsomecomplexitytothestreamprocessinglogic. Tip: Kafkaguaranteesthatrecordsinthesamepartitionwillbeinthesameorderinallreplicasofthatpartition. 10|ApacheKafkaGuide
Description: