Apache Impala Guide ImportantNotice ©2010-2021Cloudera,Inc.Allrightsreserved. Cloudera,theClouderalogo,andanyotherproductor servicenamesorsloganscontainedinthisdocumentaretrademarksofClouderaand itssuppliersorlicensors,andmaynotbecopied,imitatedorused,inwholeorinpart, withoutthepriorwrittenpermissionofClouderaortheapplicabletrademarkholder.If thisdocumentationincludescode,includingbutnotlimitedto,codeexamples,Cloudera makesthisavailabletoyouunderthetermsoftheApacheLicense,Version2.0,including anyrequirednotices.AcopyoftheApacheLicenseVersion2.0,includinganynotices, isincludedherein.AcopyoftheApacheLicenseVersion2.0canalsobefoundhere: https://opensource.org/licenses/Apache-2.0 HadoopandtheHadoopelephantlogoaretrademarksoftheApacheSoftware Foundation.Allothertrademarks,registeredtrademarks,productnamesandcompany namesorlogosmentionedinthisdocumentarethepropertyoftheirrespectiveowners. Referencetoanyproducts,services,processesorotherinformation,bytradename, trademark,manufacturer,supplierorotherwisedoesnotconstituteorimply endorsement,sponsorshiporrecommendationthereofbyus. Complyingwithallapplicablecopyrightlawsistheresponsibilityoftheuser.Without limitingtherightsundercopyright,nopartofthisdocumentmaybereproduced,stored inorintroducedintoaretrievalsystem,ortransmittedinanyformorbyanymeans (electronic,mechanical,photocopying,recording,orotherwise),orforanypurpose, withouttheexpresswrittenpermissionofCloudera. Clouderamayhavepatents,patentapplications,trademarks,copyrights,orother intellectualpropertyrightscoveringsubjectmatterinthisdocument.Exceptasexpressly providedinanywrittenlicenseagreementfromCloudera,thefurnishingofthisdocument doesnotgiveyouanylicensetothesepatents,trademarkscopyrights,orother intellectualproperty.ForinformationaboutpatentscoveringClouderaproducts,see http://tiny.cloudera.com/patents. Theinformationinthisdocumentissubjecttochangewithoutnotice.Clouderashall notbeliableforanydamagesresultingfromtechnicalerrorsoromissionswhichmay bepresentinthisdocument,orfromuseofthisdocument. Cloudera,Inc. 395PageMillRoad PaloAlto,CA94306 [email protected] US:1-888-789-1488 Intl:1-650-362-0488 www.cloudera.com ReleaseInformation Version: CDH6.3.x Date:September30,2021 Table of Contents Introducing Apache Impala....................................................................................16 Impala Benefits..................................................................................................................................................16 HowImpalaWorkswithCDH.............................................................................................................................16 Primary Impala Features....................................................................................................................................17 Impala Concepts and Architecture..........................................................................18 Components of the Impala Server.....................................................................................................................18 TheImpalaDaemon.............................................................................................................................................................18 TheImpalaStatestore..........................................................................................................................................................18 TheImpalaCatalogService..................................................................................................................................................19 DevelopingImpalaApplications.........................................................................................................................20 OverviewoftheImpalaSQLDialect.....................................................................................................................................20 OverviewofImpalaProgrammingInterfaces......................................................................................................................21 HowImpalaFitsIntotheHadoopEcosystem.....................................................................................................21 HowImpalaWorkswithHive...............................................................................................................................................21 OverviewofImpalaMetadataandtheMetastore..............................................................................................................22 HowImpalaUsesHDFS........................................................................................................................................................22 HowImpalaUsesHBase......................................................................................................................................................22 Planning for Impala Deployment............................................................................23 Impala Requirements.........................................................................................................................................23 ProductCompatibilityMatrix...............................................................................................................................................23 SupportedOperatingSystems..............................................................................................................................................23 HiveMetastoreandRelatedConfiguration.........................................................................................................................23 JavaDependencies...............................................................................................................................................................23 Networking Configuration Requirements............................................................................................................................24 Hardware Requirements......................................................................................................................................................24 UserAccountRequirements.................................................................................................................................................25 Guidelines for Designing Impala Schemas.........................................................................................................25 Setting Up Apache Impala Using the Command Line...............................................27 WhatisIncludedinanImpalaInstallation.........................................................................................................27 InstallingImpalafromtheCommandLine.........................................................................................................27 ModifyingImpalaStartupOptions.....................................................................................................................29 ConfiguringImpalaStartupOptionsthroughtheCommandLine.......................................................................................29 CheckingtheValuesofImpalaConfigurationOptions........................................................................................................32 StartupOptionsforimpaladDaemon..................................................................................................................................32 StartupOptionsforstatestoredDaemon.............................................................................................................................32 StartupOptionsforcatalogdDaemon.................................................................................................................................32 Starting Impala...................................................................................................................................................32 StartingImpalafromtheCommandLine.............................................................................................................................33 Installing Impala with Cloudera Manager..........................................................................................................33 InstallingImpalafromtheCommandLine.........................................................................................................34 Managing Impala...................................................................................................36 Post-Installation Configuration for Impala.........................................................................................................36 Impala Upgrade Considerations..............................................................................38 ConvertingLegacyUDFsDuringUpgradetoCDH5.12orHigher.......................................................................38 HandlingLargeRowsDuringUpgradetoCDH5.13/Impala2.10orHigher......................................................38 ChangeImpalacatalogdHeapwhenUpgradingfromCDH5.6orLower...........................................................39 ListofReservedWordsUpdatedinCDH6.0/Impala3.0..................................................................................40 DecimalV2UsedbyDefaultinCDH6.0/Impala3.0.........................................................................................40 BehaviorofColumnAliasesChangedinCDH6.0/Impala3.0...........................................................................40 DefaultPARQUET_ARRAY_RESOLUTIONChangedinCDH6.0/Impala3.0.......................................................40 EnableClusteringHintforInserts.......................................................................................................................40 DeprecatedQueryOptionsRemovedinCDH6.0/Impala3.0..........................................................................40 refresh_after_connectImpalaShellOptionRemovedinCDH6.0/Impala3.0.................................................41 ReturnTypeChangedforEXTRACTandDATE_PARTFunctionsinCDH6.0/Impala3.0....................................41 ImpalaRoleswithSELECTorINSERTPrivilegeReceiveREFRESHPrivilegeDuringtheUpgradetoCDH5.16 / CDH 6.1.......................................................................................................................................................41 PortChangeforSHUTDOWNCommand............................................................................................................41 ChangeinClientConnectionTimeout................................................................................................................41 Default Setting Changes.....................................................................................................................................42 Impala Tutorials.....................................................................................................43 TutorialsforGettingStarted...............................................................................................................................43 ExploreaNewImpalaInstance............................................................................................................................................43 LoadCSVDatafromLocalFiles............................................................................................................................................48 PointanImpalaTableatExistingDataFiles........................................................................................................................49 DescribetheImpalaTable....................................................................................................................................................51 QuerytheImpalaTable........................................................................................................................................................51 DataLoadingandQueryingExamples.................................................................................................................................52 Advanced Tutorials.............................................................................................................................................54 AttachinganExternalPartitionedTabletoanHDFSDirectoryStructure............................................................................54 SwitchingBackandForthBetweenImpalaandHive...........................................................................................................56 CrossJoinsandCartesianProductswiththeCROSSJOINOperator....................................................................................57 DealingwithParquetFileswithUnknownSchema............................................................................................58 DownloadtheDataFilesintoHDFS.....................................................................................................................................59 CreateDatabaseandTables................................................................................................................................................59 ExaminePhysicalandLogicalSchema.................................................................................................................................60 Analyze Data........................................................................................................................................................................61 Impala Administration...........................................................................................70 SettingTimeoutPeriodsforDaemons,Queries,andSessions...........................................................................70 IncreasingtheStatestoreTimeout.......................................................................................................................................70 SettingtheIdleQueryandIdleSessionTimeoutsforimpalad.............................................................................................70 SettingTimeoutandRetriesforThriftConnectionstotheBackendClient..........................................................................71 CancellingaQuery...............................................................................................................................................................72 UsingImpalathroughaProxyforHighAvailability............................................................................................72 OverviewofProxyUsageandLoadBalancingforImpala...................................................................................................72 Choosing the Load-Balancing Algorithm.............................................................................................................................73 SpecialProxyConsiderationsforClustersUsingKerberos...................................................................................................74 SpecialProxyConsiderationsforTLS/SSLEnabledClusters..................................................................................................75 ExampleofConfiguringHAProxyLoadBalancerforImpala................................................................................................75 ManagingDiskSpaceforImpalaData................................................................................................................77 Auditing Impala Operations...............................................................................................................................78 DurabilityandPerformanceConsiderationsforImpalaAuditing........................................................................................79 FormatoftheAuditLogFiles...............................................................................................................................................79 WhichOperationsAreAudited...........................................................................................................................................80 ReviewingtheAuditLogs.....................................................................................................................................................80 ViewingLineageInformationforImpalaData....................................................................................................80 Impala Security......................................................................................................82 Security Guidelines for Impala...........................................................................................................................82 SecuringImpalaDataandLogFiles....................................................................................................................83 InstallationConsiderationsforImpalaSecurity..................................................................................................84 SecuringtheHiveMetastoreDatabase..............................................................................................................84 SecuringtheImpalaWebUserInterface............................................................................................................84 Configuring TLS/SSL for Impala..........................................................................................................................85 UsingClouderaManager.....................................................................................................................................................85 ConfiguringTLS/SSLCommunicationfortheImpalaShell...................................................................................................86 UsingTLS/SSLwithBusinessIntelligenceTools....................................................................................................................86 SpecifyingTLS/SSLMinimumAllowedVersionandCiphers.................................................................................................87 EnablingSentryAuthorizationforImpala..........................................................................................................87 TheSentryPrivilegeModel..................................................................................................................................................87 StartingtheimpaladDaemonwithSentryAuthorizationEnabled......................................................................................90 EnablingSentryforImpalainClouderaManager................................................................................................................91 UsingImpalawiththeSentryService(CDH5.1orhigheronly)...........................................................................................91 SettingUpSchemaObjectsforaSecureImpalaDeployment..............................................................................................94 TheDEFAULTDatabaseinaSecureDeployment.................................................................................................................94 ImpalaAuthentication........................................................................................................................................94 EnablingKerberosAuthenticationforImpala......................................................................................................................94 EnablingLDAPAuthenticationforImpala............................................................................................................................96 UsingMultipleAuthenticationMethodswithImpala..........................................................................................................98 ConfiguringImpalaDelegationforHueandBITools...........................................................................................................99 Impala SQL Language Reference...........................................................................101 Comments........................................................................................................................................................101 Data Types........................................................................................................................................................101 ARRAYComplexType(CDH5.5orhigheronly)..................................................................................................................102 BIGINTDataType...............................................................................................................................................................105 BOOLEAN Data Type..........................................................................................................................................................106 CHARDataType(CDH5.2orhigheronly)..........................................................................................................................107 DECIMALDataType(CDH6.0/Impala3.0orhigheronly)...............................................................................................109 DOUBLEDataType.............................................................................................................................................................114 FLOAT Data Type................................................................................................................................................................116 INT Data Type....................................................................................................................................................................117 MAPComplexType(CDH5.5orhigheronly).....................................................................................................................118 REALDataType..................................................................................................................................................................121 SMALLINTDataType..........................................................................................................................................................122 STRINGDataType..............................................................................................................................................................123 STRUCTComplexType(CDH5.5orhigheronly)................................................................................................................124 TIMESTAMPDataType.......................................................................................................................................................130 TINYINTDataType.............................................................................................................................................................136 VARCHARDataType(CDH5.2orhigheronly)...................................................................................................................137 ComplexTypes(CDH5.5orhigheronly)............................................................................................................................139 Literals..............................................................................................................................................................167 Numeric Literals.................................................................................................................................................................167 String Literals.....................................................................................................................................................................168 Boolean Literals.................................................................................................................................................................169 TimestampLiterals.............................................................................................................................................................169 NULL...................................................................................................................................................................................170 SQL Operators..................................................................................................................................................171 Arithmetic Operators.........................................................................................................................................................171 BETWEEN Operator...........................................................................................................................................................174 ComparisonOperators.......................................................................................................................................................175 EXISTS Operator.................................................................................................................................................................176 ILIKE Operator....................................................................................................................................................................179 IN Operator........................................................................................................................................................................180 IREGEXP Operator..............................................................................................................................................................182 ISDISTINCTFROMOperator..............................................................................................................................................183 IS NULL Operator...............................................................................................................................................................185 IS TRUE Operator...............................................................................................................................................................186 LIKE Operator.....................................................................................................................................................................186 LogicalOperators...............................................................................................................................................................187 REGEXP Operator...............................................................................................................................................................189 RLIKEOperator...................................................................................................................................................................191 ImpalaSchemaObjectsandObjectNames.....................................................................................................191 OverviewofImpalaAliases................................................................................................................................................192 OverviewofImpalaDatabases..........................................................................................................................................193 Overview of Impala Functions...........................................................................................................................................194 OverviewofImpalaIdentifiers...........................................................................................................................................195 OverviewofImpalaTables.................................................................................................................................................196 Overview of Impala Views.................................................................................................................................................199 Impala SQL Statements....................................................................................................................................202 DDL Statements.................................................................................................................................................................203 DML Statements................................................................................................................................................................204 ALTER DATABASE Statement..............................................................................................................................................204 ALTER TABLE Statement.....................................................................................................................................................205 ALTER VIEW Statement......................................................................................................................................................218 COMMENTStatement........................................................................................................................................................219 COMPUTESTATSStatement...............................................................................................................................................219 CREATE DATABASE Statement............................................................................................................................................226 CREATE FUNCTION Statement...........................................................................................................................................228 CREATEROLEStatement(CDH5.2orhigheronly).............................................................................................................234 CREATETABLEStatement...................................................................................................................................................234 CREATEVIEWStatement....................................................................................................................................................248 DELETEStatement(CDH5.10orhigheronly)....................................................................................................................249 DESCRIBE Statement..........................................................................................................................................................251 DROPDATABASEStatement...............................................................................................................................................262 DROP FUNCTION Statement..............................................................................................................................................263 DROPROLEStatement(CDH5.2orhigheronly)................................................................................................................265 DROP STATS Statement......................................................................................................................................................265 DROPTABLEStatement......................................................................................................................................................268 DROPVIEWStatement.......................................................................................................................................................270 EXPLAIN Statement............................................................................................................................................................271 GRANTStatement(CDH5.2orhigheronly).......................................................................................................................273 INSERT Statement..............................................................................................................................................................277 INVALIDATE METADATA Statement....................................................................................................................................286 LOAD DATA Statement.......................................................................................................................................................288 REFRESH Statement...........................................................................................................................................................291 REFRESHAUTHORIZATIONStatement................................................................................................................................293 REFRESH FUNCTIONS Statement.......................................................................................................................................293 REVOKEStatement(CDH5.2orhigheronly).....................................................................................................................293 SELECT Statement..............................................................................................................................................................295 SETStatement....................................................................................................................................................................321 SHOW Statement...............................................................................................................................................................363 SHUTDOWNStatement......................................................................................................................................................379 TRUNCATETABLEStatement(CDH5.5orhigheronly)......................................................................................................381 UPDATEStatement(CDH5.10orhigheronly)...................................................................................................................383 UPSERTStatement(CDH5.10orhigheronly)....................................................................................................................384 USE Statement...................................................................................................................................................................385 VALUES Statement.............................................................................................................................................................386 OptimizerHintsinImpala..................................................................................................................................................387 Impala Built-In Functions.................................................................................................................................391 ImpalaMathematicalFunctions........................................................................................................................................397 Impala Bit Functions..........................................................................................................................................................414 ImpalaTypeConversionFunctions.....................................................................................................................................423 ImpalaDateandTimeFunctions.......................................................................................................................................424 Impala Conditional Functions............................................................................................................................................457 Impala String Functions.....................................................................................................................................................462 ImpalaMiscellaneousFunctions........................................................................................................................................477 ImpalaAggregateFunctions..............................................................................................................................................479 ImpalaAnalyticFunctions..................................................................................................................................................506 User-Defined Functions (UDFs)........................................................................................................................525 UDFConcepts.....................................................................................................................................................................525 RuntimeEnvironmentforUDFs..........................................................................................................................................528 InstallingtheUDFDevelopmentPackage..........................................................................................................................528 WritingUser-DefinedFunctions(UDFs).............................................................................................................................529 WritingUser-DefinedAggregateFunctions(UDAFs).........................................................................................................532 BuildingandDeployingUDFs.............................................................................................................................................533 PerformanceConsiderationsforUDFs...............................................................................................................................535 ExamplesofCreatingandUsingUDFs...............................................................................................................................535 SecurityConsiderationsforUser-DefinedFunctions..........................................................................................................540 LimitationsandRestrictionsforImpalaUDFs....................................................................................................................540 ConvertingLegacyUDFsDuringUpgradetoCDH5.12orHigher......................................................................................541 SQLDifferencesBetweenImpalaandHive......................................................................................................541 HiveQLFeaturesnotAvailableinImpala...........................................................................................................................541 SemanticDifferencesBetweenImpalaandHiveQLFeatures............................................................................................542 PortingSQLfromOtherDatabaseSystemstoImpala......................................................................................543 PortingDDLandDMLStatements.....................................................................................................................................543 PortingDataTypesfromOtherDatabaseSystems............................................................................................................543 SQLStatementstoRemoveorAdapt.................................................................................................................................546 SQLConstructstoDouble-check........................................................................................................................................547 NextPortingStepsafterVerifyingSyntaxandSemantics..................................................................................................548 Resource Management........................................................................................549 AdmissionControlandQueryQueuing............................................................................................................549 ConcurrentQueriesandAdmissionControl.......................................................................................................................549 MemoryLimitsandAdmissionControl..............................................................................................................................550 HowImpalaAdmissionControlRelatestoOtherResourceManagementTools................................................................552 HowImpalaSchedulesandEnforcesLimitsonConcurrentQueries..................................................................................552 HowAdmissionControlworkswithImpalaClients(JDBC,ODBC,HiveServer2)................................................................553 SQLandSchemaConsiderationsforAdmissionControl....................................................................................................553 GuidelinesforUsingAdmissionControl.............................................................................................................................554 ConfiguringResourcePoolsandAdmissionControl........................................................................................554 CreatingStaticServicePools..............................................................................................................................................555 UsingAdmissionControl....................................................................................................................................................555 SettingPer-queryMemoryLimits......................................................................................................................................558 ConfiguringAdmissionControlinCommandLineInterface...............................................................................................559 Configuring Cluster-wide Admission Control.....................................................................................................................560 Admission Control Sample Scenario................................................................................................................562 Tuning Impala for Performance............................................................................565 ImpalaPerformanceGuidelinesandBestPractices.........................................................................................565 PerformanceConsiderationsforJoinQueries..................................................................................................568 HowJoinsAreProcessedwhenStatisticsAreUnavailable................................................................................................569 OverridingJoinReorderingwithSTRAIGHT_JOIN..............................................................................................................569 ExamplesofJoinOrderOptimization.................................................................................................................................570 TableandColumnStatistics..............................................................................................................................575 OverviewofTableStatistics...............................................................................................................................................575 OverviewofColumnStatistics............................................................................................................................................576 HowTableandColumnStatisticsWorkforPartitionedTables..........................................................................................577 GeneratingTableandColumnStatistics............................................................................................................................578 DetectingMissingStatistics...............................................................................................................................................582 ManuallySettingTableandColumnStatisticswithALTERTABLE......................................................................................584 ExamplesofUsingTableandColumnStatisticswithImpala.............................................................................................585 Benchmarking Impala Queries.........................................................................................................................588 ControllingImpalaResourceUsage..................................................................................................................588 RuntimeFilteringforImpalaQueries(CDH5.7orhigheronly).......................................................................588 BackgroundInformationforRuntimeFiltering..................................................................................................................589 RuntimeFilteringInternals.................................................................................................................................................590 FileFormatConsiderationsforRuntimeFiltering..............................................................................................................590 WaitIntervalsforRuntimeFilters......................................................................................................................................590 QueryOptionsforRuntimeFiltering..................................................................................................................................591 RuntimeFilteringandQueryPlans....................................................................................................................................591 ExamplesofQueriesthatBenefitfromRuntimeFiltering..................................................................................................592 TuningandTroubleshootingQueriesthatUseRuntimeFiltering......................................................................................593 LimitationsandRestrictionsforRuntimeFiltering.............................................................................................................593 UsingHDFSCachingwithImpala(CDH5.3orhigheronly)..............................................................................593 OverviewofHDFSCachingforImpala...............................................................................................................................594 SettingUpHDFSCachingforImpala..................................................................................................................................594 EnablingHDFSCachingforImpalaTablesandPartitions..................................................................................................595 LoadingandRemovingDatawithHDFSCachingEnabled.................................................................................................596 AdministrationforHDFSCachingwithImpala...................................................................................................................597 PerformanceConsiderationsforHDFSCachingwithImpala.............................................................................................598 DetectingandCorrectingHDFSBlockSkewConditions...................................................................................599 DataCacheforRemoteReads..........................................................................................................................600 TestingImpalaPerformance.............................................................................................................................601 UnderstandingImpalaQueryPerformance-EXPLAINPlansandQueryProfiles.............................................602 UsingtheEXPLAINPlanforPerformanceTuning...............................................................................................................602 UsingtheSUMMARYReportforPerformanceTuning.......................................................................................................603 UsingtheQueryProfileforPerformanceTuning...............................................................................................................604 Scalability Considerations for Impala....................................................................605 ImpactofManyTablesorPartitionsonImpalaCatalogPerformanceandMemoryUsage.............................605 ScalabilityConsideration for Large Clusters.....................................................................................................605 ScalabilityConsiderationsfortheImpalaStatestore........................................................................................606 EffectofBufferPoolonMemoryUsage(CDH5.13andhigher).......................................................................607 SQLOperationsthatSpilltoDisk......................................................................................................................607 LimitsonQuerySizeandComplexity...............................................................................................................611 ScalabilityConsiderationsforImpalaI/O.........................................................................................................611 ScalabilityConsiderationsforTableLayout......................................................................................................611 Kerberos-Related NetworkOverheadfor Large Clusters.................................................................................611 AvoidingCPUHotspotsforHDFSCachedData................................................................................................612 ScalabilityConsiderationsforFileHandleCaching...........................................................................................612 ScalingLimitsandGuidelines...........................................................................................................................613 Deployment Limits.............................................................................................................................................................613 Data Storage Limits...........................................................................................................................................................614 Schema Design Limits........................................................................................................................................................614 Security Limits....................................................................................................................................................................614 QueryLimits-CompileTime..............................................................................................................................................614 QueryLimits-RuntimeTime..............................................................................................................................................614 HowtoConfigureImpalawithDedicatedCoordinators..................................................................................614 DeterminingtheOptimalNumberofDedicatedCoordinators..........................................................................................615 DeployingDedicatedCoordinatorsandExecutorsinClouderaManager..........................................................................618 DeployingDedicatedCoordinatorsandExecutorsfromCommandLine...........................................................................619 Metadata Management...................................................................................................................................620 On-demand Metadata.......................................................................................................................................................620 AutomaticInvalidationofMetadataCache.......................................................................................................................620 AutomaticInvalidation/RefreshofMetadata....................................................................................................................621 Partitioning for Impala Tables...............................................................................625
Description: