ebook img

Data Mining for Business Analytics: Concepts, Techniques, and Applications in R PDF

577 Pages·2017·25.254 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Mining for Business Analytics: Concepts, Techniques, and Applications in R

DATA MINING FOR BUSINESS ANALYTICS DATA MINING FOR BUSINESS ANALYTICS Concepts, Techniques, and Applications in R Galit Shmueli Peter C. Bruce Inbal Yahav Nitin R. Patel Kenneth C. Lichtendahl, Jr. Thiseditionfirstpublished2018 ©2018JohnWiley&Sons,Inc. Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmitted,in anyformorbyanymeans,electronic,mechanical,photocopying,recordingorotherwise,exceptaspermittedby law.Adviceonhowtoobtainpermissiontoreusematerialfromthistitleisavailableat http://www.wiley.com/go/permissions. TherightofGalitShmueli,PeterC.Bruce,InbalYahav,NitinR.Patel,andKennethC.LichtendahlJr.tobe identifiedastheauthorsofthisworkhasbeenassertedinaccordancewithlaw. RegisteredOffices JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,USA EditorialOffice 111RiverStreet,Hoboken,NJ07030,USA Fordetailsofourglobaleditorialoffices,customerservices,andmoreinformationaboutWileyproductsvisitusat www.wiley.com. Wileyalsopublishesitsbooksinavarietyofelectronicformatsandbyprint-on-demand.Somecontentthat appearsinstandardprintversionsofthisbookmaynotbeavailableinotherformats. LimitofLiability/DisclaimerofWarranty Thepublisherandtheauthorsmakenorepresentationsorwarrantieswithrespecttotheaccuracyorcompleteness ofthecontentsofthisworkandspecificallydisclaimallwarranties;includingwithoutlimitationanyimplied warrantiesoffitnessforaparticularpurpose.Thisworkissoldwiththeunderstandingthatthepublisherisnot engagedinrenderingprofessionalservices.Theadviceandstrategiescontainedhereinmaynotbesuitablefor everysituation.Inviewofon-goingresearch,equipmentmodifications,changesingovernmentalregulations,and theconstantflowofinformationrelatingtotheuseofexperimentalreagents,equipment,anddevices,thereader isurgedtoreviewandevaluatetheinformationprovidedinthepackageinsertorinstructionsforeachchemical, pieceofequipment,reagent,ordevicefor,amongotherthings,anychangesintheinstructionsorindicationof usageandforaddedwarningsandprecautions.Thefactthatanorganizationorwebsiteisreferredtointhiswork asacitationand/orpotentialsourceoffurtherinformationdoesnotmeanthattheauthororthepublisher endorsestheinformationtheorganizationorwebsitemayprovideorrecommendationsitmaymake.Further, readersshouldbeawarethatwebsiteslistedinthisworkmayhavechangedordisappearedbetweenwhenthis workswaswrittenandwhenitisread.Nowarrantymaybecreatedorextendedbyanypromotionalstatements forthiswork.Neitherthepublishernortheauthorshallbeliableforanydamagesarisingherefrom. LibraryofCongressCataloging-in-PublicationDataappliedfor Hardback:9781118879368 CoverDesign:Wiley CoverImage:©AchimMittler,FrankfurtamMain/Gettyimages Setin11.5/14.5ptBemboStdbyAptaraInc.,NewDelhi,India PrintedintheUnitedStatesofAmerica. 10 9 8 7 6 5 4 3 2 1 The beginning of wisdom is this: Get wisdom, and whatever else you get, get insight. – Proverbs 4:7 Contents ForewordbyGarethJames xix ForewordbyRaviBapna xxi PrefacetotheREdition xxiii Acknowledgments xxvii PART I PRELIMINARIES CHAPTER 1 Introduction 3 1.1 WhatIsBusinessAnalytics? . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 WhatIsDataMining? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 DataMiningandRelatedTerms . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 BigData. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 DataScience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 WhyAreThereSoManyDifferentMethods? . . . . . . . . . . . . . . . . . . . 8 1.7 TerminologyandNotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.8 RoadMapstoThisBook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 OrderofTopics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 CHAPTER 2 Overview of the Data Mining Process 15 2.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 CoreIdeasinDataMining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 AssociationRulesandRecommendationSystems . . . . . . . . . . . . . . . . . 16 PredictiveAnalytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 DataReductionandDimensionReduction . . . . . . . . . . . . . . . . . . . . 17 DataExplorationandVisualization . . . . . . . . . . . . . . . . . . . . . . . . 17 SupervisedandUnsupervisedLearning . . . . . . . . . . . . . . . . . . . . . . 18 2.3 TheStepsinDataMining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4 PreliminarySteps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 OrganizationofDatasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 PredictingHomeValuesintheWestRoxburyNeighborhood . . . . . . . . . . . 21 vii viii CONTENTS LoadingandLookingattheDatainR . . . . . . . . . . . . . . . . . . . . . . 22 SamplingfromaDatabase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 OversamplingRareEventsinClassificationTasks . . . . . . . . . . . . . . . . . 25 PreprocessingandCleaningtheData. . . . . . . . . . . . . . . . . . . . . . . 26 2.5 PredictivePowerandOverfitting . . . . . . . . . . . . . . . . . . . . . . . . . 33 Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 CreationandUseofDataPartitions . . . . . . . . . . . . . . . . . . . . . . . 35 2.6 BuildingaPredictiveModel . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 ModelingProcess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.7 UsingRforDataMiningonaLocalMachine . . . . . . . . . . . . . . . . . . . 43 2.8 AutomatingDataMiningSolutions . . . . . . . . . . . . . . . . . . . . . . . . 43 DataMiningSoftware: TheStateoftheMarket(byHerbEdelstein). . . . . . . . 45 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 PART II DATA EXPLORATION AND DIMENSION REDUCTION CHAPTER 3 Data Visualization 55 3.1 UsesofDataVisualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 BaseRorggplot? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.2 DataExamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Example1: BostonHousingData . . . . . . . . . . . . . . . . . . . . . . . . 57 Example2: RidershiponAmtrakTrains. . . . . . . . . . . . . . . . . . . . . . 59 3.3 BasicCharts: BarCharts,LineGraphs,andScatterPlots . . . . . . . . . . . . . 59 DistributionPlots: BoxplotsandHistograms . . . . . . . . . . . . . . . . . . . 61 Heatmaps: VisualizingCorrelationsandMissingValues . . . . . . . . . . . . . . 64 3.4 MultidimensionalVisualization . . . . . . . . . . . . . . . . . . . . . . . . . . 67 AddingVariables: Color,Size,Shape,MultiplePanels,andAnimation . . . . . . . 67 Manipulations: Rescaling,AggregationandHierarchies,Zooming,Filtering . . . . 70 Reference: TrendLinesandLabels . . . . . . . . . . . . . . . . . . . . . . . . 74 ScalinguptoLargeDatasets. . . . . . . . . . . . . . . . . . . . . . . . . . . 74 MultivariatePlot: ParallelCoordinatesPlot. . . . . . . . . . . . . . . . . . . . 75 InteractiveVisualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.5 SpecializedVisualizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 VisualizingNetworkedData . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 VisualizingHierarchicalData: Treemaps . . . . . . . . . . . . . . . . . . . . . 82 VisualizingGeographicalData: MapCharts . . . . . . . . . . . . . . . . . . . . 83 3.6 Summary: MajorVisualizationsandOperations,byDataMiningGoal . . . . . . . 86 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 TimeSeriesForecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 UnsupervisedLearning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 CHAPTER 4 Dimension Reduction 91 4.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.2 CurseofDimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.