ebook img

Machine Learning: a Practical Approach on the Statistical Learning Theory PDF

373 Pages·2018·6.83 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Machine Learning: a Practical Approach on the Statistical Learning Theory

Rodrigo Fernandes de Mello  Moacir Antonelli Ponti Machine Learning A Practical Approach on the Statistical Learning Theory Machine Learning Rodrigo Fernandes de Mello Moacir Antonelli Ponti Machine Learning A Practical Approach on the Statistical Learning Theory 123 RodrigoFernandesdeMello MoacirAntonelliPonti InstituteofMathematics InstituteofMathematics andComputerScience andComputerScience UniversityofSãoPaulo UniversityofSãoPaulo SãoCarlos,SãoPaulo,Brazil SãoCarlos,SãoPaulo,Brazil ISBN978-3-319-94988-8 ISBN978-3-319-94989-5 (eBook) https://doi.org/10.1007/978-3-319-94989-5 LibraryofCongressControlNumber:2018947414 ©SpringerInternationalPublishingAG,partofSpringerNature2018 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland “TomylovelywifeClaudia forher unconditional supportalongallthesegreat years,mymotherandmyfatherforall background,andspeciallytoeveryperson whohasdedicatedtime,effort,andloveto study.” –RodrigoFernandesdeMello “Thededication ofthisbookissplitfive ways:toPaula,loveofallpasttimesand book;tomymotherClarawhoalways believedthatIcoulddoanything; toyou(the reader)thetheoryandpracticeIwanted someonehadtaught me15yearsago;to caffeineformakingithappen;andforthe mostbeautiful ofthegoddesses.” –MoacirAntonelliPonti Foreword Although Machine Learning is currently a hot topic, with several research break- throughsandinnovativeapplications,theareaismissingaliteraturethatprovides, in a practical and accessible way, theoretical issues that provided the foundation of the area. Statistical learning theory plays an important role in many research breakthroughs in machine learning. There is a classical and excellent literature in the area. However, most of them assume a strong mathematical and statistical knowledge.Thebook,writtenbyRodrigoFernandesdeMelloandMoacirAntonelli Ponti, both with much experience in the area, provides an innovative and clear approach to present the main concepts in statistical learning theory. The authors wereverycompetentinthechoiceoftheissuestobecoveredandonhowtoexplain theminaneasyanddidacticway.Theideaofprovidingelegantalgorithmsforthe implementation of several important issues in statistical learning theory, the book motivates the reader and demystifies the impression that this is a dry and difficult issue. Using the R Statistical Software, the examples of source code are easy to followandgivedeepinsightstothereader.Thisbookwillbeaveryimportantsource of knowledge for those interested in learning and pursuing a career in machine learning. Theauthorspresentconcepts,methods,andalgorithmsinasimpleandintuitive way.Thetextbookcouldbearelevanttoolformaster’sandPhDstudentsinterested inunderstandingandexploringthetheoreticalaspectsofML. LTCI,TélécomParisTech,Paris,France AlbertBifet UniversityofSãoPaulo,SãoPaulo,Brazil AndréCarlosP.deL.F.deCarvalho UniversityofPorto,Porto,Portugal JoãoGama vii Acknowledgments This book was only possible due to the direct and indirect cooperation of several friends, colleagues, and students throughout the last 7 years, after deciding to propose and to teach the course on Statistical Learning Theory to postgraduate students at the Institute of Mathematics and Computer Science at the University ofSãoPaulo,Brazil. While studying several books and articles on theoretical aspects of machine learning, linear algebra, linear and convex optimization, and support vector machines,wedecidedtoofferafirstversionofsuchpostgraduatecourse,including and detailing all necessary subjects so any student could picture the theoretical foundation of supervised learning and implement its most relevant algorithm, i.e., the support vector machine. Several course amendments were made based on the questions from students and further discussions with friends and colleagues, including Tiago Santana de Nazaré who motivated us to develop the theoretical formulation for the distance-weighted nearest neighbors, Felipe Simões Lage Gomes Duarte and Gabriel de Barros Paranhos da Costa who helped us to improve slides and pay attention to additional material which was included later on, Thiago Bianchi for the discussions, and Victor Hugo Cândido de Oliveira and Daniel Moreira Cestari for helping with slide improvements as well as theoretical discussions. We thank Carlos Henrique Grossi Ferreira for supporting thediscussionsandhelpingustoprovethegeneral-purposeshatteringcoefficient. We also thank our students and collaborators who somehow supported us duringthiswork:MarthaDaisFerreira,DéboraCristinaCorrêa,LucasdeCarvalho Pagliosa, Yule Vaz, Ricardo Araújo Rios, Cássio Martini Martins Pereira, Rosane Maria Maffei Vallim, André Mantini Eberle, Eduardo Alves Ferreira, Vinícius de Freitas Reis, José Augusto Andrade Filho, Matheus Lorenzo dos Santos, Fausto GuzzodaCosta,MarceloKeeseAlbertini,PauloHenriqueRibeiroGabriel,Renato Porfirio Ishii, Evgueni Dodonov (in memoriam), Anderson Caio Santos Silva, FernandoPereiradosSantos,LuizFernandoColetta,CamilaTatianaPicon,Isadora Rossi,MateusRiva,LeonardoSampaioFerrazRibeiro,GabrielaThumé,andFábio RodriguesJorge. ix x Acknowledgments After four consecutive years of this course on statistical learning theory, we nowsharethistextbooktobringthoseconceptsandpracticalaspectstothewhole community. We expect this content to help students and researchers to understand andopennewpathsforthemachinelearningarea. Contents 1 ABriefReviewonMachineLearning ..................................... 1 1.1 MachineLearningDefinition........................................... 1 1.2 MainTypesofLearning ................................................ 4 1.3 SupervisedLearning.................................................... 5 1.4 HowaSupervisedAlgorithmLearns? ................................. 19 1.5 IllustratingtheSupervisedLearning ................................... 28 1.5.1 ThePerceptron.................................................. 28 1.5.2 MultilayerPerceptron .......................................... 52 1.6 ConcludingRemarks.................................................... 72 1.7 ListofExercises......................................................... 73 References..................................................................... 73 2 StatisticalLearningTheory................................................. 75 2.1 Motivation............................................................... 75 2.2 BasicConcepts.......................................................... 76 2.2.1 ProbabilityDensitiesandJointProbabilities.................. 77 2.2.2 IdenticallyandIndependentlyDistributedData............... 82 2.2.3 StatisticalLearningTheoryAssumptions ..................... 89 2.2.4 ExpectedRiskandGeneralization............................. 90 2.2.5 BoundsforGeneralization:APracticalExample............. 92 2.2.6 BayesRiskandUniversalConsistency........................ 97 2.2.7 Consistency,OverfittingandUnderfitting..................... 98 2.2.8 BiasofClassificationAlgorithms.............................. 101 2.3 EmpiricalRiskMinimizationPrinciple................................ 102 2.3.1 ConsistencyandtheERMPrinciple........................... 104 2.3.2 RestrictionoftheSpaceofAdmissibleFunctions ............ 105 2.3.3 EnsuringUniformConvergenceinPractice................... 108 2.4 SymmetrizationLemmaandtheShatteringCoefficient .............. 110 2.4.1 ShatteringCoefficientasaCapacityMeasure................. 111 2.4.2 MakingtheERMPrincipleConsistentforInfiniteFunctions 113 2.5 GeneralizationBounds.................................................. 115 xi xii Contents 2.6 TheVapnik-ChervonenkisDimension................................. 118 2.6.1 MarginBounds.................................................. 121 2.7 ComputingtheShatteringCoefficient.................................. 122 2.8 ConcludingRemarks.................................................... 126 2.9 ListofExercises......................................................... 127 References..................................................................... 127 3 AssessingSupervisedLearningAlgorithms............................... 129 3.1 PracticalAspectsoftheStatisticalLearningTheory.................. 129 3.2 Distance-WeightedNearestNeighbors................................. 129 3.3 UsingtheChernoffBound.............................................. 138 3.4 UsingtheGeneralizationBound ....................................... 146 3.5 UsingtheSVMGeneralizationBound................................. 149 3.6 EmpiricalStudyoftheBiasesofClassificationAlgorithms.......... 157 3.7 ConcludingRemarks.................................................... 160 3.8 ListofExercises......................................................... 160 References..................................................................... 161 4 IntroductiontoSupportVectorMachines ................................ 163 4.1 AboutthisChapter...................................................... 163 4.2 LinearAlgebra .......................................................... 163 4.2.1 Basis............................................................. 163 4.2.2 LinearTransformation.......................................... 165 4.2.3 InversesofLinearTransformations............................ 168 4.2.4 DotProducts .................................................... 170 4.2.5 ChangeofBasisandOrthonormalBasis ...................... 172 4.2.6 EigenvaluesandEigenvectors.................................. 174 4.3 UsingBasicAlgebratoBuildaClassificationAlgorithm ............ 179 4.4 Hyperplane-BasedClassification:AnIntuitiveView ................. 190 4.5 Hyperplane-BasedClassification:AnAlgebraicView................ 198 4.5.1 LagrangeMultipliers ........................................... 202 4.5.2 Karush-Kuhn-TuckerConditions .............................. 206 4.6 FormulatingtheHard-MarginSVMOptimizationProblem.......... 211 4.7 FormulatingtheSoft-MarginSVMOptimizationProblem........... 219 4.8 ConcludingRemarks.................................................... 225 4.9 ListofExercises......................................................... 225 References..................................................................... 226 5 InSearchfortheOptimizationAlgorithm................................ 227 5.1 Motivation............................................................... 227 5.2 IntroducingOptimizationProblems.................................... 227 5.3 MainTypesofOptimizationProblems................................. 228 5.4 LinearOptimizationProblems ......................................... 234 5.4.1 SolvingThroughGraphing..................................... 234 5.4.2 PrimalandDualFormsofLinearProblems................... 241 5.4.3 UsinganAlgorithmicApproachtoSolveLinearProblems.. 253 5.4.4 OntheKKTConditionsforLinearProblems................. 263

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.