UseR! Paulo Cortez Modern Optimization with R Use R! SeriesEditors: RobertGentleman KurtHornik GiovanniParmigiani Moreinformationaboutthisseriesathttp://www.springer.com/series/6991 Use R! Albert:BayesianComputationwithR(2nded.2009) Bivand/Pebesma/Gómez-Rubio:AppliedSpatialDataAnalysiswithR(2nded. 2013) Cook/Swayne:InteractiveandDynamicGraphicsforDataAnalysis: WithRandGGobi Hahne/Huber/Gentleman/Falcon:BioconductorCaseStudies Paradis:AnalysisofPhylogeneticsandEvolutionwithR(2nded.2012) Pfaff:AnalysisofIntegratedandCointegratedTimeSerieswithR(2nded.2008) Sarkar:Lattice:MultivariateDataVisualizationwithR Spector:DataManipulationwithR Paulo Cortez Modern Optimization with R 123 PauloCortez DepartmentofInformationSystems UniversityofMinho Guimarães,Portugal ISSN2197-5736 ISSN2197-5744(electronic) ISBN978-3-319-08262-2 ISBN978-3-319-08263-9(eBook) DOI10.1007/978-3-319-08263-9 SpringerChamHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2014945630 MathematicsSubjectClassification(2010):68T20,60-04,62M10,62M45,65C05,68T05,97R40 ©SpringerInternationalPublishingSwitzerland2014 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’slocation,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer. PermissionsforusemaybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violations areliabletoprosecutionundertherespectiveCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface Currently,weareintheInformationAge,wheremoreorganizationalandindividual activities and processes are based on Information Technology. We are also in a fast changing world. Due to several factors, such as globalization, technological improvements, and more recently the 2008 financial crisis, both organizations and individuals are pressured for improving their efficiency, reducing costs, and making better-informed decisions. This is where optimization methods, supported bycomputationaltools,canplayakeyrole. Optimizationisaboutminimizingormaximizingagoal(orgoals)anditisuseful inseveraldomains,includingAgriculture,Banking,Control,Engineering,Finance, Marketing, Production, and Science. Examples of real-world applications include theoptimization ofconstruction works,financial portfolios,marketingcampaigns, andwatermanagementinagriculture,justtonameafew. Modern optimization, also known as metaheuristics, is related with general purpose solversbasedoncomputational methods thatusefewdomainknowledge, iteratively improving an initial solution (or population of solutions) to optimize a problem.Modernoptimizationisparticularlyusefulforsolvingcomplexproblems for which no specialized optimization algorithm has been developed, such as problems with discontinuities, dynamic changes, multiple objectives, or hard and softrestrictions,whicharemoredifficulttobehandledbyclassicalmethods. Althoughmodernoptimizationoftenincorporatesrandomprocesseswithintheir search engines, the overall optimization procedure tends to be much better than purerandom(MonteCarlo)search.Severalofthesemethodsarenaturallyinspired. Examplesofpopularmodernmethodsthatarediscussedinthisbookaresimulated annealing,tabusearch,geneticalgorithms,geneticprogramming,NSGAII(multi- objectiveoptimization),differentialevolution,andparticleswarmoptimization. R is a free, open source, and multiple platform tool (e.g., Windows, Linux, MacOS) that was specifically developed for statistical analysis. Currently, there is an increasing interest in using R to perform an intelligent data analysis. While it is difficult to know the real number of R users (e.g., it may range from 250,000 to 2 million), several estimates show a clear growth in the R popularity. In effect, v vi Preface theRcommunityisveryactiveandnewpackagesarebeingcontinuouslycreated, with more than 5800 packages available, thus enhancing the tool capabilities. In particular,severalofthesepackagesimplementmodernoptimizationmethods. Thereareseveralbooksthatdiscusseithermodernoptimizationmethodsorthe R tool. However, within the author’s knowledge, there is no book that integrates both subjects and under a practical point of view, with several application R code examples that can be easily tested by the readers. Hence, the goal of this book is to gather in a single document (self-contained) the most relevant concepts related with modern optimization methods, showing how such concepts and methods can beaddressedusingtheRtool. Thisbookisaddressedforseveraltargetaudiencegroups.GiventhattheRtool isfree,thisbookcanbeeasilyadoptedinseveralbachelorormasterlevelcoursesin areassuchas“OperationsResearch”,“DecisionSupport”,“BusinessIntelligence”, “Soft Computing”, or “Evolutionary Computation”. Thus, this book should be appealing for bachelor’s or master’s students in Computer Science, Information Technology, or related areas (e.g., Engineering or Science). The book should also beofinterestfortwotypesofpractitioners:Rusersinterestedinapplyingmodern optimizationmethodsandnonRexpertdataanalystsoroptimizationpractitioners whowanttotesttheRcapabilitiesforoptimizingreal-worldtasks. HowtoReadThisBook Thisbookisorganizedasfollows: Chapter1introducesthemotivationformodernoptimizationmethodsandwhy theRtoolshouldbeusedtoexploresuchmethods.Also,thischapterdiscusses key modern optimization topics, namely the representation of a solution, the evaluation function, constraints, and an overall view of modern optimization methods. This chapter ends with the description of the optimization tasks that areusedfortutorialpurposesinthenextchapters. Chapter2presentsbasicconceptsabouttheRtool.Thischapterisparticularly addressedfornonRexperts,includingthenecessaryknowledgethatisrequired tounderstandandtestthebookexamples.Rexpertsshouldskipthischapter. Chapter 3 is about how blind search can be implemented in R. This chapter detailsinparticularthreeapproaches:pureblind,grid,andMonteCarlosearch. Chapter 4 introduces local search methods, namely hill climbing, simulated annealing,andtabusearch.Then,anexamplecomparisonbetweenseverallocal searchmethodsisshown. Chapter 5 presents population-based search methods, namely genetic and evo- lutionary algorithms, differential evolution, particle swarm optimization and Preface vii estimation of distribution algorithm. Then, two additional examples are dis- cussed, showing a comparison between population-based methods and how to handleconstraints. Chapter 6 is dedicated to multi-objective optimization. This chapter first presents three demonstrative multi-objective tasks and then discusses three multi-objective optimization approaches: weighted-formula, lexicographic, and Pareto(e.g.,NSGA-IIalgorithm). Chapter 7 presents three real-world applications of previously discussed meth- ods, namely traveling salesman problem, time series forecasting, and wine qualityclassification. Eachchapterstartswithanintroduction,followedbyseveralchaptertopicrelated sectionsandendswithanRcommandsummaryandexercisesections.Throughout the book, several examples of R code are shown. The code was run using a 64 bit R (version 3.0.2) under on a MacOS laptop. Nevertheless, these examples should be easily reproduced by the readers on other systems, possibly resulting in slight numerical(32bitversion)orgraphicaldifferencesfordeterministicexamples.Also, giventhatalargeportionofthediscussedmethodsarestochastic,itisnaturalthat differentexecutionsofthesamecodeandunderthesamesystemwillleadtosmall differencesintheresults. It is particularly recommended that students should execute the R code and try to solve the proposed exercises. Examples of solutions are presented at the end of thisbook.Allthesecodefilesanddataexamplesareavailableat:http://www3.dsi. uminho.pt/pcortez/mor. Production Severalcontentsofthisbookweretaughtbytheauthorinthelast5yearsindistinct course units of master and doctoral programs. At the master’s level, it included the courses “Adaptive Business Intelligence” (Masters in Information Technology, UniversityofMinho,Portugal)and“BusinessIntelligence”(MastersinInformation Systems Management, Lisbon University Institute, Portugal). The doctoral course was “Adaptive Business Intelligence” (Doctoral Program in Computer Science, Universities of Minho, Aveiro and Porto, Portugal). Also, some material was lectured at a tutorial given in the European Simulation and Modelling Conference (ESM2011),heldatGuimarães. This book was written in LATEX, using the texstudio editor (http://texstudio. sourceforge.net) and its US English spell checker. Most figures were made in R, whilesomeofthefiguresweredesignedusingxfig(http://www.xfig.org),anopen sourcevectorgraphicaltool. Guimarães,Portugal PauloCortez Contents 1 Introduction .................................................................. 1 1.1 Motivation.............................................................. 1 1.2 WhyR?................................................................. 2 1.3 RepresentationofaSolution.......................................... 3 1.4 EvaluationFunction ................................................... 3 1.5 Constraints............................................................. 4 1.6 OptimizationMethods................................................. 5 1.7 DemonstrativeProblems .............................................. 7 2 RBasics ....................................................................... 11 2.1 Introduction............................................................ 11 2.2 BasicObjectsandFunctions.......................................... 13 2.3 ControllingExecutionandWritingFunctions ....................... 20 2.4 ImportingandExportingData ........................................ 24 2.5 AdditionalFeatures.................................................... 26 2.6 CommandSummary................................................... 27 2.7 Exercises............................................................... 29 3 BlindSearch.................................................................. 31 3.1 Introduction............................................................ 31 3.2 FullBlindSearch ...................................................... 32 3.3 GridSearch ............................................................ 36 3.4 MonteCarloSearch ................................................... 40 3.5 CommandSummary................................................... 42 3.6 Exercises............................................................... 43 4 LocalSearch.................................................................. 45 4.1 Introduction............................................................ 45 4.2 HillClimbing .......................................................... 45 4.3 SimulatedAnnealing .................................................. 50 4.4 TabuSearch............................................................ 53 4.5 ComparisonofLocalSearchMethods................................ 57 ix
Description: