ebook img

Outlier Ensembles. An Introduction PDF

282 Pages·2017·4.38 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Outlier Ensembles. An Introduction

Charu C. Aggarwal Saket Sathe (cid:129) Outlier Ensembles An Introduction 123 CharuC. Aggarwal SaketSathe IBMT. J.Watson Research Center IBMT. J.Watson Research Center Yorktown Heights, NY Yorktown Heights, NY USA USA ISBN978-3-319-54764-0 ISBN978-3-319-54765-7 (eBook) DOI 10.1007/978-3-319-54765-7 LibraryofCongressControlNumber:2017934615 ©SpringerInternationalPublishingAG2017 ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface Talent wins games, but teamwork and intelligence wins championships. MichaelJordan Ensembleanalysisisawidelyusedclassofmeta-algorithmsformanydatamining problems such as classification and clustering. Numerous ensemble-based algo- rithms have been proposed in the literature for these problems. Compared to the clustering and classification problems, ensemble analysis has been studied in a limited way in the context of outlier detection. It is only in recent years that the problem of outlier ensembles has been recognized formally, and some techniques for these problems have been proposed. However, the material in the field has not been formally recognized, and it is expected that this book will play a significant roleincreatingastructuredexpositionofthetopic.Thisbookdiscussesavarietyof methods for outlier ensembles and organizes them by the specific principles with whichaccuracyimprovementsareachieved.Inaddition,adiscussionisprovidedon the techniques with which such methods can be made more effective. A formal classification of these methods has been provided, and the circumstances in which they work well are discussed. A discussion is also provided on how outlier ensembles relate (both theoretically and practically) to the ensemble techniques used commonly for other data mining problems such as classification. The simi- laritiesand(subtle)differencesintheensembletechniquesfortheclassificationand outlier detection problems are discussed. These subtle differences do impact the design of ensemble algorithms for the latter problem. Ensemble techniques are increasingly finding their way into the curricula of many data mining courses. This book can be used for such courses. Many illus- trative examples and exercises are provided in order to facilitate classroom teach- ing.Afamiliarityisassumedtotheoutlierdetectionproblemandalsotothegeneric problem of ensemble analysis in classification. This is because many of the ensemblemethodsdiscussedinthisbookareadaptationsfromtheircounterpartsin the classification domain. Some techniques discussed in this book, such as wagging, randomized feature weighting, and geometric subsampling, provide new insights that are notavailable elsewhere. Wehavealso provided ananalysis ofthe performanceofvarioustypesofbasedetectorsandtheirrelativeeffectiveness.This book combines a textbook-style discussion of older topics with new insights. Therefore, we believe that the book will also be of interest to researchers and practitioners for leveraging ensemble methods into optimal algorithmic design. Yorktown Heights, NY, USA Charu C. Aggarwal January 2017 Saket Sathe Contents 1 An Introduction to Outlier Ensembles.... .... .... .... ..... .... 1 1.1 Introduction .... .... ..... .... .... .... .... .... ..... .... 1 1.1.1 Motivations for Ensemble Methods in Outlier Analysis ... 3 1.1.2 Common Settings for Existing Ensemble Methods... .... 4 1.1.3 Types of Ensemble Methods... .... .... .... ..... .... 6 1.1.4 Overview of Outlier Ensemble Design ... .... ..... .... 7 1.2 Categorization by Component Independence .... .... ..... .... 8 1.2.1 Sequential Ensembles .... .... .... .... .... ..... .... 9 1.2.2 Independent Ensembles... .... .... .... .... ..... .... 11 1.3 Categorization by Constituent Components . .... .... ..... .... 12 1.3.1 Model-Centered Ensembles.... .... .... .... ..... .... 12 1.3.2 Data-Centered Ensembles . .... .... .... .... ..... .... 14 1.3.3 Discussion of Categorization Schemes ... .... ..... .... 16 1.4 Categorization by Theoretical Approach.... .... .... ..... .... 17 1.4.1 Variance Reduction in Outlier Ensembles. .... ..... .... 18 1.4.2 Bias Reduction in Outlier Ensembles .... .... ..... .... 18 1.5 Defining Combination Functions . .... .... .... .... ..... .... 19 1.5.1 Normalization Issues. .... .... .... .... .... ..... .... 19 1.5.2 Combining Scores from Different Models. .... ..... .... 20 1.6 Research Overview and Book Organization. .... .... ..... .... 23 1.6.1 Overview of Book... .... .... .... .... .... ..... .... 27 1.7 Conclusions and Discussion. .... .... .... .... .... ..... .... 30 References.. .... .... .... ..... .... .... .... .... .... ..... .... 31 2 Theory of Outlier Ensembles ... .... .... .... .... .... ..... .... 35 2.1 Introduction .... .... ..... .... .... .... .... .... ..... .... 35 2.2 The Bias-Variance Trade-Offfor Outlier Detection.... ..... .... 37 2.2.1 Relationship of Ensemble Analysis to Bias-Variance Trade-Off .... ..... .... .... .... .... .... ..... .... 42 2.2.2 Out-of-Sample Issues .... .... .... .... .... ..... .... 43 2.2.3 Understanding How Ensemble Analysis Works ..... .... 44 2.2.4 Data-Centric View Versus Model-Centric View ..... .... 50 2.3 Examples and Applications of the Bias-Variance Tradeoff... .... 58 2.3.1 Bagging and Subsampling. .... .... .... .... ..... .... 59 2.3.2 Feature Bagging .... .... .... .... .... .... ..... .... 60 2.3.3 Boosting . .... ..... .... .... .... .... .... ..... .... 61 2.4 Experimental Illustration of Bias-Variance Theory.... ..... .... 61 2.4.1 Understanding the Effects of Ensembles on Data-Centric Bias and Variance.. .... .... ..... .... 62 2.4.2 Experimental Examples of Bias-Variance Decomposition ..... .... .... .... .... .... ..... .... 68 2.5 Conclusions .... .... ..... .... .... .... .... .... ..... .... 72 References.. .... .... .... ..... .... .... .... .... .... ..... .... 73 3 Variance Reduction in Outlier Ensembles. .... .... .... ..... .... 75 3.1 Introduction .... .... ..... .... .... .... .... .... ..... .... 75 3.2 Motivations for Basic Variance Reduction Framework. ..... .... 78 3.3 Variance Reduction Is Not a Panacea.. .... .... .... ..... .... 83 3.3.1 When Does Data-Centric Variance Reduction Help?.. .... 84 3.3.2 When Does Model-Centric Variance Reduction Help? .... 91 3.3.3 The Subtle Differences Between AUCs and MSEs... .... 93 3.4 Variance Reduction Methods .... .... .... .... .... ..... .... 93 3.4.1 Feature Bagging (FB) for High-Dimensional Outlier Detection. .... ..... .... .... .... .... .... ..... .... 94 3.4.2 Rotated Bagging (RB).... .... .... .... .... ..... .... 99 3.4.3 Projected Clustering and Subspace Histograms. ..... .... 100 3.4.4 The Point-Wise Bagging and Subsampling Class of Methods ... ..... .... .... .... .... .... ..... .... 107 3.4.5 Wagging (WAG).... .... .... .... .... .... ..... .... 130 3.4.6 Data-Centric and Model-Centric Perturbation .. ..... .... 131 3.4.7 Parameter-Centric Ensembles .. .... .... .... ..... .... 131 3.4.8 Explicit Randomization of Base Models.. .... ..... .... 132 3.5 Some New Techniques for Variance Reduction .. .... ..... .... 134 3.5.1 Geometric Subsampling (GS) .. .... .... .... ..... .... 134 3.5.2 Randomized Feature Weighting (RFW) .. .... ..... .... 136 3.6 Forcing Stability by Reducing Impact of Abnormal Detector Executions . .... .... ..... .... .... .... .... .... ..... .... 137 3.6.1 Performance Analysis of Trimmed Combination Methods . .... ..... .... .... .... .... .... ..... .... 140 3.6.2 Discussion of Commonly Used Combination Methods.... 143 3.7 Performance Analysis of Methods .... .... .... .... ..... .... 145 3.7.1 Data Set Descriptions .... .... .... .... .... ..... .... 145 3.7.2 Comparison of Variance Reduction Methods .. ..... .... 147 3.8 Conclusions .... .... ..... .... .... .... .... .... ..... .... 157 References.. .... .... .... ..... .... .... .... .... .... ..... .... 158 4 Bias Reduction in Outlier Ensembles: The Guessing Game.... .... 163 4.1 Introduction .... .... ..... .... .... .... .... .... ..... .... 163 4.2 Bias Reduction in Classification and Outlier Detection. ..... .... 165 4.2.1 Boosting . .... ..... .... .... .... .... .... ..... .... 166 4.2.2 Training Data Pruning.... .... .... .... .... ..... .... 167 4.2.3 Model Pruning ..... .... .... .... .... .... ..... .... 168 4.2.4 Model Weighting ... .... .... .... .... .... ..... .... 169 4.2.5 Differences Between Classification and Outlier Detection. .... ..... .... .... .... .... .... ..... .... 170 4.3 Training Data Pruning ..... .... .... .... .... .... ..... .... 171 4.3.1 Deterministic Pruning .... .... .... .... .... ..... .... 171 4.3.2 Fixed Bias Sampling. .... .... .... .... .... ..... .... 172 4.3.3 Variable Bias Sampling... .... .... .... .... ..... .... 174 4.4 Model Pruning .. .... ..... .... .... .... .... .... ..... .... 175 4.4.1 Implicit Model Pruning in Subspace Outlier Detection.... 178 4.4.2 Revisiting Pruning by Trimming.... .... .... ..... .... 178 4.4.3 Model Weighting ... .... .... .... .... .... ..... .... 180 4.5 Supervised Bias Reduction with Unsupervised Feature Engineering .... .... ..... .... .... .... .... .... ..... .... 181 4.6 Bias Reduction by Human Intervention .... .... .... ..... .... 182 4.7 Conclusions .... .... ..... .... .... .... .... .... ..... .... 184 References.. .... .... .... ..... .... .... .... .... .... ..... .... 184 5 Model Combination Methods for Outlier Ensembles .... ..... .... 187 5.1 Introduction .... .... ..... .... .... .... .... .... ..... .... 187 5.2 Impact of Outlier Evaluation Measures. .... .... .... ..... .... 190 5.3 Score Normalization Issues.. .... .... .... .... .... ..... .... 193 5.4 Model Combination for Variance Reduction. .... .... ..... .... 195 5.5 Model Combination for Bias Reduction.... .... .... ..... .... 196 5.5.1 A Simple Example .. .... .... .... .... .... ..... .... 198 5.5.2 Sequential Combination Methods ... .... .... ..... .... 199 5.6 Combining Bias and Variance Reduction... .... .... ..... .... 200 5.6.1 Factorized Consensus .... .... .... .... .... ..... .... 201 5.7 Using Mild Supervision in Model Combination.. .... ..... .... 203 5.8 Conclusions and Summary.. .... .... .... .... .... ..... .... 204 References.. .... .... .... ..... .... .... .... .... .... ..... .... 204 6 Which Outlier Detection Algorithm Should I Use?.. .... ..... .... 207 6.1 Introduction .... .... ..... .... .... .... .... .... ..... .... 207 6.2 A Review of Classical Distance-Based Detectors. .... ..... .... 212 6.2.1 Exact k-Nearest Neighbor Detector.. .... .... ..... .... 213 6.2.2 Average k-Nearest Neighbor Detector.... .... ..... .... 214 6.2.3 An Analysis of Bagged and Subsampled 1-Nearest Neighbor Detectors.. .... .... .... .... .... ..... .... 214 6.2.4 Harmonic k-Nearest Neighbor Detector... .... ..... .... 216 6.2.5 Local Outlier Factor (LOF).... .... .... .... ..... .... 217 6.3 A Review of Clustering, Histograms, and Density-Based Methods... .... .... ..... .... .... .... .... .... ..... .... 219 6.3.1 Histogram and Clustering Methods.. .... .... ..... .... 219 6.3.2 Kernel Density Methods.. .... .... .... .... ..... .... 224 6.4 A Review of Dependency-Oriented Detectors.... .... ..... .... 225 6.4.1 Soft PCA: The Mahalanobis Method .... .... ..... .... 226 6.4.2 Kernel Mahalanobis Method... .... .... .... ..... .... 231 6.4.3 Decomposing Unsupervised Learning into Supervised Learning Problems .. .... .... .... .... .... ..... .... 239 6.4.4 High-Dimensional Outliers Based on Group-Wise Dependencies . ..... .... .... .... .... .... ..... .... 242 6.5 The Hidden Wildcard of Algorithm Parameters .. .... ..... .... 243 6.5.1 Variable Subsampling and the Tyranny of Parameter Choice... .... ..... .... .... .... .... .... ..... .... 245 6.6 TRINITY: A Blend of Heterogeneous Base Detectors . ..... .... 247 6.7 Analysis of Performance.... .... .... .... .... .... ..... .... 248 6.7.1 Data Set Descriptions .... .... .... .... .... ..... .... 249 6.7.2 Specific Details of Setting. .... .... .... .... ..... .... 251 6.7.3 Summary of Findings .... .... .... .... .... ..... .... 253 6.7.4 The Great Equalizing Power of Ensembles.... ..... .... 260 6.7.5 The Argument for a Heterogeneous Combination.... .... 264 6.7.6 Discussion.... ..... .... .... .... .... .... ..... .... 269 6.8 Conclusions .... .... ..... .... .... .... .... .... ..... .... 271 References.. .... .... .... ..... .... .... .... .... .... ..... .... 271 Index .... .... .... .... .... ..... .... .... .... .... .... ..... .... 275 Chapter 1 An Introduction to Outlier Ensembles Musicisnotmath.It’sscience.Youkeepmixingthestuffupuntil itblowsuponyou,oritbecomesthisincrediblepotion. BrunoMars 1.1 Introduction The outlier analysis problem has been widely studied by database, data mining, machinelearningandstatisticalcommunities.Numerousalgorithmshavebeenpro- posed for this problem in recent years [6, 9, 11, 14, 35, 36, 39, 40, 53, 55]. A detailedsurvey [19]classifiesthedifferenttypesofalgorithmsforanomaly detec- tion, and a book on the subject [3] provides a comprehensive treatment of various algorithms.Thischapterandtheremainderofthisbookassumesafamiliaritywith the basic algorithms that are commonly used for outlier detection. The book will also assume a familiarity with the common ensemble methods used in clustering andclassification.Detaileddiscussionsonclassificationensemblemethodsmaybe foundin[71]. The modeling process in some data mining problems like outlier detection is oftenaninherentlysubjectiveprocess,wheretheobjectivefunctionormodeldefined for a particular problem depends on an analyst’s understanding of the generative behaviorofthedata.Forexample,anearest-neighboralgorithmforoutlierdetection mightprovideverydifferentresultsfromaone-classsupportvectormachine.These differences are caused by the differences in the underlying assumptions. Clearly, suchassumptionsareverysubjective,andaspecificalgorithmbeingusedmayoften model the underlying generative process in a limited way. In such cases, effective resultscanbeobtainedonsomepartsofthedatawhicharemodeledwell,whereas the results on other parts of the data may not be very accurate. Similarly, a given model may sometimes behave well on a given data set, but may not behave well on other data sets. In many cases, the model may be extremely sensitive to the choice of parameters used in a particular outlier detection model. For example, a 2 1 AnIntroductiontoOutlierEnsembles particular choice of the number of nearest neighbors may work well for an exact k-nearest neighbor outlier detector in a given data set, but it might not work well for another data set. Furthermore, it is often difficult to tune parameters or select the most appropriate algorithm in a problem like outlier detection because of the absenceofgroundtruth.Alltheseissuesoftenmaketheevaluationofthequalityof outlierdetectionalgorithmsmorechallenging.Intheabsenceofground-truth,this cancauseuncertaintyintheanalystaboutthetrueeffectivenessofheralgorithm. Ensembleanalysisisamethodthatiscommonlyusedindataminingproblems toreducethedependenceofthemodelonthespecificdatasetordatalocality.This greatlyincreasestherobustnessofthedataminingprocess.Theensembletechnique isusedverycommonlyinproblemssuchasclusteringandclassification.Ensemble analysis can include any approach that combines the results of either dependent or independent executions of data mining algorithms. For example, the boosting technique in classification, in which the different executions of the classification algorithmareclearlydependentononeanother,canalsobeconsideredanensemble approach.Theideahereisthatthefinalresultisanensemblescorefromtheresults ofdifferentmodels,nomatterhoweachofthesemodelsisderived. Theproblemofensembleanalysishasbeenwidelystudiedinthecontextofmany dataminingproblemssuchasclusteringandclassification[58,63,71],althoughthe approachesaredifferentinsupervisedandunsupervisedproblems.Infact,eachof theseareasofmeta-algorithmanalysisisconsideredanactiveandvibrantsubfieldin itsownright.Toprovideaspecificexample,theseminalpaper[26]onboostingin classificationhasseveralthousandcitations,andmanydifferentvariantsofthebasic boostingapproachhavebeenproposedintheliterature.Thecommonmethodsused forensembleanalysisinclusteringandclassificationareasfollows: • Inclustering,theareasofalternativeclustering,multiviewclustering,andensemble clusteringarecloselyrelatedsubtopicsofensembleanalysis.Theideaineachof thesevariantsisthattheclusteringprocessisinherentlysubjective,andasingle clusteringmaynotreflectthecompleteinsightsabouthowthedatamaycluster. Therefore, it is useful to examine the different and alternative clusters [13, 49, 50, 65]andcombinetheresults.Alternativeclusteringisalsosometimesreferred to as multiview clustering. The goal here is to determine clusterings which are significantly different from one another in order to obtain different insights. In somecases,theexplorationoftheclustersisperformedvisually[1, 32]inorder to obtain the best results. A notable cluster-ensemble, referred to as extremely- randomized clustering forest (ERC-Forest) [47], is closely related to an outlier ensemblemethodknownasisolationforests[43]. • Inthecontextoftheclassificationproblem,avarietyofensemblebasedmethods havebeenproposedasasbagging[15],boosting[26],stacking[22, 64, 68],ran- domforests[16,33],modelaveraging[23],andbucketofmodels[70].Ensemble analysisisconsideredparticularlyimportantinnoisyandstreamingscenariosin whichthequalityoftheresultsfromindividualclassifiersisnotconsideredrobust because of limitations in data quality or processing time. Some recent methods

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.