ebook img

Machine Learning for the Quantified Self: on the art of learning from sensory data PDF

239 Pages·2018·10.953 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Machine Learning for the Quantified Self: on the art of learning from sensory data

Cognitive Systems Monographs 35 Mark Hoogendoorn Burkhardt Funk Machine Learning for the Quantified Self On the Art of Learning from Sensory Data Cognitive Systems Monographs Volume 35 Series editors Rüdiger Dillmann, University of Karlsruhe, Karlsruhe, Germany e-mail: [email protected] Yoshihiko Nakamura, Tokyo University, Tokyo, Japan e-mail: [email protected] Stefan Schaal, University of Southern California, Los Angeles, USA e-mail: [email protected] David Vernon, University of Skövde, Skövde, Sweden e-mail: [email protected] About this Series The Cognitive Systems Monographs (COSMOS) publish new developments and advancesinthefieldsofcognitivesystemsresearch,rapidlyandinformallybutwith a high quality. The intent is to bridge cognitive brain science and biology with engineering disciplines. It covers all the technical contents, applications, and multidisciplinary aspects of cognitive systems, such as Bionics, System Analysis, System Modelling, System Design, Human Motion, Understanding, Human Activity Understanding, Man-Machine Interaction, Smart and Cognitive Environments, Human and Computer Vision, Neuroinformatics, Humanoids, Biologically motivated systems and artefacts Autonomous Systems, Linguistics, SportsEngineering,ComputationalIntelligence,BiosignalProcessing,orCognitive Materialsaswellasthemethodologiesbehindthem.Withinthescopeoftheseries are monographs, lecture notes, selected contributions from specialized conferences and workshops. Advisory Board Heinrich H. Bülthoff, MPI for Biological Cybernetics, Tübingen, Germany Masayuki Inaba, The University of Tokyo, Japan J.A. Scott Kelso, Florida Atlantic University, Boca Raton, FL, USA Oussama Khatib, Stanford University, CA, USA Yasuo Kuniyoshi, The University of Tokyo, Japan Hiroshi G. Okuno, Kyoto University, Japan Helge Ritter, University of Bielefeld, Germany Giulio Sandini, University of Genova, Italy Bruno Siciliano, University of Naples, Italy Mark Steedman, University of Edinburgh, Scotland Atsuo Takanishi, Waseda University, Tokyo, Japan More information about this series at http://www.springer.com/series/8354 Mark Hoogendoorn Burkhardt Funk (cid:129) Machine Learning fi for the Quanti ed Self On the Art of Learning from Sensory Data 123 Mark Hoogendoorn Burkhardt Funk Department ofComputer Science Institut für Wirtschaftsinformatik Vrije Universiteit Amsterdam LeuphanaUniversitätLüneburg Amsterdam Lüneburg,Niedersachsen TheNetherlands Germany ISSN 1867-4925 ISSN 1867-4933 (electronic) Cognitive Systems Monographs ISBN978-3-319-66307-4 ISBN978-3-319-66308-1 (eBook) https://doi.org/10.1007/978-3-319-66308-1 LibraryofCongressControlNumber:2017949497 ©SpringerInternationalPublishingAG2018 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfrom therelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authorsortheeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinor for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Live as if you were to die tomorrow. Learn as if you were to live forever. Mahatma Gandhi Foreword Sensors are all around us, and increasingly on us. We carry smartphones and watches,whichhavethepotentialtogatherenormousquantitiesofdata.Thesedata areoftennoisy,interrupted,andincreasinglyhighdimensional.Achallengeindata scienceishowtoputthisveritable firehoseofnoisydatatouseandextractuseful summaries and predictions. In this timely monograph, Mark Hoogendoorn and Burkhardt Funk face up to thechallenge.Theirchoiceofmaterialshowsgoodmasteryofthevarioussubfields of machine learning, which they bring to bear on these data. They cover a wide array of techniques for supervised and unsupervised learning, both for cross-sectional and time series data. Ending each chapter with a useful set of thinkingandcomputingproblemsaddsahelpfultouch.Iamsurethisbookwillbe welcomed by a broad audience, and I hope it is a big success. June 2017 Trevor Hastie Stanford University, Stanford, CA, USA vii Preface Self-tracking has become part of a modern lifestyle; wearables and smartphones support self-tracking in an easy fashion and change our behavior such as in the health sphere. The amount of data generated by these devices is so overwhelming that it is difficult to get useful insight from it. Luckily, in the domain of artificial intelligence, techniques exist that can help out here: machine learning approaches arewellsuitedtoassistandenableonetoanalyzethistypeofdata.Whilethereare ample books that explain machine learning techniques, self-tracking data comes withitsowndifficultiesthatrequirededicatedtechniquessuchaslearningovertime andacrossusers.Inthisbook,wewillexplainthecompleteloop toeffectively use self-trackingdataformachinelearning;fromcleaningthedata,theidentificationof features, finding clusters in the data, algorithms to create predictions of values for thepresentandfuture,tolearning how toprovidefeedback tousers based ontheir tracking data. All concepts we explain are drawn from state-of-the-art scientific literature. To illustrate all approaches, we use a case study of a rich self-tracking datasetobtained fromthecrowdsignalsplatform.Whilethebookisfocusedonthe self-tracking data, the techniques explained are more widely applicable to sensory data in general, making it useful for a wider audience. Who should read this book? The book is intended for students, scholars, and practitioners with an interest in analyzing sensory data and user-generated content to build their own algorithms and applications. We will explain the basics of the suitablealgorithms,andtheunderlyingmathematicswillbeexplainedasfarasitis beneficial for the application of the methods. The focus of the book is on the application side. We provide implementation in both Python and R of nearly all algorithmsweexplainthroughoutthebookandmakethecodeavailablefor allthe case studies we present in the book as well. Additional material is available on the website of the book (ml4qs.org): (cid:129) Code examples are available in Python and R (cid:129) Datasets used in the book and additional sources to be explored by readers (cid:129) Up-to-date list of scientific papers and text books related to the book’s theme ix x Preface Wehavebeenresearchersinthisfieldforovertenyearsandwouldliketothank everybody who formed the body of knowledge that has become the basis for this book. First of all, we would like to thank the people at crowdsignals.io for pro- viding us with the dataset that is used throughout the book, Evan Welbourne in particular. Furthermore, we want to thank the colleagues who contributed to the book: Dennis Becker, Ward van Breda, Vincent Bremer, Gusz Eiben, Eoin Grau, Evert Haasdijk, Ali el Hassouni, Floris den Hengst, and Bart Kamphorst. We also wanttothankallthegraduatestudentsthatparticipatedintheMachineLearningfor the Quantified Self course at the Vrije Universiteit Amsterdam in June 2017 and provided feedback on a preliminary version of the book that was used as reader during the course. Mark would like to thank (in the order of appearance in his academic career) Maria Gini, Catholijn Jonker, Jan Treur, Gusz Eiben, and Peter Szolovits for being such great sources of inspiration. And of course, the writing of this book would not have been possible without ourlovingfamilyandfriends.Markwouldspecificallyliketothankhisparentsfor their continuous support and his friends for helping him in getting the proper relaxationinthebusybook-writingperiod.Burkhardtisverygratefultohisfamily, especially his wife Karen Funk and his two daughters, for allowing him to often worklateandtospendalmosthalfayearattheUniversityofVirginiaandStanford University during his sabbatical. Amsterdam, The Netherlands Mark Hoogendoorn Lüneburg, Germany Burkhardt Funk August 2017 Contents 1 Introduction... .... .... ..... .... .... .... .... .... ..... .... 1 1.1 The Quantified Self. ..... .... .... .... .... .... ..... .... 2 1.2 The Goal of this Book ... .... .... .... .... .... ..... .... 4 1.3 Basic Terminology. ..... .... .... .... .... .... ..... .... 5 1.3.1 Data Terminology.... .... .... .... .... ..... .... 5 1.3.2 Machine Learning Terminology.. .... .... ..... .... 7 1.4 Basic Mathematical Notation .. .... .... .... .... ..... .... 8 1.5 Overview of the Book ... .... .... .... .... .... ..... .... 10 Part I Sensory Data and Features 2 Basics of Sensory Data .. ..... .... .... .... .... .... ..... .... 15 2.1 Crowdsignals Dataset.... .... .... .... .... .... ..... .... 15 2.2 Converting the Raw Data to an Aggregated Data Format.. .... 17 2.3 Exploring the Dataset.... .... .... .... .... .... ..... .... 19 2.4 Machine Learning Tasks.. .... .... .... .... .... ..... .... 23 2.5 Exercises. .... .... ..... .... .... .... .... .... ..... .... 24 2.5.1 Pen and Paper... .... .... .... .... .... ..... .... 24 2.5.2 Coding.... ..... .... .... .... .... .... ..... .... 24 3 Handling Noise and Missing Values in Sensory Data... ..... .... 25 3.1 Detecting Outliers.. ..... .... .... .... .... .... ..... .... 27 3.1.1 Distribution-Based Models . .... .... .... ..... .... 28 3.1.2 Distance-Based Models.... .... .... .... ..... .... 30 3.2 Imputation of Missing Values.. .... .... .... .... ..... .... 34 3.3 A Combined Approach: The Kalman Filter ... .... ..... .... 35 3.4 Transformation.... ..... .... .... .... .... .... ..... .... 37 3.4.1 Lowpass Filter... .... .... .... .... .... ..... .... 38 3.4.2 Principal Component Analysis... .... .... ..... .... 38 xi

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.