ebook img

Semi-supervised Learning for Real-world Object Recognition using Adversarial Autoencoders PDF

69 Pages·2017·6.81 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Semi-supervised Learning for Real-world Object Recognition using Adversarial Autoencoders

DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2017 Semi-supervised Learning for Real-world Object Recognition using Adversarial Autoencoders SUDHANSHU MITTAL KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Semi-supervised Learning for Real-world Object Recognition using Adversarial Autoencoders SUDHANSHU MITTAL Master in Computer Science Date: December 22, 2017 Supervisor: Prof. Thomas Brox (University of Freiburg), Prof. Wolfram Burgard (University of Freiburg), Prof. Atsuto Maki (KTH) Examiner: Prof. Danica Kragic School of Computer Science and Communication ii Abstract Formanyreal-worldapplications,labeleddatacanbecostlytoobtain. Semi-supervisedlearningmethodsmakeuseofsubstantiallyavailable unlabeleddataalongwithfewlabeledsamples. Mostofthelatestwork onsemi-supervisedlearningforimageclassificationshowperformance onstandardmachinelearningdatasetslikeMNIST,SVHN,etc. Inthis work,weproposeaconvolutionaladversarialautoencoderarchitecture forreal-worlddata. Wedemonstratetheapplicationofthisarchitecture for semi-supervised object recognition. We show that our approach can learn from limited labeleddata and outperform fully-supervised CNN baseline method by about 4% on real-world datasets. We also achievecompetitiveperformanceontheMNISTdatasetcomparedto state-of-the-artsemi-supervisedlearningtechniques. Tospurresearch inthisdirection,wecompiledtworeal-worlddatasets: Internet(WIS) datasetandReal-world(RW)datasetwhichconsistsofmorethan20K labeledsampleseach,comprisingofsmallhouseholdobjectsbelonging totenclasses. Wealso showapossibleapplicationofthismethodfor onlinelearninginrobotics. iii Sammanfattning I de flesta verklighetsbaserade tillämpningar kan det vara kostsamt att erhålla märkt data. Inlärningsmetoder som är semi-övervakade använder sig oftast i stor utsträckning av omärkt data med stöd av enliten mängdmärktdata. Mycketav detsenastearbetet inomsemi- övervakadeinlärningsmetoderförbildklassificeringvisarprestandapå standardiserad maskininlärning så som MNIST, SVHN, och så vidare. Idethärarbetetföreslårvienconvolutionaladversarialautoencoder arkitekturförverklighetsbaseraddata. Videmonstrerartillämpningen avdennaarkitekturförsemi-övervakadobjektidentifieringochvisar attvårttillvägagångssättkanlärasigavettbegränsatantalmärktdata. Därmed överträffar vi den fullt övervakade CNN-baslinjemetoden medca. 4%påverklighetsbaseradedatauppsättningar. Viuppnåräven konkurrenskraftig prestanda på MNIST datauppsättningen jämfört medmodernasemi-övervakadeinlärningsmetoder. Förattstimulera forskningen i denhär riktningen, samlade vi tvåverklighetsbaserade datauppsättningar: Internet (WIS) och Real-world (RW) datauppsät- tningar,sombeståravmerän20000märktaprovvardera,somutgörs av småhushållsobjekt tillhörandestio klasser. Vivisar ocksåen möjlig tillämpningavdenhärmetodenföronline-inlärningirobotik. iv Acknowledgement I would like to thank my supervisors at the University of Freiburg, Prof. ThomasBroxandProf. WolframBurgardforgivingmethisop- portunitytopursuemymasterthesisattheirlab. Igreatlyappreciate their constant support, feedback and guidance throughout the thesis work. IwouldliketothankmysupervisoratKTH,Prof. AtsutoMaki forsupportingthiscollaborationinallrespectsandforhismeticulous feedback on scientific writing. I would like to thank Prof. Danica KragicJensfeltforexaminingthethesisandorganizingthepublicpre- sentationatKTH.IoweagreatdebtofgratitudetoAndreasEiteland Maxim Tatarchenko for beinggreat mentors, for countless discussions, motivationandguidance. I had the privilege of discussing and learning from many excep- tional researchers at AIS. Special thanks to Gabriel Oliveira, Ayush Dewan,TayyabNaseer,MarcelBinzandNohaRadwanfornumerous interesting discussions. Manythanks toAndreas Eitel, MichaelKeser andPhilippJundfortheirtechnicalsupport. IwouldliketothankAn- dreasEitelandProf. WolframBurgardforofferingmeastudentjobat AISwhichsupportedmefinanciallythroughoutmystayinGermany. I thankAnnaHellbergGustafssonfromKTHforprovidingmeErasmus+ scholarshipformystayinGermany. I thank Andreas Eitel, Maxim Tatarchenko and Florian Kraemer for proofreadingthethesisreport. Thisworkwouldnothavebeenpossible without the support of everyone at the AIS group. Special thanks to MarcusLundin,GabrielaZarzarGandlerandSebastianZarzarGandler forhelping me writetheSwedish versionoftheabstract. I thankevery- onewhohelpedmetocollectthedataset: TobiasPaxian,AndreasEitel, V.K.Mittal,ShashiKabdal,HimanshuMittal,ShrutiKabdal,ShuchiKab- dal,HannahRosaNesswetter,DavidCzudnochowski,AnandNarayan, Sophie Ninnemann, Gabriela Zarzar Gandler, Jingwei Zhang, Oier Mees,RendaniMbuvha,RonakShah,VishakhaPatel,AndyWachaja andFedericoBoniardi. Contents 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Ethics,SocietalAspectsandSustainability . . . . 3 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 OverviewoftheThesis . . . . . . . . . . . . . . . . . . . . 5 2 Background 6 2.1 ArtificialNeuralNetworks . . . . . . . . . . . . . . . . . . 6 2.1.1 ConvolutionalNeuralNetworks . . . . . . . . . . 7 2.2 DeepGenerativeModels . . . . . . . . . . . . . . . . . . . 9 2.2.1 Autoencoders . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 GenerativeAdversarialNetwork . . . . . . . . . . 12 3 RelatedWork 15 3.1 DeepGenerativeModels . . . . . . . . . . . . . . . . . . . 15 3.1.1 VAE-basedMethods . . . . . . . . . . . . . . . . . 16 3.1.2 GAN-basedMethods . . . . . . . . . . . . . . . . . 16 3.1.3 HybridMethods . . . . . . . . . . . . . . . . . . . 16 3.1.4 Real-worldApplications . . . . . . . . . . . . . . . 17 4 Methodology 19 4.1 AdversarialAutoencoders . . . . . . . . . . . . . . . . . . 19 4.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . 19 4.1.2 BasicAAEArchitecture . . . . . . . . . . . . . . . 20 4.1.3 LearningLatentDistributions . . . . . . . . . . . . 22 4.1.4 Semi-supervisedAAE . . . . . . . . . . . . . . . . 23 4.1.5 ConvolutionalSemi-supervisedAAEArchitecture 27 v vi CONTENTS 5 ExperimentsandResults 30 5.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.1.1 MNISTDataset . . . . . . . . . . . . . . . . . . . . 30 5.1.2 InternetDataset . . . . . . . . . . . . . . . . . . . . 30 5.1.3 Real-worldDataset . . . . . . . . . . . . . . . . . . 32 5.1.4 Preprocessing . . . . . . . . . . . . . . . . . . . . . 34 5.2 LearningoftheLatentDistribution . . . . . . . . . . . . . 35 5.3 Semi-supervisedClassification . . . . . . . . . . . . . . . 38 5.3.1 ImplementationDetails . . . . . . . . . . . . . . . 38 5.3.2 ObjectRecognitionResults . . . . . . . . . . . . . 42 5.4 OnlineLearningwithAAE . . . . . . . . . . . . . . . . . 46 5.5 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . 49 6 ConclusionandFutureWork 50 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 6.2 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . 51 A Datasets 57 A.1 DatasetFiltering . . . . . . . . . . . . . . . . . . . . . . . . 57 A.2 Real-worldDataset: VideoStreams . . . . . . . . . . . . . 58 B ArchitectureDetails 59 B.1 Semi-supervisedConvolutionalAAE . . . . . . . . . . . . 59 B.1.1 AdversarialNetwork: Discriminator . . . . . . . . 59 B.1.2 AutoencoderNetwork . . . . . . . . . . . . . . . . 60 B.1.3 Classification/AdversarialNetwork: Generator . 61 Chapter 1 Introduction 1.1 Motivation Theideabehindsemi-supervisedlearningforobjectrecognitioncomes from the learning ability of human beings. A human child can learn about objects like animals, toys, etc. from only a few examples. For example,onceachildisshownwhatacatlookslike,itcanthereafter recognizeanewtypeofcatsintheworld. Humanbeingsdonotrequire thousands of labeled examples to learn the visual appearance of an object,andtheybecomebetteratrecognitionwithsubsequentexposure toothervariantsofthatobject. Image classification is one of the important tasks in the field of computer vision. This taskis highlyrelevantfor variousapplications likeautonomous driving,service robotics, remotesensing andmedical diagnosis. Most of the latest image classification methods like Deep ResidualNetworks[16]requirealargecollectionofmanuallylabeled imagestoperformwell. Collectinglabeledsamplescanbedifficultand veryexpensiveforspecificreal-worldapplications. Onewaytotacklethischallengeisbyleveraginginformationfrom unlabeled data in an unsupervised or semi-supervised manner. Al- though image classification in a completely unsupervised manner is notyetpracticalforcomplexdistributionslikenaturalimages,recent methodsbasedonneuralnetworkshaveshownpromisingresultsfor semi-supervised learning. In semi-supervised learning methods, we canmakeuseofunlabeleddatafortraining-typicallyasmallamount oflabeleddatawithalargeamountofunlabeleddata. Semi-supervised methods make use of unlabeled data to better capture the shape of 1 2 CHAPTER 1. INTRODUCTION underlying data distribution and generalize better to new samples. In fields like medical science and robotics, it is much easier to obtain unlabeled data as compared to obtaining labeled data. For example, in robotics, a mobile robot can autonomously interact with the envi- ronmentandcollectunlabeleddatainabundancewithoutanyhuman supervision. Therefore,semi-supervisedlearningisvery wellsuitedto fieldslikerobotics. Several methods have been studied in the literature for semi-su- pervisedlearning. Inthis work,weplantofocuson techniquesbased ongenerativemodels. Buildingscalablegenerativemodelsto capture rich distributions such as audio, images or video is one of the impor- tant challenges in machine learning. Until recently, deep generative models,suchasRestrictedBoltzmannMachines,DeepBeliefNetworks and Deep Boltzmann Machines were trained primarily by sampling algorithms. In these sampling-based approaches, the methods become moreimpreciseastrainingprogresses. Thishappensbecausesamples from the procedures are unable to mix between modes fast enough. In recent years, several deep generative models, namely, Variational Autoencoder(VAE)andGenerativeAdversarialNetwork(GAN),have been developed that can be trained via direct back-propagation and avoidthedifficultiesthatcomewithsampling-basedtraining. Figure1.1: ExamplesforeachclassfromtheReal-world(RW)dataset: banana,bottle,bowl,calculator,can,cup,orange,scissors,soccer-ball andwatering-can. In this work, we explore, how well the latest methods based on deep generative models can be used to recognize objects using semi- supervised learning methods. We scale one such methodcalled Adver- CHAPTER 1. INTRODUCTION 3 sarial Autoencoders (AAE) forobject recognition on real-world image datasets. Figure 1.1 gives a glimpse of our real-world object dataset. AAEisa hybrid approachwhichusesideas fromVariationalAutoen- coder (VAE) and Generative Adversarial Network (GAN). AAE is a probabilisticautoencoderthatusesanadversarialframeworkforvaria- tional inference. In a probabilistic autoencoder, the encoder approxi- mates a posterior distribution, and the decoder is used to stochastically reconstructtheinputdatafromthelatentvariables;theresultingmodel capturesthedistributionoverimages. Latentvariablearethevariables thatarenotdirectlyobservedbutratherareinferredusingamathemat- icalmodel,fromotherobservedvariables. Onlinelearningisarelatedtaskwhichishighlyrelevantforrobotics. For example in service robotics, every time a new mobile robot is set up in a new environment, it needs to adapt to the environment and learntheobjectsinthatenvironmentforaninteractiveapplication. The traditionalwayistoannotatealltheobjectsmanuallytorecognizeand interact with them. Additionally, the variety of objects also changes dynamicallyinanygivenenvironment. Toreducetheseexpenses,we can deploy a robot with a semi-supervised learning approach. The robot’slearningmodelcanbeinitiallytrainedwithonlyafewlabeled instanceoftheobjects,andthentherobotcanadaptitsmodeltoincrease theclassificationperformanceovertimebycollectingmoreunlabeled data. In this work, we also show how this semi-supervised learning method may be used for online learning on real-world data. Since our real-world data is similar to the data captured by the robots, this methodcanbereadilyappliedtorobotics. 1.1.1 Ethics, Societal Aspects and Sustainability Thecontributionsofthisthesis workareverytechnicalconcerningthe usageofdeepgenerativemodelsforsemi-supervisedobjectrecognition, althoughtherearemanypossibleapplicationsofobjectrecognitionin general for example autonomous driving, medical diagnosis, service robotics,etc. Some applications of semi-supervised classification can be highly relevantforthesociety,forexample,cancertumordetectioninmagnetic resonancespectroscopicimages. Sinceweallknowthatcancerisafatal disease and more than 10 million people are diagnosed with cancer everyyearworldwide,itisoneofthemainchallengesthatoursociety

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.