ebook img

Decision Making Under Uncertainty and Reinforcement Learning: Theory and Algorithms PDF

251 Pages·2022·3.212 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Decision Making Under Uncertainty and Reinforcement Learning: Theory and Algorithms

Intelligent Systems Reference Library 223 Christos Dimitrakakis Ronald Ortner Decision Making Under Uncertainty and Reinforcement Learning Theory and Algorithms Intelligent Systems Reference Library Volume 223 SeriesEditors JanuszKacprzyk,PolishAcademyofSciences,Warsaw,Poland LakhmiC.Jain,KESInternational,Shoreham-by-Sea,UK The aim of this series is to publish a Reference Library, including novel advances and developments in all aspects of Intelligent Systems in an easily accessible and wellstructuredform.Theseriesincludesreferenceworks,handbooks,compendia, textbooks,well-structuredmonographs,dictionaries,andencyclopedias.Itcontains wellintegratedknowledgeandcurrentinformationinthefieldofIntelligentSystems. Theseriescoversthetheory,applications,anddesignmethodsofIntelligentSystems. Virtuallyalldisciplinessuchasengineering,computerscience,avionics,business, e-commerce,environment,healthcare,physicsandlifescienceareincluded.Thelist oftopicsspansalltheareasofmodernintelligentsystemssuchas:Ambientintelli- gence,Computationalintelligence,Socialintelligence,Computationalneuroscience, Artificiallife,Virtualsociety,Cognitivesystems,DNAandimmunity-basedsystems, e-Learningandteaching,Human-centredcomputingandMachineethics,Intelligent control,Intelligentdataanalysis,Knowledge-basedparadigms,Knowledgemanage- ment, Intelligent agents, Intelligent decision making, Intelligent network security, Interactive entertainment, Learning paradigms, Recommender systems, Robotics and Mechatronics including human-machine teaming, Self-organizing and adap- tive systems, Soft computing including Neural systems, Fuzzy systems, Evolu- tionarycomputingandtheFusionoftheseparadigms,PerceptionandVision,Web intelligenceandMultimedia. IndexedbySCOPUS,DBLP,zbMATH,SCImago. AllbookspublishedintheseriesaresubmittedforconsiderationinWebofScience. · Christos Dimitrakakis Ronald Ortner Decision Making Under Uncertainty and Reinforcement Learning Theory and Algorithms ChristosDimitrakakis RonaldOrtner Informatique DepartmentMathematikund UniversitédeNeuchâtel Informationstechnologie Neuchâtel,Switzerland MontanuniversitätLeoben Leoben,Austria ISSN 1868-4394 ISSN 1868-4408 (electronic) IntelligentSystemsReferenceLibrary ISBN 978-3-031-07612-1 ISBN 978-3-031-07614-5 (eBook) https://doi.org/10.1007/978-3-031-07614-5 ©TheEditor(s)(ifapplicable)andTheAuthor(s),underexclusivelicensetoSpringerNature SwitzerlandAG2022 Thisworkissubjecttocopyright.AllrightsaresolelyandexclusivelylicensedbythePublisher,whether thewholeorpartofthematerialisconcerned,specificallytherightsoftranslation,reprinting,reuse ofillustrations,recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,and transmissionorinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilar ordissimilarmethodologynowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface Thepurposeofthisbookistocollectthefundamentalresultsfordecisionmaking underuncertaintyinoneplace.Inparticular,theaimistogiveaunifiedaccountof algorithmsandtheoryforsequentialdecisionmakingproblems,includingreinforce- ment learning. Starting from elementary statistical decision theory, we progress to the reinforcement learning problem and various solution methods. The end of the bookfocusesonthecurrentstateoftheartinmodelsandapproximationalgorithms. Theproblemofdecisionmakingunderuncertaintycanbebrokendownintotwo parts. First, how do we learn about the world? This involves both the problem of modeling our initial uncertainty about the world and that of drawing conclusions fromevidenceandourinitialbelief.Secondly,givenwhatwecurrentlyknowabout theworld,howshouldwedecidewhattodo,takingintoaccountfutureeventsand observationsthatmaychangeourconclusions? Typically,thiswillinvolvecreatinglong-termplanscoveringpossiblefutureeven- tualities.Thatis,whenplanningunderuncertainty,wealsoneedtotakeintoaccount whatpossiblefutureknowledgecouldbegeneratedwhenimplementingourplans. Intuitively, executing plans which involve trying out new things should give more information, but it is hard to tell whether this information will be beneficial. The choicebetweendoingsomethingwhichisalreadyknowntoproducegoodresultsand experimentwithsomethingnewisknownastheexploration–exploitationdilemma, anditisattherootoftheinteractionbetweenlearningandplanning. PartIofthebook,Chaps.1–4,focusesondecisionmakingunderuncertaintyin non-sequential settings. This includes scenarios such as hypothesis testing, where thedecisionmakermustchooseasingleactiongiventheavailableevidence.Most of the development is given through the prism of Bayesian inference and decision theory,wherethedecisionmakerhasasubjectivebelief(expressedasaprobability distribution)overwhatistrue.PartIIofthebook,Chaps.5–8,introducessequential problemsandtheformalismofMarkovdecisionprocesses.Theremainingchapters aredevotedtotheproblemofreinforcementlearning,whichisoneofthemostgeneral sequentialdecisionproblemsunderuncertainty.Finally,wehaveaddedanumberof v vi Preface theoretical and practical exercises that will hopefully aid the reader to understand thematerial. Neuchâtel,Switzerland ChristosDimitrakakis Leoben,Austria RonaldOrtner Acknowledgements Many thanks go to all the students of the Decision making under uncertainty and Advanced topics in reinforcement learning and decision making classes over the years for bearing with early drafts of this book. A big thank you goes to Nikolaos Tziortziotis,whosecodeisusedinsomeoftheexamplesinthebook.Finally,thanks toAristideTossouandHannesErikssonforproof-readingvariouschapters.Finally, alotofthecodedexamplesinthebookwererunusingtheparallelpackagebyTange [1]. Reference 1.Tange,O.:Gnuparallel-thecommand-linepowertool.USENIXMag.36(1),42–47(2011) vii Contents 1 Introduction .................................................. 1 1.1 UncertaintyandProbability ................................ 1 1.2 TheExploration–ExploitationTrade-Off ..................... 2 1.3 DecisionTheoryandReinforcementLearning ................ 3 References .................................................... 5 2 SubjectiveProbabilityandUtility ............................... 7 2.1 SubjectiveProbability ..................................... 7 2.1.1 RelativeLikelihood ............................... 7 2.1.2 SubjectiveProbabilityAssumptions ................. 8 2.1.3 AssigningUniqueProbabilities* .................... 10 2.1.4 ConditionalLikelihoods ........................... 11 2.1.5 ProbabilityElicitation ............................. 12 2.2 UpdatingBeliefs:Bayes’Theorem .......................... 13 2.3 UtilityTheory ........................................... 14 2.3.1 RewardsandPreferences .......................... 14 2.3.2 PreferencesAmongDistributions ................... 15 2.3.3 Utility ........................................... 16 2.3.4 MeasuringUtility* ................................ 19 2.3.5 ConvexandConcaveUtilityFunctions ............... 20 2.4 Exercises ................................................ 21 Reference ..................................................... 24 3 DecisionProblems ............................................. 25 3.1 Introduction ............................................. 25 3.2 RewardsthatDependontheOutcomeofanExperiment ....... 25 3.2.1 FormalisationoftheProblemSetting ................ 26 3.2.2 DecisionDiagrams ................................ 29 3.2.3 StatisticalEstimation* ............................. 30 ix x Contents 3.3 BayesDecisions ......................................... 31 3.3.1 ConvexityoftheBayes-OptimalUtility* ............. 32 3.4 StatisticalandStrategicDecisionMaking .................... 35 3.4.1 AlternativeNotionsofOptimality ................... 36 3.4.2 SolvingMinimaxProblems* ....................... 38 3.4.3 Two-PlayerGames ................................ 40 3.5 DecisionProblemswithObservations ....................... 43 3.5.1 MaximizingUtilityWhenMakingObservations ....... 45 3.5.2 BayesDecisionRules ............................. 46 3.5.3 DecisionProblemsinClassification ................. 48 3.5.4 CalculatingPosteriors ............................. 50 3.6 Summary ............................................... 51 3.7 Exercises ................................................ 52 3.7.1 ProblemswithNoObservations ..................... 52 3.7.2 ProblemswithObservations ........................ 53 3.7.3 AnInsuranceProblem ............................. 54 3.7.4 MedicalDiagnosis ................................ 56 References .................................................... 57 4 Estimation .................................................... 59 4.1 Introduction ............................................. 59 4.2 SufficientStatistics ....................................... 60 4.2.1 SufficientStatistics ................................ 60 4.2.2 ExponentialFamilies .............................. 62 4.3 ConjugatePriors ......................................... 63 4.3.1 Bernoulli-BetaConjugatePair ...................... 64 4.3.2 ConjugatesfortheNormalDistribution .............. 68 4.3.3 ConjugatesforMultivariateDistributions ............. 73 4.4 CredibleIntervals ........................................ 77 4.5 ConcentrationInequalities ................................. 80 4.5.1 Chernoff-HoeffdingBounds ........................ 82 4.6 ApproximateBayesianApproaches ......................... 84 4.6.1 MonteCarloInference ............................. 84 4.6.2 ApproximateBayesianComputation ................. 85 4.6.3 AnalyticApproximationsofthePosterior ............ 86 4.6.4 Maximum Likelihood and Empirical Bayes Methods ......................................... 87 References .................................................... 88 5 SequentialSampling ........................................... 89 5.1 GainsFromSequentialSampling ........................... 89 5.1.1 AnExample:SamplingwithCosts .................. 90 5.2 OptimalSequentialSamplingProcedures .................... 94 5.2.1 Multi-stageProblems .............................. 96 5.2.2 BackwardsInductionforBoundedProcedures ........ 97

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.