Probabilistic Graphical Models Adaptive Computation and Machine Learning Thomas Dietterich, Editor Christopher Bishop, David Heckerman, Michael Jordan, and Michael Kearns, Associate Editors Bioinformatics: TheMachineLearningApproach, Pierre Baldi and Søren Brunak ReinforcementLearning: AnIntroduction, Richard S. Sutton and Andrew G. Barto GraphicalModelsforMachineLearningandDigitalCommunication, Brendan J. Frey LearninginGraphicalModels, Michael I. Jordan Causation,Prediction,andSearch, 2nd ed., Peter Spirtes, Clark Glymour, and Richard Scheines PrinciplesofDataMining, David Hand, Heikki Mannila, and Padhraic Smyth Bioinformatics: TheMachineLearningApproach, 2nd ed., Pierre Baldi and Søren Brunak LearningKernelClassifiers: TheoryandAlgorithms, Ralf Herbrich LearningwithKernels: SupportVectorMachines,Regularization,Optimization,andBeyond, Bern- hard Schölkopf and Alexander J. Smola IntroductiontoMachineLearning, Ethem Alpaydin GaussianProcessesforMachineLearning, Carl Edward Rasmussen and Christopher K. I. Williams Semi-SupervisedLearning, Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien, eds. TheMinimumDescriptionLengthPrinciple, Peter D. Grünwald IntroductiontoStatisticalRelationalLearning, Lise Getoor and Ben Taskar, eds. ProbabilisticGraphicalModels: PrinciplesandTechniques, Daphne Koller and Nir Friedman Probabilistic Graphical Models Principles and Techniques Daphne Koller Nir Friedman The MIT Press Cambridge, Massachusetts London, England ©2009 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. For information about special quantity discounts, please email [email protected] This book was set by the authors in LATEX2. Printed and bound in the United States of America. Library of Congress Cataloging-in-Publication Data Koller, Daphne. Probabilistic Graphical Models: Principles and Techniques / Daphne Koller and Nir Friedman. p. cm. – (Adaptive computation and machine learning) Includes bibliographical references and index. ISBN 978-0-262-01319-2 (hardcover : alk. paper) 1. Graphical modeling (Statistics) 2. Bayesian statistical decision theory—Graphic methods. I. Koller, Daphne. II. Friedman, Nir. QA279.5.K65 2010 519.5’420285–dc22 2009008615 10 9 8 7 6 5 To our families my parents Dov and Ditza my husband Dan my daughters Natalie and Maya D.K. my parents Noga and Gad my wife Yael my children Roy and Lior N.F. As far as the laws of mathematics refer to reality, they are not certain, as far as they are certain,theydonotrefertoreality. Albert Einstein, 1921 When we try to pick out anything by itself, we find that it is bound fast by a thousand invisiblecordsthatcannotbebroken,toeverythingintheuniverse. John Muir, 1869 Theactualscienceoflogicisconversantatpresentonlywiththingseithercertain,impossible, or entirely doubtful ...Therefore the true logic for this world is the calculus of probabilities, which takes account of the magnitude of the probability which is, or ought to be, in a reasonableman’smind. James Clerk Maxwell, 1850 The theory of probabilities is at bottom nothing but common sense reduced to calculus; it enablesustoappreciatewithexactnessthatwhichaccuratemindsfeelwithasortofinstinct forwhichofttimestheyareunabletoaccount. Pierre Simon Laplace, 1819 Misunderstandingofprobabilitymaybethegreatestofallimpedimentstoscientificliteracy. Stephen Jay Gould Contents Acknowledgments xxiii ListofFigures xxv ListofAlgorithms xxxi ListofBoxes xxxiii 1 Introduction 1 1.1 Motivation 1 1.2 Structured Probabilistic Models 2 1.2.1 Probabilistic Graphical Models 3 1.2.2 Representation, Inference, Learning 5 1.3 Overview and Roadmap 6 1.3.1 Overview of Chapters 6 1.3.2 Reader’s Guide 9 1.3.3 Connection to Other Disciplines 11 1.4 Historical Notes 12 2 Foundations 15 2.1 Probability Theory 15 2.1.1 Probability Distributions 15 2.1.2 Basic Concepts in Probability 18 2.1.3 Random Variables and Joint Distributions 19 2.1.4 Independence and Conditional Independence 23 2.1.5 Querying a Distribution 25 2.1.6 Continuous Spaces 27 2.1.7 Expectation and Variance 31 2.2 Graphs 34 2.2.1 Nodes and Edges 34 2.2.2 Subgraphs 35 2.2.3 Paths and Trails 36