Logical Argumentation, Abduction and Bayesian Decision Theory: A Bayesian Approach to Logical Arguments and it’s Application to Legal Evidential Reasoning DavidPoole DepartmentofComputerScience UniversityofBritishColumbia 2366MainMall Vancouver,B.C.,CanadaV6T1Z4 [email protected] http://www.cs.ubc.ca/spider/poole/ April7,2000 Abstract TherearegoodnormativeargumentsforusingBayesiandecisiontheory fordecidingwhattodo. However, therearealsogoodargumentsforusing logic, where we want have a formal semantics for a language and use the structureoflogicalargumentationwithlogicalvariablestorepresentmultiple individuals (things). This paper shows how decision theory and logical ar- gumentationcanbecombinedintoacoherentframework. TheIndependent ChoiceLogiccanbeviewedasfirst-orderrepresentationofbeliefnetworks with conditional probability tables represented as first-order rules, or as a abductive/argument-based logic with probabilities over assumables. Intu- itivelywecanuselogictomodelcausally(intermsoflogicprogramswith assumables). Given evidence, we abduce to the explanations, and then can predict what follows from these explanations. As well as abduction to the bestexplanation(s),fromwhichwecanboundprobabilities,wecanalsodo marginalization to reduce the detail of arguments. An example ofTillers is givenisusedtoshowthehowtheframeworkcouldbeusedforlegalreasoning. Thecodetorunthisexampleisavailablefromtheauthorswebsite. 1 1 Introduction Therearegoodnormativeargumentsforusinglogictorepresentknowledge(Nilsson, 1991;Poole,Mackworth&Goebel,1998). Theseargumentsareusuallybasedon reasoningwithsymbolswithanexplicitdenotation,allowingrelationsamongstin- dividuals,andpermittingquantificationoverindividuals. Thisisoftentranslatedas needing (at least) the first-order predicate calculus. Unfortunately, the first-order predicatecalculushasveryprimitivemechanismsforhandlinguncertainty,namely, theuseofdisjunctionandexistentialquantification. There are also good normative reasons for using Bayesian decision theory for decision making under uncertainty (Von Neumann & Morgenstern, 1953; Savage, 1972). Theseargumentscanbeintuitivelyinterpretedasseeingdecisionmakingas a form of gambling, and that probability and utility are the appropriate calculi for gambling. Itisimportanttonotethatdecisiontheoryhasnothingtosayaboutrepresenta- tions. Adopting decision theory doesn’t mean adopting any particular representa- tion. While there are some representations that can be directly extracted from the theory,suchastheexplicitreasoningoverthestatespaceortheuseofdecisiontrees, these become intractable as the problem domains become large; it is like theorem provingbyenumeratingtheinterpretations. Adoptinglogicdoesn’tmeanyouhave to enumerate interpretations, nor does adopting decision theory mean you have to useanalogousrepresentations. The independent choice logic can be seen as a representation that combines logicandBayesiandecisiontheory. First,Iwilltalkaboutknowledgerepresentation,inwhichtraditionthisrepre- sentationisbuilt. TheICLwillthenbepresentedfromthreealternateviewpoints: asasemanticframeworkintermsofchoicesmadebyagents,intermsoffirst-order beliefnetworks(Bayesiannetworks)andasaframeworkforaabductionandargu- mentation. IwillthenshowsomeaxiomsfromanexampleofTillers,andshowthe outputsofourprototypeimplementation. 1.1 KnowledgeRepresentation InordertounderstandwhatAIcanbringtotable,Figure1(from(Pooleetal.,1998)) showstheknowledgerepresentation(KR)view. Givenaproblemwewantasolution to,wefindarepresentationfortheproblem,whichwecancomputetofindananswer thatcanbeinterpretedasasolutiontotheproblem. Whenconsideringrepresentations,thereareanumberofoftencompetingcon- siderations: 2 solve problem solution represent interpret informal formal compute representation output Figure1: KnowledgeRepresentationFramework • Therepresentationshouldberichenoughtobeabletocontainenoughinfor- mationtoactuallysolvetheproblem. • The representation should be as close to the problem as possible. We want the representation to be as “natural” as possible, so that a small changes in the problem result in small changes in the representation. Ideally it should beclearwhatknowledgeisexpressedsothatwecandirectlyargueaboutthe correctnessoftheknowledgeexpressedintherepresentation. • We want the representation to be amenable to efficient computation. This doesnotnecessarilymeanthattherepresentationneedstobeefficientinthe worst case (because that usually invalidates the first consideration). Rather wewouldliketobeabletoexploitfeaturesoftheproblemforcomputational gain. Thismeansthattherepresentationmustbecapableofexpressingthose featuresoftheproblemthatcanbeexploitedcomputationally. Belief networks (or Bayesian networks) (Pearl, 1988) are of interest because they provide a language that is represents the sort of knowledge a person may have about a domain, is rich enough for many applications and because features of the representationcanbeexploitedforcomputationalgain. Unfortunately,theunderlyinglogicispropositional. Wecannothaverelations amongstindividualsaswecan,forexample,inthefirst-orderpredicatecalculus. The predicatecalculus,howeverhasonlyprimitivemechanismsforhandlinguncertainty (disjunctionandexistentialquantification). 2 The Independent Choice Logic Theindependentchoicelogic(ICL)isaknowledgerepresentationthatcanbeseen inanumberofdifferentways: 3 • It is a way to add Bayesian probability to the predicate logic. In particular wewanttohavealluncertaintytobehandledbyprobabilities(offordecision problems, as choices of various agents). So we start with logic programs, which can be seen as predicate logic with no uncertainty (no disjunctive assertions), and have independent choices that have associated probability distributions. The logic program specifies what follows from the choices made. • It is a way to lift belief networks into a first-order language. In particular a belief network can be seen as a deterministic system with noise inputs (Pearl, 1999; Pearl, 2000). The deterministic system is modelled as a logic program. This can be seen as writing the conditional probability tables in rule form (which also naturally expresses context-specific independence). Thenoiseinputsaregivenintermsofindependentchoices. • Itisasoundwaytohaveprobabilitiesoverassumptions. Explainingobserva- tionsmeansthatweuseabduction;wefindtheexplanations(setofhypothe- ses) that imply the observations, and from these we make predictions. This reasoningissoundprobabilisticinference. 2.1 FormalSemantics InthissectionwegivethelanguageandthesemanticsoftheICL.Thisissimplified slightly;thegeneralICLallowsfornegationasfailureinthelogicprograms(Poole, 2000) and choices by various agents (Poole, 1997) which lets us model decisions inadecision-theoreticorgame-theoreticsituation. Weassumethatwehaveatomicformulaeasinanormallogicallanguage. We usethePrologconventionofhavingvariablesinuppercase,andpredicatesymbol andfunctionsymbolsinlowercase. Aclauseiseitheranatomorisoftheform h ← a ∧ ···∧a 1 k where h is an atom and each a is an atom. If k = 0 we just write h. All of the i variablesareassumedtobeuniversallyquantifiedinthescopeoftheclause. Alogicprogramisasetofclauses. Weassumethelogicprogramisacyclic1. An atomic choice can be any atom that does not unify with the head of any clause. An alternative is a set of atomic choices. A choice space is a set of alternativessuchthatanatomicchoicecanbeinatmostonealternative. AnICLtheoryconsistsof 1Allrecursionsforvariable-freequerieseventuallyhalt. Wedisallowprogramssuchas{a←a} and{a←b,b←a} 4 F thefacts,anacycliclogicprogram C achoicespace P aprobabilitydistributionoverthealternativesinC. ThatisP : ∪C → [0,1] 0 0 suchthat (cid:88) ∀A ∈ C P (c) = 1 0 c∈A The semantics is defined in terms of possible worlds, and a probability distri- butionoverpossibleworlds. Herewepresentthesemanticsforthecaseofafinite choicespace,wherethereareonlyfinitelymanypossibleworlds. Themoregeneral caseisconsideredinotherplaces(Poole,1997;Poole,2000). A total choice for choice space C is a selection of exactly one atomic choice fromeachalternativeinC. Thereisapossibleworldforeachtotalchoice. Whatistrueinapossibleworld isdefinedbytheatomschosenbythetotalchoicetogetherwiththelogicprogram. (The acyclicity guarantees there there is a single model for each possible world). The probability of a possible world is the product of the values P (c) for each c 0 selectedbytheatomicchoices. The probability of a proposition is the sum of the probability of each possible worldinwhichthepropositionistrue. 2.2 ICLandBeliefnetworks It may seem that, with independent alternatives, that the ICL is restricted in what it can represent. This is not the case; in particular it can represent anything the is representable by a Belief network. Moreover the translation is local, and (if all alternatives are binary) there is the same number of alternatives as there are free parametersinthebeliefnetwork. For example, if we had binary variables A, B and C, with domains {a,¬a}, {b,¬b}and{c,¬c},whereBandC aretheparentsofA,wewillhaverulessuchas a ← b∧¬c∧aifbnc whereaifbncisanatomicchoicewhereP (aifbnc)hasthesamevalueasthecondi- 0 tionalprobabilityasP(a|b,¬c)inthebeliefnetwork. Thisgeneralizestoarbitrary discretebeliefnetworksintheanalogousway(Poole,1993b). Thisrepresentationletsusnaturallyspecifycontext-specificindependence(Poole, 1997),where,forexample,AmaybeindependentofC whenBhasvalue¬bbutis dependentwhenBhasvalueb. 5 Moreimportantly,thismappingletsusseetherelationshipofbeliefnetworksto logicallanguages. Thelogicprogramsarestandardlogicprograms(theycaneven havenegationasfailure(Poole,2000)). Viewingbeliefnetworksaslogicprograms givesusanaturalwaytoliftthemtothefirst-ordercase(i.e.,withlogicalvariables universallyquantifiedoverindividuals). 2.3 ICL,AbductionandLogicalArgumentation The ICL can also be seen as a language for abduction. In particular, if all of the atomic choices are assumable (they are abducibles or possible hypotheses). An explanationforgisaconsistentsetofatomicchoicesthatimpliesg. Consistency means that there is no more than one atomic choice from any alternative. An explanationcanbeseenasanargumentbasedonexplicitassumptionsaboutwhat istrue. Eachoftheseexplanationshasanassociatedprobabilityobtainedbycomputing theproductoftheprobabilitiesoftheatomicchoicesthatmakeuptheexplanation. Theprobabilityofgcanbecomputedbysumming2 theprobabilitiesoftheexpla- nationsforg(Poole,1993b;Poole,2000). IfwewanttodoevidentialreasoningandcomputeP(g|obs),wenoticethatthis isP(g∧obs)/P(obs). Intermsofexplanations, wecanfirstfindtheexplanations for obs (which would give us P(obs)) and then try to extend these explanations to also explain g (this will give us P(g∧obs)). Intuitively, we explain all of the explanationsandseewhatthesealsopredict. Wecanalsoboundthepriorandposteriorprobabilitiesbygeneratingonlyafew of the most plausible explanations (either top-down (Poole, 1993a) or bottom-up (Poole, 1996)). Thus we can go inference to the best explanations to do sound (approximate)probabilisticreasoning. 2.4 ReasoningintheICL TodoreasoningintheICLwecaneitherdo • variable elimination (marginalization or partial evaluation) to simplify the model(Poole,1997). Wesumoutvariablestoreducethedetailoftherepre- sentation. 2Thisassumesthebodiesfortherulesforeachatomaaremutuallyexclusive. Thisisacommon practice in logic programming and the rules obtained from the translation from belief networks havethisproperty. Weneedtodosomethingabitmoresophisticatediftherulesarenotdisjoint (Poole,2000). 6 • Generatingexplanationstoboundtheprobabilities. Notethatifwegenerate alloftheexplanationswecouldcomputetheprobabilitiesexactly,butthere arecombinatoriallymanyexplanations. Inthedescriptionofwhatfollows,Ionlydothesecond. 3 Tillers’Example PeterTillerspresentedanexampleofjudicialproofinthequandaryofAbleAttorney. InthissectionIpresentarepresentationofpartofthisexampletoshowhowtheICL couldbeusedforthissortofproblem. AcompletelistingoftheICLrepresentation isgivenbelow. Thefixedwidthfontgivestheactualinput. Allofthebackgroundknowledgeisgivenintheformofrulesandalternatives. Theparticularsofthecaseisgivenintermsoftheobservations. Notealsothattheaxiomatizationisclumsyinanumberofareas. Inparticular theICLprovidesnofacilitiesfordealingwithtime(allofthetimesinthestoryare ignored), modalities(e.g., reasoningaboutsaying, believing, thinking, possibility, obligation,etc.),aggregates(e.g.,probabilitiesthatdealwithreasoningaboutpop- ulationsizes),dealingwithlanguage(e.g.,understandingwhataviciousSOBmay be). Finally the numbers are arbitrary3 and the actual rules are pretty stupid. This axiomatizationisonlyintendedtogiveanideaofwhatcanbedone. 3.1 Observations Beforewegivetheaxiomatization,theobservationsthatwedealwithare: • says(peter,wentto(peter,hvstore)) PetersaysthathewenttotheHappyValleyStore. • says(peter,clerk_at(harry,hvstore)) PetersaysthatHarrywasaclerkattheHappyValleyStore • says(peter,vicious_sob(harry)) PetersaysthatHarryisaviciousSOB. • says(peter,observed(peter,blinding_flash)) Petersaysthatheobservedablindingflash. 3Itunedthemabitbecauseinmyfirstattemptsitwasalwaysmuchmorelikelythatsomeonewas lyingthantheyweretruthfulwhentheysaidsomeunlikelyeventoccurred. 7 • says(peter,says(doctor,shot(peter))) Petersaidthatthedoctorsaidhewasshot. • says(peter,says(newspaper,disappeared(harry))) PetersaidthatthenewspapersaidHarrydisappeared. Weasktoexplaintheconjunctionofthesetodeterminethemostlikelyexplanations andtoderiveconditionalprobabilities. 3.2 WitnessHonesty Thefirstcollectionofclausesspecifieswhysomeonemaysaysomething. Wedivide the people into honest and dishonest people. Honest people rarely say deliberate lies. While we may assume people are honest, once they have said a few lies, we mayassumeit’smorelikelytheyaren’thonest. [Recallthattheuppercaseletters areuniversallyquantifiedvariables.] says(P,F) <- thinks_true(P,F) & relevant(P,F) & honest(P) & truthful_h(P,F). says(P,F) <- thinks_true(P,F) & relevant(P,F) & dishonest(P) & truthful_h(P,F). says(P,F) <- honest(P) & untruthful_h(P,F). says(P,F) <- dishonest(P) & untruthful_d(P,F). Heretruthful_h(P,F)istheatomicchoicethatspecifiestheprobabilitythatPwho ishonestwillsaysomethingF thattheythinkistrueandrelevant. Thefollowingspecifythealternativesandthecorrespondingprobabilities: random([relevant(P,F):0.05,irrelevant(P,F):0.95]). random([honest(P):0.999,dishonest(P):0.001]). random([truthful_h(P,F):0.9999,untruthful_h(P,F):0.0001]). random([truthful_d(P,F):0.998,untruthful_d(P,F):0.002]). 8 Whethertheythinksomethingistruedependsonwhethertheyaremistaken. thinks_true(P,F) <- true(F) & notmistaken_t(P,F). thinks_true(P,F) <- false(F) & mistaken_f(P,F). random([mistaken_t(P,F):0.02,notmistaken_t(P,F):0.98]). random([mistaken_f(P,F):0.06,notmistaken_f(P,F):0.94]). Wehaven’taxiomatisedwhatmaybefalse. 3.3 Whatistrue Formostthingsweacceptthattheyarejusttrue,withoutanydeeperexplanation4. For other things (e.g., why someone was shot or they disappeared) we look for deeperexplanations. true(X) <- just_true(X). true(says(P,X)) <- says(P,X). true(shot(P)) <- shot(X,P). true(disappeared(X)) <- left_for_no_reason(X). true(disappeared(X)) <- disappeared_when_criminal(X) & committed_crime(X). random([disappeared_when_criminal(X):0.8, stayed_when_criminal(X):0.2]). random([left_for_no_reason(P):0.001,open_in_whereabouts(P):0.999]). Ifsomeoneisshot,wewanttoexplainthemeansandopportunityaswellasthe motive. Howevernoteveryonewhohasmeansandopportunityandmotiveactually shoots. Wealsoneedtoassumethattheyactuallyshot. shot(X,P) <- means_opportunity_to_shoot(X,P) & 4Notethatthetruepredicateisonlyanartifactofonlywantingtoquantifyoverindividuals. We reifytherelationsthatpeoplemaysayaretrue. 9 motive_to_shoot(X,P) & actually_shot(X,P). random([actually_shot(X,P):0.01,didnt_actually_shoot(X,P):0.99]). Someonehasthemeansandopportunitytoshootsomeoneelseiftheyareboth atthesameplace: means_opportunity_to_shoot(X,P) <- at(X,L) & at(P,L). at(X,L) <- true(clerk_at(X,L)). at(X,L) <- true(wentto(X,L)). ThefactthesomeoneisaviciousSOBmaybeamotivetoshoot. Butthevicious SOBonlyhasamotivetoshootsomepeople. motive_to_shoot(X,P) <- true(vicious_sob(X)) & vicious_sob_shot(X,P). motive_to_shoot(X,P) <- wanted_money(X) & had_money(P). random([vicious_sob_shot(X,P):0.2, vicious_sob_didnt_shoot(X,P):0.8]). The most likely explanation for a blinding flash is a picture taken. Another explanationmaybethatthepersonisshot. true(observed(X,blinding_flash)) <- picture_taken(X). true(observed(X,blinding_flash)) <- true(shot(X)). random([picture_taken(X):0.06,no_picture_taken(X):0.94]). Apersoncommittedacrimeiftheyshotsomeone: committed_crime(X) <- shot(X,P). Therearesomethingswejustacceptwithoutexplanation: 10
Description: