The Application of Bayesian Networks in the Domain of Theft Alarm Analysis Alex Bijsterveld S0520195 Artificial Intelligence Radboud University Nijmegen MSc Thesis in Artificial Intelligence Supervisors Radboud University: dr. ir. Johan Kwisthout Donders Institute for Brain, Cognition, and Behaviour Radboud University Nijmegen dr. Iris van Rooij Department of Artificial Intelligence Donders Institute for Brain, Cognition, and Behaviour Radboud University Nijmegen Supervisors Allsetra B.V.: Steven Hoen Simon van der Linde Raymond van Dorresteijn External examiner: dr. Marina Velikova Department of Model-Based System Development Institute for Computing and Information Sciences Radboud University Nijmegen March 2013 Abstract In connection with the application of Bayesian networks in several domains like medical diagnoses and weather forecasting, this thesis introduces the use of Bayesian networks into the field of theft alarm analysis. It is investigated if Bayesian networks could be able to assist in making judgments about incoming alarms from vehicles using historical alarm data available from those same vehicles. The model is tested in the real world environment of Allsetra, which provides track- and-trace solutions for vehicles using built-in electronics. The test results give an insight on what percentages of false alarms could be filtered out using Bayesian networks. 2 Acknowledgements First, I want to thank Johan Kwisthout and Iris van Rooij for guiding me during my research and the writing of my thesis. Second, I want to thank Allsetra for the opportunity to do my master’s internship at their place. Within Allsetra, my special thanks go out to Simon van der Linde en Raymond van Dorresteijn for their technical support and their helpful assistance. I also want to thank Steven Hoen for his support and for bringing me and Allsetra together. 3 Contents 1 Introduction 5 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Medical domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Industrial domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Research setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.5 Scientific aims and relevance . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Preliminaries 10 2.1 Bayesian networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Modeling 13 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Network types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Static Bayesian network specifications . . . . . . . . . . . . . . . . . . 15 3.4 Questionnaire for probabilities . . . . . . . . . . . . . . . . . . . . . . 17 3.5 Prior Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.6 Dynamic Bayesian network specifications . . . . . . . . . . . . . . . . 21 3.7 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.8 Data Allsetra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4 Validation 23 4.1 Static Bayesian network results . . . . . . . . . . . . . . . . . . . . . . 23 4.2 Dynamic Bayesian network results . . . . . . . . . . . . . . . . . . . . 25 5 Discussion 27 5.1 Allsetra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.2 Innovations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.3 Future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 A Questionnaire 32 A.1 Original questionnaire for domain experts in Dutch . . . . . . . . . . . 32 A.2 Original questionnaire for domain experts in English . . . . . . . . . . . 38 B Source code 44 B.1 Static network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 B.2 Dynamic network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4 1 Introduction Bayesian networks are formal models for representing, and reasoning with, uncertain information. Such models have proven useful as decision support systems in several medical situations where information at hand was uncertain and decisions nevertheless needed to be made. They have also been employed in other areas where diagnoses are to be made by domain experts. However, they have never been applied in the domain of theft alarm analysis. This domain differs from, e.g., the medical domain, as the genuine alarms (in the medical domain the persons with the disease) do not have specific symptoms to distinguish them. In this thesis, the Bayesian network formalism is used to build a system that can assist in diagnosis concerning the validity of theft alarms of vehicles. To achieve this, certain challenges need to be dealt with. At first, patterns of reoccurring circumstances need to be found in the historical data of earlier alarms to make decisions about a new incoming alarm. Second, expert knowledge is needed to make estimations on the probability that an alarm is valid or false. At last, valid alarms should never be diagnosed as false, as this could result in a stolen vehicle. 1.1 Background Many companies use call centers to help and assist customers with problems. Large companies with a lot of customers will need a lot of employees to cover for these large numbers of customers. To decrease the number of customers calling to the support center, and thus the number of employees needed, some companies create an online support system on their website. Questions that are often asked by customers are gathered and explained on the website, resulting in fewer phone calls to the call center; sometimes, techniques from artificial intelligence are used to parse natural language queries. Not all call centers, however, are set up for customers with questions that need to be answered. Some call centers receive automatic messages (e.g., through SMS) with warnings about occurring situations. An example of such message system can be found at some water boards in the Netherlands. SMS-alarms are sent out when the flow rate of the water becomes too high. These specific type of call centers only become active when an alarm message is received. Such an alarm can be sent from all sorts of devices and with all sorts of reasons. At some of these call centers, a lot of these alarms could be triggered, with no actual emergency situation occurring. Then, only a small number of alarms would be truly genuine. Because of the large number of false alarms, employees could become less focused and the risk of making mistakes in case of a genuine alarm could increase. To decrease the number of false alarms arriving at the call center, artificial intelligence techniques could be used to filter these alarms beforehand. With these techniques human methods to carry out certain tasks could be mimicked or even improved. This way, part of the work that needed to be done by employees could now be done by a computer. In this thesis, a particular technique in artificial intelligence – namely, Bayesian networks – is used to aid in diagnosis of the validity of an incoming alarm. Ben-Gal (2007) shows that a Bayesian network, also known as a belief or probabilistic network, could be a suitable model to estimate the probability of the validity of an alarm. Especially in an uncertain domain where it is not self-evident whether an incoming alarm is true or false. 1.2 Medical domain Earlier research shows the use of probabilistic networks for diagnosing certain medical situations. A medical situation can, at first sight, in some way, be compared with diagnosing a theft alarm, because experts in both domains intent to recognize certain indications that usually denote the presence of a disease or a theft. Later on in this chapter I will also show an 5 important difference between the two domains, which makes this novel research of particular applied interest. In the next paragraphs three examples of research concerning the use of probabilistic networks in the medical domain will be reviewed. At first, De Bruijn, Schurink, and Hoepelman (2000) introduced a probabilistic and decision-theoretic system that aims to assist clinicians in diagnosing and treating patients with pneumonia in the intensive-care unit. Its underlying probabilistic network model includes temporal knowledge to diagnose pneumonia on the basis of the likelihood of micro-organisms causing disease combined with the symptoms and signs actually present in the patient. An optimal antimicrobial therapy is selected by balancing the expected efficacy of the treatment against the spectrum of antimicrobial treatment. Expert knowledge was used as a basis for the models. Second, Van der Gaag, Renooij, Witteman, Aleman, and Taal (2002) came up with a decision-support system for patient-specific therapy selection for oesophageal carcinoma. The system consists of a probabilistic network that describes the characteristics of oesophageal carcinoma and the pathophysiological processes of invasion and metastasis. The probabilities required for the network were retrieved from experts using a new method that combines the ideas of transcribing probabilities as fragments of text and of using a scale with both numerical and verbal anchors for marking assessments. Using data from 185 patients to test the quality of the probabilities obtained, they found that, for 85% of the patients, the network yielded the correct outcome. At last, Wasyluk, Onisko, and Druzdzel (2001) described a probabilistic causal model for diagnosis of liver disorders named HEPAR II. It is based on expert knowledge combined with clinical data retrieved from medical records. The Bayesian network captured the causal interactions among various risk factors, diseases, symptoms, and test results. Its main applications were to assist in diagnosing and to train beginning diagnosticians. In all three examples of applications of Bayesian networks in the medical domain three interesting points come up. First, expert knowledge is gathered to describe dependencies and estimate probabilities used to model chances of occurrence of a certain disease. Second, all three Bayesian models are used to assist in a diagnosing situation. Third, they all assist in diagnosing the presence of a medical condition, rather than the absence of such a condition. 1.3 Industrial domain Besides the application of Bayesian networks in the medical domain there also have been some applications of Bayesian networks providing diagnoses in industrial circumstances, which are also connected with the research in this thesis. A lot of AI research is done in academia using standard problems and datasets. It is a challenge to apply an AI technique, like Bayesian networks, in an industrial setting. Little guideline support for modeling systems in an industrial context and little theory of using Bayesian networks in an industrial setting is available. In the next paragraphs four examples of research concerning the use of probabilistic networks in an industrial setting will be reviewed. At first, Dey and Stori (2005) created a Bayesian belief network to diagnose the root cause of process variations in a production machining environment. They used multiple process metrics from multiple sensor sources in sequential machining operations to identify this root cause and provided a probabilistic confidence level of the diagnosis. Second, Hommersom and Lucas (2010) argue that conventional control engineering solutions increasingly fall short. Conventional control engineering techniques assume that a physical system’s dynamic behavior can be completely described by means of a set of equations. On the one hand, this shortcoming exists due to the modern systems that are often of a high complexity and incompletely understood; on the other hand, it exists due to the observations obtained from sensors during runtime that give an incomplete picture. They state 6 that probabilistic reasoning would allow one to deal with these sources of incompleteness, yet in the area of control engineering such AI solutions are rare. In their paper they show that it is possible to use a Bayesian network to control a complex system’s behavior. Third, Cofiño, Cano, Sordo, and Gutiérrez (2002) chose to use a Bayesian network to model spatial and temporal dependencies among a network of meteorological stations. The main reason to support this decision was that standard approaches like analogue techniques and neural networks do no not consider all available information. Cofiño et al. illustrate the efficiency of the use of Bayesian networks by obtaining precipitation forecasts for 100 meteorological stations. At last, Kennett, Korb, and Nicholson (2001) examined the use of Bayesian networks for predicting sea breezes. They developed some networks based on expert elicitation and some learned by two machine learning programs. The results are compared with a pre- existing rule-based system. The Bayesian networks clearly outperformed the rule-based system. These previous studies show that Bayesian networks are often used for diagnostic purposes and they perform well in a wide range of industrial environments. In this thesis, the application of Bayesian networks is extended to a novel industrial domain: diagnosing the validity of incoming theft alarms. 1.4 Research setting The research on which this thesis is built was mainly executed in the environment of Allsetra. Allsetra (http://www.allsetra.nl/) is a company that provides complex track and trace solutions for vehicles, boats and operating equipment in construction using built-in electronics. These built-in electronics are packed in a box which is hidden in the vehicle. This box registers all sorts of information like GPS coordinates, driving speed, if the engine of the vehicle is on and if the vehicle is moved. An important functionality of this box is the ability to send an alert to the Service Center when something is happening to the secured object that should not happen. The object could for example be moved at a time it should not move or it could be moved to a place where it should not go. When the Service Center receives such an alert it needs to find out whether the object is being stolen and the police needs to be informed or that the alert is false and no further actions need to be taken. Many incoming alarms are found to be unnecessary or redundant. Based on human interpretation and intervention, some of the alarms can be eliminated in advance. Reoccurring events or a certain pattern of alarms can, for example, be interpreted as false alarms. The remaining alarms need to be handled by the Service Center which gives cause to an extensive load of work and a lot of customers are unnecessarily contacted. Allsetra is interested in an intelligent system that is able to incorporate human interpretation and intervention into the model to filter out some of the easy cases so that the focus of the Service Center is with the harder cases that need to be checked upon. As a result, the amount of false emergency calls to the customers and the amount of messages needed to be interpreted by the co-workers of the Service Center should be considerably reduced. A system like this will have to deal with uncertain information and combined with other information it should give an outcome indicating the probability that an alarm signals a genuine alarm situation. 1.5 Scientific aims and relevance The new domain this research is focusing on could be compared to the medical domain reviewed earlier in this introduction. A genuine alarm could be compared with a disease that a patient could be diagnosed for. By taking several symptoms and signs into account the probability that a specific disease is at hand is calculated. However, a genuine alarm does not 7 have specific symptoms and signs and it does not occur often enough to find reliable regularities. Therefore, genuine alarms cannot easily be recognized. The number of alarms that could be genuine however, could be decreased by recognizing the false alarms. False alarms occur very often and are caused by human beings that more than once do not follow the agreed rules. Another difference between the medical and the current domain is that a Bayesian model in the medical domain is intended to fit on a large group of people. The idea is that, when every person in that group would be infected with a certain disease, every person’s symptoms would be comparable. However, in the current domain this idea does not hold. The circumstances (in the medical domain the symptoms) in which a false alarm (the disease) occurs are different for each vehicle (the people). When transposed to the medical domain, this could mean that for a certain disease one person has the symptoms high body temperature and high blood pressure, while another person has the symptoms low body temperature and low blood pressure. The goal of this research project is to investigate if a Bayesian network could be used to aid in diagnosis of incoming theft alarms. Practically, in the Allsetra environment, this means that the number of messages being handled by the co-workers of the Service Center can be reduced without missing any genuine alarm situations. To achieve this, some important research questions will need to be answered. The first question is about the unique necessity in this domain to deal with the different patterns of circumstances in which alarms occur for each vehicle. An example of such a pattern could be, if two alarms from the same vehicle occurred at the same time and on the same day of the week. The question will be if it is possible to create a Bayesian model that fits every vehicle, while for each vehicle different instantiations of patterns apply. The second research question is about the probabilities that will be used by the Bayesian network to make judgments on incoming alarms. Because there is no objective information available about these probabilities, experts will have to give an insight in the domain of judging alarms. Keeping in mind that circumstances of alarms differ for each vehicle, the question will be if the experts still can provide enough information that can be used by the Bayesian network to make a correct probability judgment on the falseness of an incoming alarm. The last research question is about the constraint of Allsetra to avoid filtering out any true alarm case, which is very important in this domain. Such a constraint, where a certain classification cannot be missed, could also exist in other domains, but for some the consequences of breaking it could be worse than for others. For example, imagine the difference between a wrong weather forecast and a life threatening disease that is not recognized. Therefore, in this domain, an important research question is: is it feasible to have a Pareto optimal working network? Pareto efficiency or Pareto optimality was introduced by Pareto (1896) in economics, but it also has its applications in engineering, e.g., where there are several design objectives, some of which may be competing, and the goal is to choose a solution that maximizes benefits subject to the existing constraints. In the context of this research project, Pareto optimality means that as many false alarms as possible will be filtered out, without losing any true alarm cases. Because there will always be a finite possibility of having a valid alarm being judged as a false alarm, this definition of Pareto optimality should be slightly adjusted to: as many false alarms as possible will be filtered out, with a negligible chance of losing any true alarm cases. To be one hundred percent certain that no valid alarms would be filtered out, all alarms should be diagnosed as valid. As this will not lead to a reduction of alarms being handled by the Service Center, it is not a satisfying solution for Allsetra. To answer these research questions several challenges will need to be tackled. To test a network it will need to be implemented in the already existing environment of Allsetra. 8 Several domain experts will be questioned to gather knowledge about the probabilities used and the importance of information gathered. The test results of the different networks that were created during this project will be compared. What is their performance on the true positive alarm cases? How well do they filter out the alarm cases that actually are not an alarm (the true negatives)? How many cases are judged as an alarm, but actually are not (the false positives)? And how many false negative judgments are made? 1.6 Overview The remainder of this thesis is structured as follows. In Chapter 2, the Preliminaries section, the basics of Bayesian network theory are explained for readers with little knowledge about this subject. In Chapter 3, the Modeling section, the choices of design are made clear and the way parameters and data are retrieved is shown. In Chapter 4, the Validation section, results of the research are presented and lastly, in Chapter 5, the Discussion section, the results are discussed and the research questions are answered. 9 2 Preliminaries In this section, a concise introduction is provided for readers with little knowledge about Bayesian networks. For a more thorough discussion of this concept the reader is referred to standard textbooks like Pearl (1988) or overview articles such as Pearl and Russell (2000). 2.1 Bayesian networks Bayesian networks, belonging to the family of probabilistic graphical models, are used to represent knowledge about an uncertain domain and were first developed by Pearl. A Bayesian network consists of a graphical structure that models a set of stochastic variables, the conditional independencies among these variables, and a joint probability distribution over these variables. The graphical structure consists of nodes and edges, where the nodes represent random variables and the edges between the nodes represent probabilistic dependencies among the corresponding variables. Statistical and computational methods are used to estimate these conditional dependencies. Figure 1, adapted from Pearl (1988), is an example of such a Bayesian network. In this example the grass can be wet due to the sprinkler or the rain. The probability that the grass is wet, given that the sprinkler is activated and it rained, is P(W=true | S=true, R=true) = 0.99. When the sprinkler is off and it did not rain, the probability of wet grass is P(W=true | S=false, R=false) = 0. Different types of Bayesian networks exist. Diard, Bessière, and Mazer (2003) proposed a general-to-specific ordering of probabilistic modeling formalisms. The more general purpose models are Bayesian Networks, Dynamic Bayesian Networks, Recursive Bayesian Estimation, Hidden Markov Models, Kalman Filters, and Particle Filters, whereas the more problem oriented models focused on the field of robotics consist of Markov Localization, Decision Theoretic Planning, Bayesian Robot Programming, and Bayesian Maps. Bayesian networks are a primary method for dealing with probabilistic and uncertain information. They combine the theory of (independency in) probability distributions and the theory of graphs in order to yield efficient representation of probabilistic (in-)dependencies in stochastic variables. Dynamic Bayesian networks are an extension of Bayesian networks that can also deal with stochastic processes that change over time. Recursive Bayesian Estimation is the generic denomination for a class of numerous different probabilistic models of time series. Examples of this class are ‘filtering’, ‘prediction’, and ‘smoothing’. Hidden Markov Models and Kalman Filters are specializations of this Bayesian Filtering where ‘filtering’ refers to determining the distribution of a latent variable at a specific time, given all observations up to that time. Particle Filters may be seen as a specific implementation of this Bayesian Filtering where a set of differently weighted samples of the distribution (particles) is used to allow for approximate ‘filtering’. Markov Localization is Bayesian Filtering extended with control variables used in robotics where observation and movement play a large role. Decision Theoretic Planning is used in robotics to model a robot that has to plan and to execute a sequence of actions. Bayesian Robot Programming is applied to mobile robotics and Bayesian Maps are a generalization of Markov Localization. Some of these methods are particularly suited for specialized tasks, like filtering, which does not apply here; hence we need to use a more general network structure. When designing this structure, four main aspects could be considered. The first important decision to make is what variables to use in the network. These variables need to give relevant information concerning the truthfulness of the alarm. A sensitivity analysis, described in detail by Saltelli (2004), can help to find out if the chosen variables are the most important 10
Description: