ebook img

Spatiotemporal Reasoning about Epidemiological Data PDF

27 Pages·2008·0.72 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Spatiotemporal Reasoning about Epidemiological Data

Spatiotemporal Reasoning about Epidemiological Data ∗ ∗∗ Peter Revesz , Shasha Wu Keywords Epidemiology, knowledge-base, recursive definition, spatiotemporal data, visualization, West Nile Virus. Abstract Objective.Inthisarticle, weproposenew methods tovisualize andreason about spatiotemporal epidemiological data. Background. Efficient computerized reasoning about epidemics is important to public health and nationalsecurity, but it is a difficult task because epidemiological data are usually spatiotemporal, recursive, and fast changing hence hard to handle in traditional relational databases and geographic information systems. Methodology. We describe thegeneral methods of how to (1) store epidemiolog- ical data in constraint databases, (2) handle recursive epidemiological definitions, and (3) efficiently reason about epidemiological data based on recursive and non- recursive SQL queries. Results.Weimplement aparticularepidemiologicalsystem calledWeNiVISthat enables thevisualtracking ofandreasoningaboutthespreadoftheWest NileVirus epidemic in Pennsylvania. In the system, users can do many interesting reasonings based on the spatiotemporal dataset and the recursively defined risk evaluation function through the SQL query interfaces. Conclusions.Inthisarticle,theWeNiVISsystem isusedtovisualize andreason aboutthespreadofWest NileVirusinPennsylvania asasampleapplication.Beside this particular case, the general methodology used in the implementation of the system is also appropriate for many other applications. Our general solution for reasoning about epidemics and related spatiotemporal phenomena enables one to solve many problems similar to WNV without much modification. Preprint submitted to International Journal of Artificial Intelligence in Medicine 1 INTRODUCTION Infectious disease outbreaks are critical threats to public health and national security [5]. With greatly expanded travel and trade, infectious diseases can quickly spread across large areas causing major epidemics. Efficient computerized reasoning about epidemics is essential to detect their outbreak and nature, to provide fast medical aid to affected people and ani- mals, to prevent their further spread, and to manage them in other ways. Several characteristics of epidemics make them special in terms of computer reasoning needs. First, epidemiological data are usually some kind of spa- tiotemporal data, that is, they have a spatial distribution that changes over time. Second, epidemiological data are recursive in nature. This means that the best predictions of the spread of infections are based on earlier situations. Third, we need a fast response from any knowledge-base that contains epi- demiological data. A flexible information system that can be easily modified to model new epidemics is critical in assisting people to handle the outbreaks of new diseases. The above three characteristics in combination pose a difficult problem. Ge- ographic information systems generally can represent only static objects that do not change over time, or if they change, then they change only slowly, for example, the population density of counties. Such a slow change may be rep- resented in a geographic information system by a limited number of separate maps. However, continuous change over time is not easy to represent and is hard to reason about in geographic information systems. We propose new methods to visualize and reason about epidemiological data. The major contributions and novel features of our article are the following: • General method for recursively defined spatiotemporal models: We propose a new general method to model a class of recursively defined spatiotemporal concepts, which appear in many research areas including epidemiology. In this article, we extend the definition in [24] to allow linear combination of the measurements of the indicators and a different time delay for each indicator. • Recursive epidemiological definitions: ∗ Corresponding author. Complete Address: Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA. Tel.: +1- 402-472-3488; fax:+1-402-472-7767.email: [email protected] ∗∗ Complete Address: Department of Computer Science, Spring Arbor University, Spring Arbor, MI49283,USA. email:[email protected]. The work presented here was done while the author was at UNL. 2 We apply this new method to express the recursive epidemiological defi- nitions and predictions about the spread of infectious diseases. • Implementation using recursive SQL: TheProloglanguageisthechoiceforrecursivedefinitionsinmanyknowledge- base systems. However, Prolog is not good for querying spatiotemporal data. It is also less well-known than the widely-used Structured Query Language (SQL), which is the standard query language for both rela- tional and constraint databases. The latest SQL standard added to the SQL language a form of recursion, enabling the expression of the needed recursive definitions. It is expected that the latest SQL standard will be implemented in all major relational database products. As part of our contributions, we also implemented for the first time in the MLPQ [23] constraint database system, which is one of the most sophisticated con- straint database systems, the SQL recursive queries. • Epidemiological data stored in constraint databases: Relational databases and geographic information systems can not eas- ily manage epidemiological data because of their inherently spatiotem- poral nature. Constraint databases [10, 12, 22], which are very suitable for spatiotemporal data, were proposed as extensions of both relational databases and geographic information systems. There are software tools that can exportany relational databaseor geographic information system data into a constraint database [3, 2]. • WeNiVIS: The West Nile Virus Information System: We developed an example epidemic information system for reasoning about West Nile Virus infections. This system can show visually the spread of the epidemic and any other spatiotemporal data that may be generated by the system. We chose this example, because it has a typical infection pattern,it is currentlystill spreading throughtheUnited States, and data for it was readily available from Pennsylvania’s West Nile Virus Control Program [19]. The rest of the article is organized as follows: Section 2 describes some ba- sic concepts and related work. Section 3 describes the new general method for modeling recursively defined spatiotemporal concepts. Section 3.1 pro- poses a general recursive definition for spatiotemporal concepts. Section 3.2 describes the solution and optimization for the recursive definition using re- cursive SQL query language. Section 4 describes the source data we use for the West Nile Virus analysis (in Section 4.1), their interpolation and storage in a constraint database (in Section 4.2), and the West Nile Virus Information System (WeNiVIS) we developed for the WNV analysis (in Section 4.3). Sec- tion 5 presents major results and benefits of this project. Section 6 discusses some specific issues about our method and system. Finally, Section 7 gives some conclusions and directions for future work. 3 2 BASIC CONCEPTS AND RELATED WORK 2.1 Recursive Queries Wegiveonlyabriefintroductiontorecursivequeriesinrelationaldatabases[4, 6, 21, 26]. Figure 1 shows a relational database table that describes child- parent relationships. A recursive query on this table would be to find all the ancestors of David. Family Child Parent David Andrew David Jane Andrew Scott Andrew Mary Mary Tracy . . . . . . Fig. 1. Relationship of a family. The latest ANSI (American National Standards Institute) SQL Language al- lows a form of recursion, enabling the expression of the above recursive query. We implement the recursive SQL for the first time in the MLPQ constraint databasesystem.ThesyntaxoftherecursiveSQLintheMLPQsystemfollows the latest SQL standard with only a minor modification. A non-recursive SQL view definition is a statement of the form: create view V as B ; i i where V is a view name with attributes and B is an SQL statement that i i uses only input relations (tables). Such B s are called basic SQL expressions. i A recursive SQL view definition has the form: create view V with recursive as B union R ; i i i where V is a view name with attributes. Here V is defined using the union i i of a basic SQL expression B and a recursive SQL expression R , which may i i contain a reference to V or other non-recursive and recursive views. i 4 A sample recursive SQL query that finds all ancestors of David based on the table of Figure 1 can be expressed as follows: Query 2.1 Find all ancestors of David: create view DavidAncestors(Ancestor) with recursive as (select Parent from Family where Child = “David”) union (select F.Parent from Family as F, DavidAncestors as D where F.Child = D.Ancestor) Figure 2 displays the implementation of Query 2.1 in the MLPQ constraint database system. 2.2 Constraint Databases Concepts Aconstraintdatabaseisafinitesetofconstraintrelations.Aconstraintrelation is a finite set of constraint tuples, where each constraint tuple is a conjunction of atomic constraints using the same set of attribute variables [22]. Hence, constraints are hidden inside the constraint tables, and the users only need to understand the logical meaning of the constraint tables as an infinite set of constant tuples represented by the finite set of constraint tuples. Typical atomic constraints include linear or polynomial arithmetic constraints. The MLPQ system is a constraint database system that implements ratio- nal linear constraint databases and queries. MLPQ is the abbreviation for Management of Linear Programming Queries. Among other functionalities, it supports both SQL and Datalog queries, and minimum/maximum aggre- gation operators over linear objective functions [23]. It is a suitable tool for representing, querying, and managing spatiotemporal constraint databases. Other constraint database systems include CCUBE [1], CQA/CDB [7], and DEDALE [9], which could be also used. Li and Revesz [15] considered constraint-based visualization for spatiotempo- ral data but did not consider recursively defined concepts. Revesz and Wu [24] considered constraint-based visualization for recursively defined spatiotempo- 5 Fig. 2. User interface for recursive SQL in the MLPQ system. ral data, but they only consider one indicator with a fixed time delay. That is too simple for real epidemiological problems and need to be extended. In epi- demiology, oneinfectious diseasecommonlyhasseveralindicators(i.e.measur- able disease carriers) and different indicators may have different effectiveness with different delay times. 2.3 Interpolation Methods Ina2-Dspatialproblem,apoint-basedspatiotemporalrelationhastheschema of (x, y, t, w , w , ..., w ), where the attributes (x,y) specify point locations, 1 2 m t specifies a time instance, and w (1 ≤ i ≤ m) records the features of each i location. A point-based spatiotemporal data set only stores information of some sample points. To represent the features beyond those finite sample points, it is nec- essary to do spatiotemporal interpolation on them. A shape function based 6 spatiotemporal method [14, 15] was used to interpolates and translates the original point-based spatiotemporal information into a constraint relation. Li and Revesz [16, 15] did an extensive comparison and proved shape functions to be the best over the Inverse Distance Weighting (IDW) [25] and Krig- ing [11, 17] interpolaters in a test example concerning house price estimation. Figure3 shows a point-basedspatiotemporaldataset consisting ofthevertices shown there, and its “Delaunay Triangulation” network [8]. Fig. 3. The triangulated network of sample points in the state of Pennsylvania. 2.4 GIS Enhancement for Spatiotemporal Information Geographic Information Systems (GIS) are designed for static data and need to beenhanced to beableto reason aboutspatiotemporalinformation [13,29]. One such GIS enhancement is given by Theophilides et al. [28], who developed DYCAST, which is an epidemic spread prediction system based on spatiotem- poral interpolation. The DYCAST system was used to predict human West Nile Virus infections based on dead bird surveillance data. However, the DY- CAST system does not provide a flexible reasoning method. Another GISenhancement is given by Raffaet`a et al. [20], whouseMuTACLP, which is a temporal annotated constraint logic programming language. While in theory MuTACLP can describe spatial data by using constraints similarly to constraint databases [10, 12, 22], Raffaet`a et al. [20] are only interested in using MuTACLP on top of a GIS. The temporal annotations are simple, that is, they allow only to declare some atomic formula is true at a certain time, truethroughouta time interval, or truesometime during a time interval. MuTACLP is implemented based on Sicstus Prolog 3.8.3. In contrast to MuTACLP, we use more complex temporal conditions, i.e., we allow any linear constraint on the spatial variables x and y and temporal vari- able t, and our implementation is based directly on the MLPQ [23] constraint database system. 7 3 METHODOLOGY 3.1 General Definition for Recursively Defined Spatiotemporal Concepts ReveszandWuproposedageneraldefinitionforrecursively definedspatiotem- poral concepts in [24]. Unfortunately, that definition is too limited for our current need, because it only deals with one indicator with fixed one unit time delay. In epidemiology, one infectious disease commonly has several indicators (i.e. measurable disease carriers) and different indicators may have different effectiveness. The animal indicators also may predict ahead of the human in- fection with different delay times. To consider these extra complications, we extend their definition as follows: Definition 3.1 Let M (x,y,t) represent the amount of indicator i measured i at location (x,y) at time unit t. For each indicator i, let w be theeffectiveness i weight and d be the time delay to indicate property P. Then location (x,y) i has property P during time unit t if (1) w M (x,y,t−d ) ≥ k or i i i (2) k ≤ w M (x,y,t−d ) < k P1 i i i and the location has property P during time unit t−1. P Part (1) of Definition 3.1 says that property P holds at time t if the linear combination of measurements of the indicators at the appropriate previous times (i.e., with their respective time delays) is greater than some threshold value k. Part (2) says that P also holds in those areas where the same linear combination is only between k and k but already had property P at time 1 t−1. Example 3.1 The West Nile virus has four major types of disease indicators: wild bird as indicator 1, mosquito as indicator 2, chicken as indicator 3 and horse as indicator 4. Figure 4 suggests that the onset of human infections generally occurs three weeks later than the onset of wild bird infections, one week later than the onset of mosquito infections, about six weeks after the onset of chicken infections and almost at thesame time as thehorseinfections. Hence, we can assign the time delay for these four indicators as follows: d = 3, d = 1, d = 6, d = 0 1 2 3 4 Considering that big animals usually contain more virus than small animals contain, we may assign the effectiveness weight of WNV infection to the four major carriers according to their relative body sizes as follows: w = 1, w = 0.2, w = 1.5, w = 5 1 2 3 4 8 1000(cid:13) 1000(cid:13) human(cid:13) human(cid:13) bird(cid:13) mosquito(cid:13) 100(cid:13) 100(cid:13) 10(cid:13) 10(cid:13) 1(cid:13) 1(cid:13) 23(cid:13) 26(cid:13) 28(cid:13) 30(cid:13) 32(cid:13) 34(cid:13) 36(cid:13) 38(cid:13) 40(cid:13) 42(cid:13) 44(cid:13) 46(cid:13) 48(cid:13) 50(cid:13) 23(cid:13) 26(cid:13) 28(cid:13) 30(cid:13) 32(cid:13) 34(cid:13) 36(cid:13) 38(cid:13) 40(cid:13) 42(cid:13) 44(cid:13) 46(cid:13) 48(cid:13) 50(cid:13) 1000(cid:13) 1000(cid:13) human(cid:13) human(cid:13) horse(cid:13) chicken(cid:13) 100(cid:13) 100(cid:13) 10(cid:13) 10(cid:13) 1(cid:13) 1(cid:13) 23(cid:13) 26(cid:13) 28(cid:13) 30(cid:13) 32(cid:13) 34(cid:13) 36(cid:13) 38(cid:13) 40(cid:13) 42(cid:13) 44(cid:13) 46(cid:13) 48(cid:13) 50(cid:13) 23(cid:13) 26(cid:13) 28(cid:13) 30(cid:13) 32(cid:13) 34(cid:13) 36(cid:13) 38(cid:13) 40(cid:13) 42(cid:13) 44(cid:13) 46(cid:13) 48(cid:13) 50(cid:13) Fig. 4. The comparison of time lags between the infections on human and vari- ous types of animal hosts (X-coordination represents the week in year 2002 and Y-coordination is the number of reported infectious cases). We assume that the infected animals reported at time t−d are representative i of the entire animal population at the same time and part of the unreported infected animals at that time may continue to be infected at least until time t. Suppose we would like to find the areas on a map that have a high risk of human WNV infections at time t. Let k = 8 and k = 4, and M (x,y,t) be as 1 i in Definition 3.1. First, we compute the linear combination of the measurements of the indica- tors for each area as follows: w = w M (x,y,T −d ) i i i = MP (x,y,T −3)+0.2M (x,y,T −1)+1.5M (x,y,T −6)+5M (x,y,T) 1 2 3 4 Then the area is at high risk of human WNV infections at week t if during 9 week t it has (1) w ≥ 8 or (2) 4 ≤ w < 8 and it is at high-risk during week t−1. 3.2 Solution and Optimization The general solution for the problem defined in Definition 3.1 can be formally expressed as follows. Given relations M (x,y,t,m) where the value m represents the measurement i of indicator i at location (x,y) at time t for each 1 ≤ i ≤ n, let us define the following: A = {(x,y,t) | M (x,y,t−d ,m ) ∧ ... ∧ M (x,y,t−d ,m ) ∧ 1 1 1 n n n w m +...+w m ≥ k } 1 1 n n B = {(x,y,t) | M (x,y,t−d ,m ) ∧ ... ∧ M (x,y,t−d ,m ) ∧ 1 1 1 n n n k ≤ w m +...+w m < k } 1 1 1 n n Where A is the part of M ,...,M where the linear combination of measure- 1 n ments of all indicators is greater or equal to k, and B is the part that the linear combination of measurements of all indicators is between k and k . The 1 above definition can be implemented in the SQL query language as follows: Query 3.1 SQL query for linear combination and time delay: create view A(x,y,t) as select M .x, M .y, t 1 1 from M , ..., M 1 n where w M .m+...+w M .m ≥ k, 1 1 n n M .t = t−d , ..., M .t = t−d , 1 1 n n M .x = ... = M .x, 1 n M .y = ... = M .y 1 n Relation A returns the spatiotemporal locations (x,y,t) that satisfy part (1) 10

Description:
Objective. In this article, we propose new methods to visualize and reason about spatiotemporal epidemiological data. Background. Efficient computerized reasoning about epidemics is important to public health and national security, but it is a difficult task because epidemiological data are usually
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.