ebook img

FO2(<,+1,~) on data trees, data tree automata and branching vector addition systems PDF

0.32 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview FO2(<,+1,~) on data trees, data tree automata and branching vector addition systems

LogicalMethodsinComputerScience Vol.12(2:3)2016,pp.1–28 Submitted Jan. 5,2015 www.lmcs-online.org Published Apr.25,2016 FO2(<,+1,∼) ON DATA TREES, DATA TREE AUTOMATA AND BRANCHING VECTOR ADDITION SYSTEMS FLORENTJACQUEMARD,LUC SEGOUFIN,ANDJE´RE´MIE DIMINO INRIAand ENS Cachan e-mail address: {florent.jacquemard, luc.segoufin}@inria.fr Abstract. A data tree is an unranked ordered tree where each node carries a label from a finite alphabet and a datum from some infinite domain. We consider the two variable first order logic FO2(<,+1,∼) over data trees. Here +1 refers to the child and the next sibling relations while < refers to the descendant and following sibling relations. Moreover, ∼ is a binary predicate testing data equality. We exhibit an automata model, denoted DTA#, that is more expressive than FO2(<,+1,∼) but such that emptiness of DTA# and satisfiability of FO2(<,+1,∼) are inter-reducible. This is proved via a model of counter tree automata, denoted EBVASS, that extends Branching Vector Addition SystemswithStates(BVASS)withextrafeaturesformergingcounters. Weshowthat,as decisionproblems,reachabilityforEBVASS,satisfiabilityofFO2(<,+1,∼)andemptiness of DTA# are equivalent. Introduction Adatatreeisanunrankedorderedtreewhereeachnodecarriesalabelfromafinitealphabet and a datum from some infinite domain. Together with the special case of data words, they have been considered in the realm of program verification, as they are suitable to model the behavior of concurrent, communicating or timed systems, wheredata can represent e.g., process identifiers or time stamps [1, 6, 7]. Data trees are also a convenient model for XML documents [4], where data represent attribute values or text contents. Therefore finding decidable logics for this model is a central problem as it has applications in most reasoning tasks in databases and in verification. Several logical formalisms and models of automata over data trees have been proposed. Many of them were introduced in relation to XPath, the standard formalism to express properties of XML documents. Although satisfiability of XPath in the presence of data values is undecidable, automata models were introduced for showing decidability of several data-aware fragments [12, 4, 11, 10, 13]. As advocated in [4], the logic FO2(<,+1,∼) can be seen as a relevant fragment of XPath. Here FO2(<,+1,∼) refers to the two-variable fragment of first order logic over 2012 ACM CCS: [Theory of computation]: Formal languages and automata theory—Automata over infiniteobjects; Logic—Logic and verification. Key words and phrases: data automata, counterautomata, two-variable logic. LOGICALMETHODS (cid:13)c F.Jacquemard,L.Segoufin,andJérémieDimino lINCOMPUTERSCIENCE DOI:10.2168/LMCS-12(2:3)2016 (cid:13)CC CreativeCommons 2 F.JACQUEMARD,L.SEGOUFIN,ANDJE´RE´MIEDIMINO unranked ordered data trees, with predicates for the child and the next sibling relations (+1), predicates for the descendant and following sibling relations (<) and a predicate for testing data equality between two nodes (∼). Over data words, FO2(<,+1,∼) was shown to be decidable by a reduction to Petri Nets or, equivalently, Vector Addition Systems with States (VASS) [5]. It is also shown in [4] that reachability for Branching Vector Addition Systems with States, BVASS, reduces to satisfiability of FO2(<,+1,∼) over data trees. The model of BVASS, extends VASS with a natural branching feature for running on trees, see [15] for a survey of the various formalisms equivalent to BVASS. As the reachability of BVASS is a long standing open problem, showing decidability of finite satisfiability for FO2(<,+1,∼) seems unlikely in the near future. This paper is a continuation of the work of [5, 4]. We introduce a model of counter automata, denoted EBVASS,andshowthatsatisfiability ofFO2(<,+1,∼)isinter-reducible toreachabilityinEBVASS.ThismodelextendsBVASSbyallowingnewfeaturesformerging counters. In a BVASS the value of a counter at a node x in a binary tree is the sum of the values of that counter at the children of x, plus or minus some constant specified by the transition relation. In EBVASS constraints can be added modifying this behavior. In particular (see Section 3 for a more precise definition) it can enforce the following at node x: one of the counters of its left child and one of the counters of its right child are decreased by the same arbitrary number n, then the sum is performed as for BVASS, and finally, one of the resulting counters is increased by n. The reduction from FO2(<,+1,∼) to EBVASS goes via a new model of data tree automata, denoted DTA#. Our first result (Section 2) shows that languages of data trees definable in FO2(<,+1,∼) are also recognizable by DTA#. Moreover the construction of the automaton from the formula is effective. Ourautomata model is a non-trivial extension from data words to data trees of the Data Automata (DA) model of [5], chosen with care in order to be powerful enough to capture the logic but also not too powerful in order to match its computational power. The obvious extensions of DA to data trees are either too weak to capture FO2(<,+1,∼) or too expressive and undecidable (see Proposition 1). Here we consider the strongest of these extensions, called DTA, which is undecidable, and restrict it into a model called DTA# with an associated emptiness problem is equivalent to satisfiability of FO2(<,+1,∼). Our second result (Section 3) shows that the emptiness problem for DTA# reduces to the reachability problem for EBVASS. Finally we show in Section 4 that the latter problem can be reduced to the satisfiability of FO2(<,+1,∼), closing the loop. Altogether, this im- plies that showing (un)decidability of any of these problems would show (un)decidability of the threeof them. Although this question of (un)decidability remains open, the equivalence shown in this paper between the decidability of these three problems, the definition of the intermediate model DTA# and the techniques used for proving the interreductions provides a better understanding of the three problems, and in particular of the emptiness of the branching vector addition systems with states. Related work. Therearemanyotherworksintroducingautomataorlogicalformalism for data words or data trees. Some of them are shown to be decidable using counter automata, see for instance [9, 13]. The link between counter automata and data automata isnotsurprisingasthelatteronlycomparedatavaluesviaequality. Hencetheyareinvariant underpermutationofthedatadomainandtherefore,often,itisenoughtocountthenumber of data values satisfying some properties instead of knowing their precise values. FO2(<,+1,∼) ON DATA TREES 3 1. Preliminaries In this paper A or B denote finite alphabets while D denotes an infinite data domain. We use E or F when we do not care whether the alphabet is finite or not. We denote by E the # extension of an alphabet E with a new symbol # that does not occur in E. Unranked ordered data forests. We work with finite unranked ordered trees and forests over an alphabet E, defined inductively as follows: for any a ∈ E, a is a tree. If t ,··· ,t is a 1 k finite non-empty sequence of trees then t +···+t is a forest. If s is a forest and a ∈ E, 1 k then a(s) is a tree. The set of trees and forests over E are respectively denoted Trees(E) and Forests(E). A tree is called unary (resp. binary) when every node has at most one (resp. two) children. We use standard terminology for trees and forests defining nodes, roots, leaves, parents, children, ancestors, descendants, following and preceding siblings. Given a forest t ∈ Forests(E), and a node x of t, we denote by t(x) the label of x in t. We say that two forests t ∈ Forests(E ) and t ∈ Forests(E ) have the same domain 1 1 2 2 if there is a bijection from the nodes of t to the nodes of t that respects the parent 1 2 and the next-sibling relations. In this case we identify the nodes of t with the nodes 1 of t and the difference between t and t lies only in the label associated to each node. 2 1 2 Given two forests t ∈ Forests(E ), t ∈ Forests(E ) having the same domain, we define 1 1 2 2 t ⊗t ∈ Forests(E ×E ) as the forest over the same domain and such that for all nodes x, 1 2 1 2 t ⊗t (x) = ht (x),t (x)i. 1 2 1 2 The set of data forests over a finite alphabet A and an infinite data domain D is defined as Forests(A×D). Note that every t ∈ Forests(A×D) can be decomposed into a ∈ Forests(A) and d∈ Forests(D) such that t = a⊗d. Logics on data forests. A data forest of Forests(A×D) can be seen as a relational model for first order logic. The domain of the model is the set of nodes in the forest. There is a unary relation a(x) for all a ∈ A containing the nodes of label a. There is a binary relation x ∼ y containingallpairsofnodescarryingthesamedatavalueofD,andbinaryrelationsE (x,y) → (y is the sibling immediately next to x), E (x,y) (x is the parent of y), and E , E which ↓ ⇒ (cid:21) are the non reflexive transitive closures respectively of E and E , minus respectively E → ↓ → and E (i.e., they define two or more navigation steps). The reason for this non-standard ↓ definition of E and E is that it will be convenient that equality, E , E , E and E are ⇒ (cid:21) → ↓ ⇒ (cid:21) disjoint binary relations. We will often make use of the macro, x<>y, and say that x and y are incomparable, when none of x = y, E (x,y), E (x,y), E (x,y) and E (x,y) holds. → ↓ ⇒ (cid:21) Let FO2(<,+1,∼) be the set of first order sentences with two variables built on top of the above predicates. Typical examples of properties definable in FO2(<,+1,∼) are key constraints(allnodesoflabelahavedifferentdatavalues),∀x∀ya(x)∧a(y)∧x∼ y → x = y, and downward inclusion constraints (every node x of label a has a node y of label b in its subtree with the same data value), ∀x∃y a(x) → b(y)∧x ∼ y∧(E (x,y)∨E (x,y)) . (cid:21) ↓ We also consider the extension EMSO2(<,+(cid:0)1,∼) of FO2(<,+1,∼) with existen(cid:1)tially quantifiedmonadicsecondordervariables. EveryformulaofEMSO2(<,+1,∼)hastheform ∃R ...∃R φ where φ is a FO2(<,+1,∼) formula called the core, involving the variables 1 n R ,...,R as unarypredicates. Theextension to fullmonadicsecond order logic is denoted 1 n MSO(<,+1,∼). WewriteMSO(<,+1)forthesetofformulasnotusingthe∼predicates. Theseformulas are ignoring the data values, i.e., they are classical monadic second-order formulas over forests. 4 F.JACQUEMARD,L.SEGOUFIN,ANDJE´RE´MIEDIMINO Automata models for forests. We will informally refer to automata and transducers for forests and unranked trees over a finite alphabet. The particular choice of a model of automata is not relevant here and we refer to [8, Chapters 1,6,8] for a detailed description. Asetof forests accepted by anautomaton iscalled aregular language andregular languages are exactly those definable in MSO(<,+1). Automata models for data forests. Given a data forest t = b⊗d∈ Forests(B×D) and a data value d ∈ D, the class forest t[d] of t associated to the datum d is the forest of Forests(B ) # having the same domain as t and such that t[d](x) = b(x) if d(x) = d and t[d](x) = # otherwise. (a,1) a # (b,1) (b,1) b b # # (c,1)(b,1)(a,1) (a,2)(b,2)(a,1) c b a # # a # # # a b # Figure 1: A forest t followed by its class forests t[1] and t[2] We now define two models of automata over data trees. The first and most general one is a straightforward generalization to forests of the automata model over data words of [5]. The second one adds a restriction in order to avoid undecidability. General Data Forest Automata model: DTA.. A DTA is a pair (A,B) where A is a non- deterministic letter-to-letter transducer taking as input a forest in Forests(A) and returning a forest in Forests(B) with the same domain, while B is a forest automaton taking as input a forest in Forests(B ). Intuitively a DTA works as follows on a forest t = a⊗d: first the # transducer A relabels the nodes of a into b and the forest automaton B has to accept all class forests of b⊗d. More formally a data forest t = a⊗d ∈ Forests(A×D) is accepted by (A,B) iff (1) there exists b ∈ Forests(B) such that b is a possible output of A on a and, (2) for all d ∈ D, the class forest (b⊗d)[d] ∈ Forests(B ) is accepted by B. # Over data words this model was shown to be decidable [5]. Unfortunately it is undecidable over data trees. Proposition 1. Emptiness of DTA is undecidable. Proof. WeshowthatDTAcansimulate theClassAutomata of[3]. Thislatter modelhasan undecidable emptiness problem, already when restricted to data words, i.e., forests of the form ha ,d i+...+ha ,d i. It captures indeed the class of languages of words - without 1 1 m m data - recognized by counter automata. Like Data Automata, Class Automata are defined as pairs made of one transducer A and one word automaton B. However, the B part in the Class Automata model has access to the label of the nodes that are not in the class, while it sees only # in the Data Automata case. This extra power implies undecidability. WeassumetwofinitealphabetsAandB,writingthelatterinextenso asB ={b ,...,b }. 1 n A class automaton over A×D is a pair C = (A,B) where A is a non-deterministic letter-to- letter word transducer from A into B and B is a word automaton taking as input words over the alphabet B×{0,1}. In order to define the acceptance of data words by class automata, we shall use a notion of class word associated to a data word w = b⊗d and a value d∈ D, FO2(<,+1,∼) ON DATA TREES 5 denoted wJdK, defined as the word having the same domain as w and such that, for every node x of w, wJdK(x) = hb(x),1i if d(x) = d and wJdK(x) = hb(x),0i otherwise. A data word w = a⊗d is accepted by C iff (1) there exists a word b over B such that b is a possible output of A on a and, (2) for all d ∈ D, the class word (b⊗d)JdK is accepted by B. Given a class automaton C = (A,B) over A×D, we construct a DTA C′ such that C accepts a data word iff C′ accepts a data tree. The idea of the reduction is that we replace each letter b by a tree of depth i. Hence, even if b is replaced by # during the run of C′ i i (conversion to class word), this label can still be recovered. Let O be a new alphabet containing the two symbols b and #. For any symbol s and 1 ≤ i ≤ n, let si be the unary data tree of depth i defined recursively by: s1 = s and si+1 = s(si). We associate to a data word w = hb ,d i + ... + hb ,d i a data forest i1 1 im m wˆ ∈ Forests(O×D) defined by wˆ = hb,d ii1+1+...+hb,d iim+1 . 1 m From the word automaton B we(cid:0)can construct a forest automa(cid:1)ton B′ accepting exactly the set of class forests wˆ[d] such that wJdK is accepted by B, for all d∈ D. The bestway to see this is to use MSO(<,+1) logic. Thelanguage recognized by B can be defined by a formula ϕ of MSO(<,+1). The formula corresponding to B′ is constructed by replacing in ϕ each atom of the form hb ,1i by b (x) and each atom of the form hb ,0i i i i by a formula testing that x has label # and that the subtree rooted at x has depth i. From there it is now easy to construct an A′ such that the DTA (A′,B′) accepts a data forest iff the class automaton C = (A,B) accepts a data word. Restricted Data Forest Automata model: DTA#. The second data tree automata model we consider is defined as DTA with a restriction on B. The restriction makes sure that B ignores repeated and contiguous occurrences of # symbols. This ensures that for each class forest t[d], not only the automata cannot see the label of a node whose data value is not d, but also can not see the shape of subtrees of nodes whose data value differs from d. In particular it can no longer count the number of # symbols in a subtree and the undecidability proof of Proposition 1 no longer works. A set L ⊆ Forests(B) is called #-stuttering iff it is closed under the rules depicted in Figure 2. Intuitively these rules should be understood as follows: if a subforest is matched by a left-hand side of a rule (when the variables x and y are replaced by (possibly empty) forests), then replacing this subforest by the corresponding right-hand side (with the same variable replacement) yields a forest also in L, and the other way round. # ←→ # # + # ←→ # # + # ←→ # # +y ←→ y+ # # x x x x x x x x Figure 2: Closure rules for #-stuttering sets. x represents an arbitrary forest. For instance if L is #-stuttering and contains the trees t[1] and t[2] of Figure 1, then it should also contain the trees in Figure 3. Typical examples of #-stuttering languages are those testing that notwo nodes of label aoccurint[d](key constraint)orthateach nodeoflabelahasadescendantoflabelbint[d] 6 F.JACQUEMARD,L.SEGOUFIN,ANDJE´RE´MIEDIMINO a # # # # # b b # # # # # # # a b # c b a # a # # a b # # a b # a b # a b # Figure 3: Closure of {t[1],t[2]} of Figure 1. (inclusion constraint). Other typical #-stuttering languages are those defined by formulas of the form ∀x∀y a(x)∧b(y)∧x ∼ y → ¬E (x,y). Indeed the #-stuttering rules do not ⇒ affect the relationship between pairs of nodes not labeled by #. Typical examples of languages that are not #-stuttering are those counting the num- ber of nodes of label #. Note that #-stuttering languages are closed under union and intersection. We define DTA# as those DTA (A,B) such that the language recognized by B is #- stuttering. We conclude this section with the following simple lemma whose proof is a straight- forward Cartesian product construction. We use the term letter projection for a relabeling function defined as h: A′ → A, where A and A′ are alphabets. Lemma 2. The class of DTA# languages is closed under union, intersection and letter projection. 2. From FO2(<,+1,∼) to DTA# In this section we show the following result. Theorem 3. Given a formula φ in FO2(<,+1,∼), there exists a DTA#, effectively com- putable from φ, accepting exactly the set of data forests satisfying φ. Theproofworksintwosteps. Inthefirststepweprovideanormalformforsentences of FO2(<,+1,∼) that is essentially an EMSO2(<,+1,∼) formula whose core is a conjunction of simple formula of FO2(<,+1,∼). In a second step, we show that each of the conjunct can be translated into a DTA#, and we conclude using composition of these automata by intersection, see Lemma 2. 2.1. Intermediate Normal Form. Weshow firstthatevery FO2(<,+1,∼)formulaφcan be transformed into an equivalent EMSO2(<,+1,∼) formula in intermediate normal form: ∃R ···∃R χ 1 k i ^i where each χ has one of the following forms: i ∀x∀y α(x)∧β(y)∧δ(x,y) → γ(x,y) (2.1) ∀x∃y α(x) → (β(y)∧δ(x,y)∧ǫ(x,y)) (2.2) where each of α and β is a type, that is, a conjunction of unary predicates or their negation (these unary predicates are either from A or from R ,...,R , i.e., introduced by the exis- 1 k tentially quantified variables), δ(x,y) is either x ∼ y or x 6∼ y, γ(x,y) is one of ¬E (x,y), ⇒ FO2(<,+1,∼) ON DATA TREES 7 ¬E (x,y) or ¬(x<>y), and ǫ(x,y) is one of x = y, E (x,y), E (y,x), E (x,y), E (y,x), (cid:21) → → ↓ ↓ E (x,y), E (y,x), E (x,y), E (y,x), x<>y or false. ⇒ ⇒ (cid:21) (cid:21) This normal form is obtained by simple syntactical manipulation similar to the one given in [5] for the data words case, and detailed below. Scott normal form. We first transform the formula φ into Scott Normal Form obtaining an EMSO2(<,+1,∼) formula of the form: ψ = ∃R ...∃R ∀x∀y χ∧ ∀x∃y χ 1 m i ^i where χ and every χ are quantifier free, and R ,...R are new unary predicates (called i 1 m monadic). This transformation is standard: a new unary predicate R is introduced for θ each subformula θ(x) with one free variable for marking the nodes where the subformula holds. The subformula θ(x) is then replaced by R (x) and a conjunct ∀x R (x) ↔ θ(x) is θ θ added. This yields a formula in the desired normal form. (cid:0) (cid:1) From Scott to intermediate normal form. We show next that every conjunct of the core of the formula ψ in Scott Normal Form can be replaced by an equivalent conjunction of formulas of the form (2.1) or (2.2), possibly by adding new quantifications with unary predicates upfront. Case ∀x∀y χ. Recall that with our definition, the binary relations E , E , E , E ,<>and → ⇒ ↓ (cid:21) = are pairwise disjoint. Hence we can rewrite ∀x∀y χ into an equivalent FO2(<,+1,∼) formula in the following form, x = y → ψ (x,y) ∧ E (x,y) → ψ (x,y) ∧ E (x,y) → ψ (x,y) ∀x∀y = → → ⇒ ⇒ (cid:18)∧ x<>y → ψ<>(x,y) ∧ E↓(x,y) → ψ↓(x,y) ∧ E(cid:21)(x,y) → ψ(cid:21)(x,y) (cid:19) where every subformula ψ is (cid:1)quantifier free and only involves the predicate ∼ together ∗ with unary predicates. They can be obtained from χ via conjunctive normal form and De Morgan’s law. The resulting formula is equivalent to the conjunction ∀x∃y x = y∧ψ (x,y) = ∧ ∀x∃y (cid:0)¬last(x) → (E→((cid:1)x,y)∧ψ→(x,y)) ∧ ∀x∃y (cid:0)¬leaf(x) → (E↓(x,y)∧ψ↓(x,y)) (cid:1) ∧ ∀x∀y E(cid:0)⇒(x,y) → ψ⇒(x,y) (cid:1) ∧ ∀x∀y E (x,y) → ψ (x,y) (cid:21) (cid:21) ∧ ∀x∀y x<>y → ψ (x,y) <> where leaf(x) is a new predicate denoting the leaves of the forest and last(x) is also a new predicate denoting nodes having no right sibling. The predicate leaf is specified by the following formulas, that have the desired form. ∀x∀y E (x,y)∧leaf(x) → false ↓ ∀x∃y ¬leaf(x) → E (x,y) ↓ Similar formulas specify the predicate last. The first three conjuncts, with quantifier prefix ∀x∃y, will be treated later when dealing with the second case. For the next three conjuncts, putting ¬ψ , ¬ψ , ¬ψ in disjunctive normal form (with ⇒ (cid:21) <> an exponential blowup), we rewrite ψ , ψ , ψ as a conjunction of formulas of the form ⇒ (cid:21) <> ¬(α(x)∧β(y)∧δ(x,y)), where α, β, and δ are as in (2.1). By distribution of conjunction 8 F.JACQUEMARD,L.SEGOUFIN,ANDJE´RE´MIEDIMINO over implication, and by contraposition, we obtain for the 3 cases an equivalent conjunction of formulas of the following form (matching the desired form (2.1)) ∀x∀y α(x)∧β(y)∧δ(x,y) → ¬E (x,y) ⇒ ∀x∀y α(x)∧β(y)∧δ(x,y) → ¬E (x,y) (cid:21) ∀x∀y α(x)∧β(y)∧δ(x,y) → ¬(x<>y) Case ∀x∃y χ. We first transform χ (with an exponential blowup) into an equivalent dis- junction of the form χ′ = α (x)∧β (y)∧δ (x,y)∧ǫ (x,y) j j j j _j where α , β , δ and ǫ are as in (2.2). Next, in order to eliminate the disjunctions, we add j j j j a new monadic second-order variables R , that we existentially quantify upfront of the χ,j global formula, and transform ∀x∃y χ′ into the conjunction ∀x∃y (α (x)∧R (x)) → (β (y)∧δ (x,y)∧ǫ (x,y))∧∀x∃y R (x) j χ,j j j j χ,j ^j (cid:0)_j (cid:1) The first conjuncts express that if R (x) holds, then there exists a node y such that the χ,j corresponding conjunct of χ′ holds, and the last conjunct expresses that for all node x, at least one of the R (x) must hold and can be rewritten as ∀x∃y ¬R (x) → false . χ,j χ,j Now all the conjuncts are as in (2.2) and we are done. (cid:0)V (cid:1) 2.2. Case analysis for constructing DTA# from intermediate normal forms. We now show how to transform a formula in intermediate normal form into a DTA#. Let A be theinitialalphabetandletA′ bethenewalphabetformedbycombiningletters ofAwiththe newly quantified unary predicates R ,...,R . By closure of DTA# under intersection and 1 k letter projection (Lemma 2), it is enough to construct a DTA# automaton for each simple formula of the form (2.1) or (2.2), accepting the data forests in Forests(A′ ×D) satisfying the formula. We do a case analysis depending on the atoms involved in the formula of the form (2.1) or (2.2). For each case we construct a DTA# (A,B) recognizing the set of data forests satisfying the formula. The construction borrows several ideas from the data word case [5], but some extra work is needed as the tree structure is more complicated. In the discussion below, a node whose label satisfies the type α will be called an α-node. Many of the cases build on generic constructions that we described in the following remark. Remark1. ADTA# (A,B)canbeusedtodistinguishonespecificdatavalue,byrecoloring, with A, all the nodes carrying the data value, and checking, with B, the correctness of the recoloring. We will then say that (A,B) marks a data value using the new color c. This can be done as follows. The transducer A marks (i.e. relabel the node by adding to its current label an extra color) a node x with this data value with a specific new color c′. At the same time it guesses all the nodes sharing the same data value as x and marks each of them with a new color c. Then, the forest automaton B checks, for every data value, that either none of the nodes are marked with c or c′, or that all nodes not labeled with # are marked with c or c′ and that c′ occurs exactly once in the same class forest. Note that this defines a #-stuttering language. It is now clear that for the run to be accepting, A must color exactly one data value and that all the nodes carrying this data value must bemarked with c or c′. The transducer A can then build on this fact for checking other properties. FO2(<,+1,∼) ON DATA TREES 9 A generic example of the usefulness of this remark is given below. Once an arbitrary data valueismarkedwithacolorc,thenapropertyoftheform∀x∀y α(x)∧β(y)∧x 6∼ y → γ(x,y) is a conjunction of ∀x∀y α(x) ∧c(x)∧β(y)∧¬c(y) → γ(x,y) with ∀x∀y α(x)∧¬c(x)∧ β(y)∧x 6∼ y → γ(x,y). The first part, ∀x∀y α(x)∧c(x)∧β(y)∧¬c(y) → γ(x,y) is now a regular property and can therefore be tested by A. Hence it is enough to consider the case where x does not carry the marked data value. The same reasoning holds if two data values are marked or if the formula starts with a ∀x∃y quantification. We will use this fact implicitly in the case analysis below. Given a data forest, a vertical path is a set of nodes containing exactly one leaf and all its ancestors and nothing else. A horizontal path is a set of nodes containing one node together with all its siblings and nothing else. We start with formulas of the form (2.1). Case 1: ∀x∀y α(x)∧β(y)∧x ∼ y → γ(x,y), where γ(x,y) is as in (2.1). These formulas express a property of pairs of nodes with the same data value. We have seen that those are #-stuttering languages that can be tested by the forest automaton B solely (i.e., by a DTA# with A doing nothing). Case 2: ∀x∀y α(x)∧β(y)∧x 6∼ y → ¬E (x,y). This formula expresses that a data forest ⇒ cannotcontain anα-nodehavingaβ-nodewithadifferentdatavalueasasiblingtoitsright, except if it is the next-sibling. Let X be an horizontal path in a data forest t containing at least one α-node, and let x be the leftmost α-node in X. Let d be the data value of x. Consider an α-node x′ and a β-node y′ that make the formula false within X, in particular we have E (x′,y′). Then, if y′ has a data value different from d we already have E (x,y′) ⇒ ⇒ and the formula is also false for the pair (x,y′). Hence the validity of the formula within X can be tested over pairs (x′,y′) such that either x′ or y′ has data value d. With this discussion in mind we construct (A,B) as follows. In every horizontal path X containing one α-node, the transducer A identify the leftmost occurrence x of an α-node in X, and marks it with a new color c′, and marks all the nodes of X with the same data value as x with a color c. As in Remark 1, the forest automaton B checks that the guesses are correct, i.e. it accepts only forests in which every horizontal path X satisfy one of the following conditions: X contains one occurrence of the color c′ and all other nodes of X not labeled with # are marked with c, or X contains none of the colors c and c′ at all. All these properties define regular and #-stuttering languages, and hence can be checked by a forest automaton B. Assuming this, the transducer A rejects if there are some unmarked β-nodes occurring as a right sibling (except for the next-sibling) of a marked α-node or there is an unmarked α-node as left sibling, except for the previous sibling, of a marked β-node. As explained in Remark 1, this is a regular property. Case 3: ∀x∀y α(x)∧β(y)∧x 6∼ y → ¬E (x,y). The property expressed by this formula (cid:21) is similar to the previous case, replacing the right sibling relationship with the descendant relationship. Let X be a vertical path in a data forest t containing at least one α-node, and let x be the α-node in X the closest to the root. Let d be the data value of x. Consider an α-node x′ and a β-node y′ that make the formula false within X, in particular we have E (x′,y′). (cid:21) Then, if y′ has a data value different from d we already have E (x,y′) and the formula is (cid:21) 10 F.JACQUEMARD,L.SEGOUFIN,ANDJE´RE´MIEDIMINO z z z 1 2 x x 1 2 α α d d 1 2 Figure 4: Subcase 4.1 in the proof of Theorem 3. also false for the pair (x,y′). Hence the validity of the formula within X can be tested over pairs (x′,y′) such that either x′ or y′ has data value d. The construction of (A,B) is similar to the previous case, except that different vertical paths may share some nodes. The transducer A marks all the α-nodes that have no α-node as ancestor, with a new color c′. Then, for every node x marked c′, A guesses all the nodes inside the subtree rooted at x having the same data value as x and mark them with a new color c. AsinRemark1,theforestautomaton B checks thattheguessesofcolors arecorrect for each vertical path (see also the previous case). Assuming this, the transducer A rejects if there are an unmarked β-node that is a descendant, but not a child, of a marked α-node or there is an unmarked α-node as an ancestor, except for the parent, of a marked β-node. This is a regular property that can be checked by A in conjunction with the marking, following the principles of Remark 1. Case 4: ∀x∀y α(x)∧β(y)∧x 6∼ y → ¬(x<>y). The formula expresses that every two nodes of type respectively α and β and with different data values cannot be incomparable. Recall that two nodes are incomparable if they are not ancestors and not siblings. Subcase 4.1: There exist two α-nodes that are incomparable. Let x and x be two incomparable α-nodes and let z be their least common ancestor 1 2 (seeFigure4). Wecanchoosex andx suchthatnoneoftheα-nodesareincomparablewith 1 2 z or sibling of z, because if this was not the case then there is an α-node x incomparable 3 with z or sibling of z, and therefore x is incomparable with x , and we can replace x with 3 1 2 x , continuing with their least common ancestor, a nodewhich is strictly higher than z. Let 3 z and z be the children of z that are respectively ancestors of x and x . Note that by 1 2 1 2 construction, z 6= z . If x = z and there is an α-node x in the subtree of z, different 1 2 1 1 3 from x and incomparable with z , then we replace x by x and proceed. In other words 1 2 1 3 we ensure that if x = z then there is no α-node incomparable with z in the subtree of z. 1 1 2 We proceed similarly to enforce that if x = z then there is no α-node incomparable with 2 2 z in the subtree of z. Notice that we cannot have at the same time x = z and x = z 1 1 1 2 2 because we assumed x and x to be incomparable. All these properties can be specified 1 2 in MSO(<,+1) and therefore can be tested by a forest automaton. Let d and d be the 1 2 respective data values of x and x (possibly d = d ). 1 2 1 2 Consider now a β-node y whose data value is neither d nor d . If y is incomparable 1 2 with z or sibling of z, then the formula cannot be true as it is contradicted by (x ,y). If y 1 is an ancestor of z then, as no α-node is incomparable with z, none is incomparable with y. Hence the formula can only be true with such y. Assume now that y is inside the subtree of z. If y = z and x 6= z , then the formula is contradicted by (x ,y). If y = z and 1 2 2 2 1

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.