ebook img

Improving Strategies via SMT Solving PDF

0.28 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Improving Strategies via SMT Solving

Improving Strategies via SMT Solving ∗ Thomas Martin Gawlitza David Monniaux† January 17, 2011 1 1 0 2 n Abstract a J We consider the problem of computing numerical invariants of programs by 4 abstractinterpretation. Ourmethodeschewstwotraditionalsourcesofimprecision: 1 (i)theuseofwideningoperatorsforenforcingconvergencewithinafinitenumberof iterations(ii)theuseofmergeoperations(often,convexhulls)atthemergepointsof ] L thecontrolflowgraph. Itinsteadcomputestheleastinductiveinvariantexpressible P in the domain at a restricted set of program points, and analyzes the rest of the . code en bloc. We emphasize that we compute this inductive invariant precisely. s c For thatwe extendthe strategyimprovementalgorithmofGawlitzaandSeidl [17]. [ If we applied their method directly, we would have to solve an exponentially sized 1 system of abstract semantic equations, resulting in memory exhaustion. Instead, v wekeepthesystemimplicitanddiscoverstrategyimprovementsusingSATmodulo 2 reallineararithmetic (SMT). For evaluatingstrategieswe use linear programming. 1 Our algorithm has low polynomial space complexity and performs for contrived 8 2 examples in the worst case exponentially many strategy improvement steps; this . is unsurprising, since we show that the associated abstract reachability problem is 1 Πp-complete. 0 2 1 1 1 Introduction : v i X Motivation Staticprogramanalysisattemptstoderivepropertiesabouttherun-time r behavior of a program without running the program. Among interesting properties are a the numerical ones: for instance, that a given variable x always has a value in the range [12,41] when reaching a given program point. An analysis solely based on such interval relationsatallprogrampointsisknownasinterval analysis [11]. Morerefinednumerical analyses include, for instance, finding for each program point an enclosing polyhedron for the vector of program variables [13]. In addition to obtaining facts about the values of numerical program variables, numerical analyses are used as building blocks for e.g. pointer and shape analyses. (cid:1) ∗ This work was partially funded bythe ANRproject “ASOPT”. (cid:1)(cid:2)(cid:3)(cid:4)(cid:5) †Both authors from CNRS (Centre national de la recherche scientifique), VERIMAG laboratory. VERIMAG is a joint laboratory of CNRS and Universit´e Joseph Fourier (Grenoble). Email first- [email protected] 1 However, by Rice’s theorem, only trivial properties can be checked automatically [26]. In order to check non-trivial properties we are usually forced to use abstractions. A systematic way for inferring properties automatically w.r.t. a given abstraction is given through the abstract interpretation framework of Cousot and Cousot [12]. This framework safely over-approximates the run-time behavior of a program. When using the abstract interpretation framework, we usually have two sources of imprecision. The first source of imprecision is the abstraction itself: for instance, if the property to be proved needs a non-convex invariant to be established, and our abstraction can only represent convex sets, then we cannot prove the property. Take for instance the C-code y = 0; if (x <= −1 || x >= 1) { if (x == 0) y = 1; }. No matter what the values of the variables x and y are before the execution of the above C-code, after the execution the value of y is 0. The invariant |x| ≥ 1 in the “then” branch is not convex, and its convex hull includes x = 0. Any static analysis method that computes a convex invariant in this branch will thus also include y = 1. In contrast, our method avoids enforcing convexity, except at the heads of loops. The second source of imprecision are the safe but imprecise methods that are used for solving the abstract semantic equations that describe the abstract semantics: such methodssafelyover-approximateexactsolutions,butdonotreturnexactsolutionsinall cases. The reason is that we are concerned with abstract domains that contain infinite ascendingchains,inparticularifweareinterested innumericalproperties: thecomplete latticeofalln-dimensionalclosedrealintervals, usedforintervalanalysis,isanexample. The traditional methods are based on Kleene fixpoint iteration which (purely applied) is not guaranteed to terminate in interesting cases. In order to enforce termination (for the price of imprecision) traditional methods make use of the widening/narrowing approach of Cousot and Cousot [12]. Grossly, widening extrapolates the first iterations of a sequence to a possiblelimit, butcan easily overshoot the desired result. In order to avoidthis,varioustricksareused,including“wideningupto”[27,Sec.3.2],“delayed”or with “thresholds” [6]. However, these tricks, although they may help in many practical cases, are easily thwarted. Gopan and Reps [25] proposed “lookahead widening”, which discovers new feasible paths and adapts widening accordingly; again this method is no panacea. Furthermore, analyses involving widening are non-monotonic: stronger preconditions can lead to weaker invariants being automatically inferred; a rather non- intuitive behaviour. Since our method does not use widening at all, it avoids these problems. Our Contribution We fight both sources of imprecision noted above: • In order to improve the precision of the abstraction, we abstract sequences of if-then-else statements without loops en bloc. In the above example, we are then able to conclude that y 6= 0 holds. In other words: we abstract sets of states only at the heads of loops, or, more generally, at a cut-set of the control-flow graph (a cut-set is a set of program points such that removing them would cut all loops). • Our main technical contribution consists of a practical method for precisely com- puting abstract semantics of affine programs w.r.t. the template linear constraint 2 domains of Sankaranarayanan et al. [42], with sequences of if-then-else state- ments which do not contain loops abstracted en bloc. Our method is based on a strict generalization of the strategy improvement algorithm of Gawlitza and Seidl [17, 18, 21]. The latter algorithm could be directly applied to the problem we solve in this article, but the size of its input would be exponential in the size of the program, because we then need to explicitly enumerate all program paths between cut-nodes which do not cross other cut-nodes. In this article, we give an algorithm with low polynomial memory consumption that uses exponential time in the worst case. The basic idea consists in avoiding an explicit enumeration of all paths through sequences of if-then-else-statements which do not contain loops. Instead we use a SAT modulo real linear arithmetic solver for improving the cur- rentstrategy locally. Forevaluatingeach strategyencounteredduringthestrategy iteration, we use linear programming. • As a byproduct of our considerations we show that the corresponding abstract p p reachability problem is Π -complete. In fact, we show that it is Π -hard even if 2 2 the loop invariant being computed consists in a single x ≤ C inequality where x is a program variable and C is the parameter of the invariant. Hence, exponential worst-case running-time seems to be unavoidable. Related Work Recently, several alternative approaches for computing numerical in- variants (for instance w.r.t. to template linear constraints) were developed: Strategy Iteration Strategy iteration (also called policy iteration) was introduced by Howardforsolvingstochasticcontrolproblems[29,40]andisalsoappliedtotwo-players zero-sum games [28, 39, 45] or min-max-plus systems [7]. Adj´e et al. [2], Costan et al. [9], Gaubert et al. [16] developed a strategy iteration approach for solving the abstract semantic equations that occur in static program analysis by abstract interpretation. Their approach can be seen as an alternative to the traditional widening/narrowing approach. The goal of their algorithm is to compute least fixpoints of monotone self- maps f, where f(x) = min {π(x) | π ∈ Π} for all x and Π is a family of self-maps. The assumption is that one can efficiently compute the least fixpoint µπ of π for ev- ery π ∈ Π. The π’s are the (min-)strategies. Starting with an arbitrary min-stratgy π(0), the min-strategy is successively improved. The sequence (π(k)) of attained min- k strategies results in a decreasing sequence µπ(0) > µπ(1) > ··· > µπ(k) that stabilizes, whenever µπ(k) is a fixpoint of f — not necessarily the least one. However, there are indeed important cases, where minimality of the obtained fixpoint can be guaranteed [1]. Moreover, an important advantage of their algorithm is that it can be stopped at any time with a safe over-approximation. This is in particular interesting if there are infinitely many min-strategies [2]. Costan et al. [9] showed how to use their framework for performing interval analysis without widening. Gaubert et al. [16] extended this work to the following relational abstract domains: The zone domain [33], the octagon domain [34] and in particular the template linear constraint domains [42]. Gawlitza and Seidl [17] presented a practical (max-)strategy improvement algorithm for com- puting least solutions of systems of rational equations. Their algorithm enables them 3 to perform a template linear constraint analysis precisely — even if the mappings are not non-expansive. This means: Their algorithm always computes least solutions of abstract semantic equations — not just some solutions. Acceleration Techniques Gonnord [23], Gonnord and Halbwachs [24] investigated an improvement of linear relation analysis that consists in computing, when possible, the exact (abstract) effect of a loop. The technique is fully compatible with the use of widening, and whenever it applies, it improves both the precision and the performance of the analysis. Gawlitza et al. [20], Leroux and Sutre [31] studied cases where interval analysis can be done in polynomial time w.r.t. a uniform cost measure, where memory accesses and arithmetic operations are counted for O(1). Quantifier Elimination Recent improvements in SAT/SMT solving techniques have made it possible to perform quantifier elimination on larger formulas [36]. Monniaux [37] developed an analysis method based on quantifier elimination in the theory of rational linear arithmetic. Thismethodtargets thesamedomainsas thepresentarticle; it however produces a richer result. It can not only compute the least invariant inside theabstractdomainofaloop, butalsoexpressitasafunctionof thepreconditionofthe loop; the method outputs the source code of the optimal abstract transformer mapping the precondition to the invariant. Its drawback is its high cost, which makes it practical onlyonsmallcodefragments;thus,itsintendedapplicationismodular analysis: analyze very precisely small portions of code (functions, modules, nodes of a reactive data-flow program, ...), and use the results for analyzing larger portions, perhaps with another method, including the method proposed in this article. Mathematical Programming Colo´n et al. [8], Cousot [10], Sankaranarayanan et al. [41] presented approaches for generating linear invariants that uses non-linear constraint solving. Leconte et al. [30] propose a mathematical programming formulation whose constraints define the space of all post-solutions of the abstract semantic equations. The objective function aims at minimizing the result. For programs that use affine assignments and affine guards, only, this yields a mixed integer linear programming formulation for interval analysis. The resulting mathematical programming problems can thenbesolved toguaranteed global optimality bymeans of generalpurposebranch- and-bound type algorithms. 2 Basics Notations B = {0,1} denotes the set of Boolean values. The set of real numbers is denoted by R. The complete linearly ordered set R∪{−∞,∞} is denoted by R. We call two vectors x,y ∈ Rn comparable iff x ≤ y or y ≤ x holds. For f : X → Rm with X ⊆ Rn, we set dom(f) := {x ∈ X | f(x) ∈ Rm} and fdom(f) := dom(f)∩Rn. We denotethei-throw(resp.thej-thcolumn)ofamatrixAbyA (resp.A ). Accordingly, i· ·j A denotes the component in the i-th row and the j-th column. We also use this i·j notation for vectors and mappings f : X → Yk. Assume that a fixed set X of variables and a domain D is given. We consider equations of the form x = e, where x ∈ X is a variable and e is an expression over D. 4 A system E of (fixpoint) equations is a finite set {x = e ,...,x = e } of equations, 1 1 n n where x ,...,x are pairwise distinct variables. We denote the set {x ,...,x } of 1 n 1 n variables occurring in E by X . We drop the subscript whenever it is clear from the E context. For a variable assignment ρ : X → D, an expression e is mapped to a value JeKρ by setting JxKρ := ρ(x) and Jf(e ,...,e )Kρ := f(Je Kρ,...,Je Kρ), where x ∈ X, f is 1 k 1 k a k-ary operator, for instance +, and e ,...,e are expressions. Let E be a system of 1 k equations. We define the unary operator JEK on X → D by setting (JEKρ)(x) := JeKρ for all x = e ∈ E. A solution is a variable assignment ρ such that ρ = JEKρ holds. The set of solutions is denoted by Sol(E). Let D bea complete lattice. We denote the least upper bound and the greatest lower bound of a set X ⊆ D by X and X, respectively. The least element ∅ (resp. the W V W greatest element ∅) is denoted by ⊥ (resp. ⊤). We define the binary operators ∨ and ∧byx∨y := {xV,y}andx∧y := {x,y}for allx,y ∈ D, respectively. For (cid:3) ∈{∨,∧}, we will also cWonsider x (cid:3) ··· (cid:3) Vx as the application of a k-ary operator. This will 1 k causenoproblems,sincethebinaryoperators∨and∧areassociative andcommutative. An expression e (resp. an equation x= e) is called monotone iff all operators occurring in e are monotone. The set X → D of all variable assignments is a complete lattice. For ρ,ρ′ :X → D, we write ρ⊳ρ′ (resp. ρ⊲ρ′) iff ρ(x) < ρ′(x) (resp. ρ(x) > ρ′(x)) holds for all x ∈ X. For d ∈ D, d denotes the variable assignment {x 7→ d | x ∈ X}. A variable assignment ρ with ⊥ ⊳ ρ ⊳ ⊤ is called finite. A pre-solution (resp. post-solution) is a variable assignment ρ such that ρ ≤ JEKρ (resp. ρ ≥ JEKρ) holds. The set of all pre-solutions (resp. the set of all post-solutions) is denoted by PreSol(E) (resp. PostSol(E)). The least fixpoint (resp. the greatest fixpoint) of an operator f : D → D is denoted by µf (resp. νf), provided that it exists. Thus, the least solution (resp. the greatest solution) of a system E of equations is denoted by µJEK (resp. νJEK), provided that it exists. For a pre-solution ρ (resp. for a post-solution ρ), µ JEK (resp. ν JEK) denotes the ≥ρ ≤ρ least solution that is greater than or equal to ρ (resp. the greatest solution that is less than or equal to ρ). From Knaster-Tarski’s fixpoint theorem we get: Every system E of monotone equations over a complete lattice has a least solution µJEK and a greatest solution νJEK. Furthermore, µJEK= PostSol(E) and νJEK= PreSol(E). V W Linear Programming We consider linear programming problems (LP problems for short) of the form sup {c⊤x | x ∈ Rn,Ax ≤ b}, where A ∈ Rm×n, b ∈ Rm, and c ∈ Rn are the inputs. The convex closed polyhedron {x ∈ Rn | Ax ≤ b} is called the feasible space. The LP problem is called infeasible iff the feasible space is empty. An element of the feasible space, is called feasible solution. A feasible solution x that maximizes c⊤x is called optimal solution. LP problems can be solved in polynomial time through interior point methods [32, 43]. Note, however, that the running-time then crucially depends on the sizes of occurring numbers. At the danger of an exponential running-time in contrived cases, we can also instead rely on the simplex algorithm: its running-time is uniform, i.e., 5 independent of the sizes of occurring numbers (given that arithmetic operations, com- parison, storage and retrieval for numbers are counted for O(1)). SAT modulo real linear arithmetic The set of SAT modulo real linear arithmetic formulas Φ is defined through the grammar e ::= c | x | e + e | c · e′, Φ ::= a | 1 2 e ≤ e | Φ ∨Φ | Φ ∧Φ | Φ′. Here, c ∈ R is a constant, x is a real valued variable, 1 2 1 2 1 2 e,e′,e ,e are real-valued linear expressions, a is a Boolean variable and Φ,Φ′,Φ ,Φ 1 2 1 2 are formulas. An interpretation I for a formula Φ is a mappingthat assigns a real value to every real-valued variable and a Boolean value to every Boolean variable. We write I |= Φ for “I is a model of Φ”, i.e., JcKI = c, JxK = I(x), Je +e KI = Je KI +Je KI, 1 2 1 2 Jc·e′KI = c·Je′KI, and: I |= a ⇐⇒ I(a) = 1 I |= e ≤ e ⇐⇒ Je KI ≤ Je KI 1 2 1 2 I |= Φ ∨Φ ⇐⇒ I |= Φ or I |= Φ I |= Φ ∧Φ ⇐⇒ I |= Φ and I |= Φ 1 2 1 2 1 2 1 2 I |= Φ′ ⇐⇒ I 6|= Φ′ A formula is called satisfiable iff it has a model. The problem of deciding, whether or not a given SAT modulo real linear arithmetic formula is satisfiable, is NP-complete. There nevertheless exist efficient solver implementations for this decision problem [15]. In order to simplify notations we also allow matrices, vectors, the operations ≥, <,>,6=,=, and the Boolean constants 0 and 1 to occur. Collecting and Abstract Semantics The programs that we consider in this article use real-valued variables x ,...,x . Accordingly, we denote by x = (x ,...,x )⊤ the 1 n 1 n vector of all program variables. For simplicity, we only consider elementary statements of the form x := Ax +b, and Ax ≤ b, where A ∈ Rn×n (resp. Rk×n), b ∈ Rn (resp. Rk), and x ∈ Rn denotes the vector of all program variables. Statements of the form x := Ax+b are called (affine) assignments. Statements of the form Ax ≤ b are called (affine)guards. Additionally, weallowstatementsoftheforms ;··· ;s ands | ··· | s , 1 k 1 k where s ,...,s are statements. The operator ; binds tighter than the operator |, and 1 k we consider ; and | to be right-associative, i.e., s | s | s stands for s | (s | s ), and 1 2 3 1 2 3 s ;s ;s stands for s ;(s ;s ). The set of statements is denoted by Stmt. A statement 1 2 3 1 2 3 of the form s | ··· | s , where s does not contain the operator | for all i = 1,...,k, is 1 k i called merge-simple. A merge-simple statement s that does not use the | operator at all is called sequential. A statement is called elementary iff it neither contains the operator | nor the operator ;. Rn Rn The collecting semantics JsK:2 → 2 of a statement s ∈Stmt is defined by Jx := Ax+bKX := {Ax+b |x ∈ X}, JAx ≤ bKX := {x ∈ X | Ax ≤ b}, Js ;··· ;s K := Js K◦···◦Js K Js |··· |s KX := Js KX ∪···∪Js KX 1 k k 1 1 k 1 k for X ⊆ Rn. Note that the operators ; and | are associative, i.e., J(s ;s );s K = 1 2 3 Js ;(s ;s )K and J(s | s )| s K = Js | (s |s )K hold for all statements s ,s ,s . 1 2 3 1 2 3 1 2 3 1 2 3 An(affine) program Gisatriple(N,E,st),whereN isafinitesetofprogram points, E ⊆ N ×Stmt×N is a finite set of control-flow edges, and st ∈N is the start program 6 st x :=0 x :=−2x x1 :=−x +1 1 1 1 1 st 1 x ≤1000 x :=0 1 1 4 2 5 1 x2 :=−x1 x1 ≤1000;x2 :=−x1; 3 x2 ≤−1 x2 ≥0 (x2 ≤−1;x1 :=−2x1 |x2 ≥0;x1 :=−x1+1) (a) (b) Figure 1: point. As usual, the collecting semantics V of a program G = (N,E,st) is the least solution of the following constraint system: V[st] ⊇ Rn V[v] ⊇ JsK(V[u]) for all (u,s,v) ∈ E Rn Here, the variables V[v], v ∈ N take values in 2 . The components of the collecting semantics V are denoted by V[v] for all v ∈ N. Let D be a complete lattice (for instance the complete lattice of all n-dimensional closed real intervals). Let the partial order of D be denoted by ≤. Assume that α : 2Rn → D and γ :D → 2Rn form a Galois connection, i.e., for all X ⊆ Rn and all d∈ D, α(X) ≤ d iff X ⊆ γ(d). The abstract semantics JsK♯ : D → D of a statement s is defined by JsK♯ := α◦JsK◦γ. The abstract semantics V♯ of an affine program G= (N,E,st) is the least solution of the following constraint system: V♯[st]≥ α(Rn) V♯[v] ≥ JsK♯(V♯[u]) for all (u,s,v) ∈ E Here, the variables V♯[v], v ∈ N take values in D. The components of the abstract semantics V♯ are denoted by V♯[v] for all v ∈ N. The abstract semantics V♯ safely over-approximates the collecting semantics V, i.e., γ(V♯[v]) ⊇ V[v] for all v ∈N. Using Cut-Sets to improve Precision Usually, only sequential statements (these statements correspond to basic blocks) are allowed in control flow graphs. However, given a cut-set C, one can systematically transform any control flow graph G into an equivalent control flow graph G′ of our form (up to the fact that G′ has fewer program points than G) with increased precision of the abstract semantics. However, for the sake of simplicity, we do not discuss these aspects in detail. Instead, we consider an example: Example 1 (Using Cut-Sets to improve Precision). As a running example throughout the present article we use the following C-code: int x 1 , x 2; x 1 = 0; while (x 1 <= 1000) { x 2 = −x 1; if (x 2 < 0) x 1 = −2 ∗ x 1; else x 1 = −x 1 + 1; } This C-code is abstracted through the affine program G = (N ,E ,st) which is shown 1 1 1 in Figure 1.(a). However, it is unnecessary to apply abstraction at every program point; it suffices to apply abstraction at a cut-set of G . Since all loops contain program point 1 1, a cut-set of G is {1}. Equivalent to applying abstraction only at program point 1 1 7 is to rewrite the control-flow graph w.r.t. the cut-set {1} into a control-flow graph G equivalent w.r.t. the collecting semantic. The result of this transformation is drawn in Figure 1.(b). This means: the affine program for the above C-code is G = (N,E,st), where N = {st,1},E = {(st,x := 0,1),(1,s,1)}, and 1 s′ = x ≤ 1000;x := −x s = x ≤ −1;x := −2x 1 2 1 1 2 1 1 s = −x ≤ 0;x := −x +1 s= s′;(s | s ) 2 2 1 1 1 2 Let V denote the collecting semantics of G and V denote the collecting semantics of 1 1 G. G and G are equivalent in the following sense: V[v] = V [v] holds for all program 1 1 points v ∈ N. W.r.t. the abstract semantics, G is, is we will see, strictly more precise than G . In general we at least have V♯[v] ⊆ V♯[v] for all program points v ∈ N. This 1 1· is independent of the abstract domain.1 Template Linear Constraints In the present article we restrict our considerations to template linear constraint domains [42]. Assume that we are given a fixed template constraint matrix T ∈ Rm×n. The template linear constraint domain is Rm. As shown by Sankaranarayanan et al. [42], the concretization γ : Rm → 2Rn and the abstraction α :2Rn → Rm, which are defined by γ(d) := {x ∈ Rn |Tx ≤ d} ∀d∈ Rm, α(X) := {d ∈ Rm | γ(d) ⊇ X} ∀X ⊆ Rn, V form a Galois connection. The template linear constraint domains contain intervals, zones, and octagons, with appropriate choices of the template constraint matrix [42]. In a first stage we restrict our considerations to sequential and merge-simple state- ments. Even for these statements we avoid unnecessary imprecision, if we abstract such statements en bloc instead of abstracting each elementary statement separately: Example 2. Inthisexampleweusetheintervaldomainasabstractdomain,i.e.,ourcom- plete lattice consists of all n-dimensional closed real intervals. Our affine program will use2variables, i.e., n = 2. Thecompletelattice ofall2-dimensionalclosedrealintervals canbespecifiedthroughthetemplateconstraintmatrixT = −I I ⊤ ∈R4×2,whereI (cid:0) (cid:1) denotes the identity matrix. Consider the statements s = x := x , s = x := x −x , 1 2 1 2 1 1 2 ands = s ;s andtheabstractvalueI = [0,1]×R(a2-dimensionalclosedrealinterval). 1 2 TheintervalI canw.r.t.T beidentifiedwiththeabstractvalue(0,∞,1,∞)⊤. Moregen- erally, w.r.t. T every 2-dimensional closed real interval [l ,u ]×[l ,u ] can be identified 1 1 2 2 with the abstract value (−l ,−l ,u ,u )⊤. If we abstract each elementary statement 1 2 1 2 separately, then we in fact use Js K♯◦Js K♯ instead of JsK♯ to abstract the collecting se- 2 1 mantics JsKofthestatement s = s ;s . Thefollowing calculation showsthatthiscan be 1 2 important: JsK♯I = [0,0]×[0,1] 6= [−1,1]×[0,1] = Js K♯([0,1]×[0,1]) = (Js K♯◦Js K♯)I. 2 2 1 The imprecision is caused by the additional abstraction. We lose the information that the values of the program variables x and x are equal after executing the first state- 1 2 ment. 1We assume that we have given a Galois-connection and thus in particular monotone best abstract transformers. 8 Another possibility for avoiding unnecessary imprecision in the above example would consist in adding additional rows to the template constraint matrix. Although this works for the above example, it does not work in general, since still only convex sets can be described, but sometimes non-convex sets are required (cf. with the example in the introduction). Provided that s is a merge-simple statement, JsK♯d can be computed in polynomial time through linear programming: Lemma 3 (Merge-Simple Statements). Let s be a merge-simple statement and d∈ Rm. Then JsK♯d can be computed in polynomial time through linear programming. However, the situation for arbitrary statements is significantly more difficult, since, by reducing SAT to the corresponding decision problem, we can show the following: Lemma 4. The problem of deciding, whether or not, for a given template constraint matrix T, and a given statement s, JsK♯∞ > −∞ holds, is NP-complete. Before proving the above lemma, we introduce ∨-strategies for statements as follows: Definition 1 (∨-Strategies for Statements). A ∨-strategy σ for a statement s is a function that maps every position of a |-statement, (a statement of the form s | s ) 0 1 within s to 0 or 1. The application sσ of a ∨-strategy σ to a statement s is inductively defined by sσ = s, (s | s )σ = s σ, and (s ;s )σ = (s σ;s σ), where s is 0 1 σ(pos(s0|s1)) 0 1 0 1 an elementary statement, and s ,s are arbitrary statements. For all occurrences s′, 0 1 pos(s′) denotes the position of s′, i.e., pos(s′) identifies the occurrence. Proof. Firstly, we show containment in NP. Assume JsK♯∞ > −∞. There exists some k such that the k-th component of JsK♯∞ is greater than −∞. We choose k non- deterministically. There exists a ∨-strategy σ for s such that the k-th component of JsσK♯∞ equals the k-th component of JsK♯∞. We choose such a ∨-strategy non- deterministically. By Lemma 3, we can check in polynomial time, whether the k-th component of JsσK♯∞ is greater than −∞. If this is fulfilled, we accept. Inordertoshow NP-hardness,wereducetheNP-hardproblemSAT toourproblem. Let Φ be a propositional formula with n variables. W.l.o.g. we assume that Φ is in normal form, i.e., there are no negated sub-formulas that contain ∧ or ∨. We define the statement s(Φ) that uses the variables of Φ as program variables inductively by s(z) := z = 1,s(z) := z = 0,s(Φ ∧Φ ):= s(Φ );s(Φ ),ands(Φ ∨Φ ):= s(Φ ) | s(Φ ), 1 2 1 2 1 2 1 2 where z is a variable of Φ, and Φ ,Φ are formulas. Here, the statement Ax = b is an 1 2 abbreviation for the statement Ax ≤ b;−Ax ≤ −b. The formula Φ is satisfiable iff Js(Φ)KRn 6= ∅ holds. Moreover, even if we just use the interval domain, Js(Φ)KRn 6= ∅ holdsiffJs(Φ)K♯∞ > −∞holds. Thus,ΦissatisfiableiffJs(Φ)K♯∞ > −∞holds. Obviously, J(s | s );sK = Js ;s | s ;sK and Js;(s | s )K = Js;s | s;s K for all state- 1 2 1 2 1 2 1 2 ments s,s ,s . We can transform any statement s into an equivalent merge-simple 1 2 statement s′ using these rules. We denote the merge-simple statement s′ that is ob- tained from an arbitrary statement sby applyingtheabove rules in somecanonical way by [s]. Intuitively, [s] is an explicit enumeration of all paths through the statement s. 9 Lemma 5. For every statement s, [s] is merge-simple, and JsK = J[s]K. The size of [s] is at most exponential in the size of s. However,intheworstcase,thesizeof[s]isexponentialinthesizeofs. Forthestatement s = (s(11) | s(12));··· ;(s(k1) |s(k2)),forinstance,weget[s]= |(a1,...,ak)∈{1,2}ks1(a1);··· ;sk(ak). After replacing all statements s with [s] it is in principle possible to use the methods of Gawlitza and Seidl [17] in order to compute the abstract semantics V♯ precisely. Because of the exponential blowup, however, this method would be impractical in most cases. 2 Our new method that we are going to present avoids this exponential blowup: in- stead of enumerating all program paths, we shall visit them only as needed. Guided by a SAT modulo real linear arithmetic solver, our method selects a path through s only when it is locally profitable in some sense. In the worst case, an exponential number of paths may be visited (Section 7); but one can hope that this does not happen in many practical cases, in the same way that SAT and SMT solving perform well on many practical cases even though they in principle may visit an exponential number of cases. Abstract Semantic Equations The first step of our method consists of rewriting our program analysis problem into a system of abstract semantic equations that is interpreted over the reals. For that, let G = (N,E,st) be an affine program and V♯ its abstract semantics. We define the system C(G) of abstract semantic inequalities to be the smallest set of inequalities that fulfills the following constraints: • C contains the inequality xst,i ≥ αi·(Rn) for every i ∈ {1,...,m}. • C contains the inequality x ≥ JsK♯(x ,...,x ) for every control-flow edge v,i i· u,1 u,m (u,s,v) ∈ E and every i ∈ {1,...,m}. WedefinethesystemE(G)ofabstract semantic equations byE(G) := E(C(G)). Here,for a system C′ = {x ≥ e ,...,x ≥ e ,...,x ≥ e ,...,x ≥ e } of inequalities, 1 1,1 1 1,k1 n n,1 n n,kn E(C′)isthesystemE(C′)= {x = e ∨···∨e ,...,x =e ∨···∨e }ofequations. 1 1,1 1,k1 n n,1 n,kn The system E(G) of abstract semantic equations captures the abstract semantics V♯ of G: Lemma 6. (V♯[v]) = µJE(G)K(x ) for all program points v, i∈ {1,...,m}. i· v,i Example 7 (Abstract Semantic Equations). We again consider theprogram G of Exam- ple1. Assumethatthetemplate constraintmatrix T ∈ R2×2 isgiven byT = (1,0) and 1· T = (−1,0). Let V♯ denote the abstract semantics of G. Then V♯[1] = (2001,2000)⊤. 2· E(G) consists of the following abstract semantic equations: xst,1 = ∞ x1,1 = Jx1 := 0K♯1·(xst,1,xst,2)∨JsK♯1·(x1,1,x1,2) xst,2 = ∞ x1,2 = Jx1 := 0K♯2·(xst,1,xst,2)∨JsK♯2·(x1,1,x1,2) 2 Note that we cannot expect a polynomial-time algorithm, because of Lemma 4: even without loops, abstract reachability is NP-hard. Even if all statements are merge-simple, we cannot expect a polynomial-time algorithm, since the problem of computing the winning regions of parity games is polynomial-time reducible to abstract reachability [19]. 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.