A bounded degree SOS hierarchy for polynomial optimization∗ Jean B. Lasserre†, Kim-Chuan Toh‡ and Shouguang Yang§ June 29, 2015 5 1 0 2 n Abstract u J We consider a new hierarchy of semidefinite relaxations for the general polynomial op- ∗ 6 timization problem (P) : f = min{f(x) : x ∈ K} on a compact basic semi-algebraic set 2 K ⊂ Rn. This hierarchy combines some advantages of the standard LP-relaxations associ- atedwithKrivine’spositivitycertificateandsomeadvantagesofthestandardSOS-hierarchy. ] C In particular it has the following attractive features: (a) In contrast to the standard SOS- hierarchy, for each relaxation in the hierarchy, the size of the matrix associated with the O semidefinite constraint is the same and fixed in advance by the user. (b) In contrast to the . h LP-hierarchy, finite convergence occurs at the first step of the hierarchy for an important at class of convex problems. Finally (c) some important techniques related to the use of point m evaluations for declaring a polynomial to be zero and to the use of rank-one matrices make anefficientimplementationpossible. Preliminaryresultsonasampleofnonconvexproblems [ are encouraging. 2 v 6 1 Introduction 2 1 6 We consider the polynomial optimization problem: 0 1. (P) : f∗ = min{f(x) : x ∈ K} (1) 0 x 5 where f ∈ R[x] is a polynomial and K ⊂Rn is the basic semi-algebraic set 1 : v K = {x ∈ Rn : g (x) ≥ 0, j = 1,...,m}, (2) i j X r for some polynomials gj ∈ R[x], j = 1,...,m. In order to approximate (and sometimes solve a exactly) (P) one may instead solve a hierarchy of convex relaxations of (P) of increasing sizes, namely for instance: • Semidefinite relaxations based on Putinar’s certificate of positivity on K [24], where the d-th convex relaxation of the hierarchy is a semidefinite program given by m γ =max t : f −t = σ + σ g . (3) d 0 j j t,σj Xj=1 ∗Thework ofthefirstauthorispartiallysupportedbyaPGMO grant fromFondation Math´ematique Jacques Hadamard, and a grant from the European Research Council (ERC-AdG: TAMING project). †LAAS-CNRS and Institute of Mathematics, University of Toulouse, LAAS, 31031 Toulouse c´edex 4, France ([email protected]). ‡DepartmentofMathematics,NationalUniversityofSingapore,10LowerKentRidgeRoad,Singapore119076 ([email protected]). §DepartmentofMathematics,NationalUniversityofSingapore,10LowerKentRidgeRoad,Singapore119076 ([email protected]). 1 The unknowns σ are sums of squares (SOS) polynomials with the degree bound constraint, j degree(σ g ) ≤ 2d, j = 0,...,m, and the expression in (3) is a certificate of positivity on K for j j the polynomial x 7→ f(x)−t. • LP-relaxations based on Krivine-Stengle’s certificate of positivity on K [15, 27], where the d-th convex relaxation of the hierarchy is a linear program given by m θ = max t : f −t = λ gαj (1−g )βj , (4) d λ≥0,t αβ j j (α,βX)∈N2dm jY=1(cid:16) (cid:17) where N2m = {(α,β) ∈ N2m: α +β ≤ d}. The unknown are t and thenonnegative scalars d j j j λ = (λ ), and it is assumed that 0 ≤ g ≤ 1 on K (possibly after scaling) and the family {1,g } αβ j j P generates the algebra R[x] of polynomials. Problem (4) is an LP because stating that the two polynomials in both sides of “=” are equal yields linear constraints on the λ ’s. For instance, αβ the LP-hierarchy from Sherali-Adams RLT [25] and their variants [26] are of this form. In both cases, (γ ) and (θ ), d∈ N, provide two monotone nondecreasing sequences of lower d d bounds on f∗ and if K is compact, then both converge to f∗ as one lets d increases. For more details as well as a comparison of such relaxations, the reader is referred to e.g. Lasserre [20, 17] andLaurent[21], as wellas Chlamtac andTulsiani[7]for theimpactof LP-andSOS-hierarchies on approximation algorithms in combinatorial optimization. Of course, in principle, one would much prefer to solve LP-relaxations rather than semidefi- niterelaxations (i.e. computeθ rather than γ )becausepresentLP-software packages can solve d d sparse problems with millions of variables and constraints, which is far from being the case for today’s semidefinite solvers. And so the hierarchy (3) applies to problems of modest size only unless some sparsity or symmetry is taken into account in which case specialized variants can handle problems of much larger size; see e.g. Waki et al. [30]. However, on the other hand, the LP-relaxations (4) suffer from several serious theoretical and practical drawbacks. For instance, it has been shown in [17, 20] that the LP-relaxations cannot be exact for most convex problems, i.e., the sequence of the associated optimal values converges to the global optimum only asymp- totically and not in finitely many steps. Moreover, the LPs of the hierarchy are numerically ill-conditioned. This is in contrast with the semidefinite relaxations (3) for which finite con- vergence takes place for convex problems where ∇2f(x∗) is positive definite at every minimizer x∗ ∈ K (see de Klerk and Laurent [9, Corollary 3.3]) and occurs at the first relaxation for SOS-convex1 problems [19, Theorem 3.3]. In fact, as demonstrated in recent works of Marshall [22] and Nie [23], finite convergence is generic even for non convex problems. 1.1 Contribution This paper is in the vein of recent attempts in Lasserre [18] and Ahmadi and Majumdar [1] to overcome the important computational burden associated with the standard SOS-hierarchy (3). In particular, in [18] we have suggested another hierarchy of convex relaxations which combines some of the advantages of the SOS- and LP- hierarchies (3) and (4). In the present paperwetakeadvantageofattractivefeaturesoftheSDPT3solver[28,29]toprovideaneffective implementationofthisnewhierarchy. Firstpreliminarytestsonasampleofnonconvexproblems are encouraging and suggest that this new hierarchy might be efficient. This new hierarchy is another type of SOS-hierarchy labelled BSOS (for hierarchy with bounded degree SOS) with the following attractive features: 1AnSOS-convexpolynomialisaconvexpolynomialwhoseHessian factors asL(x)L(x)T forsomerectangular matrix polynomial L. For instance, separable convex polynomials are SOS-convex. 2 • In contrast to the standard SOS-hierarchy (3), for each semidefinite program in the hierar- chy, the size n+k of the semidefinite matrix variable is now fixed, parametrized by an integer n k that one fixes in advance. This integer k determines the degree of a certain SOS polynomial (cid:0) (cid:1) (for instance one may fix k = 2), whence the label BSOS (for “bounded”-SOS). Recall that in the standard SOS-hierarchy (3) the size of the semidefinite matrix variable is n+d with rank d n in the hierarchy. (cid:0) (cid:1) • In contrast to the LP-hierarchy (4), finite convergence occurs at the first step in the hierarchy for a large class of convex problems; typically convex problems defined with convex quadratic polynomials or SOS-convex polynomials of degree at most k. Recall that such finite convergence is impossible for the LP-hierarchy (4). • Just as in the standard SOS-hierarchy (3), there also exists a sufficient condition for finite convergence of the hierarchy. Namely it suffices to check whether at an optimal solution of the corresponding SDP, some associated moment matrix is rank-one. •Lastbutnotleast, toimplementthishierarchy oneusesimportanttechniques thatdramat- ically alleviate the computational burden associated with a standard (careless) implementation. Namely, (a) to declare that two polynomials are identical one uses that their values are equal on finitely many randomly chosen points (instead of equating their coefficients), and (b) the SDP solver SDPT3 [28, 29] can be used to handle efficiently some type of matrices used in our positivity certificate. Preliminary computational experiments First we have compared our results with those obtained with the GloptiPoly software [13] (devoted to solving the SOS-hierarchy (3)) on a sample of non convex problems with up to 20 variables. For problems with low degree (in the initialdata)and/orlowdimensionweobtaintheglobal optimumwhereasgoodlower boundsare always obtained for problems with high degee or higher dimension (e.g. problems with degree 4 and up to 20 variables). Next, we have also tested the LP-hierarchy (4) on a sample of convex problems and as expectedtheconvergence isverypoorandtheresultingLPsbecomeill-conditioned. Inaddition, theLPcanbeexpensivetosolveastheLPdataistypicallydense. Incontrast, thenewhierarchy (with smallest value k = 1 of its parameter) converges at the first step even though some of the problems are defined with polynomials of degree larger than 2. We have also considered a sample of non convex quadratic problems of the form inf{xTAx : x ∈ ∆} where ∆ ⊂ Rn is the canonical simplex and A is a randomly generated real symmetric matrix with r negative eigenvalues and n−r positive eigenvalues. For all problems that could be solved with GloptiPoly (up to n = 20 variables) we obtain the optimal values. For the other problems with n = 40,50,100 variables, only the first (dense) relaxation of GloptiPoly can be implemented andyields onlyalower boundontheglobaloptimum. For thoseproblems,abetter lower bound is obtained in a reasonable amount of time by running the BSOS hierarchy. Finally we have considered the minimization of quadratic and quartic polynomials (with up to 40 variables) on the Euclidean unit ball intersected with the positive orthant. Again in those examples only the first SDP-relaxation of GloptiPoly can be implemented, providing only a lower bound. In contrast BSOS solves all problems at step 2 of the hierarchy in a reasonable amount of time. Of course this new hierarchy of semidefinite relaxations also has its drawbacks (at least in its present version). Namely some submatrix (of the matrix used to describe the linear equality constraints of the resulting SDP) is fully dense and many of these linear constraints are nearly dependent, which yields a lack of accuracy in the optimal solution when the order of relaxation d is increased. 3 2 Main result 2.1 Notation and definitions Let R[x] be the ring of polynomials in the variables x = (x ,...,x ). Denote by R[x] ⊂ R[x] 1 n d the vector space of polynomials of degree at most d, which forms a vector space of dimension s(d) = n+d ,withe.g., theusualcanonicalbasis(xα)ofmonomials. Also,denotebyΣ[x]⊂ R[x] d (resp. Σ[x] ⊂ R[x] ) the space of sums of squares (SOS) polynomials (resp. SOS polynomials d 2d (cid:0) (cid:1) of degree at most 2d). If f ∈ R[x] , we write f(x) = f xα in the canonical basis and d α∈Nn α d denote by f = (f ) ∈ Rs(d) its vector of coefficients. Finally, let Sn denote the space of n×n α P real symmetric matrices, with inner product hA,Bi = traceAB. We use the notation A (cid:23) 0 (resp. A ≻ 0) to denote that A is positive semidefinite (definite). With g := 1, the quadratic 0 module Q(g ,...,g )⊂ R[x] generated by polynomials g ,...,g , is defined by 1 m 1 m m Q(g ,...,g ) := σ g : σ ∈ Σ[x] . 1 m j j j Xj=0 We briefly recall two important theorems by Putinar [24] and Krivine-Stengle [15, 27] respec- tively, on the representation of polynomials that are positive on K. Theorem 1. Let g = 1 and K in (2) be compact. 0 (a) If the quadratic polynomial x 7→ M −kxk2 belongs to Q(g ,...,g ) and if f ∈ R[x] is 1 m strictly positive on K then f ∈ Q(g ,...,g ). 1 m (b) Assume that 0 ≤ g ≤ 1 on K for every j, and the family {1,g } generates R[x]. If f is j j strictly positive on K then f = c gαj (1−g )βj , αβ j j α,Xβ∈Nm Yj (cid:16) (cid:17) for some (finitely many) nonnegative scalars (c ). αβ 2.2 The Bounded-SOS-hierarchy (BSOS) Consider the problem (P) : f∗ = min{f(x) | x ∈ K} where K ⊂ Rn is the basic semi-algebraic set defined in (2), assumed to be compact. Moreover we also assume that g (x) ≤ 1 for all x ∈ K and j = 1,...,m, and {1,g } generates the ring of j j polynomials R[x]. With k ∈ N fixed, consider the family of optimization problems indexed by d ∈N: qk := sup{t |L (x,λ)−t ∈ Σ[x] , λ ≥ 0}, d∈ N. (5) d d k λ,t Observe that when k is fixed, then for each d ∈ N: • Computing qk in (5) reduces to solving a semidefinite program and therefore (5) defines a d hierarchy of semidefinite programs because qk ≥ qk for all d ∈N. d+1 d • The semidefinite constraint is associated with the constraint L (x,λ) − t ∈ Σ[x] and d k the associated matrix has fixed size n+k , independent of d ∈ N, a crucial feature for n computational efficiency of the approach. (cid:0) (cid:1) • If k = 0 then (5) is the linear program (4) and so θ = q0 ≤qk for all d,k. d d d 4 Interpretation. For any fixed d ≥ 1, problem (P) is easily seen to be equivalent to the following problem by adding redundant constraints: (P˜): f∗ = min{f(x) |h (x) ≥ 0 ∀ (α,β) ∈ N2m} αβ d where N2m = {(α,β) ∈ N2m ||α|+|β| ≤ d} and h ∈ R[x] is the polynomial d αβ m x 7→ h (x) := g (x)αj(1−g (x))βj, x ∈ Rn. αβ j j j=1 Y Given λ = (λ ), (α,β) ∈ N2m, consider the Lagrangian function: αβ d x 7→ L (x,λ) =f(x)− λ h (x), x ∈ Rn. d αβ αβ (α,βX)∈N2dm The Lagrangian dual of (P˜) is given by (P˜∗): sup{G (λ) : λ ≥ 0} d d λ where the function G (·) is given by: d λ 7→ G (λ) := inf {L (x,λ)}, λ ≥ 0. d d x∈Rn Now for a fixed λ, the evaluation of G (λ) is computational intractable. However, let k ∈ N d be fixed and observe that G (λ) = inf L (x,λ) = sup{t | L (x,λ)−t ≥ 0, ∀x} d d d x∈Rn t ≥ sup{t | L (x,λ)−t ∈ Σ[x] }, d k t where recall that Σ[x] is the space of SOS polynomials of degree at most 2k. Moreover, k f∗ ≥ sup G (λ) ≥ qk, ∀d∈ N. d d λ≥0 So the semidefinite program (5) can be seen as a tractable simplification of the intractable problem: sup G(λ). The linear program (4) (which is (5) with k = 0) is an even more brutal λ≥0 simplification, so that qk ≥ q0 = θ for all d,k. As a matter of fact we have the more precise d d d and interesting result. Theorem 2 ([18]). Let K ⊂ Rn in (2) be compact with nonempty interior and g (x) ≤ 1 for j x ∈ K and j = 1,...,m. Assume further that the family {1,g } generates the algebra R[x]. Let j k ∈ N be fixed and for each d∈ N, let qk be as in (5). Then: d (a) The sequence (qk), d ∈ N, is monotone nondecreasing and qk → f∗ as d→ ∞. d d (b) Moreover, assume that Slater’s condition holds, i.e., there exists x ∈ K such that 0 g (x ) > 0 for all j = 1,...,m. If f and −g , j = 1,...,m, are SOS-convex2 polynomials j 0 j of degree at most 2k then qk = f∗, i.e., finite convergence takes places at the first relaxation in 1 the hierarchy! In particular when f,−g are convex quadratic polynomials then q1 = f∗. j 1 2A polynomial f ∈ R[x] is SOS-convex if its Hessian ∇2f is an SOS matrix, i.e., ∇2f(x) = L(x)L(x)T for some matrix polynomial L∈R[x]n×p and some p∈N. 5 Proof. The first result is a direct application of Theorem 1(b) since for any integer d,k, f∗ ≥ qk ≥ q0 = θ , and by Theorem 1, θ → f∗ as d → ∞. Next, if Slater’s condition holds and f d d d d and −g are SOS-convex, j = 1,...,m, then there exist nonnegative Lagrange-KKT multipliers j λ ∈ Rm such that + ∇f(x∗)− λ ∇g (x∗) = 0; λ g (x∗) = 0, j = 1,...,m. j j j j j X In other words, the Lagrangian x 7→ L(x) := f(x)−f∗− λ g (x) is SOS-convex and satisfies j j j L(x∗) = 0 and ∇L(x∗) = 0. By Helton and Nie [14, Lemma 8, p. 33], L is SOS (of degree at P most 2k). 2.3 The SDP formulation of (5) To formulate (5) as a semidefinite program one has at least two possibilities depending on how westatethattwopolynomialsp,q ∈ R[x] areidentical. Eitherbyequating their coefficients(e.g. d in the monomial basis), i.e., p = q for all α ∈Nn, or by equating their values on n+d generic α α d n points (e.g. randomly generated on the box [−1,1]n). In the present context of (5) we prefer (cid:0) (cid:1) thelatter option sinceexpandingthepolynomial h (x) symbolically to get thecoefficients with αβ respect to the monomial basis can be expensive and memory intensive. Let τ = max{deg(f),2k,dmax {deg(g )}}. Then for k fixed and for each d, we get j j qk = sup t | f(x)−t− λ h (x)= hQ,v (x)v (x)Ti; d αβ αβ k k n (α,βX)∈N2dm s(k) Q ∈ S ,λ ≥ 0 (6) + o f(x(p)) = t+ λ h (x(p))+hQ,v (x(p))v (x(p))Ti, (α,β)∈N2m αβ αβ k k = sup t d (7) ( p = 1,...,L, PQ ∈ Ss(k), λ ≥ 0, t ∈ R ) (cid:12) + (cid:12) (cid:12) whereL := |Nn|= n+τ and {x(p) ∈ Rn |p = 1,...,L} arerandomly selected pointsin [−1,1]n; τ n s(k) = n+k , and v (x) is a vector of polynomial basis for R[x] , the space of polynomials of k (cid:0)k (cid:1) k degree at most k. To be rigorous, if the optimal value qk of (6) is finite then the above equality (cid:0) (cid:1) d holds with probability one and every optimal solution (qk,λ∗,Q∗) of (6) is also an optimal d solution of (7). 2.4 Sufficient condition for finite convergence By looking at the dual of the semidefinite program (7) one obtains a sufficient condition for finite convergence. To describe the dual of the semidefinite program (7) we need to introduce some notation. For every p = 1,...,L, denote by δ the Dirac measure at the point x(p) ∈ R and let x(p) hq,δ i = q(x(p)) for all p = 1,...,L, and all q ∈ R[x]. x(p) With a real sequence y = (y ), α ∈ Nn , denote by M (y) the moment matrix associated α 2ℓ ℓ with y. It is a real symmetric matrix with rows and columns indexed by Nn, and with entries ℓ M (y)(α,β) = y , ∀α,β ∈ Nn. ℓ α+β ℓ 6 Similarly, for j = 1,...,m, letting g (x) = g xγ, denote by M (g y) the localizing matrix j γ jγ ℓ j associated with y and g ∈ R[x]. Its rows and columns are also indexed by Nn, and with entries i P ℓ M (g y)(α,β) = g y , ∀α,β ∈ Nn. ℓ j jγ α+β+γ ℓ γ∈Nn X The dual of the semidefinite program (7) reads: L q˜k := inf θ hf,δ i d p x(p) θ∈RL p=1 X L s.t. θ (v (x(p))v (x(p))T) (cid:23) 0 p k k p=1 X (8) L θ hh ,δ i≥ 0, (α,β) ∈ N2m p αβ x(p) d p=1 X L θ = 1. p p=1 X (Notice that the weights θ are not required to be nonnegative.) By standard weak duality in p convex optimization, and for every fixed k ∈ N, one has f∗ ≥ q˜k ≥ qk, ∀d∈ N. d d (In full rigor, for each fixed d the above inequality holds with probability one.) Next, let s ∈ N be the smallest integer such that 2s ≥ max[deg(f); deg(g )], and let r := max ⌈deg(g )/2⌉. We j j j have the following verification lemma. Lemma 1. Let K in (2) be compact with nonempty interior and assume that there exists x ∈ K 0 such that 0< g (x ) <1 for all j = 1,...,m. j 0 (a) For every d sufficiently large (say d ≥ d ), the semidefinite program (6) has an optimal 0 solution. Hence for each fixed d ≥ d , with probability one the semidefinite program (7) has also 0 an optimal solution. (b) Let θ∗ ∈ RL be an optimal solution of (8) (whenever it exists) and let y∗ = (y∗), α ∈Nn , α 2s with L y∗ := θ∗(x(p))α, α∈ Nn . (9) α p 2s p=1 X • If rankM (y∗) = 1 then q˜k = f∗ and x∗ = (y∗), |α| = 1, i.e., x∗ = L θ∗x(p), is an s d α p=1 p optimal solution of problem (P). P • If M (y∗) (cid:23) 0, M (g y∗) (cid:23) 0, j = 1,...,m, and rankM (y∗) = rankM (y∗), then s s−r j s s−r q˜k = f∗ and problem (P) has rankM (y∗) global minimizers that can be extracted by a linear d s algebra procedure. The proof is postponed to the Appendix. Notice that we do not claim that problem (8) has always an optimal solution. Lemma 1 is a verification lemma (or a stopping criterion) based on some sufficient rank condition on M (y∗) and M (y), provided that an optimal solution y∗ s s−r exists. When the latter condition holds true then f∗ = q˜k and we can stop as at least one global d optimal solution x∗ ∈ K has been identified. 7 2.5 On the rank-one matrices of (5) and SDPT3 NotethatintheSDP(7),theconstraintmatricesassociatedwithQarealldenserank-1matrices of the form A = v (x(p))v (x(p))T. Thus if we let v = v (x(p)), then the linear maps involved p k k p k in the equality constraints of the SDP can beevaluated cheaply based on the following formulas: L L L A(X) := hA ,Xi = hv ,Xv i , A∗y := y A = V Diag(y)VT p p p p p p=1 p=1 h i h i Xy=1 where X ∈ Ss(k), y ∈ RL, V = [v ,...,v ] ∈ Rs(k)×L. Moreover, one need not store the 1 L dense constraint matrices {A | p = 1,...,L} but only the vectors {v | p = 1,...,L}. To p p solve the SDP (7) efficiently, we need to exploit the rank-1 structure of the constraint matrices during the iterations. Fortunately, the SDPT3 solver [28, 29] based on interior point methods has already been designed to exploit such a rank-1 structure to minimize the memory needed to store theconstraint matrices, as well as to minimizethecomputational cost requiredto compute the Schur complement matrix arising in each interior-point iteration. More precisely, in each iteration where a positive definite matrix W ∈ Ss(k) is given, one needs to compute the Schur complement matrix S whose (p,q) element is given by S = hA ,WA Wi= hv vT,Wv vTWi= hv ,Wv i2, p,q = 1,...,L. pq p q p p q q p q It is the combination of these two implementation techniques (point evaluation in the for- mulation and exploiting rank-one structure in the interior point algorithm) that makes our implementation of the SOS-hierarchy (5) efficient. 3 Computational issues Given f ∈ R[x] , in order to efficiently evaluate the vector f(x(p)), p = 1,...,L, we need a d convenient representation of the polynomial f(x). In our implementation of BSOS, we use the following data format to input a polynomial: F(i,1 : n+1) = [αT,f ] α where f is the ith coefficient corresponding to the monomial xα. Note that the enumeration α of the coefficients of f(x) is not important. For a given point z ∈ Rn such that z 6= 0 for all i i = 1,...,n, we evaluate f(z) via the following procedure written in Matlab syntax: Step 1. Set P = F(:,1 : n), f = F(:,n+1), and s = (s ,...,s )T, where s = 1 if z < 0, and 1 n i i s = 0 if z ≥ 0. i i Step 2. Compute s¯= rem(Ps,2) and z = exp(P log|z|). Step 3. Compute f(z) = hf(a),z(a)i−hf(b),z(b)i, where f(a) = f(find(s¯== 0)) and f(b) = f(find(s¯== 1)). (The above procedure can be modified slightly to handle the case when z has some zero compo- nents.) Note that in the above procedure, hf(a),z(a)i and hf(b),z(b)i correspond to the sum of positive terms and sum of negative terms in the evaluation of f(z). By separating the summa- tion of the positive and negative terms in the evaluation of f(z), it is hoped that cancellation errors can be minimized. We should mention that some of the equality constraints in (7) may be redundant. For the sake of reducing the computational cost and improve the numerical stability, we remove these 8 redundant constraints before solving the SDP. However, as d increases, the linear constraints would become more and more nearly dependent, and typically the SDP problem cannot be solved accurately by either SDPT3 or SEDUMI. Another numerical issue which we should point out is that the constraint matrix h (x(1)) αβ (α,β)∈N2m d . (cid:0) ..(cid:1) h (x(L)) αβ (α,β)∈N2m d (cid:0) (cid:1) associated with the nonnegative vector (λ ) is typically fully dense. Such a matrix would αβ consume too much memory and also computational cost when d increases or when m is large. 4 Numerical experiments We call our approach BSOS (for hierarchy with bounded degree SOS). As mentioned in the Introduction, we conduct experiments on three classes of problems which will be described in the ensuing subsections. 4.1 Comparison of BSOS with Gloptiploy We construct a set of test functions with 5 constraints. The test functions are mainly generated based on the following two problems: (P ) f = x2 −x2 +x2 −x2 +x −x 1 1 2 3 4 1 2 s.t. 0 ≤ g = 2x2 +3x2 +2x x +2x2 +3x2 +2x x ≤ 1 1 1 2 1 2 3 4 3 4 0 ≤ g = 3x2 +2x2 −4x x +3x2 +2x2 −4x x ≤ 1 2 1 2 1 2 3 4 3 4 0 ≤ g = x2 +6x2 −4x x +x2 +6x2 −4x x ≤ 1 3 1 2 1 2 3 4 3 4 0 ≤ g = x2 +4x2 −3x x +x2 +4x2 −3x x ≤ 1 4 1 2 1 2 3 4 3 4 0 ≤ g = 2x2 +5x2 +3x x +2x2 +5x2 +3x x ≤ 1 5 1 2 1 2 3 4 3 4 0 ≤ x. The optimal value of (P ) is f(x∗)= −0.57491, as computed by GloptiPoly3. For BSOS, we get 1 the result qk=1 = −0.57491, which is the exact result. d=1 The second problem is : (P ) f = x4x2 +x2x4 −x2x2 2 1 2 1 2 1 2 s.t. 0 ≤ g = x2 +x2 ≤ 1 1 1 2 0 ≤ g = 3x2 +2x2 −4x x ≤ 1 2 1 2 1 2 0 ≤ g = x2 +6x4 −8x x +2.5 ≤ 1 3 1 2 1 2 0 ≤ g = x4 +3x4 ≤ 1 4 1 2 0 ≤ g = x2 +x3 ≤ 1 5 1 2 0 ≤ x , 0≤ x . 1 2 The optimal value of (P ) is f(x∗) = −0.037037, as computed by GloptiPoly3. The results 2 9 obtained by BSOS are qk=3 = −0.041855, qk=3 = −0.037139, qk=3 = −0.037087 d=1 d=2 d=3 qk=3 = −0.037073, qk=3 = −0.037046 d=4 d=5 qk=4 = −0.038596, qk=4 = −0.037046, qk=4 = −0.037040 d=1 d=2 d=3 qk=4 = −0.037038, qk=4 = −0.037037. d=4 d=5 Basedontheabovetwoproblems,weincreasethedegreeoftheobjectivefunctionandconstraint functions to generate other test instances which are given explicitly in the Appendix. Table 1 compares the results obtained by BSOS and GloptiPoly3 for the tested instances. We observe that BSOS can give the exact result for those problems with either low degree or low dimension, while also providing a good lower bound for high degree and high dimensional problems. In particular on this sample of problems, k is chosen so that the size of the semidef- inite constraint (which is n+k ) is the same as the one needed in GloptiPoly, to certify global k optimality. Then notice that BSOS succeeds in finding the optimal value even though the posi- (cid:0) (cid:1) tivity certificate used in (7) is not Putinar’s certificate (3) used in GloptiPoly. In addition, for most of test problems, BSOS can usually get better bounds as d increases, and in most cases, the bound is good enough for small d= 2,3. In Table 1, we also use the sufficient condition stated in Lemma 1 to check whether the generated lower bound is indeed optimal. For quite a number of instances, the moment matrix M (y∗) associated with the optimal solution θ∗ of (8) indeed has numerical rank equal to one ℓ (we declare that the matrix has numerical rank equal to one if the largest eigenvalue is at least 104 times larger than the second largest eigenvalue), which certifies that the lower bound is actually the optimal value. We should note that for some of the instances, although the lower bound is actually the optimal value (as declared by GloptiPoly), but the rank of the moment matrix M (y∗) is larger than one. ℓ 4.2 Comparison of BSOS with the LP relaxations of Krivine-Stengle on con- vex problems HerewecomparetheperformanceofBSOSwiththeLPrelaxations ofKrivine-Stengleonconvex problems where each test problem has 5 constraint functions in addition to the nonnegative constraint x ≥ 0. Note that the LP relaxation problem has exactly the same form as in (7), except that the positive semidefinite matrix variable Q is set to 0. We should mention that even though the Krivine-Stengle scheme generates LP problems instead of SDP problems, the size of the corresponding LP problems also increases rapidly with d, like for the BSOS scheme. In particular, in both LP- and BSOS-relaxations, the dimension of the nonnegative variable λ is 2m+d , and the constraint matrix is fully dense. (The BSOS-relaxations include an additional d semidefinite constraint with fixed matrix size n+k .) The following example illustrates the (cid:0) (cid:1) k performance of LP relaxation method: (cid:0) (cid:1) (C ) min f = x4 +x4 +2x2x2−x −x 1 1 2 1 2 1 2 s.t. 0 ≤ g = −x4 −2x4 +1 1 1 2 0 ≤ g = −2x4 −x4 +1 2 1 2 0 ≤ g = −x4 −4x2 +1.25 3 1 2 0 ≤ g = −4x4 −x4 +1.25 4 1 2 0 ≤ g = −2x4 −3x2 +1.1 5 1 2 0 ≤ x 1 0 ≤ x . 2 10