An iterative construction of solutions of the TAP equations for the Sherrington-Kirkpatrick model Erwin Bolthausen∗† Universit¨at Zu¨rich 2 1 1 Introduction 0 2 n The TAP equations [6] for the Sherrington-Kirkpatrick model describe the quenched a expectations of the spin variables in a large system. J The standard SK-model has the random Hamiltonian on Σ d=ef 1,1 N , N N, 3 N {− } ∈ 1 N def (N) R] HN,β,h,ω(σ) = −β gij (ω)σiσj −h σi, 1≤i<j≤N i=1 P X X . (N) h where σ = (σ ,...,σ ) Σ , β > 0, h 0, and where the g , 1 i < j N, are 1 N ∈ N ≥ ij ≤ ≤ at i.i.d. centered Gaussian random variables with variance 1/N, defined on a probability m space (Ω, ,P). We extend this matrix to a symmetric one, by putting g d=ef g for ij ji F [ def i > j, and g = 0. The quenched Gibbs measure on Σ is ii N 1 v 1 exp[ H (σ)], 1 Z − N,β,h,ω N,β,h,ω 9 8 def 2 where ZN,β,h,ω = σexp[−HN,β,h,ω(σ)]. . We write for the expectation under this measure. We will often drop the 1 h·iN,βP,h,ω 0 indices N,β,h,ω if there is no danger of confusion. We set 2 1 def m = σ . : i h ii v i The TAP equations state that X r N a m = tanh h+β g m β2(1 q)m , (1.1) i ij j i j=1 − − (cid:18) (cid:19) X which have to be understood in a limiting sense, as N . q = q(β,h) is the solution → ∞ of the equation q = tanh2(h+β√qz)φ(dz), (1.2) Z ∗Instituteof Mathematics, University of Zu¨rich, e-mail: [email protected] †Supported by an SNFgrant No 200020-125247, and by theHumboldt Society. 1 where φ(dz) is the standard normal distribution. It is known that this equation has a unique solution q > 0 for h > 0 (see [4] Proposition 1.3.8). If h = 0, then q = 0 is the unique solution if β 1, and there are two other (symmetric) solutions when ≤ β > 1, which are supposed to be the relevant ones. Mathematically, the validity of the TAP equations has only been proved in the high temperature case, i.e. when β is small, although in the physics literature, it is claimed that they are valid also at low temperature, but there they have many solutions, and the Gibbs expectation has to be taken inside “pure states”. For the best mathematical results, see [4] Chap. 1.7. The appearance of the so-called Onsager term β2(1 q)m is easy to understand. i − From standard mean-field theory, one would expect an equation N m = tanh h+β g m , i ij j j=1 (cid:18) (cid:19) X butonehastotakeintoaccountthestochasticdependencebetweentherandomvariables m and g . In fact, it turns out that the above equation should be correct when one j ij (i) replaces m by m where the latter is computed under a Gibbs average dropping the j j (i) interactions with the spin i. Therefore m is independent of the g , 1 k N, and j ik ≤ ≤ one would get N (i) m = tanh h+β g m . (1.3) i ij j j=1 (cid:18) (cid:19) X TheOnsagertermisanItˆo-typecorrection expandingthedependencyofm ong = g , j ji ij (i) and replacing m on the right hand side by m . The correction term is non-vanishing j j because of g2 1, ij ≈ j X i.e. exactly for the same reason as in the Itˆo-correction in stochastic calculus. We omit the details which are explained in [3]. In the present paper, there are no results about SK itself. We introduce an iterative approximation scheme for solutions of the TAP equations which is shown to converge below and at the de Almayda-Thouless line, i.e. under condition (2.1) below (see [1]). This line is supposed to separate the high-temperature region from the low-temperature one, but although the full Parisi formula for the free energy of the SK-model has been proved by Talagrand [5], there is no proof yet that the AT line is the correct phase separation line. The iterative scheme we propose reveals, we believe, an interesting structure of the dependence of the m on the family g , even below the AT line. The main technical i ij { } result, Proposition 2.5 is proved at all temperatures, but beyond the AT-line, it does not give much information. We finish the section by introducing some notations. If x,y RN, we write ∈ N 1 def def x,y = x y , x = x,x . i i h i N k k h i i=1 X p 2 As mentioned above, we suppress N in notations as far as possible, but this parameter is present everywhere. We also define the N N-matrix x y, x y, by s × ⊗ ⊗ 1 x y def def i j (x y) = (x y +x y ), x y = . (1.4) ⊗s i,j N i j j i ⊗ N If A is an N N-matrix and x RN, the vector Ax is defined in the usual way × ∈ (interpreting vectors in RN as column matrices). If f :R R is a function and x RN → ∈ we simply write f (x) for the vector obtained by applying f to the coordinates. g = (g )isaGaussianN N-matrixwheretheg fori< j areindependentcentered ij ij × Gaussians with variance 1/N, and where g = g , g = 0. We will exclusively reserve ij ji ii the notation g for such a Gaussian matrix. We will use Z,Z′,Z ,Z ,... as generic standard Gaussians. Whenever several of 1 2 them appear in the same formula, they are assumed to be independent, without special mentioning. We then write E when taking expectations with respect to them. (This notation is simply an outflow of the abhorrence probabilists have of using integral signs, as John Westwater once put it). If X , Y are two sequences of real random variables, defined on (Ω, ,P), we N N { } { } F write X Y N N ≃ provided there exists a constant C > 0 such that P(X Y t) Cexp t2N/C N N | − | ≥ ≤ − for all N N, 0 < t 1. (cid:2) (cid:3) ∈ ≤ Clearly, if X Y , and X′ Y′ , then X +X′ Y +Y′ . N ≃ N N ≃ N N N ≃ N N If X(N) = X(N) , Y(N) = Y(N) are two sequences of random vectors in i i i≤N i≤N RN, we write X(cid:16)(N) (cid:17)Y(N) if (cid:16) (cid:17) ≈ N 1 (N) (N) X Y 0. N i − i ≃ Xi=1(cid:12) (cid:12) (cid:12) (cid:12) We willuseC > 0as agenericposi(cid:12)tive constant,(cid:12)notnecessarily thesameatdifferent occurrences. It may depend on β,h, and on the level k of the approximation scheme appearing in the next section, but on nothing else, unless stated otherwise. In order to avoid endless repetitions of the parameters h and β, we use the abbrevi- ation Th(x)= tanh(h+βx). We always assume h = 0, and as there is a symmetry between the signs, we assume 6 h > 0. q = q(β,h) will exclusively be used for the unique solution of (1.2). In the case h = 0, β > 1, there is a unique solution of (1.2) which is positive. Proposition 2.5 is valid in this case, too, but this does not lead to a useful result. So, we stick to the h > 0 case. Gaussian random variables are always assumed to be centered. 3 2 The recursive scheme for the solutions of the TAP equa- tions We recursively define a double sequences m(k) = m(k) of random variables i 1≤i≤N, k∈N by putting n o m(0) d=ef 0, m(1) d=ef √q1, i N, ∀ ≤ 1 here the vector with coordinates all 1, and q = q(β,h) is the unique solution of (1.2). We define m(k+1) d=ef Th gm(k) β(1 q)m(k−1) , k 1. − − ≥ (cid:16) (cid:17) k will exclusively been used to number this level of the iteration. Our main result is Theorem 2.1 Assume h > 0. If β > 0 is below the AT-line, i.e. if β2Ecosh−4(h+β√qZ) 1, (2.1) ≤ then 2 lim limsupE m(k) m(k′) = 0. k,k′→∞ N→∞ − (cid:13) (cid:13) If there is strict inequality in (2.1), then(cid:13)there exist 0 <(cid:13) λ(β,h) < 1, and C > 0, such (cid:13) (cid:13) that for all k 2 limsupE m(k+1) m(k) Cλk. − ≤ N→∞ (cid:13) (cid:13) (cid:13) (cid:13) Thetheoremisastraightforwardconsequenceofacomputation oftheinnerproducts (cid:13) (cid:13) m(i),m(j) . We explain that first. The actual computation of these inner products will be quite involved and will depend on clarifying the structural dependence of m(k) on g. (cid:10) (cid:11) As we assume h> 0, we have q > 0. We define a function ψ : [0,q] R by → ψ(t) d=ef ETh √tZ +√q tZ′ Th √tZ +√q tZ′′ , − − (cid:16) (cid:17) (cid:16) (cid:17) whereZ,Z′,Z′′,asusual,areindependentstandardGaussians. RememberthatTh(x) = tanh(h+βx). def Let α = ETh √qZ > 0. (cid:0) (cid:1) Lemma 2.2 a) ψ satisfies 0 < ψ(0) = α2 < ψ(q) = q, and is strictly increasing and convex on [0,q]. b) ψ′(q)= β2Ecosh−4(h+β√qZ). 4 Proof. ψ(0) = α2, and ψ(q) = q are evident by the definition of α,q. We compute the first two derivatives of ψ : 1 ψ′(t)= E ZTh′ √tZ +√q tZ′ Th √tZ +√q tZ′′ √t − − h (cid:16) (cid:17) (cid:16) (cid:17)i 1 E Z′Th′ √tZ +√q tZ′ Th √tZ +√q tZ′′ − √q t − − − h (cid:16) (cid:17) (cid:16) (cid:17)i = ETh′′ √tZ +√q tZ′ Th √tZ +√q tZ′′ − − (cid:16) (cid:17) (cid:16) (cid:17) +ETh′ √tZ +√q tZ′ Th′ √tZ +√q tZ′′ − − (cid:16) (cid:17) (cid:16) (cid:17) ETh′′ √tZ +√q tZ′ Th √tZ +√q tZ′′ − − − (cid:16) (cid:17) (cid:16) (cid:17) = ETh′ √tZ +√q tZ′ Th′ √tZ +√q tZ′′ . − − (cid:16) (cid:17) (cid:16) (cid:17) the second equality by Gaussian partial integration. Differentiating once more, we get ψ′′(t)= E Th′′ √tZ +√q tZ′ Th′′ √tZ +√q tZ′′ . − − (cid:16) (cid:16) (cid:17) (cid:16) (cid:17)(cid:17) In both expressions, we can first integrate out Z′,Z′′, getting ∞ ∞ 2 ψ′(t) = Th′ √tx+√q ty φ(dy) φ(dx) > 0, − Z−∞(cid:20)Z−∞ (cid:16) (cid:17) (cid:21) and the similar expression for ψ′′ with Th′ replaced by Th′′. So, we see that ψ is in- creasing and convex. Furthermore, as Th′(x) = βtanh′(βx+h) = β 1 tanh2(βx+h) − β = , (cid:0) (cid:1) cosh2(βx+h) we get ψ′(q)= ETh′(√qZ)2 = β2Ecosh−4(h+β√qZ). Corollary 2.3 If (2.1) is satisfied, then q is the only fixed point of ψ in the interval [0,q]. If (2.1) is not satisfied then there is a unique fixed point of ψ(t) = t inside the interval (0,q). def def We define sequences {ρk}k≥1, {γk}k≥1 recursively by γ1 = α, ρ1 = γ1√q, and for k 2 ≥ def ρ = ψ(ρ ), k k−1 ρ Γ2 γ d=ef k − k−1 , k q Γ2 − k−1 q 5 where m Γ2 d=ef γ2, Γ2 d=ef 0. m j 0 j=1 X Lemma 2.4 a) For all k N ∈ Γ2 < ρ < q. k−1 k b) If (2.1) is satisfied, then lim ρ = q, lim Γ2 = q. k k k→∞ k→∞ c) If there is strict inequality in (2.1) , then Γ2 and ρ converge to q exponentially k k fast. Proof. a) ρ < q for all k is evident. k We prove by induction on k that ρk > Γ2k−1. For k = 1, as ρ1 = γ1√q, the statement follows. Assume that it is true for k. Then ρ Γ2 γ = k − k−1 < ρ Γ2 , k q Γ2 k − k−1 − k−1 q q i.e. ρ > Γ2. As ρ > ρ , the statement follows. k k k+1 k b) Evidently lim ρ = q if (2.1) is satisfied. The sequence Γ2 is increasing and k→∞ k k bounded (by q). If ζ d=ef lim Γ2 < q, then lim γ = √q ζ > 0, a contradiction k→∞ k k→∞ k −(cid:8) (cid:9) to the boundedness of Γ2 . k c) Linearization of ψ aroundq easily shows thattheconvergence is exponentially fast (cid:8) (cid:9) if ψ′(q)< 1. Remark that by a) of the above lemma, one has γ > 0 for all k. k Let Π be the orthogonal projection in RN, with respect to the inner product , , j h· ·i onto span m(1),...,m(j) . We set (cid:0) (cid:1)M(k,j) d=ef m(k) Π m(k) , j < k, (2.2) j − (cid:16) (cid:17) and M(k) d=ef M(k,k−1). (2.3) Let M(k) φ(k) d=ef (2.4) M(k) if M(k) = 0. In case m(k) span m(1),(cid:13)...,m(cid:13)(k−1) , we define φ(k) d=ef 1, to have (cid:13) (cid:13) 6 ∈ it defined everywhere, but we will see that this happens only with exponentially small (cid:13) (cid:13) (cid:0) (cid:1) pro(cid:13)babili(cid:13)ty. Remark that φ(1) = 1. The key result is: 6 Proposition 2.5 For all k N ∈ 2 m(k) q, (2.5) ≃ (cid:13) (cid:13) and for 1 j < k (cid:13) (cid:13) ≤ (cid:13) (cid:13) m(j),m(k) ρ , (2.6) j ≃ D E φ(j),m(k) γ . (2.7) j ≃ D E Proof of Theorem 2.1 from Proposition 2.5. As the variables m(k) are bounded, (2.5) implies 2 lim E m(k) =q, N→∞ (cid:13) (cid:13) and similarly for the other statements. (cid:13) (cid:13) (cid:13) (cid:13) 2 2 2 E m(k+1) m(k) = E m(k) +E m(k−1) 2E m(k),m(k−1) . − − (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) D E (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) Taking the N limit, using Proposition 2.5, this converges to 2q 2ρ . From (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) k−1 → ∞ − Lemma 2.4, the claim follows. Remark 2.6 Proposition 2.5 is true for all temperatures. However, beyond the AT-line, it does not givemuchinformationonthebehaviorofthem(k)forlargek.Itwouldbeveryinteresting to know if these iterates satisfy some structural properties beyond the AT-line. The main task is to prove the Proposition 2.5. It follows by an involved induction argument. We first remark that (2.7) is a consequence of (2.5) and (2.6). IfJ NletCOND(J)bethestatementthat(2.5)and(2.6)holdfork J.COND(1) ∈ ≤ is evidently true. COND(J) implies that for all k J, we have with ≤ δ d=ef q Γ2 /2 > 0 k − k−1 q that P M(k) δ C exp[ N/C ]. k k k ≤ ≤ − (cid:16)(cid:13) (cid:13) (cid:17) If we put (cid:13) (cid:13) (cid:13) (cid:13) J A d=ef M(k) > δ , (2.8) J k k\=1n(cid:13) (cid:13) o (cid:13) (cid:13) then (cid:13) (cid:13) P(A ) 1 C exp[ N/C ]. (2.9) J J J ≥ − − Evidently, all variables φ(k) are boundedby a constant on A , if k J. The constant J ≤ may depend on J, of course. The m(k) are bounded by 1 everywhere. 7 3 Iterative modifications of the interaction variables Let be a sub-σ-field of , and y = (y ) be a random matrix. We are only G F ij 1≤i,j≤N interested in the case where y is symmetric and 0 on the diagonal, but this is not important for the moment. We assume that y is jointly Gaussian, conditioned on , i.e. G there is a positive semidefinite N2 N2- -m.b. matrix Γ such that × G 1 E exp i t y = exp t Γ t . kj kj kj kj,k′j′ k′j′ k,j G −2 k,k′,j,j′ (cid:16) h X i(cid:12) (cid:17) (cid:20) X (cid:21) (cid:12) (We do not assume that y is Ga(cid:12)ussian, unconditionally). Consider a -measurable G random vector x, and the linear space of random variables def N = a (yx) : a ,...,a measurable . L i=1 i i 1 N G − (cid:26) (cid:27) X We consider the linear projection π (y) of y onto , which is defined to be the unique L L matrix with components π (y ) in which satisfy L ij L E( y π (y ) U ) = 0, U . ij L ij { − } |G ∀ ∈ L As y is assumed to be conditionally Gaussian, given , it follows that y π (y) is L G − conditionally independent of the variables in , given . L G If y is symmetric, then clearly π (y) is symmetric, too. L Remark 3.1 If X is a -measurable random variable then yX is conditionally Gaussian as well and G π (y)X = π (yX). L L Remark also that (y π (y))x = yx π (yx)= 0, (3.1) L L − − as yx . ∈ L Using this construction, we define a sequence g(k),k 1 of matrices, and a sequence ≥ of sub-σ-fields of , starting with g(1) d=ef g, and = = ,Ω ). The k −1 0 {F } F F F {∅ } construction is done in such a way that (C1) g(k) is conditionally Gaussian, given . k−1 F (C2) m(k), M(k), and φ(k) are -measurable k−1 F Using that we define g(k+1) = g(k) π g(k) , − Lk (cid:16) (cid:17) with d=ef N a g(k)M(k) : a measurable , k i i k−1 L i=1 i F − (cid:26) (cid:27) X (cid:16) (cid:17) 8 i.e. we perform the above construction with = and x= M(k). k−1 G F Furthermore, we define d=ef σ ,ξ(k+1) , k+1 k F F (cid:16) (cid:17) where ξ(k) d=ef g(k)φ(k). In order that the construction is well defined, we have to inductively prove the properties (C1) and (C2). We actually prove a condition which is stronger than (C1): (C1’) Conditionally on , g(k) is Gaussian, and conditionally independent of . k−2 k−1 F F (C1’) implies that g(k) is conditionally Gaussian, given , and the conditional k−1 F law, given , is the same as given . k−1 k−2 F F Inductive proof of (C1’) and (C2). The case k = 1 is trivial. We first prove (C2) for k 2, using (C1’), (C2) up to k 1. We claim that ≥ − m(k) = Th g(k−1)M(k−1)+R(k−2) , (3.2) (cid:16) (cid:17) where R(k−2) stands for a generic -measurable random variable, not necessarily the k−2 F same at different occurrences. As g(k−1)M(k−1) = M(k−1) ξ(k−1), and M(k−1) is -measurable, by the induc- k−2 F tion hypothesis, it follows from (3.2) that m(k) is -measurable The statements for (cid:13) (cid:13) Fk−1 M(k),φ(k) are then trivi(cid:13)al conseq(cid:13)uences. We therefore have to prove (3.2). We prove by induction on j that m(k) = Th g(j)M(k−1,j−1)+R(k−2) . (3.3) (cid:16) (cid:17) The case j = 1 follows from the definition of m(k), and the case j = k 1 is (3.2). − Assumethat(3.3)istrueforj < k 1.Wereplaceg(j) byg(j+1) throughtherecursive − definition m(k) = Th g(j+1)M(k−1,j−1)+π g(j) M(k−1,j−1)+R(k−2) Lj (cid:16) (cid:16) (cid:17) (cid:17) = Th g(j+1)M(k−1,j−1)+R(k−2) , (cid:16) (cid:17) as π g(j) is -measurable and therefore π g(j) M(k−1,j−1) is -measurable Lj Fj Lj Fk−2 Using (3.1), one gets g(j+1)M(j) = 0, and therefore (cid:0) (cid:1) (cid:0) (cid:1) g(j+1)M(k−1,j−1) = g(j+1)M(k−1,j). This proves (3.2), and therefore (C2) for k. We next prove (C1’) for k. g(k) = g(k−1) π g(k−1) . − Lk−1 (cid:16) (cid:17) 9 We condition on . By (C2), M(k−1) is -measurable As g(k−1), conditioned on k−2 k−2 F F ,isGaussian,andindependentof ,ithasthesamedistributionalsoconditioned k−3 k−2 F F on . By the construction of g(k), this variable is, conditioned on , independent k−2 k−2 F F of , and conditionally Gaussian. k−1 F Lemma 3.2 For m < k, one has g(k)φ(m) = 0. Proof. The proof is by induction on k. For k = 1, there is nothing to prove. Assume that the statement is proved up to k. We want to prove g(k+1)φ(m) = 0 for m k. The case m = k is covered by (3.1). For m < k, it follows by Remark 3.1, as ≤ φ(m) is -measurable, that k−1 F π g(k) φ(m) = π g(k)φ(m) , Lk Lk (cid:16) (cid:17) (cid:16) (cid:17) and therefore g(k+1)φ(m) = g(k)φ(m) π g(k) φ(m) − Lk (cid:16) (cid:17) = g(k)φ(m) π g(k)φ(m) = 0, − Lk (cid:16) (cid:17) as g(k)φ(m) = 0 by the symmetry of g(k) and the induction hypothesis. Lemma 3.3 If m < k, then (k) (m) ξ φ = 0. i i i X Proof. (k) (m) (k) (k) (m) ξ φ = g φ φ i i ij j i i i j X XX (k) (k) (m) = φ g φ j ij i j i X X (k) (k) (m) = φ g φ = 0 j ji i j i X X for m < k, by the previous lemma. 4 Computation of the conditional covariances of g(k). We introduce some more notations. We write O (N−r) for a generic -measurable random variable X which satisfies k k F P(NrX K) Cexp[ N/C], ≥ ≤ − 10