SCSC-TR-98-13 9 9 9 1 n Monte Carlo Quasi-Heatbath a by approximate inversion J 7 2 v Philippe de Forcrand 5 Swiss Center for Scientific Computing (SCSC) 2 ETH-Zu¨rich, CH-8092 Zu¨rich 0 Switzerland 1 [email protected] 1 8 9 / t a m - d n o Abstract c : When sampling the distribution P(φ~) exp( Aφ~ 2), a global v ∝ −| | heatbath normally proceeds by solving the linear system Aφ~ = ~η, i X where ~η is a normal Gaussian vector, exactly. This paper shows how r to preserve the distribution P(φ~) while solving the linear system with a arbitrarily low accuracy. Generalizations are presented. PACS numbers: 02.70.L, 02.50.N, 52.65.P,11.15.H, 12.38.G In Monte Carlo simulations, it is frequently the case that one wants to sample a vector φ~ from a distribution of the Gaussian type exp( Aφ~ 2). Typically,φ~ hasmanycomponents,andAisalarge,sparsem∝atrix. I−n|latt|ice field theory, φ~ is the value of the continuum field φ~ at regular grid points, and A is the discretized version of some differential operator . Illustrative A examples used in this paper are = m+ip~ (free field) and = m+ip/ A A (Dirac operator). The goal of the Monte Carlo simulation is to provide independent configurations of φ~ at the least cost. The brute-force approach consists of drawing successive random vec- tors ~η(k) from the normal Gaussian distribution exp( ~η 2), and of solving Aφ~(k) = ~η(k). The solution of this linear system c−a|n|be efficiently ob- tained with an iterative linear solver (Conjugate Gradient if A is Hermi- tian, BiCGStab otherwise). This approach can be called a global heatbath, because φ~(k+1) has no memory of φ~(k): the heatbath has touched all the components of φ~. To avoid a bias, the solver must be iterated to full con- vergence, which is often prohibitively expensive. One may try to limit the accuracy while maintaining the bias below statistical errors, but this re- quires a delicate compromise difficult to tune a priori. A notable example of this global heatbath approach is thestochastic evaluation of inverse Dirac matrix elements, where several hundred “noise vectors” ~η(k) are inverted to yield (A†A)−1 φ φ† . An abundant literature has been devoted to the ij ≈ h i jik optimization of this procedure [1, 2]. For the free field or the Dirac operator mentioned above, the number of iterations of the solver required to reach a given accuracy grows like the correlation length ξ 1/m. Thus the work per new, independent φ~ is c ξz ≡ where z, the dynamical critical exponent, is 1. However, the prefactor c is large. For this reason, local updates, where only one component of φ~ is changed at a time, are often preferred. They usually provide an indepen- dent φ~ after an amount of work c′ ξ2, but with a much smaller prefactor c′ [3]. This paper presents an adaptation of the global heatbath which allows for arbitrarily low accuracy in the solution of Aφ~ = ~η, thus reducing the prefactor c, while maintaining the correct distribution. This is obtained by the introduction of an accept/reject test of the Metropolis type, making the procedure a “quasi-heatbath.” The method is described in the next section. Generalizations, including a local version, are presented afterwards. 1 1 Quasi-Heatbath Efficient Monte Carlo often relies on the subtle introduction of auxiliary degrees of freedom. Consider here a vector χ~ distributed according to 1 exp( χ~ Aφ~ 2). Note that Z is a constant (πN for an N-component Zχ −| − | χ complex vector) independent of φ~. Therefore, the original distribution of φ~, 1 exp( Aφ~ 2), is unchanged by the introduction of χ~: Zφ −| | 1 φ~ e−|Aφ~|2 = 1 φ~ χ~ e−|Aφ~|2−|χ~−Aφ~|2. (1) Z Z D Z Z Z D D φ φ χ We can now alternate Monte Carlo steps on φ~ and χ~, with the following prescription: 1. Perform a global heatbath on χ~: χ~ Aφ~+~η, (2) ←− where ~η is a normal Gaussian vector; 2. ReflectAφ~ withrespecttotheminimumof thequadraticform(Aφ~ 2+ χ~ Aφ~ 2): | | | − | Aφ~ χ~ Aφ~, ←− − i.e., φ~ A−1χ~ φ~. (3) ←− − Step 2 conserves the probability of φ~ but is not ergodic. Step 1 provides the ergodicity. Note that step 2 exchanges the two terms Aφ~ 2 and χ~ Aφ~ 2 in the quadratic form. Since χ~ Aφ~ in step 1 is set to a|new| rando|m−vecto|r ~η, Aφ~ at the end of step 2 is eq−ual to ~η. Therefore, a completely decorrelated φ~ has been generated. The vector χ~ is not needed any longer and can be discarded. This two-step algorithm can now be modified slightly. The vector A−1χ~ in Eq.(3) need not be computed exactly. Consider an approximate solution ζ~ with Aζ~ = χ~ ~r, where ~r = ~0 is the residual. Step 2 should now be considered as a −way to propos6 e a candidate φ~′ = ζ~ φ~ in a Metropolis procedure. Since ζ~ is completely independent of φ~ or φ−~′, the probability of proposing φ~′ given φ~ is the same as that of proposing φ~ given φ~′. Detailed balance will be satisfied with the additional step: 2 3. Accept the candidate φ~′ = ζ~ φ~ with probability − P (φ~ φ~′) = min(1,e−∆S), (4) acc → where ∆S = Aφ~′ 2+ χ~ Aφ~′ 2 Aφ~ 2 χ~ Aφ~ 2. | | | − | −| | −| − | Simple algebra shows that ∆S = 2Re(r† (Aφ~ Aφ~′)), · − which is antisymmetric under the exchange φ~ φ~′, as it should be. If the linearsystemAζ~ = χ~ issolvedexactly(~r =~0),t↔hen∆S = 0andonerecovers the original global heatbath with acceptance 1. Otherwise, the candidate φ~′ may be rejected, in which case φ~(k) must be included once more in the Monte Carlo sequence: φ~(k+1) = φ~(k). As the residual is allowed to grow, the average acceptance falls. But no bias is introduced: the distribution of φ~ remains 1 exp( Aφ~ 2). Zφ −| | The optimal magnitude of ~r is thus the result of a compromise between accuracy and acceptance. The average acceptance of the prescription (4) is erfc( (∆S)2 /8) [5]. Here (∆S)2 can be evaluated as a function of the h i h i conveprgence criterion ǫ of the linear solver. If the solver yields a residual ~r such that ||~r|| ǫ, then (∆S)2 8Nǫ2, where Aφ~, Aφ~′, and ~r have been ||χ~|| ≤ h i ≤ considered independent random Gaussian vectors with variance N, N, and 2ǫ2N, respectively, and N is the number of their components. Therefore, the acceptance is simply acceptance erfc(ǫ√N). (5) h i≈ Inotherwords,theacceptanceisentirelydeterminedbyǫandN,thenumber of degrees of freedom (the volume) of the system, and is independent of the matrix A. To maintain a constant acceptance as the volume grows, the convergence criterion for the solution of Aζ~ = χ~ should vary like 1/√N. An accuracy ǫ 10−3 10−4 provides an acceptance of 80-90% up to systems ∼ − of 106 degrees of freedom. There is no need for higher accuracy. The convergence of an iterative solver is typically exponential: ǫ n ∼ e−n/ξ after n iterations. Therefore, the above prescription reduces the work byafactorof2to3comparedtotheusualapproachwhichiteratesthesolver until “full” convergence (which typically means ǫ < 10−8 10−12). Illustra- ∼ − tive results are shown in Fig. 1 for the case of the Wilson–Dirac operator. This figure shows the number of iterations, the acceptance, and the work per independent φ~ as a function of ǫ. The acceptance obeys erfc(c ǫ√N), 3 wherec < 1(0.75 here)reflects thefactthattheresidualis always smaller(c times smaller on average) than required by the stopping criterion. For this system of N = 49 152 variables (84 lattice), the optimal stopping criterion is near 10−3. Figure 1: As the stopping criterion in the iterative solver is varied, the number of solver iterations (top) and the acceptance of the quasi-heatbath (middle)change. TheacceptanceiswelldescribedbyEq.(5)(solidline). The work per new φ~ (bottom) shows a clear minimum. The optimal stopping criterion depends on the system size only (49 152 here). 4 2 Generalizations A. Over- and under-relaxation Consider a modification of Eq.(1) with a parameter λ: 1 φ~ e−|Aφ~|2 = 1 φ~ χ~ e−|Aφ~|2−|χ~−λAφ~|2. (6) Z Z D Z Z Z D D φ φ χ The same 3-step algorithm of Section 1 now reads: 1. Heatbath on χ~: χ~ λ Aφ~+~η; (7) ←− 2. Reflection of φ~ abouttheapproximate minimumof thequadratic form: 2λ φ~′ = ζ~ φ~, (8) 1+λ2 − where Aζ~ = χ~ ~r; − 3. Accept φ~′ with probability P (φ~ φ~′) = min(1,e−∆S) where ∆S = acc Aφ~′ 2+ χ~ λAφ~′ 2 Aφ~ 2 χ~ →λAφ~ 2; and, by simple algebra, | | | − | −| | −| − | ∆S = 2λ Re(r† (Aφ~ Aφ~′)). (9) · − Thus, as λ decreases from 1, (∆S)2 also decreases, which boosts the acceptance. On the other hand, Ehq.(8) inidicates that φ~′ approaches φ~ as λ 0, so that φ~′ and φ~ become very (anti)correlated. The parame−ter λ → allows interpolation between simple reflection (λ = 0) and no motion at all (λ = + ). In fact, substituting Eq.(7) into Eq.(8) gives ∞ 1 λ2 2λ φ~′ = − φ~+ (A−1~η ~r). (10) −1+λ2 1+λ2 − Taking~r =~0, onecan identify this prescriptionwith thatof Adler’s stochas- tic over-relaxation (AOR) [4]: φ~′ = (1 ω)φ~+ ω(2 ω)A−1~η, (11) − q − provided ω = 2 . The quasi-heatbath can be viewed as a flexible, global 1+λ2 generalization of Adler’s AOR. 5 It is clear from Eq.(9) that λ < 1 allows for a looser convergence crite- rion ǫ 1/λ. However, the work to reach convergence typically grows like ∼ log ǫ, whereas Eq.(10) indicates that the number of Monte Carlo steps to d−ecorrelate φ~ will grow like 1 . Therefore, it seems inadvisable to depart λ2 from λ =1. Nonetheless, there are many situations where a completely independent φ~ at each Monte Carlo step is a wasteful luxury. When the matrix A fluc- tuates and depends on other variables U, it will take some time for the U to equilibrate in the new background φ~(k+1). Equilibration will be achieved quickly over short distances, more slowly over large ones. In that case it is useful to refresh the short-wavelength modes of φ~ at every MC step, but not the long-wavelength ones. The situation is similar for the stochastic evaluation of inverse Dirac matrix elements: one is interested in estimating (A†A)−1, where the distance i j is short. Refreshing the long-wavelength ij | − | modes every time is wasteful. B. Selective mode refresh The quasi-heatbath may be tailored for this purposeby modifying the basic Eq.(1) to 1 φ~ e−|Aφ~|2 = 1 φ~ χ~ e−|Aφ~|2−|χ~−Cφ~|2. (12) Z Z D Z Z Z D D φ φ χ ThematrixC playstheroleoftheearlierλA,exceptthatnowλdepends ontheeigenmodeconsidered. Thethreebasicstepsofthealgorithmbecome: 1. Heatbath on χ~: χ~ Cφ~+~η; (13) ←− 2. Reflection of φ~ abouttheapproximate minimumof thequadratic form: φ~′ = ζ~ φ~, (14) − where 1 (A†A+C†C)ζ~ = C†χ~ ~r; (15) 2 − 3. Accept φ~′ with probability P (φ~ φ~′)= min(1,e−∆S), where acc → ∆S = 2Re(r† (φ~ φ~′)). · − Forsimplicity,considerthecasewhereC andAcommute. Thecandidate φ~′ can be expressed as 6 φ~′ = (A†A+C†C)−1(A†A C†C)φ~+2(A†A+C†C)−1(C†~η ~r). − − − One wishes to obtain a heatbath (λ 1 in Section A) on short-wavelength ∼ modes. This implies a cancellation of eigenvalues in (A†A C†C) for short − wavelengths. For long wavelengths a heatbath is not necessary, and one could have λ 0 or + . One possible way to implement this would be ∼ ∞ C = F−1ΛFA, where F is the Fourier transform and Λ is a diagonal matrix with entries λ(~k) growing from 0 to 1 with momentum ~k . However, for operators | | A of the free-field or Dirac type, a simpler and equivalent way consists of modifying the mass parameter m to m > m. This is equivalent to λ(~k )= C | | (m2 +k2)/(m2 +k2). C q The mass which enters into the linear system to solve (15) is m = eff (m2+m2)/2. As m is increased, so is m . The work to approximately C C eff q solve Eq.(15) decreases as1/m . Therefore, oneachieves thedesiredeffect eff of refreshing short-wavelength modes at cheaper cost. By drawing m ran- C domly from a suitable distribution at each MC step, the tailored refreshing of all Fourier modes with the desired frequency can be achieved. C. Local version The quasi-heatbath described so far is a global update procedure: all com- ponents of φ~ are updated together. A local version readily suggests it- self: restricting the auxiliary vector χ~ to have only 1 non-zero component, χ = χ δ (or any subset of components). i i,i0 Eq.(1) then becomes 1 φ~ e−|Aφ~|2 = 1 φ~ χ~ e−|Aφ~|2−|χ−(Aφ~)i0|2. (16) Z Z D Z Z Z D D φ φ χ The algorithm is unchanged: 1. Heatbath on χ: χ (Aφ~) +η; ←− i0 2. Approximate reflection of φ~: φ~′ = ζ~ φ~, where Aζ~ = χ~ ~r; − − 3. Accept φ~′ with probability P (φ~ φ~′)= min(1,e−∆S), acc → where ∆S = Re(r† (Aφ~ Aφ~′))+Re(r⋆ ((Aφ~) (Aφ~′) )). · − i0 · i0 − i0 7 Inthiscase, ζ~istheapproximateGreen’sfunctionofAforasourceati . 0 Itwillhave asupportof size (ξ), sothatthelocal change inχ willinduce O i0 a change in φ over a whole domain. By varying i from 1 to N, one sweeps 0 the whole system and generates a new vector φ~(k+1). If the acceptance is maintained close to 1, φ~(k+1) will essentially be uncorrelated with φ~(k). However, the work per local update is proportional to ξd in d dimensions, so that this approach becomes very inefficient when the correlation length ξ is large. Nevertheless, it may be advantageous for moderate ξ. The reason is that the approximate solution ζ~ A−1χδ(i ) need not be obtained by a 0 ≈ Krylov method, which applies successive powers of A to the initial residual. Instead, onemay searchforthebestsolution ζ~amongallvectors oflocalized support, for instance, i and its nearest neighbours. 0 D. Adler’s stochastic over-relaxation Finally, the local variable χ δ(i,i ) may interact with φ~ in the simplest way, 0 with a contact interaction. This modifies Eq.(1) to 1 φ~ e−|Aφ~|2 = 1 φ~dχ e−|Aφ~|2−|χ−λφ(i0)|2. (17) Z Z D Z Z Z D φ φ χ If one chooses to update only φ(i ) and leave the other components of φ~ 0 unchanged, then there is no need to invert the matrix A. The algorithm simplifies to: 1. Heatbath on χ: χ λφ(i )+η; 0 ←− 2. Reflection of φ(i ) with respect to the minimum of the quadratic form 0 (m2+λ2)φ(i )2 +(φ(i )† (ψ λχ)+H.c.)+constant, 0 0 | | · − where m2 (A†A) and ψ (A†A) φ(j). ≡ i0i0 ≡ i0j This reflection is exact, and so the acceptance test disappears. The new reflected value is λχ ψ φ′(i ) = 2 − φ(i ) 0 1+λ2 − 0 1 λ2 2 2λ = − φ(i ) ψ+ η. −1+λ2 0 − 1+λ2 1+λ2 This prescription is identical to Adler’s stochastic over-relaxation [4] with the change of notation ω 2/(1+λ2). ↔ 8 3 Conclusion The quasi-heatbath Eqs.(2)–(4) is a simple and efficient method to globally change a vector φ~ distributed according to 1 e−|Aφ~|2. Like the global heat- Zφ bath consisting of solving Aφ~ = ~η, where ~η is a Gaussian vector, exactly at each Monte Carlo step, the quasi-heatbath also has dynamical critical exponent 1. The prefactor is reduced by a factor of 2 to 3 because the linear system Aφ~ = ~η can now be solved approximately. Whatever the level of accuracy, an acceptance test maintains the exact distribution e−|Aφ~|2. The most efficient choice for the accuracy level is (1/√N), where N is the O volume of the system. Several generalizations of the quasi-heatbath have been proposed. A simple modification makes it possible to refresh each of the Fourier compo- nents of Aφ~ at aprescribedrate. Alocal version may beadvantageous when the correlation length is moderate. In a limiting case, this version becomes identical to Adler’s stochastic over-relaxation. I thank Massimo D’Elia for interesting discussions and valuable com- ments. 9