ebook img

Third Granada Lectures in Computational Physics: Proceedings of the III Granada Seminar on Computational Physics Held at Granada, Spain, 5–10 September 1994 PDF

337 Pages·1995·15.317 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Third Granada Lectures in Computational Physics: Proceedings of the III Granada Seminar on Computational Physics Held at Granada, Spain, 5–10 September 1994

I Invited Lectures Computational Field Theory and Pattern Formation Rafil Toral Departament de Fisica, Universitat de les Illes Balears, 07071-Palma de Mallorca, Spain 1. Monte Carlo Methods 1.1. Introduction In many occasions we need to compute the value of a high-dimensional integral of the form: F // / dxl.., dxNf~(xl,...,xN)G(xl,...,XN)= dXf~(X)G(X) (1.1) O0 O0 here N is a large number and we have denoted by X the set (xl,..., xn) of real variables. The function f~(X) is non-negative and normalized, i.e.: (i) _> o (ii) f~,(X) dX - dxl.., dxwf~,(Xl,..., )NX = 1 O0 O0 In next chapters we will give specific examples of when one needs to perform this type of calculation. If the above integral cannot be computed by analyti- cal methods (which happens more often than desired), one has to resource to numerical methods. One must realize, however, that a simple extension of the standard methods used in small-N numerical integration (Simpson rules, Gauss integration, etc. (Faires and Burden, 1993)) to compute numerically the value of the integral will fail for very high-dimensional integrals. For instance, let us take N = 001 (and this is a rather small number for typical applications) and suppose we want to use Simpson methods. We need to generate a grid in the X space and sum up all the values of G(X) on the points of this grid. If we choose, say, a grid of 01 points per every coordinate, we will have 101°° terms to add in the Simpson rule. If every addition takes 01 -12 s (and today's computers take much longer than that) the time needed for the calculation exceeds any astronomical unit of time (including the age of the universe). The situation does not improve by diminishing the number of points in the integration grid. Even 2 points per integration variable is beyond the capability of any present (or future) computer. 4 R. Total The solution to this problem is given by application of the Monte Carlo techniques (Kalos and Whitlock, 1986; Binder, 1992; Heermann, 1986). For our particular example, they consist in interpreting f~(X) as a probability density function of some random variable ~ and, consequently, the integral is nothing but the average value of the function G(X), )G< = / Xd f~(X)G(X) (1.3) The key idea behind Monte Carlo integration is to approximate the previous average by a sample average: 1 M )G< .-~ GM# = ~ ~ G(X ))k( (1.4) k=l where X(~), k = 1,... ,M are values of the N-dimensional variable X dis- tributed according to the probability density function f~(X). This is called the technique of importance sampling. The basic principle of sampling is the same one that allows politicians to have an accurate knowledge of the election results before election day by making random polls amongst a representative sample of the electors. A more precise statement on the relation between the average (G) and the sample average GMp is given by the lartnec limit theorem (Feller, 1971). This theorem states that, if the values of X )k( are statistically independent, then the sample average #M tends in the limit M +-- oo to a Gaussian distribution of mean (G) and variance cr2#M given by: 5 1 ~~M = ~2c = ~ <c >2 - (a> ~ (1.5) It is usual to express the relation between GMp and )G< as: M#'O <G> = #MG -4 (1.6) which is to be interpreted in the usual statistical sense for the Gaussian distribution, i.e. that there is a 95.45% probability that (G) lies in the in- terval (UMC- 2~UM,UMC + 2~~), 99.73% in the interval (UMO- ,M#~3 GM# + ,)M#~C3 etc. In practice, the unknown variance cr2G can be replaced by the sample variance: c(x(~)) ~ el ~ ~ ~ ~(x(~)) ~ - ~ (1.7) k=l k--:l According to relation (1.5) the error in a Monte Carlo integration decreases as the inverse square root of the number of samplings M. In the case that the values X )~( are not independent, two major modifications are needed: (a) The variance of the sample average is given by: Field Theory and Pattern Formation 5 cr2pM = G~M~ (2va + 1) (1.8) where the autocorrelation time ra is given (in the limit of large M) by: OC ac" = Epa(k) (1.9) 1----k here the normalized autocorrelation function pa(k) is defined as: 2>C< pa(k) = - (1.10) G2¢ The situation for non zero autocorrelation time can arise, for instance, when the value X(k) depends on the value X (k-i). In our poll example, this can occur if the k-th person to be questioned lives in the neighbour- hood of the (k - 1)-th person, such that their social classes are likely to be similar. Intuitively, ~'c measures the number of values of the sample X (k), X@+I), ..., X (~+T¢) that we have to discard in order to consider that X(k) and X (k+~¢) are independent of each other. A problem of Monte Carlo methods is that, in many cases of interest, G"7 becomes very large and grows with the number of variables N. (b) The relation (G) = #MG-4- C~M has to be interpreted now according to Chebichev's theorem (Feller, 1971): 1 P(I#MG- (G) > kcrpM) > 1 -- --k 2 (1.11) i.e. the probability that (G) lies in the interval (/AMG -- 20"#M,#MG -J- )MprC2 is at least 75%, that it lies in (#MG -- GMp,MU~rc3 + 3¢#M), at /east 88.89%, etc. The above Monte Carlo importance sampling procedure is one of the most powerful methods available to compute high-dimensional integrals. In order to completely specify the method, though, we need a procedure to generate the values of the samples X )k( that appear in the above formulae. These should be values of the N-dimensional variable (xl,..., XN) distributed according to the probability density function fs:(X). There are several methods devised to gener- ate the required values X (k). Before we can explain the very powerful methods available we need to study in detail some simpler cases. 1.2. Uniform sampling Let us start with a simple example. Consider the one-dimensional (N=I) inte- gral: I -- dx cos(x) (1.12) 6 R. Total this is a particular case of Eq.(1.1) in which the function f~(x) is equal to 1 in the interval (0, 1) and 0 otherwise. The name uniform sampling comes from the fact that the variable x is equally likely to take any value in the interval ( 0, 1) (or, generally speaking, in any finite interval (a, b)). Although one could, in principle, devise some physical process that would produce values of a variable x uniformly distributed in the interval ( 0, 1), it is preferable, for many reasons, to use simple algorithms that can be programmed on a computer and that produce "random- type" (or pseudo-random) numbers uniformly distributed in the interval (0, 1). We will not discuss in these lectures the interesting topic of pseudo-random number generation. An excellent exposition can be found in references (Knuth, 1981) and (James, 1990). Let us simply assume that there exists a fortran func- tion, which we will call ran_u() , which returns an independent random number uniformly distributed in the interval (0, 1). By using such a function, we can implement very easily the algorithm of uniform sampling: m=lO00 r=O, 0 s=O,O do 99 k=l,m x:ran u ( ) gk=g(x) r=r+gk s=s+gk*gk 99 continue r=r/m s=s/m-r*r s=sqrt (s/m) write(6,*) r,'+/-',s end function g(x) g:cos(x) return end The reader should find out the name given to the function ran_u() is his own computer and run the above program checking that the error in the output value scales as the inverse square root of the number of samples M. 1.3. Importance sampling for N--1 In the N=I case, when the variable X =- x has only one component, a simple and important theorem allows to generate x(~) very easily. Recall that the probability distribution function Fsc(x) is defined by integration of the probability density function f~(x) :sa FeCx) = f~(y) dy (1.13) O0 Field Theory and Pattern Formation 7 Theorem 1 If f~(x) is a one-variable probability distribution function, then the variable u = F~(x) is uniformly distributed in the interval (0, 1). As a consequence, we have the Coronary 1 If u is uniformly distributed in the interval (0, 1), then x = F;~l(u) is distributed according to f~(x). (The proof of this theorem is left as an exercise for the reader). This the- orem reduces the problem of generating random numbers according to a given distribution f~(x) to the inversion of the corresponding probability distribution function. If, for instance, we want to use the method of importance sampling to compute the integral I = dxe-~x 2 (1.14) we can take f~(x) -- e -~ if x _> 0, and G(z) = x .2 This choice for re(x) respects positivity and normalization. To generate values of x (k) according to re(x) we need to invert F~(x), which in this case can be done analytically: Fe(x) = dye -y = 1 - e -~ = u +-- x = F~-l(u) = - log(1 - u) (1.15) Since 1-u is also uniformly distributed in the interval (0, 1) we can write simply: x = -log(u) (1.16) which is equivalent to x = - log(1 - u) from the statistical point of view. A program to compute integral (1.14) can be: m=lO00 r=O. 0 s=O.O do 99 k=l,m x=ran_f ( ) gk=g(x) r=r+gk s=s+gk*gk 99 continue r=r/m s=s/m-r*r s=sqrt (s/m) write(6,*) r,'+/-',s end function g(x) g=x*x return end 8 R. Total :function ran_:f )( ram_:f =-log (ran_u ( ) ) return end A technical point is that, in some cases, the function F~ 1- (u) is not expressible in terms of elementary functions and some kind of approximative methods (such as numerical interpolation) might be needed to compute it. It is important to realize that the decomposition of a given integral in f~(x) and G(x) to put it in the form of Eq.(1.3) might not be unique. In the previous example, Eq.(1.14), we could have taken as well f~(x) = xe -~ and G(x) = x. This choice, however, makes the probability distribution function F~(x) difficult to invert. An important case in which the function F~-l(u) is not easily calculable is that of Gaussian random numbers (of mean zero and variance 1) for which the probability density function is: 1 f~:(x) = --e -~ /2 (1.17) (random numbers y of mean p and variance 2~ are easily obtained by the linear transformation y -- ~x + p). The inverse probability distribution function is x = v:2 erf-i(2u - 1) (1.18) where erf-l(z) is the inverse error function (Abramowitz and Stegun, 1972; Vails and Mazenko, 1986). The inverse error function does not usually belong to the set of predefined functions in a programming language, although some libraries (for example, the NAG library) do include it in their list of functions. An alternative to the generation of Gaussian distributed random numbers is the algorithm of Box-Muller-Wiener (Box and Muller, 1958; Ahrens and Dieter, 1972; 1988) which is based on a change of variables to polar coordinates. Namely, xl =v/-p cos(0) (1.19) x2 =x/~ sin(O) If xl and x2 are independent Gaussian variables, it is easy to prove that p and 0 are independent variables, p follows an exponential distribution and 0 a uniform distribution, so that their inverse distribution functions can be expressed in terms of elementary functions. Namely, fp(p) = le-p/2 +--- Fp(p) = 1- e -p/2, p _> 0 1 0 re(o) = r72 ~-- Fo(O) ~ 0 < O < r~2 (1.20) The resulting algorithm is: x, =,/-2 log(u) cos(2rrv) (1.21) .2 log(u) sin(2 v) Field Theory and Pattern Formation 9 Where u and v are independent random variables uniformly distributed in the interval (0, 1). The main advantage of this Box-Muller-Wiener algorithm is that it is exact, yielding two independent Gaussian variables, Xl, x2 from two indepen- dent uniform variables, u, v. Its main disadvantage, though, is that it is extremely slow since it involves the calculation of trigonometric, logarithm and square root functions. In most of the applications a linear interpolation approximation to the inverse error function does produce sufficiently good quality Gaussian ran- dom numbers at a considerable gain in speed (Total and Chakrabarti, 1993). A possible implementation of the Box-Muller-Wiener algorithm is the following: function ran_g() data is /-I/ is=-is if (is.eq.l) then a= s qrt (-2. O,log (ran_u() ) ) b=6. 2831853072*ran_u ( ) ran_g=a*cos )b( x2=a*sin(b) return endif ran_g=x2 r et urn end (a practical note: it might be necessary to tell the compiler that the two functions ran_u() that appear in the previous program produce different values and need to be computed separately, sometimes compilers are too clever). Another important case that deserves being considered explicitly, in despite of its simplicity, concerns the generation of events with a given probability. Imagine we want to simulate tossing a biased coin, with a p = 0.6 probability for heads (and 0.4 for tails). We need to generate a variable that takes some value (1, for example) 60% of the times and another value (0), 40% of the times. This can be done by comparing a random number u uniformly distributed in the interval (0, 1) with the given probability p. If u <_ p then we take x = ,1 otherwise we take x = 0. This is achieved by the program: if (ran_u().It.p) then x=l else x=O endif For N-dimensional variables (xl,..., XN), the situation is much more com- plex. The equivalent of Theorem 1 states that in order to generate values of the N-dimensional variable (Xl,..., )NX we need (Rubinstein, 1981): (i) gener- ate N-independent random numbers (ul,..., )NU uniformly distributed in the interval (0, 1) (this is the easy part) and (ii) solve the following set of equations: 01 R. Toral FXl (Xl) :~tl F (x21Xl) (122) Where, for example, F~ (x2 Ix~) is the conditional probability distribution func- tion of the variable x2 given that Xl has taken a particular value, and so on. The calculation of the conditional probability distribution functions is generally at least as complicated as the calculation of the original integral we wanted to compute numerically and the above procedure is of little practical use. In or- der to develop alternative methods suitable for the generation of N-dimensional variables we need first to introduce the so-called rejection methods for I variable. 1.4. Rejection method for N----1 In those eases that the inverse probability distribution function is difficult to compute, the rejection method offers a very convenient alternative. Also, it is the basis of the N-dimensional methods which we will develop later. The method f~(x) is based on the fact that the probability density function is proportional to the probability that the variable x takes a particular value. If, for example, f~(xl) ---- 2f~(x2), we can affirm that the value x I is twice as probable as the value x2. The rejection method (in its simplest version) proposes the values Xl and x2 with the same probability and then accepts the proposed value x with h(x) a probability proportional to f~(x), such that, in our example, Xl will be accepted twice as many times as x2. Consequently, Xl will appear twice as many times as x2, which is equivalent to saying that Xl is twice as probable as x2 as desired. We will illustrate the method by an example. Let us consider the probability density function: f~(x) = 6x(1 - x), x E (0, 1) C1.23) Which has the shape indicated in Fig.1. We propose a value of x uniformly in the interval C 0,1). The proposed value h(x) h(x) = c~f~(x). has to be accepted with a probability proportional to f~(x), xCh ) The constant a is arbitrary but the resulting function has to be interpreted h(x) as a probability, which means that we have to keep the bounds 0 <_ <_ 1. This yields a < m~x,/,(~) 1 . Obviously, the rejection method is more efficient when the acceptance probability is high, which means that one has to take the maximum f~(x) possible value for a. In our example, the maximum for is at x = 0.5, h(x) f~(0.5) = 1.5, so that we take a = 2/3 and, consequently, = 4x(1 - x), see Fig.1. The acceptance process is done as explained in the previous section by xCh ) comparing the probability with a random number uniformly distributed in C 0,1). The whole process is repeated until the proposed value is accepted. The final algorithm can be coded as: function ran_f() h(x)=4.0*x*(l.O-x)

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.