ebook img

Accelerated dynamics: Mathematical foundations and algorithmic improvements PDF

0.86 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Accelerated dynamics: Mathematical foundations and algorithmic improvements

Accelerated dynamics: Mathematical foundations and algorithmic improvements Tony Leli`evre∗ January 7, 2015 5 1 0 2 Abstract n We present a review of recent works on the mathematical analysis of a algorithmswhichhavebeenproposedbyA.F.Voterandco-workersinthe J late nineties in order toefficiently generate long trajectories of metastable 5 processes. These techniques have been successfully applied in many con- texts, in particular in the field of materials science. The mathematical ] A analysis we propose relies on the notion of quasi stationary distribution. N . h 1 Introduction t a m This article is a review of recent works whose aim is to lay the mathematical [ foundations of algorithms used in computational statistical physics, namely accelerated dynamics techniques introduced by A.F. Voter and co-workers in 1 v the late nineties. These methods have been proposed in order to efficiently 4 sample trajectories in the context of molecular dynamics. 3 Molecular dynamics is used in various application fields (biology, chemistry, 0 1 materials science) in order to simulate the evolution of a molecular system, 0 namely interacting particles representing atoms or group of atoms. The typical . 1 dynamics one should have in mind is the Langevin dynamics: 0 5 (cid:40)dq = M−1p dt 1 t t (1) v: dpt = −∇V(qt)dt−γM−1ptdt+(cid:112)2γβ−1dWt i X where (q ,p ) denotes the positions and momenta of the particles at time t ≥ 0, t t r a M is the mass tensor, V is the potential function which, to a given set of posi- tionsq,associatesitsenergyV(q),γ > 0isafrictionparameter,β = (k T)−1 is B proportionaltotheinversetemperatureandW isastandardBrownianmotion. t In the following, we assume that q ∈ Rd where d is typically very large (say t 3 times the number of particles) but generalizations to dynamics on manifolds ∗Universit´e Paris-Est, CERMICS (ENPC), INRIA, 6-8 Avenue Blaise Pascal, F-77455 Marne-la-Vall´ee. The work of T. Leli`evre is supported by the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreementnumber614492. T.Leli`evrewouldliketothankveryinstructivediscussionswithD. PerezandA.F.Voteronaccelerateddynamics,aswellashisco-authorsD.Aristoff,A.Binder, C. Le Bris, M. Luskin, F. Nier, D. Perez and G. Simpson. 1 (systems with constraints) are straightforward. When γ = 0, the Langevin dynamics is nothing but the Hamiltonian dynamics. The terms involving γ model the fact that the system is at a given temperature. Indeed, under loose assumptions on V, this dynamics is ergodic with respect to the canonical mea- sure (NVT ensemble): for any test function ϕ : Rd×Rd → R, 1 (cid:90) T (cid:90) lim ϕ(q ,p )dt = Z−1 ϕ(q,p)exp(−β(pTM−1p/2+V(q)))dpdq t t T→∞ T 0 where Z = (cid:82) exp(−β(pTM−1p/2+V(q)))dpdq < ∞. In the following, we will also consider the overdamped Langevin dynamics which is obtained from (1) in the limit γ → ∞ or M → 0 (see for example [9, Section 2.2.4]): (cid:112) dX = −∇V(X )dt+ 2β−1dW (2) t t t where X ∈ Rd denotes the position of the particles. Again, under loose as- t sumptionsonV, thisdynamicsisergodicwithrespecttothecanonicalmeasure Z˜−1exp(−βV(x))dx where Z˜ = (cid:82) exp(−βV(x))dx. The aim of molecular sim- ulations is to compute macroscopic properties from the models (1) or (2) at the atomistic level (V being the main modelling ingredient). In this article, we are particularly interested in so-called dynamical quantities, namely macroscopic observables which depend on the path (q ,p ) or (X ) . For example, one t t t≥0 t t≥0 would like to sample the paths which go from one region of the phase space to another one, in order to compute the typical time to observe such transitions or the intermediate states along the transition path. The numerical difficulty associated with such computations is that the timescale at the microscopic level is much smaller than the timescale at the macroscopic level. More precisely, the timestep required to obtain a stable discretization of the above dynamics is much smaller than the timescale asso- ciated with the macroscopic observables of interest. In other words, one has to simulate very long trajectories of a high dimensional stochastic dynamics. In practice, for many applications, a naive discretization of the dynamics is not sufficient to reach the timescales of interest, since it would require up to typically 1015 iterations (the timescale at the atomistic level - bond vibration - is indeed of the order of 10−15s, while transitions between metastable regions occur may over timescales ranging from microseconds to seconds). The idea of accelerated dynamics is to take benefit from this timescale dis- crepancy in order to simulate more efficiently paths over very long times. In- deed, the typical trajectories of (1) or (2) are metastable: this means that the trajectory remains trapped for very long times in some region of the phase space, before hopping to another region where it again remains trapped. These regions are called metastable states. Metastability originates from energetic barriers (the path to leave the state requires to climb above a saddle point of the potential energy V) or from entropic barriers (the path to leave the state goes through a narrow corridor, due to some steric constraints in the system for example), or more generally from a combination of energetic and entropic effects. The bottom line is thus that behind the continuous state space dy- namics (1) or (2), there is a discrete state space jump process (encoding the 2 jumps from metastable states to metastable states). Actually, discrete state space Markov dynamics are also very much used in molecular dynamics: there are called kinetic Monte Carlo or Markov state models, see for example [15]. And continuous state space models are typically used in order to parametrize theseMarkovianmodels(namelytocomputethejumpratesbetweenmetastable states) using for example Arrhenius (or Eyring-Kramers) formulas. The accel- erateddynamicsofA.F.Voterfollowadifferentpath: theprincipleistousethe underlying jump process in order to accelerate the sampling of the original dy- namics, in the spirit of a predictor-corrector schemes. These are thus numerical methods to efficiently generate the underlying jump process among metastable states. In the following, we will assume that we are given a mapping S : Rd → N (3) which to a given set of positions x ∈ Rd associates S(x), the label of the state in which x lies. One should think of the states S−1({n}) = {x ∈ Rd,S(x) = n} for n ∈ N as the metastable states mentioned above. This mapping thus defines a par- tition of the state space. Let us make two comments on this mapping. First, an important message from the mathematical analysis we present below is that whatever the mapping S, the accelerated dynamics algorithms are consistent: theygivethecorrectresultinsomelimitingregime. Forexamplefortheparallel replica method, in the limit when the correlation time - a numerical parame- ter introduced below - goes to infinity, the generated dynamics are statistically correct. In particular if some of the states happen not to be metastable, or if for one specific realization, the stochastic process does not remain trapped in one of this state (because, for example, it enters the state with a too large ve- locity), the algorithms are still consistent. Second, as will become clear below, the numbering of the states do not need to be known a priori: the states are numbered as the simulation goes, when they are successively discovered by the stochastic process. Three algorithms have been proposed by A.F. Voter and co-workers. The idea is that if the stochastic process remains trapped for a very long time in a given state S = S−1({n}) (for some given n), then there are ways to generate the exit event from this state much more efficiently than by running the original dynamics until the exit time. The exit event is fully characterized by two random variables: the exit time and the exit point from S, which are defined by considering the first hitting time and point on the boundary ∂S. Let us roughly describe the ideas behind the three algorithms. In the Parallel Replica method [14], the principle is to simulate in parallel many trajectories following the original dynamics (1) or (2), to consider the first exit event among the replicas, and to generate from this first exit event a consistent exit time and exit point. The gain is thus obtained in terms of wall clock time. This algorithm can be seen as a way to parallelize a computation 3 in time, which is not an easy problem in general due to the sequential nature of time evolutions. In the hyperdynamics [13], the idea is to modify the potential V within the stateS inordertoacceleratetheexitfromthestatefortheoriginaldynamics(1) or(2). Again,usinganappropriatetimerescaling,itispossibletogeneratefrom the observed exit event on the biased potential an exit event which is consistent with what would have been observed on the original unbiased potential V. The Temperature Accelerated Dynamics (TAD) [12] consists in considering the original dynamics (1) or (2) at a higher temperature than the original one. The idea is then that under appropriate assumptions, there is a way to infer from the exit events observed at high temperature the exit event which would have been observed at the original lower temperature. The ultimate aim of these three techniques is thus to generate efficiently theso-calledstate-to-statedynamics(S(q )) (for(1))or(S(X )) (for(2)), t t≥0 t t≥0 with the correct statistical properties. Let us emphasize that the objective is to get the correct law on the paths (in order to compute dynamical quantities), not only on the time marginals for example. A crucial mathematical tool to understand these techniques is the Quasi- Stationary Distribution (QSD) introduced in Section 2. We will then describe the mathematical results which have been obtained so far on the three algo- rithms: Parallel Replica in Section 3, hyperdynamics in Section 4 and Tem- perature Accelerated Dynamics in Section 5. A few concluding remarks are provided in Section 6. 2 The Quasi-Stationary Distribution and the decor- relation step LetusconsiderafixedstateS = S−1({n})andletusfocusforsimplicityonthe overdamped Langevin dynamics (2). We assume that S is a bounded regular subset of Rd. Let us consider the first exit time from S: T = inf{t ≥ 0, X (cid:54)∈ S}. S t 2.1 The QSD Let us start with the definition of the QSD. Definition 1. A probability measure ν with support in S is called a QSD for the Markov process (X ) if and only if t t≥0 (cid:82) P(Xx ∈ A,t < Tx)ν(dx) ∀t > 0, ∀A ⊂ S,ν(A) = S (cid:82) tP(t < Tx)ν(Sdx) . S S In other words, ν is a QSD if, when X is distributed according to ν, the 0 law of X conditionally to the fact that (X ) remains in the state S is still t s 0≤s≤t ν, for all positive t. TheQSDsatisfiesthreepropertieswhichwillbecrucialinthefollowing. We refer for example to [6] for a proof of these results and to [4] for more general results on QSDs. 4 Proposition 2. Let (X ) follow the dynamics (2) with an initial condition t t≥0 X ∈ S. Then, there exists a probability distribution ν with support in S such 0 that lim L(X |T > t) = ν. (4) t S t→∞ The distribution ν is the QSD associated with S. A consequence of this proposition is the existence and uniqueness of the QSD.TheQSDcanthusbeseenasthelongtimelimitoftheprocessconditioned to stay in the well. This proposition can be useful to understand what is a metastable state. A metastable state is a state such that the typical exit time is much larger than the local equilibration time, namely the time to observe the convergence to the QSD in (4). Let us now give a second property of the QSD. Proposition 3. Let L = −∇V · ∇ + β−1∆ be the infinitesimal generator of (X ) (satisfying (2)). Let us consider the first eigenvalue and eigenfunction t t≥0 associated with the adjoint operator L∗ = div (∇V +β−1∇) with homogeneous Dirichlet boundary condition: (cid:40) L∗u = −λ u on S, 1 1 1 (5) u = 0 on ∂S. 1 The QSD ν associated with S satisfies: u (x)dx 1 dν = (cid:82) u (x)dx S 1 where dx denotes the Lebesgue measure on S. The QSD thus has a density with respect to the Lebesgue measure, which is nothing but the ground state of the Fokker-Planck operator L∗ associated with the dynamics with absorbing boundary conditions. Finally, the last property of the QSD concerns the exit event, when X is 0 distributed according to ν. Proposition 4. Let us assume that X is distributed according to the QSD ν 0 in S. Then the law of the couple (T ,X ) (namely the first exit time and the S TS first exit point) is fully characterized by the following properties: • T is exponentially distributed with parameter λ (defined in Equation (5) S 1 above) • T is independent of X S TS • The law of X is given by: for any bounded measurable function ϕ : TS ∂S → R, (cid:82) ϕ∂ u dσ Eν(ϕ(XTS)) = −βλ∂S(cid:82) un(x1)dx (6) 1 S 1 where σ denotes the Lebesgue measure on ∂S induced by the Lebesgue measure in Rd and the Euclidean scalar product, and ∂ u = ∇u · n n 1 1 denotes the outward normal derivative of u on ∂S. 1 5 This Proposition explains the interest of the QSD. Indeed, if the process is distributed according to the QSD in S (namely, from Proposition 2, if it remained for a sufficiently long time in S), then the exit event from the state S is Markovian, in terms of state-to-state dynamics. This is reminiscent of what is assumed to build kinetic Monte Carlo models (see [15]). Remark 5. The existence of the QSD and the convergence of the constrained process towards the QSD for the Langevin process (1) requires extra work com- pared to the overdamped Langevin process (2). For results in that direction, we refer to the recent manuscript [11]. 2.2 The decorrelation step The accelerated dynamics algorithms will be based on the assumption that the process remained sufficiently long in the state S so that one can consider it is distributed according to the QSD ν. Then, using this assumption, various techniquesareusedinordertoefficientlygeneratetheexiteventfromS,starting from the QSD (see the next sections). A natural preliminary question is therefore: how to assess in practice that the limit has been reached in (4) ? This is done in the so-called decorrelation step which consists in waiting for a given time τ (a so-called decorrelation corr time) before assuming that the local equilibrium ν has been reached. This correlation time can be state dependent, and is typically supposed to be known a priori. From a mathematical viewpoint, τ should be chosen sufficiently large so corr that the distance between the law of X conditioned to T ≥ τ and the τcorr S corr QSD ν is small. In [6], we prove the following: Proposition 6. Let (X ) satisfies (2) with X ∈ S. Let us consider −λ < t t≥0 0 2 −λ < 0 the first two eigenvalues of the operator L∗ on S with homogeneous 1 Dirichlet boundary conditions on ∂S (see Proposition 3 for the definition of L∗). Then, there exists a constant C > 0 which depends on the law of X , such 0 that, for all t ≥ C , λ2−λ1 sup |E(f(T −t,X )|T ≥ t)−Eν(f(T ,X ))| ≤ Cexp(−(λ −λ )t). S TS S S TS 2 1 f,(cid:107)f(cid:107)L∞≤1 Inotherwords,thetotalvariationnormbetweenthelawof(T −t,X )con- S TS ditioned to T ≥ t (for any initial condition X ∈ S), and the law of (T ,X ) S 0 S TS when X is distributed according to ν, decreases exponentially fast with rate 0 λ −λ . This means that τ should be chosen of the order 1/(λ −λ ). Of 2 1 corr 2 1 course, this is not a very practical result since computing these eigenvalues is in general impossible. From a theoretical viewpoint, this result tells us that the local equilibration time is of the order 1/(λ −λ ), so that, the state S will be 2 1 metastable if this time is much smaller than the exit time (which is typically of the order of 1/λ , see Proposition 3). 1 Let us mention the recent work [3] where we propose a numerical method to approximate the time to reach the QSD using a Fleming-Viot particle process together with stationarity statistical tests. The interest of the approach is 6 demonstrated on toy examples (including the 7 Lennard-Jones cluster), but it remains to test this technique on higher dimensional problems. From now on, we assume that the decorrelation step has been successful, and we look for efficient techniques to generate the exit event (namely a sam- ple of the random variables (T ,X )). Let us describe successively the three S TS algorithms which has been proposed by A.F. Voter and co-workers. 3 The Parallel Replica method Figure 1: The Parallel Replica method: many exit events are simulated in parallel, all starting from the QSD in S. The blue trajectory represents the reference walker which stays sufficiently long within S so that we can assume the blue point is distributed according to the QSD. The red points represent i.i.d. initialconditionsdistributedaccordingtotheQSD.Theblacktrajectories are simulated in parallel. Let us assume that we are given an initial condition X ∈ S such that X 0 0 is distributed according to the QSD in S. Let us assume that we are given a computer with many CPUs in parallel. The idea of the parallel replica method is to distribute N independent initial conditions (Xi) in S according to 0 1≤i≤N the QSD ν, to let them evolve according to (2) driven by independent motions (so that the replicas remain independent) and then to consider the first exit event among the replicas: I = arg min Ti where Ti = inf{t ≥ 0, Xi (cid:54)∈ S}. (7) 0 S S t i∈{1,...,N} The integer I ∈ {1,...,N} is the index of the first replica which exits S, and 0 min(T1,...,TN) = TI0. The effective exit time is set as N times the first exit S S S time, and the effective exit point is nothing but the exit point for the first exit event. See Figure 3 for a schematic illustration of the method. The consistency of the method is a corollary of Proposition 4. Indeed using the fact that, starting from the QSD, the exit time is exponentially distributed and independent of the exit point, we easily obtain that NTI0 = N min(T1,...,TN) =L T1 (8) S S S S 7 which means that the effective exit time has the correct law and XI0 =L X1 TSI0 TS1 which means that the first exit point of the replica I (the first one to exit 0 among N) has the same law as the first exit point of any of them. Moreover NTI0 = N min(T1,...,TN) and XI0 are independent, so that we have proven S S S TI0 S the following Lemma. Lemma 7. Let I be the index of the first replica exiting S, defined by (7). 0 Then we have the equality in law: (cid:18) (cid:19) NTI0,XI0 =L (T1,X1 ). S TSI0 S TS1 This Lemma shows that the parallel replica is exact: the law of the effective exit time and exit point is exactly the law of the exit time and exit point which would have been observed for only one replica. Let us make a few remarks on this algorithm. First, the full algorithm actually iterates three steps: • The decorrelation step (see Section 2.2), where a reference walker is run following the dynamics (2) until it remains trapped for a sufficiently long time in one of the sets S−1({n}), so that is can be assumed to be dis- tributedaccordingtotheQSDν associatedwiththisset. Duringthisstep, the algorithm thus consists simply in integrating the original dynamics. No error is introduced and there is no computational gain. • The dephasing step which is a preparation step during which N inde- pendent initial conditions distributed according to ν are sampled, each one on a different CPU. This is done in parallel. During this step, the simulation clock is stopped. This step is thus pure overhead. This step requires appropriate algorithms to sample the QSD such as rejection al- gorithm or Fleming-Viot particle systems (see [6]). For example, the re- jection algorithm consists in running independently walkers following the dynamics (2) (starting from a point within S) and to consider the final point of the trajectory conditionally to the fact that the walker remains in the state, for a sufficiently long trajectory (typically for the time τ corr introduced in Section 2.2). • The parallel step, just described above, which consists in running the N replicas independently in parallel, and in waiting for the first exit event among the N replicas. The simulation clock is then updated by adding the effective exit time NTI0. The exit point XI0 is used as the initial S TI0 S condition of the reference walker for the next decorrelation step. The computational gain of the whole algorithm comes from this step which divides the wall clock time to sample the exit event by the number of replicas N. 8 Inpractice, iftherejectionalgorithmisusedinthedephasingstep, oneactually doesnotneedtowaitfortheN replicastobedephasedtoproceedtotheparallel step, see [14, 6, 3]. In view of the above discussions, the errors introduced in the algorithm have two origins. First, in the decorrelation step, τ should be chosen suffi- corr ciently large so that at the end of the decorrelation step, the reference walker is indeed distributed according to a probability law sufficiently close to the QSD. The convergence result of Proposition 6 shows that the error is of the order O(exp(−τ /(λ −λ ))). Second, in the dephasing step, the sampling corr 2 1 algorithmoftheQSDshouldbesufficientlypreciseinordertoobtaini.i.d. sam- ples distributed according to ν. For the rejection algorithm, independence is ensured, and the accuracy is again related to the convergence result of Propo- sition 6. For Fleming-Viot particle process, some correlations are introduced among the replicas, and it is an open problem to evaluate the error introduced by these correlations. As already mentioned above, in [3], we recently proposed an algorithm to compute on-the-fly a good correlation time, while sampling the QSD, using a Fleming-Viot particle process. Theparallelreplicaisthusaveryversatilealgorithm. Inparticularitapplies to both energetic and entropic barriers. The only errors introduced in the algorithm are related to the rate of convergence of the conditioned process to the QSD. The algorithm will be all the more efficient than the convergence time to the QSD is small compared to the exit time (namely the states are metastable): in this case, the speed-up in terms of wall clock time is linear as a function of N. We refer to [3, Section 5.1] for a discussion of the parallel efficiency of the algorithm. Let us finally mention the recent work [2] where we propose an extension of the Parallel Replica algorithm to Markov chains (namely discrete-in-time stochastic processes). This is indeed a relevant question since in practice, the continous-in-time dynamics (such as (1) or (2)) are approximated by discrete- in-time Markov Chains using appropriate time discretization schemes. The algorithm has to be slightly adapted since, starting from the QSD (which is still perfectly well defined in this context), the exit time is not exponentially distributed but has a geometric law. Therefore, the formula (8) does not hold and is replaced by the following: for T1,...,TN N i.i.d. random variables geometrically distributed, N(min(T1,...,TN)−1)+min(i ∈ {1,...,N}, Ti = min(T1,...,TN)) =L T1. This yields a natural adaptation of the original Parallel Replica algorithm to Markov chains. 4 The hyperdynamics Let us again assume that we are given an initial condition X ∈ S such that X 0 0 is distributed according to the QSD in S. In other words, let us assume that we are at the end of the decorrelation step: the reference walker stayed sufficiently long in S. 9 Figure 2: The hyperdynamics: the exit event is simulated on a biased potential inS. ThehoneycombregionrepresentsthesupportofthebiasingpotentialδV. The principle of the hyperdynamics algorithm is then to raise the potential inside the state in order to accelerate the exit from S. The algorithm thus requiresabiasingpotentialδV : S → R,whichsatisfiesappropriateassumptions detailed below. The algorithm then proceeds as follows: • Equilibrate the dynamics on the biased potential V +δV, namely run the dynamics(2)ontheprocess(XδV) overthebiasedpotentialcondition- t t≥0 ally to staying in the well, up to the time the random variable XδV has t distribution close to the QSD νδV associated with the biased potential. This first step is a preparation step, which is pure overhead. The end point XδV will be used as the initial condition for the next step. t • Run the dynamics (2) over the biased potential V + δV up to the exit time TδV from the state S. The simulation clock is updated by adding S the effective exit time BTδV where B is the so-called boost factor defined S by B = 1 (cid:90) TSδV exp(βδV(XδV))dt. (9) TδV t S 0 The exit point is then used as the starting point for a new decorrelation step. See Figure 2 for a schematic illustration of the method. Roughlyspeaking,theassumptionsrequiredonδV intheoriginalpaper[13] are twofold: • δV is sufficiently small so that the exit event from the state S still sat- isfies the standard assumptions used for kinetic Monte Carlo models and transition state theory. • δV is zero on (a neighborhood) of the boundary ∂S. The derivation of the method relies on explicit formulas for the laws of the exit time and exit point, using the transition state theory. The aim of the 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.