Topic 3 Polymers and Neurons Lecture 3 Simulated Annealing Monte Carlo and Genetic Algorithms The article by R. Unger and J. Moult, ”Genetic algorithms for protein folding simulations”, J. Mol. Biol. 231, 75-81 (1993) describes a genetic algorithm search procedure to simulate the folding of a simplified protein model and compares the algorithm with simulated annealing Monte Carlo. A protein with just two types of amino acid, hydrophobic (black) and hydrophilic (white) occupies the sites of a 2-D square lattice. The figure shows two configurations of the 20-residue protein with sequence BWBWWBBWBWWBWBBWWBWB. PHY 411-506 Computational Physics 2 1 Friday, February 28 Topic 3 Polymers and Neurons Lecture 3 The energy function is very simple (cid:40) (cid:88) 1 if |r| = 1 and bond is not covalent V = − ∆(r − r ) , ∆(r) = i j 0 otherwise (cid:104)B B (cid:105) i j The configuration on the left has energy −4. A trial move selects the arrowed site and rotates 180◦ to give the folded configuration on the right with minimum energy −9. The model was simulated using two types of algorithms: (cid:4) Genetic Algorithm: A population of 200 configurations was evolved over 300 generations (cid:4) Simulated Annealing: The simulations involved the same number of initial configurations as the GA algorithm. The MC simulation used the same number of energy evaluations as the GA. The “Long MC” and “Multiple MC” simulations were run much longer to improve the results. They found that the GA alogrithms performed much better than the MC. PHY 411-506 Computational Physics 2 2 Friday, February 28 Topic 3 Polymers and Neurons Lecture 3 The basic idea to find the absolute minimum energy folded configuration in a complicated landscape of mountains and valleys with many local minima is to start the system at a finite temperature T large enough to allow it explore the landscape. Random moves are made with acceptance probability determined by T-dependent Boltzmann factor, and the system is gradually cooled as its energy tends towards the global minimum. PHY 411-506 Computational Physics 2 3 Friday, February 28 Topic 3 Polymers and Neurons Lecture 3 Simulated Annealing Monte Carlo The Monte Carlo algorithm of Metropolis et al. J. Chem. Phys. 21, 1087 (1953) is modified using a Simulated annealing schedule, as outlined in Section 3 of the article. A stopping criterion is chosen, which can be some maximum number of steps, or a target minimum energy. 1. Start from a random coil configuration, for example an extended straight line. 2. Make a trial configuration change, for example by selecting a monomer at random and rotating the C-terminal part of the chain around this monomer. 3. If the new configuration is self-avoiding, compute the change in energy ∆E in this trial move: (cid:4) If ∆E < 0 accept the change (cid:4) Otherwise accept conditionally using the Metropolis criterion w = exp[−∆E/c ] > u , c ≡ k T k k B where u is a random uniform deviate. monomer, as was done in the figure. 4. If the stop criterion is not met, repeat at step 2., gradually reducing T according to an annealing schedule. For example, with a stop criterion of 50,000,000 steps, the temperature is reduced from 2 to 0.15 by reducing c → c = αc with α = 0.99 every 200,000 steps. k k+1 k PHY 411-506 Computational Physics 2 4 Friday, February 28 Topic 3 Polymers and Neurons Lecture 3 Genetic Algorithms Genetic algorithms try to model evolution by natural selection. In nature the genetic code is stored in DNA molecules as sequences of bases: adenine (A) which pairs with thymine (T), and cytosine (C) which pairs with guanine (G). The analog of DNA in a digital genetic algorithm is a sequence of binary digits (0) and (1). In nature, the genetic code describes a genotype, which is translated into an organism, a phenotype, by the process of cell division. Digital genetic algorithms can be used to solve a problem, such as finding the global minimum of a compli- cated energy landscape. A famous example is the Traveling Salesman Problem. The phenotype in a genetic algorithm is some state of the model: strings of binary digits are mapped to the states of the model to be solved. Evolution by natural selection is driven in part by changes to the genetic code: Mutations: Random changes can occur, for example caused by radioactivity or cosmic rays damaging a DNA molecule. Mutations of the digital genotype can be modeled by choosing a random bit in the string and changing it 1 → 0 or 0 → 1. Recombination or Crossover: During sexual reproduction the offspring inherit DNA from each of the PHY 411-506 Computational Physics 2 5 Friday, February 28 Topic 3 Polymers and Neurons Lecture 3 parents. This can be simulated by taking two strings and exchanging two substrings. Survival of the Fittest: There is some criterion of fitness such that when mutations or recombinations take place, the mutants or offspring either survive and reproduce or die out. These simple ingredients can be used to construct a very wide variety of genetic algorithms. A simple algorithm which can be applied to an energy landscape problem is illustrated by the random Ising model: (cid:88) E = − T s s , ij i j (cid:104)ij(cid:105) where s = ±1 are Ising spins, and the coupling constants T between nearest neighbors are chosen randomly i ij to be ±1. This is a model of a spin glass which has a very complicated energy landscape with numerous local minima. What is a genotype for this model? Suppose we have a 2-D lattice of spins with i,j = 0,2,...,(L − 1), then we can order the spins linearly using the formula n = iL + j = 0,1,...,(L2 − 1) for example. A configuration is of spins is mapped to a genotype of L2 bits by setting the bit with index n to 0 or 1 if s = ±1. ij Since we are seeking the global energy minimum, the fitness of a particular genotype can be taken to be 2L2−E, since the minimum and maximum possible values for the energy are ∓2L2 for a 2-D square lattice and periodic boundary conditions. (Recall that the number of bonds is then twice the number of spins.) The following is one possible evolution protocol: PHY 411-506 Computational Physics 2 6 Friday, February 28 Topic 3 Polymers and Neurons Lecture 3 (cid:4) Start with a population of a fixed number N of strings initialized in some way, for example by setting 0 the string bits randomly. (cid:4) Repeat the following “generations”: • Allow some number of mutations. For example, choose 20% of the strings at random, and mutate a random bit (flip a random spin) in each string. • Choose some number of pairs of strings at random and have them “reproduce” as follows: each pair produces two offspring which differ from the parents by exchange of a randomly chosen substring. • The size of the population has now increased from N to N due to reproduction, and the parents 0 and children are competing for the same limited natural resources. Select N fittest survivors as 0 follows: (cid:3) Construct a cumulative histogram k (cid:88) H = (2L2 − E ) , k = 1,2,...,N , k i i=1 where k labels the strings in the population. (cid:3) Repeat N times: 0 ◦ Choose a random H between 0 and the maximum H . N ◦ Select the smallest k such that H > H. k After many generations the population should converge to the global energy minimum configuration! PHY 411-506 Computational Physics 2 7 Friday, February 28 Topic 3 Polymers and Neurons Lecture 3 Genetic Algorithm for the Unger-Moult Model PHY 411-506 Computational Physics 2 8 Friday, February 28 Topic 3 Polymers and Neurons Lecture 3 (cid:4) The binary strings are taken to be proteins, with hydrophobic and hydrophilic amino acid residues assigned binary 0 or 1. (cid:4) The process starts with N extended linear chains. (cid:4) Each mutation is a single Metropolis Monte Carlo step subject to the Metropolis acceptance criterion at temperature T. (cid:4) A crossover operation is performed by selecting a protein depending on its energy E with probability E i p(S ) = i (cid:80)N E j=1 j Thus the lower energy configurations have a higher chance of being selected. (cid:4) For a pair of selected proteins a random point is chosen along the sequence, and the N-terminus portion of the first protein and connected to the C-terminus potion of the second protein. • There are three ways to do this, with angles of 0◦, 90◦, or 270◦, which are tested in random order to find one that is self-avoiding. • If none of the three ways is valid, another pair of structures is selected. (cid:4) The energy of the valid crossover structure is compared to the average of the energies of its two parents and conditionally accepted based on the Metropolis Boltzmann ratio criterion w = exp[−(E − (cid:104)E(cid:105))/c ] > u , c ≡ k T k k B PHY 411-506 Computational Physics 2 9 Friday, February 28 Topic 3 Polymers and Neurons Lecture 3 (cid:4) The crossover operation is repeated until N −1 new accepted hybrid structures have been constructed. There are numerous resources on genetic algorithms on the web, see for example Wikipedia Genetic Algo- rithm. Python Simulated Annealing Algorithm Code genetic.py import copy import math import random def new_extended_coil(dna): """ input: dna sequence of ’B’ (hydrophopic) and ’W’ (hydrophilic) residues output: extended linear conformation """ L = len(dna) assert L == dna.count(’B’) + dna.count(’W’) return [ [0, i] for i in range(L) ] def E(s, dna): PHY 411-506 Computational Physics 2 10 Friday, February 28
Description: