ASA OPTIONS Lester Ingber Abstract Adaptive Simulated Annealing (ASA) is a C-language code that finds the best global fit of a nonlinear cost-function over a D-dimensional space. ASA has over 100 OPTIONS to provide robust tuning over many classes of nonlinear stochastic systems. These many OPTIONS help ensure that ASA can be used robustly across manyclasses of systems. Ke ywords: simulated annealing, nonlinear systems Most recent drafts are available as http://www.ingber.com/asa11_options.pdf $Id: asa11_options,v 1.25 2011/06/15 22:00:47 ingber Exp ingber $ Lester Ingber -2- ASA OPTIONS 1. Introduction Simulated annealing (SA) presents an optimization technique that can: (a) process cost functions possessing quite arbitrary degrees of nonlinearities, discontinuities, and stochasticity; (b) process quite arbitrary boundary conditions and constraints imposed on these cost functions; (c) be implemented quite easily with the degree of coding quite minimal relative to other nonlinear optimization algorithms; (d) statistically guarantee finding an optimal solution. Adaptive Simulated Annealing (ASA) is a C-language code that finds the best global fit of a nonlinear cost-function over a D-dimensional space. The basic algorithm was originally published as Very Fast Simulated Reannealing (VFSR) in 1989 (Ingber, 1989), after two years of application on combat simulations. The code (Ingber, 1993a) can be used at no charge and downloaded from http://www.ingber.com/#ASA with mirrors at: http://alumni.caltech.edu/˜ingber http://asa-caltech.sourceforge.net https://code.google.com/p/adaptive-simulated-annealing . ASA has over 100 OPTIONS to provide robust tuning over many classes of nonlinear stochastic systems. The current number as of this chapter is 152. These many OPTIONS help ensure that ASA can be used robustly across manyclasses of systems. In the context of this book, it will be seen in the discussions that the “QUENCHing” OPTIONS are among the most important for controlling Adaptive Simulated Annealing. Fuzzy ASA algorithms in particular offer new ways of controlling how these QUENCHing OPTIONS may be applied across many classes of problems. 1.1. LICENSEand Contributions The code originally was issued under a BSD-type License. This was changed to a form consistent with the less restrictive New BSD License http://en.wikipedia.org/wiki/BSD_License beginning with Version 28.1 in February 2011. I hav e had several queries as to why I did not follow a GPL license. I felt and still feel, similar to many other people who makecode available at no charge to others, that the GPL license is just too cumbersome and onerous. I hav e made my code available at no charge to anyone or any company, subject to very simple terms. If some user contributions do not quite fit into the code per se, I have put or referenced their contributions into the asa_contrib.txt or ASA-NOTES files. I do not think this has stymied people from contributing to the code. For example, in http://www.ingber.com/asa_contrib.txt there are references to several major contributions made by other people, e.g., Matlab interface, RLAB interface, AMPL interface, and Haskell Interface, The ASA_PARALLEL OPTIONS were contributed as a team effort I led, as Principal Investigator of a 1994 National Science Foundation Parallelizing ASA and PATHINT Project (PAPP). The Editor of this book has contributed FUZZY_ASA OPTIONS (Oliveira, 2001; Oliveira, H.R. Petraglia & Petraglia, 2007; Oliveira, A. Petraglia & Petraglia, 2009). Another user referenced in http://www.ingber.com/asa_contrib.txt contributed explicit code used in ASA to help parallelize optimization of chip design. The current list of CONTRIBUTORS in the ASA-CHANGES file that comes with code numbers 56. All these contributions have resulted in many versions of the code. The current list of VERSION DATES in the ASA-CHANGES file that comes with code numbers 586 since 1987. Afew ASA papers showed how the code could be useful for manyprojects (Ingber,1993b; Ingber,1996a; Atiyaet al,2003). 1.2. Organization of Chapter The next two sections give a short introduction to simulated annealing and to ASA. The first section discusses the theoretical foundations of ASA, and the second section discusses the practical implementation of ASA. The following section gives an overview and several approaches that consider why tuning is necessary in any sampling algorithm like SA, GA, etc. These issues have been addressed according to user feedback, i.e., what helps many users in many disciplines with a broad range of Lester Ingber -3- ASA OPTIONS experience to no experience. This work follows theoretical development of the algorithm that can be found in other ASA papers (Ingber,1989; Ingber,1993b; Ingber,1996a). Other sections that follow illustrate the use of OPTIONS are devoted Adaptive OPTIONS and Multiple Systems. Thelast section is the conclusion. Most of this chapter has organized information that has collected on the use of the code since 1987, and is contained in some form in multiple files, e.g., ASA-README, ASA-NOTES, asa_contrib.txt, asa_examples.txt, etc. 2. Theoretical Foundations of Adaptive Simulated Annealing (ASA) The unique aspect of simulated annealing (SA) is its property of (weak) ergodicity, permitting such code to statistically and reasonably sample a parameter space. Note that for very large systems, ergodicity is not an entirely rigorous concept when faced with the real task of its computation (Ma, 1985). In this chapter “ergodic” is used in a very weak sense, as it is not proposed, theoretically or practically, that all states of the system are actually to be visited. 2.1. Shadesof simulated annealing Even “standard” SA is not without its critics. Some negative features of SA are that it can: (A) be quite time-consuming to find an optimal fit, especially when using the “standard” Boltzmann technique; (B) be difficult to fine tune to specific problems, relative to some other fitting techniques; (C) suffer from “over- hype” and faddish misuse, leading to misinterpretation of results; (D) lose its ergodic property by misuse, e.g., by transforming SA into a method of “simulated quenching” (SQ) for which there is no statistical guarantee of finding an optimal solution. There also is a large and growing domain of SA-like techniques, which do not theoretically predict general statistical optimality, but which are extremely powerful for certain classes of problems. There are many examples given in published papers addressing robust problems across many disciplines. There are many reviews of simulated annealing, comparisons among simulated annealing algorithms, and between simulated annealing and other algorithms (Johnsonet al,1987; Gelfand, 1987; van Laarhoven& Aarts, 1987; Collinset al,1988; Ingber,1993b; Ingber,1996a). It is important to compare the basic theoretic constraints of true SA with actual practice on a range of problems spanning many disciplines. This may help to address what may yet be expected in terms of better necessary conditions on SA to make it a more efficient algorithm, as many believe that the present sufficiencyconditions are overly restrictive. 2.2. Criticsof SA The primary criticism is that it is too slow. This is partially addressed here by summarizing some work in appropriately adapting SQ to many problems. Another criticism is that it is “overkill” for many of the problems on which it is used. This is partially addressed here by pointing to much work demonstrating that it is not insignificant that many researchers are using SA/SQ because of the ease in which constraints and complexcost functions can easily be approached and coded. There is another class of criticisms that the algorithm is too broadly based on physical intuition and is too short on mathematical rigor (Charnes & Wolfe, 1989). In some particular bitter and scathing critiques authors take offense at the lack of reference to other prior work (Pincus, 1970), the use of “metaphysical non-mathematical ideas of melting, cooling, and freezing” reference to the physical process of annealing as used to popularize SA (Kirkpatrick et al, 1983), and they giv e their own calculations to demonstrate that SA can be a very poor algorithm to search for global optima in some instances. That there are undoubtedly other references that should be more regularly referenced is an objective issue that has much merit, with respect to SA as well as to other research projects. The other criticisms may be considered by some to be more subjective,but theyare likely no more extreme than the use of SQ to solve for global optima under the protective umbrella of SA. Lester Ingber -4- ASA OPTIONS 2.3. “Standard”simulated annealing (SA) The Metropolis Monte Carlo integration algorithm (Metropolis et al, 1953) was generalized by the Kirkpatrick algorithm to include a temperature schedule for efficient searching (Kirkpatrick et al, 1983). A sufficiency proof was then shown to put an lower bound on that schedule as 1/log(t), where t is an artificial time measure of the annealing schedule (Geman & Geman, 1984). However, independent credit usually goes to several other authors for independently developing the algorithm that is nowrecognized as simulated annealing (Pincus, 1970; Cerny, 1982). 2.4. Boltzmannannealing (BA) Credit for the first simulated annealing is generally recognized as a Monte Carlo importance-sampling technique for doing large-dimensional path integrals arising in statistical physics problems (Metropolis et al, 1953). This method was generalized to fitting non-convex cost-functions arising in a variety of problems, e.g., finding the optimal wiring for a densely wired computer chip (Kirkpatrick et al, 1983). The choices of probability distributions described in this section are generally specified as Boltzmann annealing (BA) (Szu & Hartley, 1987). The method of simulated annealing consists of three functional relationships. 1. g(x): Probability density of state-space of D parameters x = {xi;i = 1,D}. 2. h(D E): Probability for acceptance of newcost-function giventhe just previous value. 3. T(k): schedule of “annealing” the “temperature” T in annealing-time steps k, i.e., of changing the volatility or fluctuations of one or both of the twoprevious probability densities. The acceptance probability is based on the chances of obtaining a newstate with “energy” E relative to k+1 aprevious state with “energy” E , k exp(- E /T) h(D E) = k+1 exp(- E /T)+exp(- E /T) k+1 k 1 = 1+exp(D E/T) » exp(-D E/T) , (1) where D E represents the “energy” difference between the present and previous values of the energies (considered here as cost functions) appropriate to the physical problem, i.e., D E = E - E . This k+1 k essentially is the Boltzmann distribution contributing to the statistical mechanical partition function of the system (Binder & Stauffer,1985). This can be described by considering: a set of states labeled by x, each with energy e(x); a set of probability distributions p(x); and the energy distribution per state d((e(x))), giving an aggregate energy E, S p(x)d((e(x))) = E . (2) x The principle of maximizing the entropy, S, S = - S p(x)ln[p(x)/p(x)] , (3) x where x represents a reference state, using Lagrange multipliers (Mathews & Walker, 1970) to constrain the energy to average valueT,leads to the most likely Gibbs distributionG(x), 1 G(x) = exp((- H(x)/T)) , (4) Z in terms of the normalizing partition function Z, and the Hamiltonian H operator as the “energy” function, Z = S exp((- H(x)/T)) . (5) x Lester Ingber -5- ASA OPTIONS For such distributions of states and acceptance probabilities defined by functions such as h(D E), the equilibrium principle of detailed balance holds. I.e., the distributions of states before, G(x ), and after, k G(x ), applying the acceptance criteria, h(D E) = h(E - E )are the same: k+1 k+1 k G(x )h((D E(x))) = G(x ) . (6) k k+1 This is sufficient to establish that all states of the system can be sampled, in theory. Howev er, the annealing schedule interrupts equilibrium every time the temperature is changed, and so, at best, this must be done carefully and gradually. An important aspect of the SA algorithm is to pick the ranges of the parameters to be searched. In practice, computation of continuous systems requires some discretization, so without loss of much generality for applications described here, the space will be assumed to be discretized. There are additional constraints that are required when dealing with generating and cost functions with integral values. Many practitioners use novel techniques to narrow the range as the search progresses. For example, based on functional forms derived for many physical systems belonging to the class of Gaussian-Markovian systems, one could choose an algorithm for g, g(D x) = (2p T)- D/2exp[-D x2/(2T)] , (7) where D x = x - x is the deviation of x from x (usually taken to be the just-previously chosen point), 0 0 proportional to a “momentum” variable, and where T is a measure of the fluctuations of the Boltzmann distribution g in the D-dimensional x-space. Given g(D x), it has been proven (Geman & Geman, 1984) that it suffices to obtain a global minimum of E(x)ifT is selected to be not faster than T T(k) = 0 , (8) lnk withT “large enough.” 0 A heuristic demonstration shows that this equation for T will suffice to give a global minimum of E(x) (Szu & Hartley, 1987). Inorder to statistically assure, i.e., requiring manytrials, that anypoint in x-space can be sampled infinitely often in annealing-time (IOT), it suffices to prove that the products of probabilities of not generating a state x IOTfor all annealing-times successive to k yield zero, 0 ¥ P (1- g ) = 0 . (9) k k=k 0 This is equivalent to ¥ S g = ¥ . (10) k k=k 0 The problem then reduces to findingT(k)tosatisfy this equation. For BA, if T(k) is selected to be the Boltzmann criteria above, then the generating distribution g above gives ¥ ¥ ¥ S g ‡ S exp(- lnk) = S 1/k = ¥ . (11) k k=k k=k k=k 0 0 0 Although there are sound physical principles underlying the choices of the Boltzmann criteria above (Metropolis et al, 1953), it was noted that this method of finding the global minimum in x-space was not limited to physics examples requiring bona fide “temperatures” and “energies.” Rather, this methodology can be readily extended to any problem for which a reasonable probability density h(D x) can be formulated (Kirkpatricket al,1983). 2.5. Simulatedquenching (SQ) Many researchers have found it very attractive to take advantage of the ease of coding and implementing SA, utilizing its ability to handle quite complexcost functions and constraints. However, the long time of execution of standard Boltzmann-type SA has many times driven these projects to utilize a temperature schedule too fast to satisfy the sufficiencyconditions required to establish a true (weak) ergodic search. A Lester Ingber -6- ASA OPTIONS logarithmic temperature schedule is consistent with the Boltzmann algorithm, e.g., the temperature schedule is taken to be lnk T =T 0 , (12) k 0 lnk where T is the “temperature,” k is the “time” index of annealing, and k is some starting index. This can 0 be written for large k as lnk D k D T = - T 0 , k >> 1 0 k(lnk)2 lnk T =T - T 0 . (13) k+1 k 0 k(lnk)2 However, some researchers using the Boltzmann algorithm use an exponential schedule, e.g., T = cT , 0 < c < 1 k+1 k D T = (c- 1)D k , k >> 1 T k T =T exp(((c- 1)k)) , (14) k 0 with expediency the only reason given. While perhaps someday some less stringent necessary conditions may be developed for the Boltzmann algorithm, this is not now the state of affairs. The question arises, what is the value of this clear misuse of the claim to use SA to help solve these problems/systems? Adaptive simulated annealing (ASA) (Ingber, 1989; Ingber, 1993a), in fact does justify an exponential annealing schedule, but only if a particular distribution is used for the generating function. In many cases it is clear that the researchers already know quite a bit about their system, and the convenience of the SA algorithm, together with the need for some global search over local optima, makes a strong practical case for the use of SQ. In some of these cases, the researchers have been more diligent with regard to their numerical SQ work, and have compared the efficiency of SQ to some other methods they hav e tried. Of course, the point must be made that while SA’s true strength lies in its ability to statistically deliver a true global optimum, there are no theoretical reasons for assuming it will be more efficient than anyother algorithm that also can find this global optimum. 2.6. Fast annealing (FA) Although there are many variants and improvements made on the “standard” Boltzmann algorithm described above, many textbooks finish just about at this point without going into more detail about other algorithms that depart from this explicit algorithm (van Laarhoven & Aarts, 1987). Specifically, it was noted that the Cauchydistribution has some definite advantages overthe Boltzmann form (Szu & Hartley, 1987). TheCauchydistribution, T g(D x) = , (15) (D x2+T2)(D+1) / 2 has a “fatter” tail than the Gaussian form of the Boltzmann distribution, permitting easier access to test local minima in the search for the desired global minimum. It is instructive to note the similar corresponding heuristic demonstration, that the Cauchy g(D x) statistically finds a global minimum. If the BoltzmannT is replaced by T T(k) = 0 , (16) k then here ¥ ¥ S g » T0 S 1 = ¥ . (17) k D xD+1 k k k 0 0 Lester Ingber -7- ASA OPTIONS Note that the “normalization” of g has introduced the annealing-time index k, giving some insights into how to construct other annealing distributions. The method of FA is thus seen to have an annealing schedule exponentially faster than the method of BA. This method has been tested in a variety of problems (Szu & Hartley, 1987). 2.7. Adaptive simulated annealing (ASA) In a variety of physical problems we have a D-dimensional parameter-space. Different parameters have different finite ranges, fixed by physical considerations, and different annealing-time-dependent sensitivities, measured by the derivatives of the cost-function at local minima. BA and FA hav e distributions that sample infinite ranges, and there is no provision for considering differences in each parameter-dimension; e.g., different sensitivities might require different annealing schedules. This prompted the development of a new probability distribution to accommodate these desired features (Ingber, 1989), leading to a variant of SA that in fact justifies an exponential temperature annealing schedule. Theseare among several considerations that gav erise to Adaptive Simulated Annealing (ASA). Full details are available by obtaining the publicly available source code (Ingber,1993a). ASA considers a parametera i in dimensioni generated at annealing-time k with the range k a i˛ [A ,B ] , (18) k i i calculated with the random variable yi, a i =a i + yi(B - A ) , k+1 k i i yi˛ [- 1, 1] . (19) Define the generating function g (y) =P D 1 ” P D gi (yi) . (20) T i=1 2(|yi|+Ti)ln(1+1/Ti) i=1 T Its cumulative probability distribution is y1 yD G (y) = (cid:242) ... (cid:242) dy¢1...dy¢ D g (y¢) ” P D Gi (yi) , T T T - 1 - 1 i=1 1 sgn (yi) ln(1+|yi|/T ) Gi (yi) = + i . (21) T 2 2 ln(1+1/T ) i yi is generated from aui from the uniform distribution ui˛ U[0, 1] , yi =sgn (ui - 1)T [(1+1/T )|2ui- 1|- 1] . (22) i i 2 It is straightforward to calculate that for an annealing schedule forT i T (k) =T exp(- c k1/D) , (23) i 0i i aglobal minima statistically can be obtained. I.e., S¥ g » S¥ [P D 1 ] 1 = ¥ . (24) k k k i=1 2|yi|ci k 0 0 It seems sensible to choose control overc ,such that i T =T exp(- m ) when k = expn , fi 0i i f i c = m exp(- n /D) , (25) i i i Lester Ingber -8- ASA OPTIONS where m and n can be considered “free” parameters to help tune ASA for specific problems. i i It has proven fruitful to use the same type of annealing schedule for the acceptance function h as used for the generating function g, but with the number of acceptance points, instead of the number of generated points, used to determine the k for the acceptance temperature. Newparametersa i are generated from old parametersa i from k+1 k a i =a i + yi(B - A ) , (26) k+1 k i i constrained by a i ˛ [A ,B ] . (27) k+1 i i I.e., yi’s are generated until a set of D are obtained satisfying these constraints. 2.7.1. Reannealing Whenever doing a multi-dimensional search in the course of a real-world nonlinear physical problem, inevitably one must deal with different changing sensitivities of the a i in the search. At any giv en annealing-time, it seems sensible to attempt to “stretch out” the range overwhich the relatively insensitive parameters are being searched, relative tothe ranges of the more sensitive parameters. This can be by periodically rescaling the annealing-time k, essentially reannealing, e.g., every hundred or so acceptance-events, in terms of the sensitivities s calculated at the most current minimum value of the i cost function, L, s = ¶ L/¶ a i . (28) i In terms of the largest s = s , ASA can reanneal by using a rescaling for each k of each parameter i max i dimension, k fi k¢ , i i T¢ =T (s /s ) , ik¢ ik max i k¢ = ((ln(T /T )/c ))D . (29) i i0 ik¢ i T is set to unity to begin the search, which is ample to span each parameter dimension. i0 The acceptance temperature is similarly rescaled. In addition, since the initial acceptance temperature is set equal to a trial value of L,this is typically very large relative tothe global minimum. Therefore, when this rescaling is performed, the initial acceptance temperature is reset to the most current minimum of L, and the annealing-time associated with this temperature is set to give a new temperature equal to the lowest value of the cost-function encountered to annealing-date. Also generated are the “standard deviations” of the theoretical forms, calculated as [¶ 2L/(¶ a i)2]- 1/2, for each parameter a . This gives an estimate of the “noise” that accompanies fits to stochastic data or i functions. At the end of the run, the off-diagonal elements of the “covariance matrix” are calculated for all parameters. This inverse curvature of the theoretical cost function can provide a quantitative assessment of the relative sensitivity of parameters to statistical errors in fits to stochastic systems. A few other twists can be added, and such searches undoubtedly will never be strictly by rote. Physical systems are so different, some experience with each one is required to develop a truly efficient algorithm. 2.7.2. Selfoptimization Another feature of ASA is its ability to recursively self optimize its own Program Options, e.g., the c i parameters described above,for a givensystem. Anapplication is described below. 2.7.3. Quenching Another adaptive feature of ASA is its ability to perform quenching. This is applied by noting that the temperature schedule above can be redefined as Lester Ingber -9- ASA OPTIONS T (k ) =T exp(- c kQi/D) , i i 0i i i c = m exp(- n Q /D) , (30) i i i i in terms of the “quenching factor”Q . The above proof fails ifQ > 1as i i S P D 1/kQi/D = S 1/kQi < ¥ . (31) k k This simple calculation shows how the “curse of dimensionality” arises, and also gives a possible way of living with this disease. In ASA, the influence of large dimensions becomes clearly focused on the exponential of the power of k being 1/D, as the annealing required to properly sample the space becomes prohibitively slow. So, if we cannot commit resources to properly sample the space ergodically, then for some systems perhaps the next best procedure would be to turn on quenching, wherebyQ can become on i the order of the size of number of dimensions. The scale of the power of 1/D temperature schedule used for the acceptance function can be altered in a similar fashion. However, this does not affect the annealing proof of ASA, and so this may be used without damaging the (weak) ergodicity property. 2.8. VFSRand ASA The above defines this method of adaptive simulated annealing (ASA), previously called very fast simulated reannealing (VFSR) (Ingber, 1989) only named such to contrast it the previous method of fast annealing (FA) (Szu & Hartley, 1987). The annealing schedules for the temperatures T decrease i exponentially in annealing-time k, i.e., T =T exp(- c k1/D). Of course, the fatter the tail of the i i0 i generating function, the smaller the ratio of acceptance to generated points in the fit. However, in practice, when properly tuned, it is found that for a given generating function, this ratio is approximately constant as the fit finds a global minimum. Therefore, for a large parameter space, the efficiencyofthe fit is determined by the annealing schedule of the generating function. A major difference between ASA and BA algorithms is that the ergodic sampling takes place in an n+1 dimensional space, i.e., in terms of n parameters and the cost function. In ASA the exponential annealing schedules permit resources to be spent adaptively on reannealing and on pacing the convergence in all dimensions, ensuring ample global searching in the first phases of search and ample quick convergence in the final phases. The acceptance function h(D x) chosen is the usual Boltzmann form satisfying detailed balance, and the acceptance-temperature reannealing paces the convergence of the cost function to permit ergodic searching in the n-parameter space considered as the independent variables of the dependent cost function. 3. PracticalImplementation of ASA Details of the ASA algorithm are best obtained from the code itself and from published papers. There are three parts to its basic structure. 3.1. GeneratingProbability Density Function In a D-dimensional parameter space with parameters pi having ranges [A , B ], about the k’th last saved i i point (e.g., a local optima), pi, a new point is generated using a distribution defined by the product of k distributions for each parameter, gi(yi;T ) in terms of random variables yi˛ [- 1, 1], where pi = i k+1 pi + yi(B - A ), and “temperatures”T , k i i i 1 gi(yi;T ) = . (32) i 2(|yi|+T )ln(1+1/T ) i i The OPTIONS USER_GENERATING_FUNCTION permits using an alternative tothis ASA distribution function. Lester Ingber -10- ASA OPTIONS 3.2. AcceptanceProbability Density Function The cost functions, C(p )- C(p ), are compared using a uniform random generator, U˛ [0, 1), in a k+1 k “Boltzmann” test: If exp[- ((C(p )- C(p )))/T ] >U , (33) k+1 k cost where T is the “temperature” used for this test, then the new point is accepted as the new sav ed point cost for the next iteration. Otherwise, the last saved point is retained. The OPTIONS USER_ACCEPT_ASYMP_EXP or USER_ACCEPT_THRESHOLD permit using alternatives to this Boltzmann distribution function. 3.3. ReannealingTemperatureSchedule The annealing schedule for each parameter temperature,T from a starting temperatureT ,is i i0 T (k ) =T exp(- c k1/D) . (34) i i 0i i i The annealing schedule for the cost temperature is developed similarly to the parameter temperatures. However, the index for reannealing the cost function, k , is determined by the number of accepted cost points, instead of the number of generated points as used for the parameters. This choice was made because the Boltzmann acceptance criteria uses an exponential distribution that is not as fat-tailed as the ASA distribution used for the parameters. This schedule can be modified using several OPTIONS. In particular, the Pre-Compile OPTIONS USER_COST_SCHEDULE permits quite arbitrary functional modifications for this annealing schedule, and the Pre-Compile OPTIONS As determined by the Program Options selected, the parameter “temperatures” may be periodically adaptively reannealed, or increased relative to their previous values, using their relative first derivatives with respect to the cost function, to guide the search “fairly” among the parameters. As determined by the Program Options selected, the reannealing of the cost temperature resets the scale of the annealing of the cost acceptance criteria as T (k ) =T exp(- c k1/D) . (35) cost cost 0cost cost cost The newT is taken to be the minimum of the current initial cost temperature and the maximum of the 0cost absolute values of the best and last cost functions and their difference. The new k is calculated taking cost T as the maximum of the current value and the absolute value of the difference between the last and cost best saved minima of the cost function, constrained not to exceed the current initial cost temperature. This procedure essentially resets the scale of the annealing of the cost temperature within the scale of the current best or last savedminimum. This default algorithm for reannealing the cost temperature, taking advantage of the ASA importance sampling that relates most specifically to the parameter temperatures, while often is quite efficient for some systems, may lead to problems in dwelling too long in local minima for other systems. In such case, the user may also experiment with alternative algorithms effected using the Reanneal_Cost OPTIONS. For example, ASA provides an alternative calculation for the cost temperature, when Reanneal_Cost < -1 or > 1, that periodically calculates the initial and current cost temperatures or just the initial cost temperature, resp., as a deviation overasample of cost functions. These reannealing algorithms can be changed adaptively by the user, e.g., by using USER_REANNEAL_COST and USER_REANNEAL_PARAMETERS. 3.4. QUENCH_PARAMETERS=FALSE This OPTIONS permits you to alter the basic algorithm to perform selective “quenching,” i.e., faster temperature cooling than permitted by the ASA algorithm. This can be very useful, e.g., to quench the system down to some region of interest, and then to perform proper annealing for the rest of the run. However, note that once you decide to quench rather than to truly anneal, there no longer is anystatistical guarantee of finding a global optimum. Once you decide you can quench, there are many more alternative algorithms you might wish to choose for your system, e.g., creating a hybrid global-local adaptive quenching search algorithm, e.g., using
Description: