Noname manuscript No. (will be inserted by the editor) Two globally convergent nonmonotone trust-region methods for unconstrained optimization Masoud Ahookhosh · Susan Ghaderi 5 1 Abstract This paper addresses some trust-region methods equipped with nonmonotone strategies for 0 solving nonlinear unconstrained optimization problems. More specifically, the importance of using non- 2 monotone techniques in nonlinear optimization is motivated, then two new nonmonotone terms are pro- n posed,andtheircombinationsintothetraditionaltrust-regionframeworkarestudied.Theglobalconver- a gence to first- and second-order stationary points and local superlinear and quadratic convergence rates J forbothalgorithmsareestablished.NumericalexperimentsontheCUTEsttestcollectionofunconstrained 9 problemsandsome highlynonlineartestfunctionsare reported, whereacomparisonamong state-of-the- art nonmonotone trust-region methods show the efficiency of the proposed nonmonotne schemes. ] C Keywords Unconstrained optimization · Black-box oracle · Trust-region method · Nonmonotone O strategy · Global convergence . h Mathematics Subject Classification (2000) 90C30 · 65K05 t a m [ 1 Introduction 1 In this paper we consider the unconstrained minimization problem v 1 minimize f(x) 1 (1) 0 subject to x∈Rn, 2 0 where f :Rn →R is a real-valued nonlinear function, which is bounded and continuously-differentiable. . We suppose that first- or second-order black-box oracle of f is available. 1 0 Motivation & history. Trust-region methods, also called restricted step methods [21], are a class of 5 iterative schemes developed to solve convex or nonconvex optimization problems, see, for example, [13]. 1 They also developed for nonsmooth problems, see [13, 16, 45, 23]. Trust-region methods have strong : v convergence properties, are reliable and robust in computation, and can handle ill-conditioned problems, i cf.[34,35].Letx bethecurrentiteration.Intrust-regionframeworktheobjectivef isapproximatedby X k a simple model in a specific region around x such that it is an acceptable approximation of the original k r a objective, which is called region of trust. Afterward, the model is minimized subject to the trust-region constraint to find a new trial point d . Hence the simple model means that it can be minimized much k easier than the original objective function. If the founded model is an adequate approximation of the objective function within the trust-region, then the point x =x +d is accepted by the trust-region k+1 k k method and the region can be expanded for the next iteration; conversely, if the approximation is poor, the region is contracted and the model is minimized within the contracted region. This scheme will be continued until finding an acceptable trial step d guaranteeing an acceptable agreement between the k model and the objective function. M.Ahookhosh FacultyofMathematics,UniversityofVienna,Oskar-Morgenstern-Platz1,1090Vienna,Austria E-mail:[email protected] S.Ghaderi DepartmentofMathematics,FacultyofScience,RaziUniversity,Kermanshah,Iran E-mail:[email protected] 2 MasoudAhookhosh,SusanGhaderi Severalquadraticandnon-quadraticmodelshavebeenproposedtoapproximatetheobjectivefunction inoptimization,see[14,22,36,39],however,theconicandquadraticmodelsaremorepopular,see[17,37]. If the approximated model is quadratic, i.e., 1 q (d):=f +gTd+ dTB d, (2) k k k 2 k where f = f(x ), g = ∇f(x ), and B ≈ ∇2f(x ), the trust-region method can be considered as a k k k k k k globally convergent generalization on classical Newton’s method. Then the trust-region sunproblem is defined by minimize q (d), k (3) subject to (cid:107)d(cid:107)≤δ . k Hence the trust-region is commonly a norm ball C defined by C :={d∈Rn |(cid:107)d(cid:107)≤δ }, k where δ >0 is a real number called trust-region radius, and (cid:107)·(cid:107) is any norm in Rn, cf. [29]. Since C is k compactandthemodeliscontinuous,thetrust-regionsubproblemattainsitsminimizeronthesetC.The most computational cost of trust-region methods relates to minimizing the model over the trust-region C. Hence finding efficient schemes for solving (3) has received much attention during past few decades, see [19, 20, 25, 31, 38]. Once the step d is computed, the quality of the model in the trust-region is evaluated by a ratio of the actual reduction of objective, f −f(x +d), to the predicted reduction of k k model, q (0)−q (d), i.e., k k f −f(x +d) r = k k . (4) k q (0)−q (d) k k Foraprescribedpositiveconstantµ ∈(0,1],ifr ≥µ ,themodelprovidesareasonableapproximation, 1 k 1 the step is accepted, i.e., x = x +d , and the trust-region C can be expanded for the next step. k+1 k k Otherwise, the trust-region C should be contracted by decreasing the radius δ and the subproblem (3) k is solved in the reduced region. This scheme is continued until that the step d accepted by trust-region test r ≥µ . Our discussion can be summarized in the following algorithm: k 1 Algorithm 1: TTR (traditional trust-region algorithm) Input:x0∈Rn,B0∈Rn×n,kmax;0<µ1≤µ2≤1,0<ρ1≤1≤ρ2,ε>0; Output:x ;f ; b b 1 begin 2 δ0←(cid:107)g0(cid:107); k←0; 3 while(cid:107)gk(cid:107)≥ε & k≤kmax do 4 solvethesubproblem(3)tospecifydk; 5 x(cid:98)k←xk+dk; computef(x(cid:98)k); 6 determinerk using(4); 7 whilerk<µ1 do 8 δk←ρ1δk; 9 solvethesubproblem(3)tospecifydk; 10 x(cid:98)k←xk+dk; computef(x(cid:98)k) 11 determinerk using(4); 12 end 13 xk+1←x(cid:98)k; 14 if rk≥µ2 then 15 δk+1←ρ2δk; 16 end 17 updateBk+1; k←k+1; 18 end 19 xb←xk; fb←fk; 20 end In Algorithm 1, it follows from r ≥µ and q (0)−q (d )>0 that k 1 k k k f −f ≥µ (q (0)−q (d ))>0, k k+1 1 k k k implying f ≤ f . This means that the sequence of function values {f } is monotonically decreasing, k+1 k k i.e.,thetraditionaltrust-regionmethodisalsocalledthemonotone trust-region method.Thisfeature seems natural for minimization schemes, however, it slows down the convergence of TTR to a minimizer if the objective involves a curved narrow valley, see [1, 27]. To observe the effect of nonmonotonicity on TTR, we study the next example. Twogloballyconvergentnonmonotonetrust-regionmethodsforunconstrainedoptimization 3 Example 1 Consider the two-dimensional Nesterov-Chebysheve-Rosenbrock function , cf. [28], 1 f(x ,x )= (x −1)2+(x −2x2+1)2, 1 2 4 1 2 1 where we solve the problem (1) by Newton’s method and TTR with the initial point x =(−0.61,−1). It 0 is clear that (1,1) is the optimizer. The implementation indicates that Newton’s method needs 7 iterations and 8 function evaluations, while monotone trust-region method needs 22 iterations and 24 function evaluations. We depict the contour plot of the objective and iterations as well as a diagram for function values versus iteration attained by these two algorithms in Figure 1. Subfigure (a) of Figure 1 shows that the iterations of TTR follow the bottom of the valley in contrast to those for Newton’s method that can go up and down to reach the ε-solution with the accuracy parameter ε = 10−5. We see that Newton’s method attains larger step compared with those of TTR. Subfigure (b) of Figure 1 illustrates function values versus iterations for both algorithms showing that the related function values of TTR decreases monotonically, while it is fluctuated nonmonotonically for Newton’s method. 102 1 0.5 100 0 10−2 −0−.15 function values10−4 10−6 −1.5 −2 10−8 Nes−Cheb−Rosen TTR TTR −2.5 Newton Newton 10−10 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 100 101 102 iterations (a)Nes-Cheb-Rosencontourplot&iterations (b)functionvaluesversusiterations Fig. 1: A comparison between Newton’s method and TTR: Subfigure (a) illustrates the contour plot of the two-dimensional Nesterov-Chebysheve-Rosenbrock function and iterations of Newton and TTR; Subfigure (b) shows the diagram of function values versus iterations. In general the monotonicity may result to the slow iterative schemes for highly nonlinear or badly- scaled problems. To avoiding this algorithmic limitation, the idea of nonmonotone strategies has been proposedtracedbacktothewatch-dogtechniquetoovercometheMartoseffectforconstrainedoptimiza- tion [12]. To improve the performance of Armijo’s line search, Grippo et al. in 1986 [27] proposed the modified Armijo’s rule f(x +α d )≤f +σα gTd , k =0,1,2,··· , k k k l(k) k k k with the step-size α >0, σ ∈(0,1/2), and k f = max {f }, (5) l(k) k−j 0≤j≤m(k) where m(0) = 0, m(k) ≤ min{m(k − 1) + 1,N} for nonnegetive integer N. It was shown that the associatedschemeisgloballyconvergent,andnumericalresultsreportedinGrippoetal.[27]andToint [40]showedtheeffectivenessoftheproposedidea.Motivatedbytheseresults,thenonmonotonestrategies has received much attention during past few decades. For example, in 2004, Zhang & Hager in [46] proposed the nonmonotone term (cid:26) (cid:26) f if k =0, 1 if k =0, C = 0 Q = (6) k (η Q C +f(x ))/Q if k ≥1, k η Q +1 if k ≥1, k−1 k−1 k−1 k k k−1 k−1 4 MasoudAhookhosh,SusanGhaderi where 0≤η ≤η ≤η ≤1. Recently, Mo et al. in [30] and Ahookhosh et al. in [3] studied the min k−1 max nonmonotone term (cid:26) f if k =1, D = k (7) k η D +(1−η )f if k ≥2, k k−1 k k where η ∈ [η ,η ], η ∈ [0,1], η ∈ [η ,1]. More recently, Amini et al. in [7] proposed the k min max min max min nonmonotone term R =η f +(1−η )f , (8) k k l(k) k k where0≤η ≤η ≤1andη ∈[η ,η ].Inallcasesitwasprovedthattheschemesareglobally min max k min max convergent and enjoy the better performance compared with monotone ones. At the same importance of using monmonotone strategies for inexact line search techniques, the combination of trust-region methods with nonmonotone strategies is interesting. Historically, the first nonmonotone trust-region method was proposed in 1993 by Deng et al. in [15] for unconstrained opti- mization.Undersomeclassicalassumptions,theglobalconvergenceandthelocalsuperlinearconvergence rate were established. Nonmonotone trust-region methods were also studied by several authors such as Toint [41], Xiao & Zho [43], Xiao & Chu [44], Zhou & Xiao [47], Ahookhosh & Amini [2], Amini & Ahookhosh [6], and Mo et al. [30]. Recently, Ahookhosh & Amini in [1] and Ahookhosh et al. in [4] proposed two nonmonotone trust-region methods using the nonmonotone term (8). Theoretical resultswerereported,andnumericalresultsshowedtheefficiencyoftheproposednonmonotonemethods. Content. In this paper we propose a trust-region method equipped with two novel nonmonotone terms. More precisely, we first establish two nonmonotone terms and then combine them with Algorithm 1 to construct two nonmonotone trust-region algorithms. If k ≥ N, the new nonmonotone terms are defined by a convex combination of the last N successful function values, and if k < N, either a convex combi- nation of k successful function values or f is used. The global convergence to first- and second-order l(k) stationary points is established on some classical assumptions. Moreover, local superlinear and quadratic convergenceratesfortheproposedmethodsarestudied.Numericalresultsregardingexperimentsonsome highly nonlinear problems and on 112 unconstrained test problems from the CUTEst test collection [24] are reported indicating the efficiency of the proposed nonmonotone terms. The remainder of paper is organized as follow. In Section 2 we propose new nonmonotone terms and their combination with the trust-region framework. The global convergence of the proposed methods are given in Section 3. Numerical results are reported in Section 4. Finally, some conclusions are given in Section 5. 2 Novel nonmonotone terms and algorithm In this section we first present two novel nonmonotone terms and then combine them into trust-region frameworktointroducetwononmonotonetrust-regionalgorithmsforsolvingtheunconstrainedoptimiza- tion problem (1). We first assume that k denotes the current iteration and N ∈ N is a constant. The main idea is to construct a nonmonotone term determined by a convex combination of the last k successful function values if k < N and by a convex combination of the last N successful function values if k ≥ N. In the other words, we construct new terms using function values collected in the set (cid:26) {f ,f ,··· ,f } if k <N, F := 0 1 k (9) k {f ,f ,··· ,f } if k ≥N, k−N+1 k−N+2 k which should be updated in each iteration. To this end, motivated by the term (40), we construct T k using the subsequent procedure T =f if k =0, TT012 ==((110−−ηη01))ff12++ηη01f(10−η0)f1+η1η0f0 iiff kk ==12,, . . . . . . TT. N−1 ==. ((11−−ηηN−2))ffN−+1η+ηN−(12(−1η−ηN−)f3)fN−+2·+···+··η+ηN−·2···η··fη0f0 i.iff kk ==NN,−1, N N−1 N N−1 N−2 N−1 N−1 0 0 Twogloballyconvergentnonmonotonetrust-regionmethodsforunconstrainedoptimization 5 where η ∈[0,1), for i=1,2,··· ,N, are some weight parameters. Hence the new term is generated by i (cid:26) (1−η )f +η T if k <N, T := k−1 k k−1 k−1 (10) k (1−η ) f +η (1−η ) f +···+η ···η f if k ≥N, k−1 k k−1 k−2 k−1 k−1 k−N k−N whereT =f andη ∈[0,1)fori=1,2,··· ,k.ToshowthatT isaconvexcombinationofthecollected 0 0 i k functionvaluesF ,itisenoughtoshowthatthesummationofmultipliersareequaltounity.Fork ≥N, k the definition for T implies k (1−η )+η (1−η )+···+η ···η (1−η )+η ···η =1 (11) k−1 k−1 k−2 k−1 k−N−1 k−N k−1 k−N For k <N, a similar summation of the last k multipliers is equal to one. Therefore, the generated term T is a convex combination of the elements of F . k k The procedure of defining T clearly implies that the set F should be updated and saved in each k k iteration. Moreover, N(N +1)/2 multiplications is required to compute T . To avoid saving F and k k decrease the number of multiplications, we derive a recursive formula for (10). From the definition of T , k for k ≥N, it follows that T −η T =(1−η ) f +η (1−η ) f +···+η ···η f k k−1 k−1 k−1 k k−1 k−2 k−1 k−1 k−N k−N −η (1−η ) f −···−η ···(1−η ) f −η η ···η f k−1 k−2 k−1 k−1 k−N−1 k−N k−1 k−2 k−N−1 k−N−1 =(1−η ) f +η η ···η (f −f ) k−1 k k−1 k−2 k−N−1 k−N k−N−1 =(1−η ) f +ξ (f −f ) k−1 k k k−N k−N−1 where ξ :=η η ···η . For k ≥N, this equation leads to k k−1 k−2 k−N−1 T =(1−η ) f +η T +ξ (f −f ), (12) k k−1 k k−1 k−1 k k−N k−N−1 whichrequirstosaveonlyf andf andonlyneedsthreemultiplications.Moreover,thedefinition k−N k−N−1 of ξ implies k η η ξ =η η ···η = k−1 η η ···η = k−1 ξ . (13) k k−1 k−2 k−N−1 η k−2 k−3 k−N−2 η k−1 k−N−2 k−N−2 If ξ is recursively updated by (13), (10), and (12), a new nonmonotone term is defined by k (cid:26) f +η (T −f ) if k <N, Tk := mkax(cid:8)Tk−1,f (cid:9)k k if k ≥N, (14) k k where the max term is added to guarantee T ≥f . k k AsdiscussedinSection1,nonmonotoneschemesperformbetterwhentheyusestrongernonmonotone termsfarawayfromtheoptimizerandweakeroneclosetoit.Thismotivateustoconsideranewversionof thederivednonmonotonetermbyusingf incasesthatk <N.Moreprecisely,thesecondnonmonotone l(k) term is defined by (cid:26) f if k <N, Tk = ml(akx)(cid:8)T ,f (cid:9) if k ≥N, (15) k k whereξ isdefinedby(13).Itisclearthatthenewtermusesastrongertermf definedby(5)forfirst k l(k) k <N iterations and then employs the relaxed convex term proposed above. Now, to employ the proposed nonmonotone terms in the trust-region framework, it is enough to replace the ratio r (4) by the nonmonotone ratio k T −f(x +d) r = k k , (16) (cid:98)k q (0)−q (d) k k where T is defined by (14) or (15). Hence in trust-region framework we replace (4) by (16). Notice that k if r ≥µ , the, (cid:98)k 1 T −f ≥µ (q (0)−q (d ))≥0. k k+1 1 k k k This implies that f can be larger than f , however, the elements of {f } cannot arbitrarily increase, k+1 k k and the maximum increase is controlled by the nonmonotone term T . Moreover, the definitions (14) k and (15) imply that r ≥r increasing the possibility of attaining larger steps for nonmonotone schemes (cid:98)k k compared with monotone ones. 6 MasoudAhookhosh,SusanGhaderi The above-mentioned discussion leads to the following nonmonotone trust-region algorithm: Algorithm 2: NMTR (nonmonotone traditional trust-region algorithm) Input: x ∈Rn, B ∈Rn×n, k ; 0<µ ≤µ ≥1, 0<ρ ≤1≤ρ ≥1, ε>0; 0 0 max 1 2 1 2 Output: x ; f ; b b 1 begin 2 δ0 ←(cid:107)g0(cid:107); k ←0; 3 while (cid:107)gk(cid:107)≥ε & k ≤kmax do 4 solve the subproblem (3) to specify dk; 5 x(cid:98)k ←xk+dk; compute f(xˆk); 6 determine r(cid:98)k using (16); 7 if r(cid:98)k <µ1 while r(cid:98)k <µ1 do 8 δk ←ρ1δk; 9 solve the subproblem (3) to specify dk; 10 x(cid:98)k ←xk+dk; compute f(xˆk) 11 determine r(cid:98)k using (16); 12 end 13 xk+1 ←x(cid:98)k; 14 if r(cid:98)k ≥µ2 then 15 δk+1 ←ρ2δk; 16 end 17 update Bk+1; k ←k+1; 18 update Tk+1; 19 update ηk+1; 20 end 21 xb ←xk; fb ←fk; 22 end In Algorithm 2, if r ≥µ (Line 7), it is called a successful iteration and if r ≥µ (Line 14), it is (cid:98)k 1 (cid:98)k 2 called a very successful iteration. In addition, in the algorithm, the loop started from Line 3 to Line 20 is called the outer cycle, and the loop started from Line 7 to Line 12 is called the inner cycle. 3 Convergence analysis This section concerns with the global convergence to first- and second-order stationary points of the sequence {x } generated by Algorithm 2. More precisely, we intend to prove that all limit point x∗ of k the sequence {x } satisfy the condition g(x∗)=0, and there exists a point x∗ satisfying g(x∗)=0 where k H(x∗) is positive semidefinite. Furthermore, we show that Algorithm 2 is well-defined, which means that theinnercycleofthealgorithmwillbeleavedafterafinitenumberinternaliterations,andthenproveits global convergence. Moreover, local superlinear and quadratic convergence rates are investigated under some classical assumptions. To prove the global convergence of the sequence {x } generated by Algorithm 2, we require to make k the following assumptions: (H1) The objective function f is continuously differentiable and has a lower bound on the upper level set L(x )={x∈Rn |f(x)≤f(x )}. 0 0 (H2) The sequence {B } is uniformly bounded, i.e., there exists a constant M >0 such that k (cid:107)B (cid:107)≤M, k for all k ∈N. (H3) There exists a constant c>0 such that the trial step d satisfies (cid:107)d (cid:107)≤c(cid:107)g (cid:107). k k k Twogloballyconvergentnonmonotonetrust-regionmethodsforunconstrainedoptimization 7 We also assume that the decrease on the model q is at least as much as a fraction of the decrease k obtained by the Cauchy point guaranteeing that there exists a constant β ∈(0,1) such that (cid:26) (cid:27) (cid:107)g (cid:107) q (0)−q (d )≥β(cid:107)g (cid:107) min δ , k , (17) k k k k k (cid:107)B (cid:107) k for all k. This condition is called the sufficient reduction condition. Inequality (17) implies that d (cid:54)= 0 k whenever g (cid:54)= 0. It is noticeable that there are several schemes that can solve the the trust-region k subproblem (3) such that (18) is valid, see, for example, [13, 32]. Lemma 2 Suppose that sequence {x } is generated by Algorithm 2, then k |f −f(x +d )−(q (0)−q (d ))|≤O((cid:107)d (cid:107)2). k k k k k k k Proof The proof can be found in [12]. (cid:116)(cid:117) Lemma 3 Suppose that the sequence {x } is generated by Algorithm 1, then we get k f ≤T ≤f , (18) k k l(k) for all k ∈N∪{0}. Proof For k ≤ N, we consider two cases: (i) T is defined by (14); (ii) T is defined by (15). In Case (i) k k Lemma 2.1 in [3], f ≤f , for i=0,1,···k, and the fact that summation of multipliers in T equal to i l(k) k one give the result. Case (ii) is evident from (15). For k ≥N, if T =f , the result is evident. Otherwise, since k k (1−η )+η (1−η )+···+η ···η (1−η )+η ···η =1, (19) k−1 k−1 k−2 k−1 k−N−1 k−N k−1 k−N the fact that f ≤f , for i=k−N +1,··· ,k, and (10) imply i l(k) f ≤T =(1−η ) f +η (1−η ) f +···+η ···η f k k k−1 k k−1 k−2 k−1 k−1 k−N k−N ≤[(1−η )+η (1−η )+···+η ···η ]f =f , k−1 k−1 k−2 k−1 k−N l(k) l(k) giving the result. (cid:116)(cid:117) Lemma 4 Suppose that sequence {x } is generated by Algorithm 2, then the sequence {f } is decreas- k l(k) ing. Proof The condition (18) implies that T ≤f . If x is accepted by Algorithm 2, then k l(k) k+1 f −f(x +d ) T −f(x +d ) l(k) k k ≥ k k k ≥µ , q (0)−q (d ) q (0)−q (d ) 1 k k k k k k leading to f −f(x +d )≥µ (q (0)−q (d ))≥0, for all k ∈N, l(k) k k 1 k k k implying f ≥f , for all k ∈N. (20) l(k) k+1 Now, if k ≥N, by using m(k+1)≤m(k)+1 and (20), we get f = max {f }≤ max {f }=max{f ,f }≤f . l(k+1) k−j+1 k−j+1 l(k) k+1 l(k) 0≤j≤m(k+1) 0≤j≤m(k)+1 For k < N, it is obvious that m(k) = k. Since, for any k, f ≤ f , it is clear that f = f . Therefore, k 0 l(k) 0 in both cases, the sequence {f } is decreasing. (cid:116)(cid:117) l(k) Lemma 5 Suppose that (H1) holds and the sequence {x } is generated by Algorithm 2, then L(x ) k 0 involves {x }. k Proof The definition of T indicates that T = f . By induction, we assume that x ∈ L(x ), for all k 0 0 i 0 i=1,2,··· ,k, and then prove that x ∈L(x ). From (18), we get k+1 0 f ≤T ≤f ≤f ≤f , k+1 k+1 l(k+1) l(k) 0 implying that L(x ) involves the sequence {x }. (cid:116)(cid:117) 0 k 8 MasoudAhookhosh,SusanGhaderi Corollary 6 Suppose that (H1) holds and the sequence {x } is generated by Algorithm 2. Then the k sequence {f } is convergent. l(k) Proof The assumption (H1) and Lemma 4 imply that there exists a constant λ such that λ≤f ≤f ≤···≤f ≤f , k+n l(k+n) l(k+1) l(k) for all n∈N. This implies that the sequence {f } is convergent. (cid:116)(cid:117) l(k) Lemma 7 Suppose that (H1)-(H3) hold and the sequence {x } is generated by Algorithm 2, then k lim f(x )= lim f . (21) l(k) k k→∞ k→∞ Proof The condition (18) and Lemma 7 of [1] imply that the result is valid. (cid:116)(cid:117) Corollary 8 Suppose (H1)-(H3) hold and the sequence {x } is generated by Algorithm 2, then we k lim T = lim f . (22) k k k→∞ k→∞ Proof From (18) and Lemma 7, the result is obtained. (cid:116)(cid:117) Lemma 9 Suppose that (H1) and (H2) hold, and the sequence {x } is generated by Algorithm 2. Then k if (cid:107)g (cid:107)≥ε>0, we have k (i) The inner cycle of Algorithm 2 is well-defined; (ii) For any k, there exists a nonnegative integer p such that x is a very successful iteration. k+p+1 Proof (i) Let t denotes the internal iteration counter in step k, and dtk and δtk respectively show the k k k solutionofthesubproblem(3)andthecorrespondingtrust-regionradiusintheinternaliterationt .The k fact that (cid:107)g (cid:107)≥ε>0, (H2), and (17) imply k (cid:26) (cid:107)g (cid:107)(cid:27) (cid:110) ε (cid:111) q (0)−q (dtk)≥β(cid:107)g (cid:107) min δtk, k ≥βε min δtk, . (23) k k k k k (cid:107)B (cid:107) k M k Then Line 8 of Algorithm 2 implies lim δtk =0. tk→∞ k From This, Lemma 2, and (24), we obtain |rk−1|=(cid:12)(cid:12)(cid:12)(cid:12)fqk−(0)f−(xkq+(ddtktkk)) −1(cid:12)(cid:12)(cid:12)(cid:12)=(cid:12)(cid:12)(cid:12)(cid:12)fk−f(xk+q d(0tkk))−−q(q(kd(t0k))−qk(dtkk))(cid:12)(cid:12)(cid:12)(cid:12) k k k k k k O((cid:107)dtk(cid:107)2) O((δtk)2) ≤ k ≤ k →0 (t →∞), βε min(cid:8)δtk,ε/M(cid:9) βε min(cid:8)δtk,ε/M(cid:9) k k k implying that there exists a positive integer k such that for k ≥k we have r ≥µ . This and (18) lead 0 0 k 1 to T −f(x +dtk) f −f(x +dtk) r = k k k ≥ k k k ≥µ , (cid:98)k q (0)−q (dtk) q (0)−q (dtk) 1 k k k k k k implying that the inner cycle is well-defined. (ii)Assumethatthereexistsapositiveintegerk suchthatforanarbitrarypositiveintegerpthepoint x is not very successful. Hence, for any constant p=0,1,2,···, we get k+p+1 r <µ . (cid:98)k+p 2 The fact that (cid:107)g (cid:107)≥ε>0, (H2), and (17) imply k (cid:26) (cid:27) (cid:107)g (cid:107) T −f(x +d )≥µ (q (0)−q (d ))≥βµ (cid:107)g (cid:107) min δ , k+p k+p k+p k+p 1 k+p k+p k+p 1 k+p k+p (cid:107)B (cid:107) k+p (24) (cid:110) ε (cid:111) ≥βµ ε min δ , . 1 k+p M Twogloballyconvergentnonmonotonetrust-regionmethodsforunconstrainedoptimization 9 By using (22) and (24), we can write lim δ =0. (25) k+p p→∞ From Lemma 2, (25), and (23), we obtain (cid:12) (cid:12) |rk+p−1|=(cid:12)(cid:12)(cid:12)f(xqk+p()0−)−f(qxk+p(d+dk)+p) −1(cid:12)(cid:12)(cid:12) k+p k+p k+p (cid:12) (cid:12) =(cid:12)(cid:12)f(xk+p)−f(xk+p+dk+p)−(qk+p(0)−qk+p(dk+p))(cid:12)(cid:12) (cid:12) q (0)−q (d ) (cid:12) k+p k+p k+p O((cid:107)d (cid:107)2) O(δ2 ) ≤ k+p ≤ k+p →0 (p→∞). βε min{δ ,ε/M} βε min{δ ,ε/M} k+p k+p Then, for a sufficiently large p, we get r ≥µ leading to k+p 2 T −f(x +d ) f(x )−f(x +d ) k+p k+p k+p ≥ k+p k+p k+p ≥µ . q (0)−q (d ) q (0)−q (d ) 2 k+p k+p k+p k+p k+p k+p implying r ≥ µ , for a sufficiently large p. This contradicts with assumption r < µ giving the (cid:98)k+p 2 (cid:98)k+p 2 result. (cid:116)(cid:117) Lemma9(i)impliesthattheinnercyclewillbeleavedafterafinitenumberofinternaliterations,and Lemma 9(ii) implies that if the current iteration is not a first-order stationary point, then at least there exists a very successful iteration point, i.e., the trust-region radius δ can be enlarged. The next result k gives the global convergence of the sequence {x } of Algorithm 2. k Theorem 10 Supposethat(H1)and(H2)hold,andsupposethesequence{x }isgeneratedbyAlgorithm k 2. Then liminf(cid:107)g (cid:107)=0. (26) k k→∞ Proof We consider two cases: (i) Algorithm 2 has finitely many very successful iterations; (ii) Algorithm 2 has infinitely many very successful iterations. In Case 1, we suppose that k be the largest index of very successful iterations. If (cid:107)g (cid:107)>0, then 0 k0+1 Lemma 9(ii) implies that there exist a very successful iteration with larger index than k . This is a 0 contradiction to the definition of k . 0 In Case 2, by contradiction, we assume that there exist constants ε>0 and K >0 such that (cid:107)g (cid:107)≥ε, (27) k for all k ≥K. If x is a successful iteration and k ≥K, then by using (H2), (17), and (27), we get k+1 T −f(x +d )≥µ (q (0)−q (d )) k k k 1 k k k (cid:26) (cid:107)g (cid:107)(cid:27) (cid:110) ε (cid:111) (28) ≥βµ (cid:107)g (cid:107) min δ , k ≥βµ ε min δ , ≥0. 1 k k (cid:107)B (cid:107) 1 k M k It follows from this inequality and (22) that lim δ =0. (29) k k→∞ Since Algorithm 2 has infinitely many very successful iterations, then Lemma 9(ii) and (27) imply that thesequence{x }involvesinfinitelymanyverysuccessfuliterationsinwhichthetrust-regionisenlarged, k which is a contradiction with (29). This implies the result is valid. (cid:116)(cid:117) Theorem 11 Suppose that (H1) and (H2) hold, and the sequence {x } is generated by Algorithm 2. k Then lim (cid:107)g (cid:107)=0. (30) k k→∞ Moreover, there is no limit point of the sequence {x } to be a local maximizer of f. k 10 MasoudAhookhosh,SusanGhaderi Proof Bycontradiction,weassumelim (cid:107)g (cid:107)=(cid:54) 0.Hencethereexistsε>0andaninfinitesubsequence k→∞ k of {x }, indexed by {t }, such that k i (cid:107)g (cid:107)≥2ε>0, (31) ti foralli∈N.Theorem10ensurestheexistence,foreacht ,afirstsuccessfuliterationr(t )>t suchthat i i i (cid:107)g (cid:107)<ε. We denote r =r(t ). Hence there exists another subsequence, indexed by {r }, such that r(ti) i i i (cid:107)g (cid:107)≥ε for t ≤k <r , (cid:107)g (cid:107)<ε. (32) k i i ri We now restrict our attention to the sequence of successful iterations whose indices are in the set κ={k ∈N | t ≤k <r }. i i Using (32), for every k ∈κ, (28) holds. It follows from (22) and (28) that lim δ =0, (33) k k→∞ for k ∈κ. Now, using (H2), (17), and (cid:107)g (cid:107)≥ε, the condition (23) holds, for k ∈κ. This, Lemma 2, and k (33) lead to (cid:12) (cid:12) (cid:12) (cid:12) |rk−1|=(cid:12)(cid:12)(cid:12)fqk−(0)f−(xkq+(ddk)) −1(cid:12)(cid:12)(cid:12)=(cid:12)(cid:12)(cid:12)fk−f(xkq+(d0k))−−q(q(kd(0))−qk(dk))(cid:12)(cid:12)(cid:12) k k k k k k O((cid:107)d (cid:107)2) O(δ2) ≤ k ≤ k →0 (k →∞, k ∈κ). βε min{δ ,ε/M} βεδ k k Thus, for a sufficiently large k+1∈κ, we get f −f(x +d )≥µ (q (0)−q (d )) k k k 1 k k k (cid:26) (cid:107)g (cid:107)(cid:27) (cid:110) ε (cid:111) (34) ≥βµ (cid:107)g (cid:107)min δ , k ≥βµ ε min δ , . 1 k k (cid:107)B (cid:107) 1 k M k The condition (33) implies that δ ≤ε/M. Hence, for a sufficiently large k ∈κ, we obtain k 1 δ ≤ (f −f ). (35) k βµ k k+1 1 Then (18) and (35) imply r(cid:88)i−1 r(cid:88)i−1 1 1 (cid:107)x −x (cid:107)≤ (cid:107)x −x (cid:107)≤ δ ≤ (f −f )≤ (T −f ), (36) ti ri j j+1 j βµ ti ri βµ ti ri 1 1 j∈κ,j=ti j∈κ,j=ti for a sufficiently large i. Now, Corollary 8 implies 1 0≤ lim (cid:107)x −x (cid:107)≤ lim (T −f )=0, i→∞ ti ri i→∞βµ1 ti ri leading to lim (cid:107)x −x (cid:107)=0. i→∞ ti ri Since the gradient is continuous, we get lim (cid:107)g −g (cid:107)=0. (37) i→∞ ti ri In view of the definitions of {t } and {r }, it is impossible, guaranteeing (cid:107)g −g (cid:107)≥ε. Therefore, there i i ti ri is no subsequence that satisfies (31) giving the result. To observe there is no limit point of the sequence {x } to be a local maximizer of f, see [27]. (cid:116)(cid:117) k ThenextresultgivestheglobalconvergenceofthesequencegeneratedbyAlgorithm2tosecond-order stationary points. To this end, similar to [15], an additional assumption is needed: (H4) If λ (B ) represents the smallest eigenvalue of the symmetric matrix B , then there exists a min k k positive scalar c such that 3 q (0)−q (d )≥c λ (B )δ2. k k k 3 min k