Generalized Forward-Backward Splitting Hugo Raguet1 Jalal Fadili2 Gabriel Peyré1 1Ceremade 2GREYC CNRS-Université Paris-Dauphine CNRS-ENSICAEN-Université de Caen Pl. De Lattre De Tassigny, 6, Bd du Maréchal Juin, 75775 Paris Cedex 16 14050 Caen Cedex France France 2 {raguet,peyre}@ceremade.dauphine.fr [email protected] 1 0 January 13, 2012 2 n a J Abstract 2 1 This paper introduces the generalized forward-backward splitting algorithm for minimizing convex functions of the form F +∑n G , where F has a Lipschitz- i=1 i ] continuous gradient and the G ’s are simple in the sense that their Moreau proxim- C i ity operators are easy to compute. While the forward-backward algorithm cannot O deal with more than n = 1 non-smooth function, our method generalizes it to the . h case of arbitrary n. Our method makes an explicit use of the regularity of F in the t forward step, and the proximity operators of the G ’s are applied in parallel in the a i m backward step. This allows the generalized forward-backward to efficiently address an important class of convex problems. We prove its convergence in infinite dimen- [ sion, and its robustness to errors on the computation of the proximity operators 3 and of the gradient of F. Examples on inverse problems in imaging demonstrate v theadvantageoftheproposedmethodsincomparisontoothersplittingalgorithms. 4 0 4 1 Introduction 4 . 8 Throughout this paper, H denotes a real Hilbert space endowed with scalar product 0 1 ⟨⋅∣⋅⟩ and associated norm ∣∣⋅∣∣, and n is a positive integer. We consider the following 1 minimization problem : v n i min{Ψ(x)d=efF(x)+∑Gi(x)}, (1) X x∈H i=1 r a where all consideredfunctions belong to the classΓ0(H)of lower semicontinuous, proper (its domain is non-empty) and convex functions from H to ]−∞,+∞]. 1.1 State-of-the-Art in Splitting Methods The decomposition (1) is fairly general, and a wide range of iterative algorithms takes advantage of the specific properties of the functions in the summand. One crucial property is the possibility to compute the associated proximity operators [54], defined as 1 prox (x)d=efargmin ∣∣x−y∣∣2+G(y). (2) G 2 y∈H 1 This is in itself a convex optimization problem, which can be solved efficiently for many functions, e.g. when the solution, unique by strong convexity, can be written in closed form. Such functions are referred to as “simple”. Another important feature is the differentiability of the functional to be minimized. However, gradient-descent approaches do not apply as soon as one of the functions G is i non-smooth. For n≡1 and G simple, the forward-backward algorithm circumvents this 1 difficultyifF isdifferentiablewithaLipschitz-continuousgradient. Thisschemeconsists in performing alternatively a gradient-descent (corresponding to an explicit step on the functionF)followedbyaproximalstep(correspondingtoanimplicitsteponthefunction G ). Such a scheme can be understood as a generalization of the projected gradient 1 method. This algorithm has been well studied [50, 40, 67, 16, 69, 24, 7]. Accelerated multistep versions have been proposed [55, 70, 6], that enjoy a faster convergence rate of O(1/t2) on the objective Ψ. Othersplittingmethodsdonotrequireanysmoothnessonsomepartofthecomposite functional Ψ. The Douglas-Rachford [27] and Peaceman-Rachford [57] schemes were developedtominimizeG (x)+G (x),providedthatG andG aresimple[47,45,33,17] 1 2 1 2 and rely only on the use of proximity operators. The backward-backward algorithm [46, 56, 1, 5, 17] can be used to minimize Ψ(x)=G (x)+G (x) when the functions involved 1 2 aretheindicatorfunctionsofnon-emptyclosedconvexsets,orinvolveMoreauenvelopes. Interestingly, if one of the functions G or G is a Moreau envelope and the other is 1 2 simple, the forward-backward algorithm amounts to a backward-backward scheme. If L is a bounded injective linear operator, it is possible to minimize Ψ(x) = G ○ 1 L(x)+G (x) by applying these splitting schemes on the Fenchel-Rockafellar dual prob- 2 lem. It was shown that applying the Douglas-Rachford scheme leads to the alternat- ing direction method of multipliers (ADMM) [39, 40, 41, 42, 33]. For non-necessarily injective L and G strongly convex with a Lipschitz-continuous gradient, the forward- 2 backward algorithm can be applied to the Fenchel-Rockafellar dual [36, 19]. Dealing with an arbitrary bounded linear operator L can be achieved using primal-dual methods motivated by the classical Kuhn-Tucker theory. Starting from methods to solve saddle function problems such as the Arrow-Hurwicz method [2] and its modification [60], the extragradient method [44], this problem has received a lot of attention more recently [15, 68, 64, 53, 12, 9]. It is also possible to extend the Douglas-Rachford algorithm to an arbitrary number n>2 of simple functions. Inspired by the method of partial inverses [65, Section 5], most methodsrelyeitherexplicitlyorimplicitlyonintroducingauxiliaryvariablesandbringing back the original problem to the case n = 2 in the product space Hn. Doing so yields iterative schemes in which one performs independent parallel proximal steps on each of the simple functions and then computes the next iterate by essentially averaging the results. Variants have been proposed in [21] and [34], who describe a general projective framework that does not reduce the problem to the case n=2. Note however that these extensions do not apply to the forward-backward scheme that can only handle n≡1. It is at the heart of this paper to present such an extension. Recently proposed methods extend existing splitting schemes to handle the sum of any number of n ≥ 2 composite functions of the form G = H ○L , where the H ’s are i i i i ∗ simple and the L ’s are bounded linear operators. Let us denote L the adjoint operator i i ∗ of L . If L satisfies L L = νId for any ν > 0 (it is a so-called tight frame), H ○L is i i i i i i ∗ simple as soon as H is simple and L is easy to compute [20]. This case thus reduces to i i ∗ ∗ the previously reviewed ones. If L is not a tight frame but (Id+L L ) or (Id+L L ) i i i i i is easily invertible, it is again possible to reduce the problem to the previous cases by introducingasmanyauxiliaryvariablesasthenumberofL ’seachbelongingtotherange i 2 of L . Note however that, if solved with the Douglas-Rachford algorithm on the product i space, the auxiliary variables are also duplicated, which would increase significantly the dimensionalityoftheproblem. Somededicatedparallelimplementationswerespecifically ∗ ∗ designedforthecasewhere(∑ L L )or(∑ L L )is(easily)invertible,seeforinstance i i i i i i [32,58]. IftheL ’ssatisfynoneoftheaboveproperties,itisstillpossibletocallonprimal- i dual methods, either by writing Ψ(x)=∑n H (L x)=G(Lx) with L(x)=(L (x)) and i=1 i i i i G((xi)i) = ∑iHi(xi), see for instance [29]; or Ψ((xi)i) = ιS((xi)i)+∑iHi(Lixi) [9], where S is the closed convex set defined in Section 3.2. In spite of the wide range of already existing proximal splitting methods, none seems satisfying to address explicitly the case where n>1 and F is smooth but not necessarily simple. Aworkaroundthathasbeenproposedpreviouslyusednestedalgorithmstocom- pute the proximity operator of ∑ G within sub-iterations, see for instance [30, 14]; this i i leads to practical as well as theoretical difficulties to select the number of sub-iterations. More recently, [53] proposed an algorithm for minimizing Ψ(x) = F(x)+G(x) under linear constraints. We show in Section 5 how this can be adapted to adress the general problem(1)whileachievingfullsplittingoftheproximityoperatorsoftheG ’sandusing i the gradient of F. It suffers however from limitations, in particular the introduction of many auxiliary variables and the fact that the gradient descent can’t be directly applied to the minimizer; see Section 5 and 6 for details. The generalized forward-backward algorithm introduced in this paper is intended to avoid all those shortcomings. As this paper was being finalized, the authors in [23] independently developed a primal-dual algorithm to solve a class of problems that cover those we consider here. Their approach and algorithm are however very different from ours in many important ways. We will provide a detailed comparison with this work in Section 5 and will also show on numerical experiments in Section 6 that our algorithm seems more adapted for problems of the form (1). 1.2 Applications in Image Processing Many imaging applications require solving ill-posed inverse problems to recover high qualityimagesfromlow-dimensionalandnoisyobservations. Thesechallengingproblems necessitate the use of regularization through prior models to capture the geometry of natural signals, images or videos. The resolution of the inverse problem can be achieved by minimizing objective functionals, with respect to a high-dimensional variable, that takes into account both a fidelity term to the observations and regularization terms reflecting the priors. Clearly, such functionals are composite by construction. Section 6 details several examples of such inverse problems. In many situations, this leads to the optimization of a convex functional that can be split into the sum of convex smooth and non-smooth terms. The smooth part of the objective is in some cases a data fidelity term and reflects some specific knowledge about the forward model, i.e. the noise and the measurement/degradation operator. This is for instancethecaseiftheoperatorislinearandthenoiseisadditiveGaussian,inwhichcase the data fidelity is a quadratic function. The most successful regularizations that have been advocated are non-smooth, which typically allow to preserve sharp and intricate structures in the recovered data. Among such priors, sparsity-promoting ones have becomepopular,e.g.the(cid:96) -normofcoefficientsinawiselychosendictionary[49],ortotal 1 variation (TV)prior[63]. Tobettermodelthedata, compositepriorscanbeconstructed by summing several suitable regularizations, see for instance the morphological diversity framework [66]. The proximity operator of the (cid:96) -norm penalization is a simple soft- 1 thresholding [26], whereas the use of complex or mixed regularization priors justifies the 3 splitting of non-smooth terms in several simpler functions (see Section 6 for concrete examples). Thecompositestructureofconvexoptimizationproblemsraisingwhensolvinginverse problems in the form of a sum of simple and/or smooth functions involving linear opera- torsexplains the popularityofproximalsplittingschemes inimagingscience. Depending on the structure of the objective functional as detailed in the previous section, one can resort to the appropriate splitting algorithm. For instance, the forward-backward algo- rithm and its modifications has become popular for sparse regularization with a smooth data fidelity, see for instance [38, 25, 24, 35, 13, 6, 8]. The Douglas-Rachford and its parallelized extensions were also used in a variety of inverse problems implying only non- smooth functions, see for instance [20, 21, 30, 14, 10, 28, 31, 61]. The ADMM (which is nothing but Douglas-Rachford on the dual) was also applied to some linear inverse problems in [48, 37]. Primal-dual schemes [12, 29] are among the most flexible schemes to handle more complicated priors. The interested reader may refer to [66, Chapter 7] and [22] for extensive reviews. 1.3 Contributions and Paper Organization This paper introduces a novel generalized forward-backward algorithm to solve (1) when F is convex with a Lipschitz continuous gradient, and the G ’s are convex and i simple. The algorithm achieves full splitting where all operators are used separately: an explicitstepfor∇F (single-valued)andaparallelizedimplicitstepthroughtheproximity operators of the G ’s. We prove convergence of the algorithm as well as its robustness to i errors that may contaminate the iterations. To the best of our knowledge, it is among the first algorithms to tackle the case where n > 1 and F is smooth (see Section 5 for relation to a recent work developed in parallel to ours). Although our numerical results are reported only on imaging applications, the algorithm may prove useful for many other applications such as machine learning or statistical estimation. Section2presentsthealgorithmandstateourmaintheoreticalresult. Section3,that can be skipped by experienced readers, sets some necessary material from the framework of monotone operator theory. Section 4 reformulates the generalized forward-backward algorithm for finding the zeros of the sum of maximal monotone operators, and proves its convergence and its robustness. Special instances of the algorithm, its potential extensions and discussion of its relation to two alternatives in the literature are given in Section 5. Numerical examples are reported in Section 6 to show the usefulness of this approach for applications to imaging problems. 2 Generalized Forward-Backward Algorithm for Minimization Problems We consider problem (1) where all functions are in Γ (H), F is differentiable on H 0 with1/β-Lipschitz gradient whereβ ∈]0,+∞[, and for alli, G is simple. Wealso assume i the following: (H1) The set of minimizers of (1) argmin(Ψ) is non-empty; (H2) The domain qualification condition holds, i.e. (0,...,0)∈sri{(x−y ,...,x−y ) ∣x∈H and ∀i, y ∈dom(G )} , 1 n i i 4 wheredom(G )d=ef{x∈H∣G (x)<+∞}andsri(C)isthestrongrelativeinteriorofanon- i i empty convex subset C of H [4]. Under (H1)-(H2), it follows from [62, 4, Theorem 16.2 and Theorem 16.37(i)] that ∅≠argmin(Ψ)=zer(∂Ψ)=zer(∇F +∑ ∂G ) , i i where ∂G denotes the subdifferential of G and zer(A) is the set of zeros of a set-valued i i map A (see Definition 3.1 in Section 3.1). Therefore, solving (1) is equivalent to Find x∈H such that 0∈∇F(x)+∑∂G (x) . (3) i i The generalized forward-backward we propose to minimize (1) (or equivalently to solve (3)) is detailed in Algorithm 1. Algorithm 1 Generalized Forward-Backward Algorithm for solving (1). β-1 ∈]0,+∞[ is the Lipschitz constant of ∇F; I is defined in Theorem 2.1. λ (z ) ∈ Hn, (ω ) ∈]0,1[n s.t. ∑n ω =1, Require i i∈ 1,n i i∈ 1,n i=1 i γ ∈]0(cid:74),2β(cid:75)[ ∀t∈N, λ ∈I(cid:74)∀t(cid:75)∈N . t t λ Initialization x←∑ ω z ; i i i t←0. Main iteration repeat for i∈ 1,n do zi ←(cid:74)zi+(cid:75)λt(proxωγtiGi(2x−zi−γt∇F(x))−x); x←∑ ω z ; i i i t←t+1. until convergence ; Return x. To state our main theorem that ensures the convergence of the algorithm and its robustness, for each i let ε1,t,i be the error at iteration t when computing proxωγtiGi at its argument, and let ε be the error at iteration t when applying ∇F to its argument. 2,t Algorithm 1 generates sequences (z ) , i ∈ 1,n and (x ) , such that for all i and i,t t∈N (cid:74) (cid:75) t t∈N t, zi,t+1 =zi,t+λt(proxωγtiGi(2xt−zi,t−γt(∇F(xt)+ε2,t))+ε1,t,i−xt) . (4) The following theorem introduces two different sets of assumptions to guarantee convergence. Assumption (A1) allows one to use a greater range for the relaxation parameters λ , while assumptions (A2) enables varying gradient-descent step-size γ and t t ensures strong convergence in the uniformly convex case. Recall that a function F ∈ Γ (H) is uniformly convex if there exists a non-decreasing function ϕ∶[0,+∞[→[0,+∞] 0 that vanishes only at 0, such that for all x and y in dom(F), the following holds ∀ρ∈]0,1[, F(ρx+(1−ρ)y)+ρ(1−ρ)ϕ(∣∣x−y∣∣)≤ρF(x)+(1−ρ)F(y). (5) Theorem 2.1. Set limγ =γ¯ and define the following assumptions: t (A0) (i) 0<limλ ≤limλ <min(3,1+2β/γ¯); t t 2 2 (ii) ∑+∞∣∣ε ∣∣<+∞, and for all i, ∑+∞∣∣ε ∣∣<+∞. t=0 2,t t=0 1,t,i 5 (A1) (i) ∀t, γ =γ¯ ∈]0,2β[; t (ii) I =]0,min(3,1+2β/γ¯)[. λ 2 2 (A2) (i) 0<limγ ≤γ¯ <2β; t (ii) I =]0,1]. λ Suppose that (H1), (H2) and (A0) are satisfied. Then, if either (A1) or (A2) is satisfied, (x ) defined in (4) converges weakly towards a minimizer of (1). Moreover, if (A2) t t∈N is satisfied and F is uniformly convex, the convergence is strong to the unique global minimizer of (1). This theorem will be proved after casting it in the more general framework of mono- tone operator splitting in Section 4. Remark 2.1. The sufficient condition of strong convergence in Theorem 2.1 can be weak- ened, and other ones can be stated as well. Indeed, the generalized forward-backward algorithm has a structure that bears similarities with the classical forward-backward, since it consists of an explicit forward step, followed by an implicit step where the prox- imity operators are computed in parallel. In fact, it turns out that the backward step involves a firmly non-expansive operator (see next section), and therefore statements of [24, Theorem 3.4(iv) and Proposition 3.6] can be transposed with some care to our algorithm. The formulation of Algorithm 1 is general, but it can be simplified for practical purposes. In particular, the auxiliary variables z can all be initialized to 0, the weights i ω set equally to 1/n, and for simplicity the relaxation parameters λ and the gradient- i t descent step-size γ can be set constant along iterations. This is typically what has been t done in the numerical experiments. 3 Monotone Operators and Inclusions The subdifferential of a function in Γ (H) is the best-known example of maximal 0 monotone operator. Therefore, it is natural to extend the generalized forward-backward, Algorithm 1, to find the zeros of the sum of maximal monotone operators, i.e. solve the monotone inclusion (3) when the subdifferential is replaced by any maximal mono- tone operator. This is the goal pursued in Section 4 where we provide the proof of a general convergence and robustness theorem whose byproduct is a convergence proof of Theorem 2.1. We first begin by recalling some essential definitions and properties of monotone operators that are necessary to our exposition. The interested reader may refer to [59, 4] for a comprehensive treatment. 3.1 Definitions and Properties In the following, A∶H→2H is a set-valued operator, and Id is the identity operator on H. A is single-valued if the cardinality of Ax is at most 1. Definition 3.1 (Graph, inverse, domain, range and zeros). The graph of A is the set gra(A) d=ef {(x,y)∈H2 ∣y ∈Ax}. The inverse of A is the operator whose graph is gra(A-1)d=ef{(x,y)∈H2 ∣(y,x)∈gra(A)}. Thedomain ofAisdom(A)d=ef{x∈H∣Ax≠∅}. The range of A is ran(A) d=ef {y ∈H∣∃x∈H∶y ∈Ax}, and its zeros set is zer(A) d=ef {x∈H∣0∈Ax}=A-1(0). 6 Definition 3.2 (Resolvant and reflection operators). The resolvant of A is the operator J d=ef(Id+A)-1. The reflection operator associated to J is the operator R d=ef2J −Id. A A A A Definition 3.3 (Maximal monotone operator). A is monotone if ∀x,y ∈H,u∈Ax and v ∈Ay ⇒⟨u−v∣x−y⟩≥0 . It is moreover maximal if its graph is not strictly contained in the graph of any other monotone operator. Definition 3.4 (Non-expansive and α-averaged operators). A is non-expansive if ∀x,y ∈H,u∈Ax and v ∈Ay ⇒∣∣u−v∣∣≤∣∣x−y∣∣ . Forα∈]0,1[,Aisα-averaged ifthereexistsRnon-expansivesuchthatA=(1−α)Id+αR. We denote A(α) the class of α-averaged operators on H. In particular, A(1) is the class 2 of firmly non-expansive operators. Note that non-expansive operators are necessarily single-valued and 1-Lipschitz con- tinuous,andsoareα-averagedoperatorssincetheyarealsonon-expansive. Thefollowing lemma gives some useful characterizations of firmly non-expansive operators. Lemma 3.1. Let A∶dom(A)=H→H. The following statements are equivalent: (i) A is firmly non-expansive; (ii) 2A−Id is non-expansive; (iii) ∀x,y ∈H, ∣∣Ax−Ay∣∣2 ≤⟨Ax−Ay∣x−y⟩; (iv) A is the resolvent of a maximal monotone operator A′, i.e. A=JA′. Proof. (i) ⇔ (ii), A∈A(1)⇔A= Id+R for some R non-expansive. (i) ⇔ (iii), see [72]. 2 2 (i) ⇔ (iv), see [51]. We now summarize some properties of the subdifferential that will be useful in the sequel. Lemma 3.2. Let F ∶ H → R be a convex differentiable function, with 1/β-Lipschitz continuous gradient, β ∈]0,+∞[, and let G ∶ H →]−∞,+∞] be a function in Γ (H). 0 Then, (i) β∇F ∈A(1), i.e. is firmly non-expansive; 2 (ii) ∂G is maximal monotone; (iii) The resolvent of ∂G is the proximity operator of G, i.e. prox =J . G ∂G Proof. (i) This is Baillon-Haddad theorem [3]. (ii) See [62]. (iii) See [54]. We thus consider in the following n maximal monotone operators A ∶H→2H indexed by i ∈ 1,n , and a (single-valued) operator B ∶ H → H and i β ∈]0,+∞[ such that βB ∈ A((cid:74)1).(cid:75)Therefore, solving (3) can be translated in the more 2 general language of maximal monotone operators as solving the monotone inclusion Find x∈H such that 0∈Bx+∑A x, (6) i i where it is assumed that zer(B+∑ A )≠∅. i i 7 3.2 Product Space The previous definitions being valid for any real Hilbert space, they also apply to the product space Hn endowed with scalar product and norm derived from the ones associated to H. Let (ω ) ∈]0,1[n such that ∑n ω =1. We consider Hd=efHn endowed with the i i∈ 1,n i=1 i scalar produc(cid:74)t ⟨⟨⋅(cid:75)∣∣⋅⟩⟩, defined as n ∀x=(x ) ,y =(y ) ∈H, ⟨⟨x∣∣y⟩⟩=∑ω ⟨x ∣y ⟩ i i i i i i i i=1 and with the corresponding norm ∣∣⋅∣∣. S ⊂ H denotes the closed convex set defined by S d=ef {x=(x ) ∈H∣x =x =⋯=x }, whose orthogonal complement is the closed i i 1 2 n linearsubspaceS(cid:150) ={x=(x ) ∈H∣∑ ω x =0}. WedenotebyIdtheidentityoperator i i i i i on H, and we define the canonical isometry C ∶H→S,x↦(x,...,x) . ιS ∶H→]−∞,+∞] and NS ∶ H → 2H are respectively the indicator function and the normal cone of S, that is ⎧ ⎧ ⎪⎪0 if x∈S , ⎪⎪S(cid:150) if x∈S , ιS(x)d=ef⎨ and NS(x)d=ef⎨ ⎪⎪+∞ otherwise , ⎪⎪∅ otherwise . ⎩ ⎩ Since S is non-empty closed and convex, it is straightforward to see that NS is maximal monotone. Tolightenthenotationinthesequel,weintroducethefollowingconcatenated operators. For every i ∈ 1,n , let A and B as defined in (6). For γ = (γ ) ∈ ]0,+∞[n, we define γA∶H(cid:74) →2(cid:75)H,x=(ix ) ↦⨉n γ A (x ), i.e. its graph is i i∈(cid:74)1,n(cid:75) i i i=1 i i i n gra(γA) d=ef ⨉gra(γ A ) i i i=1 = {(x,y)∈H2 ∣x=(x ) ,y =(y ) , and ∀i, y ∈γ A x } , i i i i i i i i and B ∶H→H,x=(x ) ↦(Bx ) . i i i i UsingthemaximalmonotonicityofA ,...,A andB itisaneasyexercisetoestablish 1 n that γA and B are maximal monotone on H. 4 Generalized Forward-Backward Algorithm for Monotone Inclusions Now that we have set all necessary material, we are ready to solve the monotone inclusion (6). First, we derive an equivalent fixed point equation satisfied by any solution of (6). From this, we draw an algorithmic scheme and prove its convergence towards a solution, as well as its robustness to errors. Finally, we derive the proof of Theorem 2.1. 4.1 Fixed Point Equation From now on, we denote the set of fixed points of an operator T ∶ H → H by FixT d=ef{z ∈H∣Tz =z}. 8 Proposition 4.1. Let (ω ) ∈ ]0,1[n. For any γ > 0, x ∈ H is a solution of (6) if i i∈ 1,n and only if there exists (z ) (cid:74) (cid:75)∈Hn such that i i∈ 1,n (cid:74) (cid:75) ⎧ ⎪⎪⎪⎨ ∀i, zi =RωγiAi(2x−zi−γBx)−γBx , (7) ⎪⎪⎪ x=∑ωizi . ⎩ i Proof. set γ >0, we have the equivalence ∀i, ω (x−z −γBx)∈γA x , 0∈Bx+∑A x ⇔ ∃ (z ) ∈Hn ∶ { i i i i i i x=∑ ω z . i i i i Now, γ ω (x−z −γBx)∈γA x ⇔ (2x−z −γBx)−x∈ A x i i i i i ω i (by Lemma 3.1 (iv)) ⇔ x=JωγiAi(2x−zi−γBx) ⇔ 2x−(2x−zi)=2JωγiAi(2x−zi−γBx) −(2x−z −γBx)−γBx i ⇔ zi =RωγiAi(2x−zi−γBx)−γBx . Before formulating a fixed point equation, consider the following preparatory lemma. Lemma 4.1. For all z =(z ) ∈H, b=(b) ∈S, and γ =(γ ) ∈]0,+∞[n, i i i i i (i) J is the orthogonal projector on S, and J z =C(∑ ω z ); NS NS i i i (ii) R (z−b)=R z−b; NS NS (iii) R z =(R (z )) . γA γiAi i i Proof. (i). From Lemma 3.2 (iii), we have for z ∈H, JNS(z)=argminy∈S∣∣z−y∣∣d=efprojS(z) . Now, argminy∈S∣∣z−y∣∣2 = C(argminy∈H∑iωi∣∣zi−y∣∣2), where the unique minimizer of ∑ ω ∣∣z −y∣∣2 is the barycenter of (z ) , i.e. ∑ ω z . i i i i i i i i (ii). J is obviously linear, and so is R . Since b ∈ S, R b = b and the result NS NS NS follows. (iii). This is a consequence of the separability of γA in terms of the components of z implying that J z =(J z ) . The result follows from the definition of R . γA γiAi i i γA Proposition 4.2. (z ) ∈Hn satisfies (7) if and only if z =(z ) is a fixed point of i i∈ 1,n i i the following operator (cid:74) (cid:75) H —→ H (8) z z→ 1[R R +Id][Id−γBJ ](z) , 2 γA NS NS with γ =(γ ) . ωi i 9 Proof. Using Lemma 4.1 in (7), we have C(x) = J z, C(Bx) = BJ (z) and R − NS NS NS γBJ =R [Id−γBJ ]. Altogether, this yields, NS NS NS z satisfies (7) ⇔ z =R R [Id−γBJ ]z−γBJ z γA NS NS NS ⇔ 2z =R R [Id−γBJ ]z+[Id−γBJ ]z γA NS NS NS 1 ⇔ z = [R R +Id][Id−γBJ ]z 2 γA NS NS 4.2 Algorithmic Scheme and Convergence The expression (8) gives us the operator on which is based the generalized forward- backward. We first study the properties of this operator before establishing convergence and robustness results of our algorithm derived from the Krasnoselskij-Mann scheme associated to it. Proposition 4.3. For all γ ∈]0,+∞[n, define H —→ H T ∶ (9) 1,γ z z→ 1[R R +Id]z . 2 γA NS Then, T is firmly non-expansive, i.e. T ∈A(1). 1,γ 1,γ 2 Proof. From Lemma 3.1, R and R are non-expansive. In view of Lemma 4.1 (iii), γiAi NS R is non-expansive as well. Finally, as a composition of non-expansive operators, γA R R is also non-expansive, and the proof is complete by the definition of A(1). γA NS 2 Proposition 4.4. For all γ ∈]0,2β[, define H —→ H T ∶ (10) 2,γ z z→ [Id−γBJ ]z . NS Then, T ∈A( γ ). 2,γ 2β Proof. By hypothesis, βB ∈A(1) and so is βB. Then, we have for any x,y ∈H 2 ∣∣βBJ x−βBJ y∣∣2 ≤ ⟨⟨βBJ x−βBJ y∣∣J x−J y⟩⟩ NS NS NS NS NS NS =⟨⟨βJ BJ x−βJ BJ y∣∣x−y⟩⟩ NS NS NS NS =⟨⟨βBJ x−βBJ y∣∣x−y⟩⟩ , (11) NS NS where we derive the first equality from the fact that J is self-adjoint (Lemma 4.1 NS (i)), and the second equality using that for all x ∈ S, Bx ⊂ S and J x = x. From NS Lemma 3.1 (iii)⇔(i), we establish that βBJ ∈A(1). We conclude using [17, Lemma NS 2 2.3]. Proposition 4.5. For all γ ∈ ]0,+∞[n and γ ∈ ]0,2β[, T T ∈ A(α), with α = 1,γ 2,γ max(2, 2 ). 3 1+2β/γ Proof. AsT andT areα-averagedoperatorsbyProposition4.3andProposition4.4, 1,γ 2,γ it follows from [17, Lemma 2.2 (iii)] that their composition is also α-averaged with the given value of α. 10