IEEESIGNALPROCESSINGLETTERS(PREPRINT) 1 Total Variation Denoising via the Moreau Envelope Ivan Selesnick Abstract—Total variation denoising is a nonlinear filtering A. Relation to Prior Work method well suited for the estimation of piecewise-constant signals observed in additive white Gaussian noise. The method Numerous non-convex penalties and algorithms have been is defined by the minimization of a particular non-differentiable proposed to outperform (cid:96) -norm regularization for the esti- 1 convex cost function. This paper describes a generalization of mation of sparse signals e.g., [8], [10], [12], [13], [15], [33], this cost function that can yield more accurate estimation of [34], [39], [43], [51], [53]. However, few of these methods piecewiseconstantsignals.Thenewcostfunctioninvolvesanon- 7 maintain the convexity of the cost function. The prescription convex penalty (regularizer) designed to maintain the convexity 1 of the cost function. The new penalty is based on the Moreau of non-convex penalties maintaining cost function convexity 0 envelope. The proposed total variation denoising method can be was pioneered by Blake, Zisserman, and Nikolova [6], [36], 2 implemented using forward-backward splitting. [39], [40], and further developed in Refs. [3], [4], [14], [20], n [28], [30], [32], [42], [46], [48]. These works rely on the a I. INTRODUCTION presence of both strongly and weakly convex terms, which J Total variation (TV) denoising is a nonlinear filtering is also exploited in [35]. 2 method based on the assumption that the underlying signal is Theproposedpenaltyisexpressedasadifferentiableconvex ] piecewise constant (equivalently, the derivative of the under- function subtracted from the standard penalty (i.e., (cid:96) norm). C 1 lying signal is sparse) [45]. Such signals arise in geoscience, Previous works also use this idea [41], [42], [47]. But the dif- O biophysics, and other areas [31]. The TV denoising technique ferentiable convex functions used therein are either separable h. is also used in conjunction with other methods in order to [41], [42] or sums of bivariate functions [47]. t process more general types of signals [20], [23], [24], [26]. In parallel with the submission of this paper, Carlsson has a m Total variation denoising is prototypical of methods based alsoproposedusingMoreauenvelopestoprescribenon-trivial on sparse signal models. It is defined by the minimization convexcostfunctions[9].Whiletheapproachin[9]startswith [ of a convex cost function comprising a quadratic data fidelity a given non-convex cost function (e.g., with the (cid:96) pseudo- 0 1 termandanon-differentiableconvexpenaltyterm.Thepenalty norm penalty) and seeks the convex envelope, our approach 9v term is the composition of a linear operator and the (cid:96)1 norm. startswiththe(cid:96)1-normpenaltyandseeksaclassofconvexity- 3 Although the (cid:96)1 norm stands out as the convex penalty that preserving penalties. 4 most effectively induces sparsity [27], non-convex penalties Some forms of generalized TV are based on infimal con- 0 can lead to more accurate estimation of the underlying signal volution (related to the Moreau envelope) [5], [7], [11], 0 [37], [38], [40], [44], [50]. [49]. But these works propose convex penalties suitable for 1. Afewrecentpapersconsidertheprescriptionofnon-convex non-piecewise-constant signals, while we propose non-convex 0 penalties that maintain the convexity of the TV denoising penalties suitable for piecewise-constant signals. 7 cost function [1], [30], [32], [48]. (The motivation for this is 1 to leverage the benefits of both non-convex penalization and : v convexoptimization,e.g.,toaccuratelyestimatetheamplitude II. TOTALVARIATIONDENOISING i of jump discontinuities while guaranteeing the uniqueness of X Definition 1. Given y ∈ RN and λ > 0, total variation thesolution.)Thepenaltiesconsideredintheseworksaresep- r denoising is defined as a arable (additive). But non-separable penalties can outperform separable penalties in this context. This is because preserving tvd(y;λ)=arg min(cid:8)1(cid:107)y−x(cid:107)2+λ(cid:107)Dx(cid:107) (cid:9) (1) the convexity of the cost function is a severely limiting x∈RN 2 2 1 requirement. Non-separable penalties can more successfully =prox (y) (2) λ(cid:107)D·(cid:107) meet this requirement because they are more general than 1 separable penalties [47]. where D is the (N −1)×N matrix This paper proposes a non-separable non-convex penalty for total variation denoising that generalizes the standard −1 1 penalty and maintains the convexity of the cost function to be −1 1 menivneilmopizee,dc.a1nTmheorneeawccpuernataelltyy,ewsthimicahteisthbeaasemdplointudtheesoMfojuremapu D = ... ... . (3) −1 1 discontinuities in an underlying piecewise constant signal. TheauthoriswiththeTandonSchoolofEngineering,NewYorkUniversity, As indicated in (2), TV denoising is the proximity operator NewYork,USA.Email:[email protected]. [16] of the function x (cid:55)→ λ(cid:107)Dx(cid:107) . It is convenient that TV This work was supported by NSF under grant 1525398 and ONR under 1 denoising can be calculated exactly in finite-time [17], [18], grantN00014-15-1-2314. 1Softwareisavailableathttp://eeweb.poly.edu/iselesni/mtvd [22], [29]. 2 IEEESIGNALPROCESSINGLETTERS(PREPRINT) III. MOREAUENVELOPE TheproposedpenaltyisupperboundedbythestandardTV penalty, which is recovered as a special case. Before we define the non-differentiable non-convex penalty inSec.IV,wefirstdefineadifferentiableconvexfunction.We Proposition 5. Let α(cid:62)0. The penalty ψ satisfies α use the Moreau envelope from convex analysis [2]. ψ (x)=(cid:107)Dx(cid:107) , ∀x∈RN (12) 0 1 Definition 2. Let α(cid:62)0. We define S : RN →R as α and S (x)= min(cid:8)(cid:107)Dv(cid:107) + α(cid:107)x−v(cid:107)2(cid:9) (4) 0(cid:54)ψ (x)(cid:54)(cid:107)Dx(cid:107) , ∀x∈RN. (13) α v∈RN 1 2 2 α 1 Proof: It follows from (5) and (7). where D is the first-order difference matrix (3). When a convex function is subtracted from another convex If α>0, then S is the Moreau envelope of index α−1 of function [as in (11)], the resulting function may well be α the function x(cid:55)→(cid:107)Dx(cid:107) . negative on part of its domain. Inequality (13) states that the 1 proposed penalty ψ avoids this fate. This is relevant because Proposition 1. The function S can be calculated by α α the penalty function should be non-negative. S (x)=0 (5) Figures in the supplemental material show examples of the 0 proposed penalty ψ and the function S . S (x)=(cid:107)Dtvd(x;1/α)(cid:107) α α α 1 + α(cid:107)x−tvd(x;1/α)(cid:107)2, α>0. (6) V. ENHANCEDTVDENOISING 2 2 Wedefine‘Moreau-enhanced’TVdenoising.Ifα>0,then Proof: For α =0: Setting α =0 and v =0 in (4) gives (5).Forα>0:BythedefinitionofTVdenoising,thev ∈RN the proposed penalty penalizes large amplitude values of Dx less than the (cid:96) norm does (i.e., ψ (x)(cid:54)(cid:107)Dx(cid:107) ), hence it is minimizing the function in (4) is the TV denoising of x, i.e., 1 α 1 less likely to underestimate jump discontinuities. vopt =tvd(x,1/α). Definition 4. Given y ∈ RN, λ > 0, and α (cid:62) 0, we define Proposition 2. Let α(cid:62)0. The function S satisfies α Moreau-enhanced total variation denoising as 0(cid:54)Sα(x)(cid:54)(cid:107)Dx(cid:107)1, ∀x∈RN. (7) mtvd(y;λ,α)=arg min(cid:8)1(cid:107)y−x(cid:107)2+λψ (x)(cid:9) (14) Proof: From (4), we have S (x)(cid:54)(cid:107)Dv(cid:107) +(α/2)(cid:107)x− x∈RN 2 2 α α 1 where ψ is given by (11). v(cid:107)2 for all v ∈ RN. In particular, v = x leads to S (x) (cid:54) α 2 α (cid:107)Dx(cid:107) . Also, S (x) (cid:62) 0 since S (x) is defined as the The parameter α controls the non-convexity of the penalty. 1 α α minimum of a non-negative function. Ifα=0,thenthepenaltyisconvexandMoreau-enhancedTV denoising reduces to TV denoising. Greater values of α make Proposition 3. Let α (cid:62) 0. The function S is convex and α the penalty more non-convex. What is the greatest value of differentiable. α that maintains convexity of the cost function? The critical Proof: It follows from Proposition 12.15 in Ref. [2]. value is given by Theorem 1. Proposition 4. Let α(cid:62)0. The gradient of Sα is given by Theorem 1. Let λ>0 and α(cid:62)0. Define Fα: RN →R as ∇S0(x)=0 (8) Fα(x)= 21(cid:107)y−x(cid:107)22+λψα(x) (15) (cid:0) (cid:1) ∇S (x)=α x−tvd(x;1/α) , α>0 (9) where ψ is given by (11). If α α where tvd denotes total variation denoising (1). 0(cid:54)α(cid:54)1/λ (16) Proof: Since S is the Moreau envelope of index α−1 thenFα isconvex.If0(cid:54)α<1/λthenFα isstronglyconvex. α of the function x (cid:55)→ (cid:107)Dx(cid:107)1 when α > 0, it follows by Proof: We write the cost function as Proposition 12.29 in Ref. [2] that F (x)= 1(cid:107)y−x(cid:107)2+λ(cid:107)Dx(cid:107) −λS (x) (17) (cid:0) (cid:1) α 2 2 1 α ∇Sα(x)=α x−prox(1/α)(cid:107)D·(cid:107) (x) . (10) = 1(cid:107)y−x(cid:107)2+λ(cid:107)Dx(cid:107) 1 2 2 1 This proximity operator is TV denoising, giving (9). −λ min(cid:8)(cid:107)Dv(cid:107) + α(cid:107)x−v(cid:107)2(cid:9) (18) v∈RN 1 2 2 = max(cid:8)1(cid:107)y−x(cid:107)2+λ(cid:107)Dx(cid:107) IV. NON-CONVEXPENALTY v∈RN 2 2 1 To strongly induce sparsity of Dx, we define a non-convex −λ(cid:107)Dv(cid:107) − λα(cid:107)x−v(cid:107)2(cid:9) (19) 1 2 2 generalization of the standard TV penalty. The new penalty is = max(cid:8)1(1−λα)(cid:107)x(cid:107)2+λ(cid:107)Dx(cid:107) +g(x,v)(cid:9) (20) defined by subtracting a differentiable convex function from v∈RN 2 2 1 the standard penalty. = 1(1−λα)(cid:107)x(cid:107)2+λ(cid:107)Dx(cid:107) + maxg(x,v) (21) Definition 3. Let α(cid:62)0. We define the penalty ψ : RN →R 2 2 1 v∈RN α where g(x,v) is affine in x. The last term is convex as it is as the point-wise maximum of a set of convex functions. Hence, ψ (x)=(cid:107)Dx(cid:107) −S (x) (11) α 1 α F is a convex function if 1−λα (cid:62) 0. If 1−λα > 0, then α where D is the matrix (3) and S is defined by (4). F is strongly convex (and strictly convex). α α 3 VI. ALGORITHM In summary, the proposed Moreau-enhanced TV denoising Proposition 6. Let y ∈ RN, λ > 0, and 0 < α < 1/λ. Then method comprises the steps: x(k) produced by the iteration 1) Set the regularization parameter λ (λ>0). z(k) =y+λα(cid:0)x(k)−tvd(x(k);1/α)(cid:1) (22a) 2) Set the non-convexity parameter α (0(cid:54)α<1/λ). 3) Initialize x(0) =0. x(k+1) =tvd(z(k);λ). (22b) 4) Run iteration (22) until convergence. converges to the solution of the Moreau-enhanced TV denois- ing problem (14). VII. OPTIMALITYCONDITION Proof: If the cost function (15) is strongly convex, then To avoid terminating the iterative algorithm too early, it is the minimizer can be calculated using the forward-backward useful to verify convergence using an optimality condition. splitting (FBS) algorithm [2], [16]. This algorithm minimizes a function of the form Proposition 7. Let y ∈RN, λ>0, and 0<α<1/λ. If x is a solution to (14), then F(x)=f (x)+f (x) (23) 1 2 (cid:2) (cid:0) (cid:1)(cid:3) where both f1 and f2 are convex and ∇f1 is additionally C (x−y)/λ+α(tvd(x;1/α)−x) n ∈sign([Dx]n) (32) Lipschitz continuous. The FBS algorithm is given by for n=0,...,N −1, where C ∈R(N−1)×N is given by z(k) =x(k)−µ(cid:2)∇f (x(k))(cid:3) (24a) 1 (cid:40) x(k+1) =argmxin(cid:8)12(cid:107)z(k)−x(cid:107)22+µf2(x)(cid:9) (24b) Cm,n = 10,, mm(cid:62)<nn, i.e., [Cx]n = (cid:88) xm (33) m(cid:54)n where 0 < µ < 2/ρ and ρ is the Lipschitz constant of ∇f . 1 The iterates x(k) converge to a minimizer of F. and sign is the set-valued signum function To apply the FBS algorithm to the proposed cost function (15), we write it as {−1}, t<0 F (x)= 1(cid:107)y−x(cid:107)2+λψ (x), (25) sign(t)= [−1,1], t=0 (34) α 2 2 α {1}, t>0. = 1(cid:107)y−x(cid:107)2+λ(cid:107)Dx(cid:107) −λS (x) (26) 2 2 1 α =f1(x)+f2(x) (27) According to (32), if x ∈ RN is a minimizer, then the points ([Dx] ,u )∈R2 must lie on the graph of the signum where n n function, where u denotes the value on the left-hand side n f1(x)= 12(cid:107)y−x(cid:107)22−λSα(x) (28a) of (32). Hence, the optimality condition can be depicted as a f (x)=λ(cid:107)Dx(cid:107) . (28b) scatter plot. Figures in the supplemental material show how 2 1 the points in the scatter plot converge to the signum function The gradient of f is given by 1 as the algorithm (22) progresses. ∇f (x)=x−y−λ∇S (x) (29) Proof of Proposition 7: A vector x minimizes a convex 1 α (cid:0) (cid:1) function F if 0 ∈ ∂F(x) where ∂F(x) is the subdifferential =x−y−λα x−tvd(x;1/α) (30) ofF atx.Thesubdifferentialofthecostfunction(15)isgiven usingProposition4.SubtractingS fromf doesnotincrease by α 1 theLipschitzconstantof∇f ,thevalueofwhichis1.Hence, 1 we may set 0<µ<2. ∂Fα(x)=x−y−λ∇Sα(x)+∂(λ(cid:107)D·(cid:107)1)(x) (35) Using (28), the FBS algorithm (24) becomes which can be written as z(k) =x(k)−µ(cid:2)x(k)−y −λα(cid:0)x(k)−tvd(x(k);1/α)(cid:1)(cid:3) (31a) ∂F (x)={x−y−λ∇S (x)+λDTu α α x(k+1) =argmxin(cid:8)12(cid:107)z(k)−x(cid:107)22+µλ(cid:107)Dx(cid:107)1(cid:9). (31b) :un ∈sign([Dx]n), u∈RN−1}. (36) Note that (31b) is TV denoising (1). Using the value µ = Hence, the condition 0∈∂F (x) can be written as α 1 gives iteration (22). (Experimentally, we found this value yields fast convergence.) (y−x)/λ+∇S (x) α Each iteration of (22) entails solving two standard TV ∈{DTu:u ∈sign([Dx] ), u∈RN−1}. (37) n n denoising problems. In this work, we calculate TV denoising usingthefastexactClanguageprogrambyCondat[17].Like LetC beamatrixofsize(N−1)×N suchthatCDT =−I, the iterative shrinkage/thresholding algorithm (ISTA) [19], e.g.,(33).Itfollowsthatthecondition0∈∂F (x)impliesthat α [25], algorithm (22) can be accelerated in various ways. We suggest not setting α too close to the critical value 1/λ (cid:2)C(cid:0)(x−y)/λ−∇S (x)(cid:1)(cid:3) ∈sign([Dx] ) (38) α n n because the FBS algorithm generally converges faster when the cost function is more strongly convex (α<1). for n=0,...,N −1. Using Proposition 4 gives (32). 4 IEEESIGNALPROCESSINGLETTERS(PREPRINT) (a) Noisy signal (σ = 0.50) TV denoising 6 L1 norm 4 0.5 Scalar log Scalar MC E S 0.4 Proposed 2 M R e 0.3 0 g a er v 0.2 −2 A 0.1 0 50 100 150 200 250 0 (b) Standard convex penalty (L1 norm) 0.2 0.4 0.6 0.8 1 6 Noise standard deviation (σ) RMSE = 0.318, MAE = 0.225 4 Fig. 2. TV denoising using four penalties: RMSE as a function of noise level. 2 0 Figure 1 shows the result of TV denoising with three different penalties. In each case, a convex cost function is −2 minimized.Figure1(b)showstheresultusingstandardTVde- noising (i.e., using the (cid:96) -norm). This denoised signal consis- 0 50 100 150 200 250 1 tently underestimates the amplitudes of jump discontinuities, (c) Scalar MC penalty especially those occurring near other jump discontinuities of 6 opposite sign. Figure 1(c) shows the result using a separable RMSE = 0.240, MAE = 0.172 non-convex penalty [48]. This method can use any non- 4 convex scalar penalty satisfying aprescribed set of properties. Here we use the minimax-concave (MC) penalty [3], [52] 2 with non-convexity parameter set to maintain cost function 0 convexity. This result significantly improves the root-mean- squareerror(RMSE)andmean-absolute-deviation(MAE),but −2 still underestimates the amplitudes of jump discontinuities. Moreau-enhancedTVdenoising,showninFig.1(d),further 0 50 100 150 200 250 reduces the RMSE and MAE and more accurately estimates the amplitudes of jump discontinuities. The proposed non- (d) Proposed penalty 6 separable non-convex penalty avoids the consistent underes- RMSE = 0.200, MAE = 0.130 timation of discontinuities seen in Figs. 1(b) and 1(c). 4 To further compare the denoising capability of the consid- ered penalties, we calculate the average RMSE as a function 2 of the noise level. We let the noise standard deviation span the interval 0.2(cid:54)σ (cid:54)1.0. For each σ value, we calculate the 0 average RMSE of 100 noise realizations. Figure 2 shows that −2 the proposed penalty yields the lowest average RMSE for all σ (cid:62) 0.4. However, at low noise levels, separable convexity- 0 50 100 150 200 250 preserving penalties [48] perform better than the proposed non-separable convexity-preserving penalty. Fig.1. Totalvariationdenoisingusingthreedifferentpenalties.(Thedashed lineisthetruenoise-freesignal.) IX. CONCLUSION This paper demonstrates the use of the Moreau envelope to define a non-separable non-convex TV denoising penalty that VIII. EXAMPLE maintains the convexity of the TV denoising cost function. This example applies TV denoising to the noisy piecewise ThebasicideaistosubtractfromaconvexpenaltyitsMoreau constant signal shown in Fig. 1(a). This is the ‘blocks’ signal envelope. This idea should also be useful for other problems, (length N = 256) generated by the Wavelab [21] function e.g., analysis tight-frame denoising [41]. MakeSignal with additive white Gaussian noise (σ =0.5). Separableconvexity-preservingpenalties[48]outperformed √ We settheregularization parameterto λ= Nσ/4 following theproposedoneatlownoiselevelsintheexample.Itisyetto adiscussioninRef.[22].ForMoreau-enhancedTVdenoising, be determined if a more general class of convexity-preserving we set the non-convexity parameter to α=0.7/λ. penalties can outperform both across all noise levels. 5 REFERENCES [28] W.He,Y.Ding,Y.Zi,andI.W.Selesnick.Sparsity-basedalgorithmfor detecting faults in rotating machines. Mechanical Systems and Signal Processing,72-73:46–64,May2016. [1] F. Astrom and C. Schnorr. On coupled regularization for non-convex variational image enhancement. In IAPR Asian Conf. on Pattern [29] N.A.Johnson. Adynamicprogrammingalgorithmforthefusedlasso Recognition(ACPR),pages786–790,November2015. andL0-segmentation. J.Computat.Graph.Stat.,22(2):246–260,2013. [30] A.Lanza,S.Morigi,andF.Sgallari. Conveximagedenoisingvianon- [2] H. H. Bauschke and P. L. Combettes. Convex Analysis and Monotone convex regularization with parameter selection. J. Math. Imaging and OperatorTheoryinHilbertSpaces. Springer,2011. [3] ˙I.Bayram. Penaltyfunctionsderivedfrommonotonemappings. IEEE Vision,pages1–26,2016. [31] M.A.LittleandN.S.Jones.Generalizedmethodsandsolversfornoise SignalProcessingLetters,22(3):265–269,March2015. removal from piecewise constant signals: Part I – background theory. [4] I. Bayram. On the convergence of the iterative shrinkage/thresholding Proc.R.Soc.A,467:3088–3114,2011. algorithmwithaweaklyconvexpenalty. IEEETrans.SignalProcess., [32] M. Malek-Mohammadi, C. R. Rojas, and B. Wahlberg. A class 64(6):1597–1608,March2016. of nonconvex penalties preserving overall convexity in optimization- [5] S.BeckerandP.L.Combettes. Analgorithmforsplittingparallelsums basedmeanfiltering. IEEETrans.SignalProcess.,64(24):6650–6664, of linearly composed monotone operators, with applications to signal December2016. recovery. J.NonlinearandConvexAnalysis,15(1):137–159,2014. [33] Y. Marnissi, A. Benazza-Benyahia, E. Chouzenoux, and J.-C. Pesquet. [6] A.BlakeandA.Zisserman. VisualReconstruction. MITPress,1987. Generalizedmultivariateexponentialpowerpriorforwavelet-basedmul- [7] M.Burger,K.Papafitsoros,E.Papoutsellis,andC.-B.Scho¨nlieb.Infimal tichannelimagerestoration. InProc.IEEEInt.Conf.ImageProcessing convolution regularisation functionals of BV and Lp spaces. J. Math. (ICIP),pages2402–2406,September2013. ImagingandVision,55(3):343–369,2016. [34] H. Mohimani, M. Babaie-Zadeh, and C. Jutten. A fast approach for [8] E. J. Cande`s, M. B. Wakin, and S. Boyd. Enhancing sparsity by overcompletesparsedecompositionbasedonsmoothedl0norm. IEEE reweighted l1 minimization. J. Fourier Anal. Appl., 14(5):877–905, Trans.SignalProcess.,57(1):289–301,January2009. December2008. [35] T. Mo¨llenhoff, E. Strekalovskiy, M. Moeller, and D. Cremers. The [9] M. Carlsson. On convexification/optimization of functionals including primal-dual hybrid gradient method for semiconvex splittings. SIAM anl2-misfitterm. https://arxiv.org/abs/1609.09378,September2016. J.Imag.Sci.,8(2):827–857,2015. [10] M.CastellaandJ.-C.Pesquet. OptimizationofaGeman-McClurelike [36] M. Nikolova. Estimation of binary images by minimizing convex criterionforsparsesignaldeconvolution.InIEEEInt.WorkshopComput. criteria. InProc.IEEEInt.Conf.ImageProcessing(ICIP),pages108– Adv.Multi-SensorAdaptiveProc.,pages309–312,December2015. 112vol.2,1998. [11] A. Chambolle and P.-L. Lions. Image recovery via total variation [37] M.Nikolova.Localstronghomogeneityofaregularizedestimator.SIAM minimization and related problems. Numerische Mathematik, 76:167– J.Appl.Math.,61(2):633–658,2000. 188,1997. [38] M.Nikolova. Analysisoftherecoveryofedgesinimagesandsignals [12] R.Chartrand. Shrinkagemappingsandtheirinducedpenaltyfunctions. byminimizingnonconvexregularizedleast-squares. MultiscaleModel. InProc.IEEEInt.Conf.Acoust.,Speech,SignalProcessing(ICASSP), Simul.,4(3):960–991,2005. pages1026–1029,May2014. [39] M. Nikolova. Energy minimization methods. In O. Scherzer, editor, [13] L. Chen and Y. Gu. The convergence guarantees of a non-convex HandbookofMathematicalMethodsinImaging,chapter5,pages138– approachforsparserecovery.IEEETrans.SignalProcess.,62(15):3754– 186.Springer,2011. 3767,August2014. [40] M. Nikolova, M. K. Ng, and C.-P. Tam. Fast nonconvex nonsmooth [14] P.-Y. Chen and I. W. Selesnick. Group-sparse signal denoising: Non- minimization methods for image restoration and reconstruction. IEEE convex regularization, convex optimization. IEEE Trans. Signal Pro- Trans.ImageProcess.,19(12):3073–3088,December2010. cess.,62(13):3464–3478,July2014. [41] A. Parekh and I. W. Selesnick. Convex denoising using non-convex [15] E. Chouzenoux, A. Jezierska, J. Pesquet, and H. Talbot. A majorize- tightframeregularization.IEEESignalProcessingLetters,22(10):1786– minimizesubspaceapproachfor(cid:96)2−(cid:96)0 imageregularization. SIAMJ. 1790,October2015. Imag.Sci.,6(1):563–591,2013. [42] A.ParekhandI.W.Selesnick. Enhancedlow-rankmatrixapproxima- [16] P.L.CombettesandJ.-C.Pesquet. Proximalsplittingmethodsinsignal tion. IEEESignalProcessingLetters,23(4):493–497,April2016. processing.InH.H.Bauschkeetal.,editors,Fixed-PointAlgorithmsfor [43] J. Portilla and L. Mancera. L0-based sparse approximation: two InverseProblemsinScienceandEngineering,pages185–212.Springer- alternative methods and some applications. In Proceedings of SPIE, Verlag,2011. volume6701(WaveletsXII),SanDiego,CA,USA,2007. [44] P. Rodriguez and B. Wohlberg. Efficient minimization method for a [17] L.Condat. Adirectalgorithmfor1-Dtotalvariationdenoising. IEEE generalized total variation functional. IEEE Trans. Image Process., SignalProcessingLetters,20(11):1054–1057,November2013. 18(2):322–332,February2009. [18] J. Darbon and M. Sigelle. Image restoration with discrete constrained [45] L.Rudin,S.Osher,andE.Fatemi.Nonlineartotalvariationbasednoise totalvariationPartI:Fastandexactoptimization.J.Math.Imagingand removalalgorithms. PhysicaD,60:259–268,1992. Vision,26(3):261–276,2006. [46] I.W.SelesnickandI.Bayram. Sparsesignalestimationbymaximally [19] I.Daubechies,M.Defrise,andC.DeMol. Aniterativethresholdingal- sparseconvexoptimization. IEEETrans.SignalProcess.,62(5):1078– gorithmforlinearinverseproblemswithasparsityconstraint.Commun. 1092,March2014. PureAppl.Math,57(11):1413–1457,2004. [47] I. W. Selesnick and I. Bayram. Enhanced sparsity by non-separable [20] Y. Ding and I. W. Selesnick. Artifact-free wavelet denoising: Non- regularization. IEEE Trans. Signal Process., 64(9):2298–2313, May convex sparse regularization, convex optimization. IEEE Signal Pro- 2016. cessingLetters,22(9):1364–1368,September2015. [48] I.W.Selesnick,A.Parekh,andI.Bayram. Convex1-Dtotalvariation [21] D. Donoho, A. Maleki, and M. Shahram. Wavelab 850, 2005. denoising with non-convex regularization. IEEE Signal Processing http://www-stat.stanford.edu/%7Ewavelab/. Letters,22(2):141–144,February2015. [22] L. Du¨mbgen and A. Kovac. Extensions of smoothing via taut strings. [49] S.Setzer,G.Steidl,andT.Teuber. Infimalconvolutionregularizations Electron.J.Statist.,3:41–75,2009. with discrete l1-type functionals. Commun. Math. Sci., 9(3):797–827, [23] S.DurandandJ.Froment. Reconstructionofwaveletcoefficientsusing 2011. total variation minimization. SIAM J. Sci. Comput., 24(5):1754–1767, [50] M. Storath, A. Weinmann, and L. Demaret. Jump-sparse and sparse 2003. recovery using Potts functionals. IEEE Trans. Signal Process., [24] G.R.Easley,D.Labate,andF.Colonna. Shearlet-basedtotalvariation 62(14):3654–3666,July2014. diffusion for denoising. IEEE Trans. Image Process., 18(2):260–268, [51] D. P. Wipf, B. D. Rao, and S. Nagarajan. Latent variable Bayesian February2009. modelsforpromotingsparsity.IEEETrans.Inform.Theory,57(9):6236– [25] M. Figueiredo and R. Nowak. An EM algorithm for wavelet-based 6255,September2011. imagerestoration. IEEETrans.ImageProcess.,12(8):906–916,August [52] C.-H.Zhang.Nearlyunbiasedvariableselectionunderminimaxconcave 2003. penalty. TheAnnalsofStatistics,pages894–942,2010. [26] A.GholamiandS.M.Hosseini. AbalancedcombinationofTikhonov [53] H. Zou and R. Li. One-step sparse estimates in nonconcave penalized andtotalvariationregularizationsforreconstructionofpiecewise-smooth likelihoodmodels. Ann.Statist.,36(4):1509–1533,2008. signals. SignalProcessing,93(7):1945–1960,2013. [27] T. Hastie, R. Tibshirani, and M. Wainwright. Statistical learning with sparsity:thelassoandgeneralizations. CRCPress,2015. 6 IEEESIGNALPROCESSINGLETTERS(PREPRINT) X. SUPPLEMENTALFIGURES To gain intuition about the proposed penalty function and how it induces sparsity of Dx while maintaining convexity of the cost function, a few illustrations are useful. Figure 3 illustrates the proposed penalty, its sparsity- inducing behavior, and its relationship to the differentiable ||Dx|| convex function S . Figure 4 illustrates how the proposed 1 α 4 penalty is able to maintain the convexity of the cost function. Figure3showstheproposedpenalty ψ definedin(11)for α α=1andN =2.Itcanbeseenthatthepenaltyapproximates 3 the standard TV penalty (cid:107)D·(cid:107) for signals x for which Dx 1 is approximately zero. But it increases more slowly than the 2 standardTVpenaltyas(cid:107)Dx(cid:107)→∞.Inthatsense,itpenalizes large values less than the standard TV penalty. 1 As shown in Fig. 3, the proposed penalty is expressed as the standard TV penalty minus the differentiable convex non- 0 2 negative function S . Since S is flat around the null space α α of D, the penalty ψ approximates the standard TV penalty 0 around the null spaαce of D. In addition, since Sα is non- x2 −2 −2 −1 x 0 1 2 negative, the penalty ψ lies below the standard TV penalty. 1 α Figure 4 shows the differentiable part of the cost function S (x) Fα for α = 1, λ = 1, and N = 2. The differentiable part is α given by f in (28a). The total cost function is obtained by 4 1 adding the standard TV penalty to f , see (23). Hence, F is 1 α convex if the differentiable part f1 is convex. As can be seen 3 in Fig. 4, the function f is convex. We note that the function 1 f1 in this figure is not strongly convex. This is because we 2 have used α=1/λ. If α<1/λ, then the function f will be 1 strongly convex (and hence F will also be strongly convex α 1 and have a unique minimizer). We recommend α<1/λ. Figure5showsthedifferentiablepartf ofthecostfunction 1 0 F for α = 2, λ = 1, and N = 2. Here, the function f is 2 α 1 non-convex because α > 1/λ which violates the convexity 0 condition. x −2 −2 −1 0 1 2 In order to simplify the illustration, we have set y = 0 in 2 x 1 Fig. 4. In practice y (cid:54)=0. But the only difference between the cases y = 0 and y (cid:54)= 0 is an additive affine function which ψ (x) = ||Dx|| − S (x) α 1 α does not alter the convexity properties of the function. 1.5 InpracticeweareinterestedinthecaseN (cid:29)2,i.e.,signals much longer than two samples. However, in order to illustrate the functions, we are limited to the case of N = 2. We note 1 thatthecaseofN =2doesnotfullyillustratethebehaviorof the proposed penalty. In particular, when N = 2 the penalty is simply a linear transformation of a scalar function, which 0.5 does not convey the non-separable behavior of the penalty for N >2. 0 A separate document has additional supplemental figures 2 illustrating the convergence of the iterative algorithm (22). 0 These figures show the optimality condition as a scatter plot. The points in the scatter plot converge to the signum function x2 −2 −2 −1 x 0 1 2 1 as the algorithm converges. Fig.3. Penaltyψα withα=1forN =2. 7 0.5 ||x||2 0.5 ||x||2 2 2 4 4 3 3 2 2 1 1 0 0 2 2 0 0 x −2 −2 −1 0 1 2 x −2 −2 −1 0 1 2 2 x 2 x 1 1 S (x) S (x) α α 4 4 3 3 2 2 1 1 0 0 2 2 0 0 x −2 −2 −1 0 1 2 x −2 −2 −1 0 1 2 2 x 2 x 1 1 f (x) = 0.5 ||x||2 − λ S (x) f (x) = 0.5 ||x||2 − λ S (x) 1 2 α 1 2 α 4 4 3 3 2 2 1 1 0 0 2 2 0 0 x −2 −2 −1 0 1 2 x −2 −2 −1 0 1 2 2 x 2 x 1 1 Fig. 4. Differentiable part f1 of cost function with λ = 1, α = 1 for Fig. 5. Differentiable part f1 of cost function with λ = 1, α = 2 for N =2.Thefunctionisconvexasα(cid:54)1/λ. N =2.Thefunctionisnon-convexasα>1/λ.