International Journal of Computer Vision manuscript No. (will be inserted by the editor) Linearized Alternating Direction Method with Adaptive Penalty and Warm Starts for Fast Solving Transform Invariant Low-Rank Textures Xiang Ren · Zhouchen Lin 3 Received:date/Accepted:date 1 0 2 Abstract TransformInvariantLow-rankTextures(TILT) 1 Introduction n is a novel and powerful tool that can effectively rec- a J tify a rich class of low-rank textures in 3D scenes from Extracting invariants in images is a fundamental prob- 9 2D images despite significant deformation and corrup- lem in computer vision. It is key to many vision tasks, 2 tion. The existing algorithm for solving TILT is based suchas3Dreconstruction,objectrecognition,andscene on the alternating direction method (ADM). It suffers understanding.Mostoftheexistingmethodsstartfrom ] V from high computational cost and is not theoretically low level local features, such as SIFT points (Schindler C guaranteed to converge to a correct solution to the in- et al, 2008), corners, edges, and small windows, which ner loop. In this paper, we propose a novel algorithm areinaccurateandunrobust.Recently,Zhangetal.(Zhang . s tospeedupsolvingTILT,withguaranteedconvergence et al, 2012b) proposed a holistic method called the c [ fortheinnerloop.Ourmethodisbasedontherecently Transform Invariant Low-rank Textures (TILT) that proposed linearized alternating direction method with can recover the deformation of a relatively large im- 2 v adaptive penalty (LADMAP). To further reduce com- age patch so that the underlying textures become reg- 1 putation, warm starts are also introduced to initialize ular (see Figures 1 (a) and (b)). This method utilizes 5 the variables better and cut the cost on singular value the global structure in the image patch, e.g. the regu- 3 decomposition. Extensive experimental results on both larity, such as symmetry, rectilinearity and repetition, 5 . synthetic and real data demonstrate that this new al- that can be measured by low rankness, rather than the 5 gorithmworksmuchmoreefficientlyandrobustlythan low level local features, hence can be very robust to 0 2 the existing algorithm. It could be at least five times significant deformation and corruption. 1 faster than the previous method. TILT has been applied to many vision tasks and : v been extended for many computer vision applications. i For example, Zhang et al. (Zhang et al, 2011b) used X Keywords Transform Invariant Low-Rank Tex- TILT to correct lens distortion and do camera calibra- r a utres · Linearized Alternating Direction Method with tion,withoutdetectingfeaturepointsandstraightlines. Adaptive Penalty · Warm Start · Singular Value Zhang et al. (Zhang et al, 2012a) applied TILT to rec- Decomposition tify texts in natural scenes to improve text recognition on mobile phones. Zhao et al. (Zhao and Quan, 2011)) proposed a method for detecting translational symme- try using TILT. Zhang et al. (Zhang et al, 2011a) fur- XiangRen ther generalized the transforms allowed by TILT to Department of Computer Science, University of Illinois at polynomiallyparameterizedonessothattheshapeand Urbana-Champaign,Urbana61801,IL,USA poseoflowranktexturesongeneralizedcylindricalsur- E-mail:[email protected] faces can be recovered. Mobahi et al. (Mobahi et al, Z.Lin(correspondingauthor) 2011) used the low rank textures recovered by TILT Key Laboratory of Machine Perception (MOE), School of EECS,PekingUniversity,Beijing100871, P.R. China as the building block for modeling urban scenes and E-mail:[email protected] reconstructing the 3D structure. 2 XiangRen,ZhouchenLin isting solver for TILT failed to produce a correct so- lution (e.g., see Figures 1(c), 1(d), 2, 3 and 4). Conse- quently,tomakethealgorithmworkable,itsparameters havetobecarefullytunedinordertotradeoffbetween convergence speed and robustness at the best (c.f. last paragraph of Section 2.2), which is difficult for differ- ent applications. The above drawbacks of the existing (a) (b) algorithm motivated us to design a more efficient al- gorithm for TILT, with a theoretical guarantee on the convergence of its inner loop. We observe that the original method does not al- ways converge to a correct solution because the inner loopoftheoriginalTILTproblemhavethreevariables. This motivates us to cancel one of the variables and (c) (d) solve an equivalent TILT problem that has only two variables in its inner loop. Then using the recently de- Fig.1 TILTforrectifyingtextures.Leftcolumn:Theorig- inalimagepatches,specifiedbyredrectangles,andthetrans- veloped linearized alternating direction method with formsfoundbyTILT,indicatedbygreenquadrilaterals.Right adaptivepenalty(LADMAP)(Linetal,2011)andsome column: rectified textures using the transforms found. Top good properties of the annihilation matrix, the inner row: results by our method for solving TILT. Bottom row: loop can be solved efficiently and with a convergence results by the original method in (Zhang et al, 2012b) for solving TILT. The original method does not converge to a guarantee. We further speed up the computation by correctsolution.(Imagesinthispaperarebestviewedon warm starts in two ways. First, we initialize the vari- screen!) ables in the inner loop by their values when they exit the previous inner loop. This gives the variables very good initial values. So the number of iterations in the innerloopcanbegreatlycut.Second,assingularvalue TILT was inspired by Robust Alignment by Sparse decomposition (SVD) is the major bottleneck of com- andLow-rankdecomposition(RASL)(Pengetal,2010), putation, we also solve SVD by warm start, where the which has been successfully applied to align faces and singular vectors and singular values are initialized as video frames. TILT and RASL share the same mathe- those in the previous iteration. The update of SVD is matical model, namely after an appropriate transform also based on a recently developed technique of opti- adatamatrixcanbedecomposedintoalowrankcom- mizationwithorthogonalityconstraints(WenandYin, ponent and a sparse one (see (1)). The only difference 2012).Asaresult,ournewalgorithm,calledLADMAP between TILT and RASL is that the data matrix in withwarmstarts(LADMAP+WS),canbeatleastfive TILT is an image patch, while that in RASL is multi- times faster than the original method and has a con- pleimagesorframes,eachbeingacolumnofthematrix. vergence guarantee for the inner loop. So in the sequel, we just focus on TILT. The rest of this paper is organized as follows. In The existing most efficient solver for TILT is based Section 2 we review the TILT problem and the ADM on the alternating direction method (ADM) (Zhang method for solving it. In Section 3 we introduce the etal,2012b;Linetal,2009).Ithasbeenadoptedbyall LADMAP method for solving the inner loop of TILT theresearchersthatuseTILTfortheirproblems.How- andthevariablewarmstarttechnique.InSection4,we ever, it still requires multiple seconds to rectify a small present some important details of applying LADMAP sized image patch, making the applications of TILT to TILT so that the computations can be made effi- far from being interactive. Another critical drawback cient. In Section 5, we show the warm start technique of the existing solver is that it was derived out of intu- to compute SVD. Experimental results on both simu- itionbysimplymimicingtheADMfortherobustprin- lated and real data are reported in Section 6. Finally, cipal component analysis (RPCA) problem presented we conclude our paper in Section 7. in (Lin et al, 2009). In the literature of ADM in the Gauss-Seidelfashion,almostalltheconvergenceresults were proven under the case of two variables, while the inner loop of TILT problem (see (3)) has three vari- 2 Transform Invariant Low-rank Textures ables. Hence the convergence of ADM for the inner loop of TILT is not theoretically guaranteed. In our Inthissection,wefirstbrieflyreviewthemathematical experiments, we did find many examples that the ex- model of TILT. Then we introduce its corresponding TitleSuppressedDue toExcessive Length 3 convex surrogate and the existing ADM based method the inner loop for solving TILT. When the increment for solving it. ∆τ is computed, the transform is updated as τi+1 =τi+∆τ. 2.1 Mathematical Model Inthefollowing,forbrevitywhenwearefocusedonthe Consider a 2D texture as a matrix A ∈ Rm×n. It is inner loop, for simplicity we may omit the superscripts called a low-rank texture if r (cid:28) min(m,n), where r = i+1 or i of variables. This should not cause ambiguity rank(A). However, a real texture image is hardly an by referring to the context. ideal low-rank texture, mainly due to two factors: 1. it Zhangetal. (Zhangetal,2012b)proposedanADM undergoes a deformation, e.g., a perspective transform based method to solve (3). It is to minimize the aug- from 3D scene to 2D image; 2. it may be subject to mented Lagrangian function of problem (3): many types of corruption, such as noise and occlusion. L(A,E,∆τ,Y,µ) So if we could rectify a deformed texture D with a =(cid:107)A(cid:107) +λ(cid:107)E(cid:107) +(cid:104)Y,D◦τ +J∆τ −A−E(cid:105) proper inverse transform τ and then remove the cor- ∗ 1 +µ(cid:107)D◦τ +J∆τ −A−E(cid:107)2, ruptions E, then the resulting texture A will be low 2 F rank. This inspires the following mathematical model withrespecttotheunknownvariablesalternately (hence for TILT (Zhang et al, 2012b): the name ADM), where Y is the Lagrange multiplier, (cid:104)A,B(cid:105) ≡ tr(ATB) is the inner product, (cid:107)·(cid:107) is the min rank(A)+λ(cid:107)E(cid:107) , s.t. D◦τ =A+E, (1) F 0 A,E,τ Frobenius norm, and µ > 0 is the penalty parameter. where τ :R2 →R2 belongs to a certain group of trans- The ADM in (Zhang et al, 2012b) goes as follows: forms, e.g., affine transforms, perspective transforms, A =argminL(A,E ,∆τ ,Y ,µ ), (4) andgeneralcylindricaltransforms(Zhangetal,2011a), k+1 k k k k A (cid:107)E(cid:107) denotes the number of non-zero entries in E, and 0 E =argminL(A ,E,∆τ ,Y ,µ ), (5) k+1 k+1 k k k λ > 0 is a weighting parameter which trades off the E rank of the underlying texture and the sparsity of the ∆τ =argminL(A ,E ,∆τ,Y ,µ ), (6) k+1 k+1 k+1 k k corruption. ∆τ Y =Y +µ (D◦τ +J∆τ −A −E ), k+1 k k k+1 k+1 k+1 µ =ρµ , k+1 k 2.2 The Existing Method for Solving TILT whereρ>1isaconstant.Allsubproblems(4)-(6)have closedformsolutions(Zhangetal,2012b)(cf.(24)and The above problem (1) is not directly tractable, be- (26)). More complete details of the algorithm can be causetherankofamatrixandthe(cid:96) -normarediscrete 0 found as Algorithm 1 in (Zhang et al, 2012b). and nonconvex, thus solving the optimization problem is NP-hard. As a common practice, rank and (cid:96) -norm Although the above ADM in the Gauss-Seidel fash- 0 could be replaced by the nuclear norm (Cand´es et al, ionempiricallyworkswellinmostcases,thereisnothe- 2011)(thesumofthesingularvalues)and(cid:96) -norm(Donoho,oreticalguaranteeonitsconvergence.Therearetwofac- 1 2004)(thesumoftheabsolutevaluesofentries),respec- torsthatmayresultinitsnon-convergence.Namely,all tively. This yields the following optimization problem thepreviousconvergenceresultsofADMintheGauss- with a convex objective function (Zhang et al, 2012b): Seidel fashion were proven under the conditions that the number of unknown variables are only two1 and min (cid:107)A(cid:107)∗+λ(cid:107)E(cid:107)1, s.t. D◦τ =A+E. (2) thepenaltyparameterisupperbounded.Incontrast,the A,E,τ inner loop of TILT has three variables and its penalty The above problem is not a convex program as the parameter is not upper bounded. As one will see, the constraint is nonconvex. So Zhang et al. (Zhang et al, naiveADMalgorithmabovemaynotproduceacorrect 2012b) proposed to linearize D◦τ at the previous τi solution. In particular, the choice of ρ is critical to in- as D ◦(τi +∆τ) ≈ D ◦τi +J∆τ and determine the fluence the number of iterations in the inner loop and increment ∆τ in the transform by solving the quality of solution. If ρ is small, there will be a lot min (cid:107)A(cid:107)∗+λ(cid:107)E(cid:107)1, s.t. D◦τi+J∆τ =A+E, (3) 1 Forthecaseofmorethantwovariables,ADMwithsome A,E,∆τ appropriate modifications, such as introducing a dummy variable and treating all the original variables as a super where J is the the Jacobian: derivative of the image block(BertsekasandTsitsiklis,1997)orintroducingaGaus- with respect to the transform parameters. J is full col- sianbacksubstitution(Heetal,2012),canbeproventocon- umn rank. We call the iterative procedure to solve (3) verge. 4 XiangRen,ZhouchenLin of iterations but the solution is very likely to be cor- resulting in the following updating scheme: rect. If ρ is large, the number of iterations is small but µ η x =argminf(x)+ k A(cid:107)x−x an incorrect solution may be produced. So one has to k+1 x 2 k tune ρ very carefully so that the number of iterations +A∗(γ +µ (A(x )+B(y )−c))/(µ η )(cid:107)2, (11) k k k k k A and the quality of solution can be best traded off. This µ η y =argming(y)+ k B(cid:107)y−y is difficult if the textures are different. So in this paper k+1 y 2 k we aim at addressing both the computation speed and +B∗(γ +µ (A(x )+B(y )−c))/(µ η )(cid:107)2, (12) k k k+1 k k B the convergence issue of the inner loop in the original andλisstillupdatedas(10),whereη >0andη >0 method. A B are some parameters. Then one can see the new sub- problems (11) and (12) can have closed-form solutions again when f and g are norms. Lin et al.also proposed 3 Solving the Inner Loop by LADMAP a strategy to adaptively update the penalty parameter µ 2 and proved that when µ is non-decreasing and up- In this section, we show how to apply a recently de- per bounded, and η >(cid:107)A(cid:107)2 and η >(cid:107)B(cid:107)2, (x ,y ) A B k k veloped method LADMAP to solve the inner loop of convergestoanoptimalsolutionto(7),where(cid:107)A(cid:107)and TILT. We first reformulate (3) so that ∆τ is canceled (cid:107)B(cid:107) are the operator norms of A and B, respectively. and hence LADMAP can be applied. For more details, please refer to (Lin et al, 2011). 3.2 Reformulating the Inner Loop of TILT 3.1 Sketch of LADMAP As almost all existing convergence results of ADM or LADMAP(Linetal,2011)aimsatsolvingthefollowing linearized ADM in the Gauss-Seidel fashion are proven type of convex programs: in the case of two variables, while the inner loop of TILThasthreevariables,weaimatcancelingonevari- minf(x)+g(y), s.t. A(x)+B(y)=c, (7) x,y ablebytakingadvantageofthespecialstructureofthe problem. where x, y and c could be either vectors or matrices, Wenoticethat∆τ doesnotappearintheobjective f and g are convex functions, and A and B are linear function of (3) and it is easy to see that if (A∗,E∗) is mappings. an optimal solution, then the optimal ∆τ must be IftheoriginalADMisappliedto(7),xandyareup- ∆τ∗ =(JTJ)−1JT(A∗+E∗−D◦τ), (13) dated by minimizing the augmented Lagrangian func- tion of (7), resulting in the following updating scheme: as (A∗,E∗,∆τ∗) must satisfy the linear constraint in (3). So we have to find an optimization problem to ob- x =argminf(x) k+1 x tain (A∗,E∗). µ + k(cid:107)A(x)+B(y )−c+γ /µ (cid:107)2, (8) By plugging (13) into the linear constraint in (3), 2 k k k one can see that (A∗,E∗) must satisfy the following y =argming(y) k+1 constraint: y µ + 2k(cid:107)B(y)+A(xk+1)−c+γk/µk(cid:107)2, (9) J⊥A+J⊥E =J⊥(D◦τ), (14) γ =γ +µ [A(x )+B(y )−c], (10) k+1 k k k+1 k+1 where J⊥ =I−J(JTJ)−1JT and I is the identity ma- trix. So we have an optimization problem for (A∗,E∗): whereγ istheLagrangemultiplierandµisthepenalty parameter. In many problems, f and g are vector or matrix norms, and the subproblems (8) and (9) have min(cid:107)A(cid:107) +λ(cid:107)E(cid:107) , s.t.J⊥A+J⊥E =J⊥(D◦τ). (15) closed-form solutions when A and B are identity map- A,E ∗ 1 pings (Cai et al, 2010; Donoho, 1995; Liu et al, 2010; We can prove the following proposition. YangandZhang,2011),henceeasilysolvable.However, when A and B are not identity mappings, subproblems 2 Pleasereferto(20)and(23).Forsuccinctness,wedonot (8)and(9)maynotbeeasytosolve.SoLinet al. (Lin repeat it here. Disregarding the adaptive penalty, LADMAP isaspecialcaseofthegeneralizedADM(DengandYin,2012) et al, 2011) proposed linearizing the quadratic penalty andiscloselyrelatedtothepredictorcorrectorproximalmul- termintheaugmentedLagrangianfunctionandadding tipliermethod(ChenandTeboulle,1994)whichupdatesvari- a proximal term in (8) and (9) for updating x and y, ablesparallellyrather thanalternately. TitleSuppressedDue toExcessive Length 5 Proposition 1 Problems (3) and (15) have the same The closed form solution to (17) is as follows (Cai optimal solution (A∗,E∗). et al, 2010): Proof. We only have to prove that the constraints for Ak+1 =S 1 (Mk), (24) (A,E) do not change. The constraints for (A,E) in (3) µk where S(·) is the singular value shrinkage operator: and (15) can be written as S (W)=UT (Σ)VT, (25) ε ε S ={(A,E)|∃∆τ such that D◦τ +J∆τ =A+E}, 1 inwhichUΣVT istheSVDofW andT (·)isthescalar ε and shrinkage operator defined as (Donoho, 1995): S ={(A,E)|J⊥A+J⊥E =J⊥(D◦τ)}, T (x)=sgn(x)max(|x|−ε,0). 2 ε respectively.If(A,E)∈S ,thenwecanmultiplyJ⊥ to Subproblem (18) also has a closed form solution as 1 bothsidesofD◦τ+J∆τ =A+E toseethat(A,E)∈ follows (Yang and Zhang, 2011): S2. If (A,E) ∈ S2, then D ◦τ −A−E ∈ null(J⊥). Ek+1 =T λ (Nk). (26) Since null(J⊥) = span(J), there exists ∆τ such that µk D◦τ −A−E =J∆τ. Thus (A,E)∈S . (cid:3) The iterations (17)-(20) stop when the following two 1 One can easily check that J⊥ has the following nice conditions are satisfied: properties: (cid:107)J⊥(A +E −D◦τ)(cid:107) /(cid:107)J⊥(D◦τ)(cid:107) <ε (27) k+1 k+1 F F 1 (J⊥)T =J⊥ and J⊥J⊥ =J⊥. (16) and Moreover, denote the operator norm of J⊥ as (cid:107)J⊥(cid:107). µ ·max((cid:107)A −A (cid:107) ,(cid:107)E −E (cid:107) )/(cid:107)J⊥(D◦τ)(cid:107) <ε . k k+1 k F k+1 k F F 2 Then we have (28) Proposition 2 (cid:107)J⊥(cid:107)=1. These two stopping criteria are set based on the Proof:From(16)wehave(J⊥)2 =J⊥.Sotheeigenval- KKT conditions of problem (15). Details of the deduc- ues λ of J⊥ satisfies λ2−λ=0. Thus, the eigenvalues tion can be found in (Lin et al, 2011). of J⊥ are either 1 or 0. As J⊥ is symmetric, (cid:107)J⊥(cid:107)=1. (cid:3) 3.3 Warm Starting Variables in the Inner Loop Applying LADMAP ((11),(12), and (10)) directly to (15) (where x, y, and γ are A, E, and Y, respec- Sincethenumberofiterationsintheinnerloopgreatly tively),withsomealgebrawehavethefollowingupdat- affects the efficiency of the whole algorithm, we should ing scheme: provide good initial values to the variables in the inner µ A =argmin(cid:107)A(cid:107) + k(cid:107)A−M (cid:107)2, (17) loop so that the number of iterations can be reduced. k+1 A ∗ 2 k F TheoriginalalgorithminitializesAastheinputD◦τ µ E =argminλ(cid:107)E(cid:107) + k(cid:107)E−N (cid:107)2, (18) and E and Y as 0. Such a cold start strategy does not k+1 1 2 k F E utilize any information from the previous loop. We ob- Y =Y +µ J⊥(A +E −D◦τ), (19) serve that solutions from the previous inner loop are k+1 k k k+1 k+1 µ =min(µ ,ρ·µ ), (20) good initializations for next inner loop, because the k+1 max k difference in D ◦τ in successive inner loops becomes where smaller and smaller. So their solutions should be close M =A −J⊥(A +E −D◦τ +Y /µ ), (21) to each other. This motivates us to use the following k k k k k k N =E −J⊥(A +E −D◦τ +Y /µ ), (22) warm start strategy for the variables: k k k+1 k k k Ai+1 =Ai , Ei+1 =Ei , and Yi+1 =Yi , (29) µmax is an upper bound of µk and 0 ∞ 0 ∞ 0 ∞ wherethesubscriptsandsuperscriptsareindicesofthe ρ ,if µ ·max((cid:107)A −A (cid:107) , 0 k k+1 k F inner and outer loops, respectively. The subscripts 0 ρ= (cid:107)E −E (cid:107) )/(cid:107)J⊥(D◦τ)(cid:107) <ε , (23) k+1 k F F 2 and ∞ stand for the initial and final values in an inner 1, otherwise, loop, respectively. in which ρ ≥ 1 is a constant and ε > 0 is a small We summarize our LADMAP with variable warm 0 2 threshold. Note that in (17)-(18) we have utilized the start (LADMAP+VWS) approach for solving TILT in properties of J⊥ in (16) and (cid:107)J⊥(cid:107) = 1. The updating Algorithm 1. Note that we only change the part of the scheme (20) and (23) for µ comes from the adaptive inner loop. The rest of the algorithm is inherited from penalty strategy in LADMAP (Lin et al, 2011). that in (Zhang et al, 2012b). 6 XiangRen,ZhouchenLin Algorithm 1 (TILT via LADMAP+VWS) is an mn×mn matrix, which is huge and often can- Input: A rectangular window D∈Rm×n in an image, the not be fit into the memory. Moreover, the computation initial parameters τ0 of the transform, weight parameter costwillbeO((mn)2),whichisalsounaffordable.Sowe λ>0,ρ0≥1and µmax>0. cannot form J⊥ explicitly and then multiply it with a Initialize:A0=D◦τ0,E0=0,Y0=0,andi=0. vectorized matrix. Recall that J⊥ =I −J(JTJ)−1JT, WhilenotconvergedDo Step 1: normalize the image and compute the Jaco- we may overcome this difficulty by multiplying succes- bianw.r.t.the transform: sively: D◦τi ∂ (cid:18) D◦ζ (cid:19)(cid:12) D◦τi← , Ji+1← (cid:12) ; (cid:107)D◦τi(cid:107)F ∂ζ (cid:107)D◦ζ(cid:107)F (cid:12)ζ=τi J⊥v =v−J ·((JTJ)−1·(JTv)), (30) Step 2: solve thelinearizedconvexoptimization: whose computation cost is only O(pmn). Note that min (cid:107)A(cid:107)∗+λ(cid:107)E(cid:107)1, s.t. (Ji+1)⊥(D◦τi)=(Ji+1)⊥(A+E), (JTJ)−1 is only a p × p small matrix which can be A,E pre-computedandsaved.Sotheabovestrategyismuch with the initial conditions: Ai0+1 = Ai∞, E0i+1 = E∞i more efficient in both memory and computation. andY0i+1=Y∞i , µ0>0,k=0: While(27)or(28)arenot satisfiedDo Ai+1 = k+1 (cid:16) (cid:17) 4.2 Initializing Y Sµ1k Aik+1−(Ji+1)⊥(Aik+1+Eki+1−D◦τi+µ−k1Yki+1) , Ei+1 = InAlgorithm1,wehavetomultiply(Ji+1)⊥ withthree k+1 (cid:16) (cid:17) T Ei+1−(Ji+1)⊥(Ai+1 +Ei+1−D◦τ+µ−1Yi+1) , different matrices. If we initialize Yi+1 in the subspace λ k k+1 k k k 0 Yki++1µ1k=Yki+1−µk(Ji+1)⊥(Aik++11+Eki++11−D◦τi), spanned by (Ji+1)⊥, then Yki+1 is always in the sub- µk+1=min(µmax,ρ·µk), whereρisgivenby (23), space spanned by (Ji+1)⊥ during the iteration. Then k←k+1. thankstotheidempotencyof(Ji+1)⊥,wehave(Ji+1)⊥Yi+1 = k End While Yi+1 and hence A and E can be updated as k Step 3: compute optimal ∆τ∗ using Eq. (13), where A∗=Ai∞+1, E∗=E∞i+1; Aik++11=(cid:16) (cid:17) Step4:updatetransform:τi+1=τi+∆τ∗,i←i+1; Sµ1k Aik+1−(Ji+1)⊥(Aik+1+Eki+1−D◦τi)−µ−k1Yki+1 , Ei+1 = k+1 (cid:16) (cid:17) End While T Ei+1−(Ji+1)⊥(Ai+1 +Ei+1−D◦τi)−µ−1Yi+1 . Output:convergedsolution(A∗,E∗,τ∗). µλk k k+1 k k k Thenonecanseethatweonlyhavetomultiply(Ji+1)⊥ 4 Implementation Details with two different matrices, Aik+1+Eki+1−D◦τi and Ai+1 +Ei+1−D◦τi, because (Ji+1)⊥(Ai+1 +Ei+1 − k+1 k k+1 k+1 In the previous section we introduce the basic ideas of D◦τi) is used for updating both Yi+1 and Ai+1. This k+1 k+2 LADMAPwithvariablewarmstart(LADMAP+VWS) savesonemultiplicationofmultiplyingwith(Ji+1)⊥for algorithmforsolvingtheinnerloopofTILT(15).How- each iteration. ever,therearestillsomedetailsthatneedtobehandled Tocombinewiththewarmstarttechnique,Y should carefully so that the computation can be the most effi- be initialized as cient.Inthissection,wefirstshowhowtocomputewith J⊥ efficiently, then we discuss how to modify the algo- Yi+1 =(Ji+1)⊥Yi . (31) 0 ∞ rithm in order to accommodate some additional con- straints when putting TILT to real applications. Because the dual problem of the inner loop (15) is max(cid:104)J⊥Y,D◦τ(cid:105) s.t. (cid:107)J⊥Y(cid:107) ≤1,(cid:107)J⊥Y(cid:107) ≤λ−1, 4.1 Multiplying with J⊥ 2 ∞ Y In Algorithm 1, J is actually an order-3 tensor (Zhang where(cid:107)·(cid:107) isthemaximumabsolutevalueinamatrix, ∞ et al, 2012b). When computing J is actually arranged we see that the optimal Y can be chosen in span(J⊥). in an mn×p matrix, where m×n is the size of D◦τ So constraining Y in span(J⊥) does not affect the con- and p is the number of variables to parameterize the vergence of LADMAP+VWS to the optimal solution. transform τ. When multiplying J⊥ with a matrix M AsYi ∈span((Ji)⊥),whenJi+1 isclosetoJi,Yi+1 ≈ ∞ 0 we actually mean to rearrange the matrix M into an Yi . So (31) makes good combination of warm starting ∞ mn vector and then multiply it with J⊥. However, J⊥ Y and making Y ∈span(J⊥). TitleSuppressedDue toExcessive Length 7 4.3 Handling Additional Constraints on τ Y˜ =WTY andinitializeitinspan(WTW).ThenM , k k k N and Y˜ are computed as follows: k k As is discussed in (Zhang et al, 2012b), additional con- straints should be imposed on τ so as to eliminate de- Mk =Ak−µ−k1Y˜k−WTW(Ak+Ek−D◦τ), generate or trivial solutions, e.g., τ∗ being 0. Typical Nk =Ak+1−µ−k1Y˜k−WTW(Ak+1+Ek−D◦τ), constrains include that both the center and area of the Y˜k+1 =Y˜k+µk·WTW(Ak+1+Ek+1−D◦τ). rectangle being fixed. These additional constraints can (36) beformulatedasllinearconstraintson∆τ (Zhangetal, Again, WTW(A +E −D◦τ) is used to update 2012b): k+1 k+1 bothM andY˜ .Inthisway,theiterationsforthe k+1 k+1 inner loop can be computed very efficiently. Q·∆τ =0, (32) Now the warm start of Y is replaced by that of Y˜, where Q∈Rl×p. which is: Following the same idea as that in Section 3.2, we Y˜i+1 =(Wi+1)TWi+1Y˜i . (37) 0 ∞ aimateliminating∆τ withtheladditionalconstraints. As∆τ needstosatisfyboththelinearconstraintsinEq. As l (cid:28) mn, (Wi+1)TWi+1 is actually very close to (32)andthatinproblem(3),theoverallconstraintsfor (Ji+1)⊥. So (37) is both a good combination of warm ∆τ are starting Y˜ and making Y˜ ∈span(WTW). (cid:20) (cid:21) (cid:20) (cid:21) J A+E−D◦τ ∆τ = . (33) 5 Warm Starting for SVD in the Inner Loop Q 0 Similar to the deductions in Section 3.2, we can have Asshownin(24)-(25),toupdateAwehavetocompute an equivalent problem: the SVD of Mk. Unlike other low-rank recovery prob- lems (Cai et al, 2010; Lin et al, 2009; Toh and Yun, min(cid:107)A(cid:107) +λ(cid:107)E(cid:107) , s.t. WA+WE =W(D◦τ), (34) 2009), partial SVD cannot be used here. This is be- ∗ 1 A,E causepartialSVDisfasterthanfullSVDonlywhenthe rankofA islessthanmin(m,n)/5(Linetal,2009), where k+1 while when rectifying general textures this condition is (cid:20)I−J(JTJ +QTQ)−1JT (cid:21) oftenviolated.SocomputingthefullSVDofM isvery W = . k −Q(JTJ +QTQ)−1JT costly as its complexity is O(mnmin(m,n)). Without exaggeration, the efficiency of computing the full SVD MatrixW alsoenjoysanicepropertysimilartoJ⊥: dominates the computing time of solving TILT. So we have to reduce the computation cost on full SVD as well. WTW =I−J(JTJ +QTQ)−1JT, (35) Weobservethat,exceptforthefirstseveralstepsin theinnerloop,thechangeinM betweentwoiterations k whichcanbeutilizedtoreducethecomputationalcost. is relatively small. So we may expect that the SVDs of Moreover,(cid:107)W(cid:107) =1.ThenwithsomealgebraLADMAP 2 M andM intwosuccessiveiterationsmaybeclose k k−1 applied to the above problem goes as follows: to each other. This naturally inspires us to utilize the SVD of M to estimate the SVD of M . A =S (M ), k−1 k k+1 1 k To do so, we first formulate the SVD problem as µk Ek+1 =T λ (Nk), follows: µk Yk+1 =Yk+µk·W(Ak+1+Ek+1−D◦τ), (U∗,Σ∗,V∗)=argmin F(U,Σ,V), (cid:0) (cid:1) U,Σ,V µ =min ρ·µ ,µ , k+1 k max (38) s.t. UTU =I, Σ is diagonal, and VTV =I, where 1 Mk =Ak−µ−k1WTYk−WTW(Ak+Ek−D◦τ), where F(U,Σ,V)= 2(cid:107)M −UΣVT(cid:107)2F and M ∈Rm×n N =A −µ−1WTY −WTW(A +E −D◦τ), is the matrix to be decomposed. Without loss of gener- k k+1 k k k+1 k ality we may assume m≥n. U ∈Rm×n and V ∈Rn×n and ρ is still computed as (23). arecolumnlyorthonormalandorthogonalmatrices,re- Thanks to (35), the multiplication of WTW with spectively. As we can negate the columns of U or V, a vectorized matrix can be done similarly as (30). To we need not require the diagonal entries of Σ to be further reduce the computational cost, we introduce nonnegative. 8 XiangRen,ZhouchenLin 5.1 Optimization with Orthogonality Constraints with respect to (U,Σ,V) and search on a geodesic on the constraint manifold C ×C ×C in the gradient U Σ V Theusualmethodtosolveageneralorthogonalitycon- direction for the next best solution, where C is the U strained optimization problem is to search along the Stiefelmanifoldofallcolumnlyorthonormalm×nma- geodesic of the Stiefel manifold 3 along the direction of trices,C isthesubspaceofalln×ndiagonalmatrices, Σ thegradientoftheobjectivefunctionprojectedontothe and C is the Stiefel manifold of all n×n orthogonal V tangent plane of the manifold (Edelman et al, 1999). matrices. This may require SVDs in order to generate feasible Wesearchontheconstraintmanifoldonthefollow- pointsonthegeodesic.Fortunately,WenandYin(Wen ing curve: andYin,2012)recentlydevelopedatechniquethatdoes notrelyonSVDs,makingourwarmstartforSVDpos- U(t)=(cid:0)I+ 2tPU(cid:1)−1(cid:0)I− 2tPU(cid:1)U, sible. Σ(t)=Σ−t·PΣ, (39) Denote the unknown variable as X. Suppose the gradient of the objective function at X is G, then the V(t)=(cid:0)I+ tP (cid:1)−1(cid:0)I− tP (cid:1)V, 2 V 2 V projection of G onto the tangent plane of the Stiefel manifold at X is P = GXT −XGT (Edelman et al, whereP =G UT−UGT andP =G VT−VGT are U U U V V V 1999; Wen and Yin, 2012). Instead of parameterizing the projection of G and G onto the tangent planes U V the geodesic of the Stiefel manifold along direction P of C and C at U and V, respectively, and P = U V Σ usingtheexponentialfunction,WenandYin(Wenand diag(G ) is the projection of G onto C . The details Σ Σ Σ Yin, 2012) proposed generating feasible points by the of computing the gradients can be found in Appendix. Cayley transform: Then we may find the optimal t∗ such that (cid:16) τ (cid:17)−1(cid:16) τ (cid:17) 1 X(τ)=C(τ)X, where C(τ)= I+ P I− P . t∗ =argminf(t)= (cid:107)M −U(t)·Σ(t)·V(t)T(cid:107)2. (40) 2 2 2 F t ItcanbeverifiedthatX(τ)hasthefollowingproperties: Asf(t)isacomplicatedfunctionoft,theoptimalt∗has 1. X(τ) is smooth in τ; to be found by iteration. This will be costly as many 2. (cid:0)X(τ)(cid:1)TX(τ)=I, ∀τ ∈R, given XTX =I; matrix inversions and multiplication will be required. 3. X(0)=X; So we choose to approximate f(t) by a quadratic func- 4. d X(0)=−P. tion via Taylor expansion: dτ So when τ > 0 is sufficiently small, X(τ) can result in 1 f(t)≈f(0)+f(cid:48)(0)·t+ f(cid:48)(cid:48)(0)·t2, (41) a smaller objective function value than X. 2 X(τ)couldbeviewedasreparameterizingthegeodesic where f(cid:48)(0) and f(cid:48)(cid:48)(0) are the first and second order with τ, which does not exactly equal to the geodesic derivatives of f(t) evaluated at 0, respectively. These distance. However, when τ is small it is very close to two derivatives can be computed efficiently. Details of the geodesic distance as it is the length of the two thedeductionscanbefoundinAppendix.Thenwecan segments enclosing the geodesic (Wen and Yin, 2012). obtainanapproximatedoptimalsolutiont˜∗ =−f(cid:48)(0)/f(cid:48)(cid:48)(0) When computing X(τ), no SVD is required. A matrix and approximate the SVD of M as U(t˜∗)Σ(t˜∗)V(t˜∗)T. inversion and some matrix multiplications are required The warm start SVD method is summarized in Al- instead, which is of much lower cost than SVD. How- gorithm2.Itiscalledonlywhenthedifferencebetween ever, as both matrix multiplication/inversion and SVD M and M is smaller than a pre-defined threshold k k−1 are of the same order of complexity, we have to control ε . svd thenumberofmatrixmultiplicationsandinversions,so AlthoughU(t˜∗)Σ(t˜∗)V(t˜∗)T isanapproximateSVD that our warm start based method can be faster than of M, our final goal is to compute the singular value directly computing the full SVD. shrinkage of M in order to update A (see (17), k k+1 (24) and (25)), not the SVD of M itself. We can show k that when M is close enough to M , computing the 5.2 SVD with Warm Start k k−1 SVD of M approximately still produces a highly ac- k Now for our SVD problem (38), we can compute the curate Ak+1. The corner stone of our proof is the fol- gradient(G ,G ,G )oftheobjectivefunctionF(U,Σ,V)lowingpseudocontractionpropertyofthesingularvalue U Σ V shrinkage operator: 3 A Stiefel manifold is a set of columnly orthonormal ma- trices whose geodesic distance is induced from the Euclidian (cid:107)Sε(W1)−Sε(W2)(cid:107)2F ≤(cid:107)W1−W2(cid:107)2F distanceofits ambientspace. −(cid:107)[S (W )−W ]−[S (W )−W ](cid:107)2, ε 1 1 ε 2 2 F TitleSuppressedDue toExcessive Length 9 Algorithm 2 (SVD with Warm Start) 6.1 Numerical Study on the Inner Loop Only Input:ThedecomposedmatricesU ,Σ andV of k−1 k−1 k−1 theSVDof M ,andamatrix M . In this section, we use synthetic data to compare the k−1 k If (cid:107)Mk−Mk−1(cid:107)<εsvd (Elsedofullsvd) efficiency of the original ADM based algorithm, our Step 0: M = M , U(0) = U , Σ(0) = Σ , and k k−1 k−1 LADMAP based algorithm, and LADMAP with warm V(0)=V . k−1 Step 1:computetheprojectedgradientsPU,PΣ,and start for computing SVD (LADMAP+SVDWS) on the PV ofF atU(0),Σ(0),andV(0)using(42)to(44),respec- inner loop only. As this time we only focus on the in- tively. nerloop,theeffectivenessofvariablewarmstart,which Step 2: compute f(cid:48)(0), f(cid:48)(cid:48)(0), and ˜t∗ =−f(cid:48)(0)/f(cid:48)(cid:48)(0) requires outer loops, cannot be shown. We generate using (45)to(47). Step 3: compute U =U(˜t∗), Σ =Σ(˜t∗), V =V(˜t∗) D◦τ and Jacobian J randomly and use them as the k k k using(39). input for ADM, LADMAP, and LADMAP+SVDWS. Output:Uk,Σk,andVk asthedecomposedmatricesinthe For fair comparison, we tune the extra parameters ε2 SVDof M . k and ρ in LADMAP and LADMAP+SVDWS and ε 0 svd in LADMAP+SVDWS so that the three methods stop with almost the same objective function values. All thankstoLemma3.3of(Pierra,1984)andthefactthat methods are initialized as A0 = D ◦τ0 and E0 = 0. S (·)istheproximalmappingofthenuclearnorm(Cai 0 0 ε ∆τ in the ADM method is initialized as ∆τ =0. Un- et al, 2010). Then we have: 0 0 der these conditions, we compare the three methods on their computation time, the number of iterations (cid:107)S (cid:0)U(t˜∗)Σ(t˜∗)V(t˜∗)T(cid:1)−S (cid:0)M (cid:1)(cid:107) ε ε k F needed to converge, and the objective function value ≤(cid:107)U(t˜∗)Σ(t˜∗)V(t˜∗)T −M (cid:107) k F when the iterations stop. The comparison is done un- ≤(cid:107)U(0)Σ(0)V(0)T −M (cid:107) =(cid:107)M −M (cid:107) . k F k−1 k F der different sizes of D◦τ. All the tests are repeated for 10 times and the average quantities are reported. Since we switch to our warm start technique for SVD The comparative results are shown in Table 1. We in the inner loop only when M is very close to M k k−1 canobservethatallthethreemethodsarriveatroughly (i.e., (cid:107)M −M (cid:107)<ε ), it is guaranteed that: k−1 k svd thesameobjectivefunctionvalues.However,LADMAP uses less than half of the number of iterations than (cid:107)Sε(cid:0)U(t˜∗)Σ(t˜∗)V(t˜∗)T(cid:1)−Sε(cid:0)Mk(cid:1)(cid:107)F <εsvd. ADM does and the SVD warm start only changes the numberofiterationsveryslightly.Consequently,LADMAP Hence our approximate SVD for M still results in a ismuchfasterthanADM,whileSVDWSfurtherspeeds k highly accurate A . upthecomputationofLADMAPwhenthesizeofD◦τ k+1 isnottoosmall(e.g.,≥100).Theaccelerationratealso increaseswhenthesizeofD◦τ grows4.Whenthesizeof D◦τ isverysmall(e.g.,<100),SVDWSdoesnotseems 6 Experiments tospeedupthecomputation.Thismayduetotheultra- shortcomputingtimeandhenceotherprocessesonthe Inthissection,weconductseveralexperimentsonboth workstationsupportingthecomputingenvironmentcan the inner loop only and the complete TILT problem to influence the total computing time. Anyway, the slow- evaluate the performance of our proposed LADMAP down is rather insignificant. As the speed of solving with warm starts method. Numerical experiments on large sized TILT is more time demanding in real appli- the inner loop only are conducted by using synthetic cations, adopting SVDWS is still advantageous. datainordertodemonstratetheeffectivenessofLADMAP and the warm start for SVD. For the complete TILT problem,weconductexperimentsonimageswithsimu- 6.2 Comparisons on the Complete TILT Problem latedandrealdeformationstofurthertesttheefficiency and robustness of LADMAP and the warm start tech- In this subsection, using real image data we compare niques.Theimagesweusearefromalow-ranktextures the original ADM method, LADMAP, LADMAP with data set and a natural images data set, respectively. variablewarmstart(LADMAP+VWS),andLADMAP ThecodefortheoriginalADMbasedmethodispro- with both variable warm start and SVD warm start videdbythefirstauthorof(Zhangetal,2012b).Forfair (LADMAP+VWS+SVDWS)onsolvingthewholeTILT comparison,wesetthecommonparametersinallcom- problem,inordertoshowtheeffectivenessofLADMAP paredmethodsthesame.AllthecodesareinMATLAB and the experiments are run on a workstation with an 4 We will see much higher acceleration rates when using Intel Xeon [email protected] CPU and 48GB memory. SVDWSonrealdata(seeSection 6.2). 10 XiangRen,ZhouchenLin Table1 ComparisonoftheefficiencyofalgorithmsontheinnerloopofTILTonly.Thetimecosts,numbersofiterations, andoptimalobjectivefunctionvaluesareaveragedover 10 trialsonrandomdata. Size of D◦τ ADM LADMAP LADMAP+SVDWS time(s) #iter. obj.func. time(s) #iter. obj.func. time(s) #iter. obj.func. 10×10 1.98E-05 49.2 6.4836 7.49E-06 20.4 6.4814 8.76E-06 20.9 6.4815 50×50 0.00112 51.5 76.6508 0.00054 21.2 76.6468 0.00056 22.2 76.6585 100×100 0.00538 51.1 217.001 0.00252 22.2 216.985 0.00225 22.9 216.987 300×300 0.1859 50 1123.66 0.10501 21.9 1123.67 0.08792 22.9 1123.70 500×500 1.3802 50 2420.41 0.6574 22 2419.63 0.5343 21 2420.28 1000×1000 53.0936 50 6844.82 25.8139 23 6844.14 18.5953 23 6845.32 1 0.9 0.8 0.7 0.6 Skew (t)0.5 0.4 0.3 0.2 0.1 00 5 10 15 20 25 30 35 40 Rotation (θ) Fig.2 Rangeofconvergenceforaffinetransform.Thex- axis and y-axis stand for the rotation angle θ and the skew valuetinanaffinetransform,respectively.Regionsinsidethe curvesareaffinetransformsthataresuccessfullyrecoveredin all trials. The blue dashed curve and the red solid curve are the boundaries of the parameters of affine transforms that can be successfully recovered by ADM and LADMAP, re- spectively. andthetwowarmstarttechniques.Aswehavepointed out before that the original ADM may not converge to theoptimalsolutionoftheinnerloop,wefirstcompare the convergence performance of ADM and LADMAP. Then we test the robustness of ADM and LADMAP when there are corruptions. We also present some ex- amples on which ADM fails but our LADMAP works well. Finally, we compare the computation time of var- ious methods on both synthetically and naturally de- formed images. The synthetically deformed images are generatedbydeformingthelow-ranktextureswithpre- defined transforms. The naturally deformed images are Fig. 3 Visual comparison between the results by ADM from our images data set which contains over 100 im- and LADMAP. First and second columns: rectification ages downloaded from the web. results by ADM. Third and fourth columns: results by LADMAP. First and third columns: rectified regions of in- terest. Second and fourth columns: the initial transform (il- lustrated by red rectangles which are manually prescribed) Range of Convergence Since LADMAP for the inner and the computed transform (illustrated by green quadrilat- loop is proven to converge to the optimal solution, we erals). expect that it will outperform ADM in recovering the correctdeformationwhensolvingthewholeTILTprob- Following the same setting in (Zhang et al, 2012b), lem5.ToshowthatLADMAPcanrecoverbroaderrange we deform a checker-board pattern by an affine trans- of deformation than ADM does, we test them with a form:y =Ax+b,wherex,y ∈R2,andtestifthetwoal- standard checker-board pattern. gorithmscanrecoverthecorrecttransform.Thematrix (cid:20) (cid:21)(cid:20) (cid:21) cos θ −sin θ 1 t 5 Note that as the whole TILT problem (2) is not a con- A is parameterized as A(θ,t)= , sin θ cos θ 01 vex program, LADMAP cannot achieve the global optimum whereθistherotationangleandtistheskewvalue.We either. So LADMAP can also fail to recover the correct de- formation. changeθwithintherange[0,π/6]withastepsizeπ/60,