ebook img

Regularized Principal Component Analysis for Spatial Data PDF

1.3 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Regularized Principal Component Analysis for Spatial Data

Regularized Principal Component Analysis for Spatial Data Wen-Ting Wang Institute of Statistics National Chiao Tung University [email protected] 6 and 1 0 Hsin-Cheng Huang 2 Institute of Statistical Science b Academia Sinica e [email protected] F 3 ] Abstract: In many atmospheric and earth sciences, it is of interest to identify E M dominantspatialpatternsofvariationbasedondataobservedatplocationsand . t n time points with the possibility that p>n. While principal component anal- a t s ysis(PCA)iscommonlyappliedtofindthe dominantpatterns,theeigenimages [ produced from PCA may exhibit patterns that are too noisy to be physically 3 v meaningful when p is large relative to n. To obtain more precise estimates of 1 2 eigenimages, we propose a regularization approach incorporating smoothness 2 3 and sparseness of eigenimages, while accounting for their orthogonality. Our 0 . method allows data taken at irregularly spaced or sparse locations. In addition, 1 0 theresultingoptimizationproblemcanbesolvedusingthealternatingdirection 5 1 methodofmultipliers,whichiseasytoimplement,andapplicabletoalargespa- : v tial dataset. Furthermore, the estimated eigenfunctions provide a natural basis i X forrepresentingtheunderlyingspatialprocessinaspatialrandom-effectsmodel, r a from which spatial covariance function estimation and spatial prediction can be efficiently performed using a regularized fixed-rank kriging method. Finally, the effectiveness of the proposed method is demonstrated by several numerical ex- amples. Keywords: Alternating direction method of multipliers, empirical orthogonal functions, fixed rank kriging, Lasso, non-stationary spatial covariance estima- tion, orthogonal constraint, smoothing splines. 1 /Regularized Principal Component Analysis for Spatial Data 2 1. Introduction In many atmospheric and earth sciences, it is of interest to identify dominant spatial patterns of variation based on data observed at p locations with n repeated measure- ments, where p may be larger than n. The dominant patterns are the eigenimages of the underlying (nonstationary) spatial covariance function with large eigenvalues. A commonly used approach for estimating the eigenimages is the principal compo- nent analysis (PCA), also known as the empirical orthogonal function analysis in atmospheric science. However, when p is large relative to n, the leading eigenimages produced from PCA may be noisy with high estimation variability, or exhibit some bizarre patterns that are not physically meaningful. To enhance the interpretabil- ity, a few approaches, such as rotation of components according to some criteria (see Richman (1986), Jolliffe (1987), Richman (1987)), have been proposed to form more desirable patterns. However, how to obtain a desired rotation in practice is not completely clear. Some discussion can be found in Hannachi, Jolliffe and Stephenson (2007). Another approach to aid interpretation is to seek sparse or spatially localized pat- terns, which can be done by imposing an L constraint or adding an L penalty 1 1 to an original PCA optimization formulation (Jolliffe, Uddin and Vines (2002), Zou, HastieandTibshirani(2006),ShenandHuang(2008),d’Aspremont,BachandGhaoui (2008), and Lu and Zhang (2012)). However, this approach may produce a pattern with isolated zero and nonzero components, and except Jolliffe, Uddin and Vines (2002) and Lu and Zhang (2012), the PC estimates produced from these approaches may not have orthogonal PC loadings. For continuous spatial domains, the problem becomes even more challenging. In- stead of looking for eigenimages on a lattice, we need to find eigenfunctions by es- sentially solving an infinite dimensional problem based on data observed at possibly sparse and irregularly spaced locations. Although some approaches have been devel- oped using functional principal component analysis (see e.g., Ramsay and Silverman /Regularized Principal Component Analysis for Spatial Data 3 (2005), Yao, Muller and Wang (2005) and Huang, Shen and Buja (2008)), they typi- cally focus on one-dimensional processes, or require data observed at dense locations. In particular, these methods generally do not work well when data are observed at fixed but sparse locations. Reviews of PCA on spatial data can be found in Hannachi, Jolliffe and Stephenson (2007) and Demsar et al. (2013). In this research, we propose a regularization approach for estimating the dominant patterns, taking into account smoothness and localized features that are expected in real-world spatial processes. The proposed estimates are directly obtained by solving a minimization problem. We call our method SpatPCA, which not only gives effective estimates of dominant patterns, but also provides an ideal set of basis functions for estimatingtheunderlying(nonstationary)spatialcovariancefunction,evenwhendata are irregularly or sparsely located in space. In addition, we develop a fast algorithm to solve the resulting optimization problem using the alternating direction method of multipliers (ADMM) (see Boyd et al. (2011)). An R package called SpatPCA is developed and available on the Comprehensive R Archive Network (CRAN). The rest of this paper is organized as follows. In Section 2, we introduce the pro- posed SpatPCA method, including dominant patterns estimation and spatial covari- ance function estimation. Our ADMM algorithm for computing the SpatPCA es- timates is provided in Section 3. Some simulation experiments that illustrate the superiority of SpatPCA and an application of SpatPCA to a global sea surface tem- perature dataset are presented in Section 4. 2. The Proposed Method Consider a sequence of zero-mean L2-continuous spatial processes, {η (s);s ∈ D}; i i = 1,...,n, defined on a spatial domain D ⊂ Rd, which are mutually uncorrelated, and have a common spatial covariance function, C (s,s∗) = cov(η (s),η (s∗)). We η i i /Regularized Principal Component Analysis for Spatial Data 4 consider a rank-K spatial random-effects model for η (·): i K (cid:88) η (s) = (ϕ (s),...,ϕ (s))ξ = ξ ϕ (s); s ∈ D, i = 1,...,n, i 1 K i ik k k=1 where {ϕ (.)} are unknown orthonormal basis functions, ξ = (ξ ,...,ξ )(cid:48) ∼ (0,Λ); k i i1 iK i = 1,...,n, are uncorrelated random variables, and Λ is an unknown symmetric nonnegative-definite matrix, denoted by Λ (cid:23) 0. A similar model based on given {ϕ (·)} was introduced by Cressie and Johannesson (2008) and in a Bayesian frame- k work by Kang and Cressie (2011). Let λ be the (k,k(cid:48))-th entry of Λ. Then the spatial covariance function of η (·) kk(cid:48) i is: K K (cid:88)(cid:88) C (s,s∗) = cov(η (s),η (s∗)) = λ ϕ (s)ϕ (s∗). (1) η i i kk(cid:48) k k(cid:48) k=1k(cid:48)=1 Note that Λ is not restricted to be a diagonal matrix. LetΛ = V Λ∗V (cid:48) betheeigen-decompositionofΛ,whereV consistsofK orthonor- mal eigenvectors, and Λ∗ = diag(λ∗,...,λ∗ ) consists of eigenvalues with λ∗ ≥ ··· ≥ 1 K 1 λ∗ . Let ξ∗ = V (cid:48)ξ and K i i (ϕ∗(s),...,ϕ∗ (s)) = (ϕ (s),...,ϕ (s))V ; s ∈ D. 1 K 1 K Then ϕ∗(·)’s are also orthonormal, and ξ∗ ∼ (0,λ∗); i = 1,...,n, k = 1,...,K, are k ik k mutually uncorrelated. Therefore, we can rewrite η (·) in terms of ϕ∗(·)’s: i k K (cid:88) η (s) = (ϕ∗(s),...,ϕ∗ (s))ξ∗ = ξ∗ ϕ∗(s); s ∈ D. (2) i 1 K i ik k k=1 The above expansion is known as the Karhunen-Lo´eve expansion of η (·) (Karhunen i (1947); Lo`eve (1978)) with K nonzero eigenvalues, where ϕ∗(·) is the k-th eigenfunc- k tion of C (·,·) with λ∗ the corresponding eigenvalue. η k Suppose that we observe data Y = (Y (s ),...,Y (s ))(cid:48) with added white noise i i 1 i p (cid:15) ∼ (0,σ2I) at p spatial locations, s ,...,s ∈ D, according to i 1 p Y = η +(cid:15) = Φξ +(cid:15) ; i = 1,...,n, (3) i i i i i /Regularized Principal Component Analysis for Spatial Data 5 where η = (η (s ),...,η (s ))(cid:48), Φ = (φ ,...,φ ) is a p×K matrix with the (j,k)-th i i 1 i p 1 K entry ϕ (s ), and (cid:15) ’s and ξ ’s are uncorrelated. Our goal is to identify the first L ≤ K k j i i dominant patterns, ϕ (·),...,ϕ (·), with relatively large λ∗,...,λ∗. Additionally, we 1 L 1 L are interested in estimating C (·,·), which is essential for spatial prediction. η Let Y = (Y ,...,Y )(cid:48) be the n×p data matrix. Throughout the paper, we assume 1 n that the mean of Y is known as zero. So the sample covariance matrix of Y is S = Y (cid:48)Y /n. A popular approach for estimating {ϕ∗(·)} is PCA, which estimates k (ϕ∗(s ),...,ϕ∗(s ))(cid:48) by φ˜ , the k-th eigenvector of S, for k = 1,...,K. Let Φ˜ = k 1 k p k (cid:0)˜ ˜ (cid:1) φ ,...,φ be a p×K matrix formed by the first K principal component loadings. 1 K ˜ Then Φ solves the following constrained optimization problem: min(cid:107)Y −Y ΦΦ(cid:48)(cid:107)2 subject to Φ(cid:48)Φ = I , F K Φ (cid:16)(cid:88) (cid:17)1/2 where Φ = (φ ,...,φ ) and (cid:107)M(cid:107) = m2 is the Frobenius norm of a 1 K F ij i,j ˜ matrix M. Unfortunately, Φ tends to have high estimation variability when p is large (leading to excessive number of parameters), n is small, or σ2 is large. Consequently, ˜ the patterns of Φ may be too noisy to be physically interpretable. In addition, for a continuous spatial domain D, we also need to estimate ϕ∗(s)’s for locations with no k data observed (i.e., s ∈/ {s ,...,s }); see some discussion in Section 12.4 and 13.6 of 1 p Jolliffe (2002). 2.1. Regularized Spatial PCA To prevent high estimation variability of PCA, we adopt a regularization approach by minimizing the following objective function: K K p (cid:88) (cid:88)(cid:88)(cid:12) (cid:12) (cid:107)Y −Y ΦΦ(cid:48)(cid:107)2 +τ J(ϕ )+τ (cid:12)ϕ (s )(cid:12), (4) F 1 k 2 k j k=1 k=1 j=1 over ϕ (·),...,ϕ (·), subject to Φ(cid:48)Φ = I and φ(cid:48)Sφ ≥ φ(cid:48)Sφ ≥ ··· ≥ φ(cid:48) Sφ , 1 K K 1 1 2 2 K K where (cid:88) (cid:90) (cid:18) ∂2ϕ(s) (cid:19)2 J(ϕ) = ds, ∂xz1...∂xzd z1+···+zd=2 Rd 1 d /Regularized Principal Component Analysis for Spatial Data 6 is a roughness penalty, s = (x ,...,x )(cid:48), τ ≥ 0 is a smoothness parameter, and 1 d 1 τ ≥ 0 is a sparseness parameter. The objective function (4) consists of two penalty 2 terms.Thefirstoneisdesignedtoenhancesmoothnessofϕ (·)throughthesmoothing k splinepenaltyJ(ϕ ),whilethesecondoneistheL Lassopenalty(Tibshirani(1996)), k 1 used to promote sparse patterns by shrinking some PC loadings to zero. While the L penalty alone may lead to isolated zero and nonzero components with no global 1 feature, when it is paired with the smoothness penalty, local sparsity translates into global sparsity, resulting in connected zero and nonzero patterns. Hence the two penalty terms together lead to desired patterns that are not only smooth but also localized. Specifically, when τ is larger, ϕˆ (·)’s tend to be smoother and vice versa. 1 k Whenτ islarger,ϕˆ (·)’sareforcedtobezeroatsomes ∈ D.Ontheotherhand,when 2 k both τ and τ are close to zero, the estimates are close to those obtained from PCA. 1 2 By suitably choosing τ and τ , we can obtain a good compromise among goodness of 1 2 fit, smoothness of the eigenfunctions, and sparseness of the eigenfunctions, leading to more interpretable results. Note that due to computational difficulty, the orthogonal constraint, is not considered by many PCA regularization methods (e.g., Zou, Hastie and Tibshirani (2006), Shen and Huang (2008), Guo et al. (2010), Hong and Lian (2013)). Although J(ϕ) involves integration, it is well known from the theory of smoothing splines (Green and Silverman (1994)) that for each k = 1,...,K, ϕˆ (·) has to be a k natural cubic spline when d = 1, and a thin-plate spline when d ∈ {2,3} with nodes at {s ,...,s }. Specifically, 1 p p d (cid:88) (cid:88) ϕˆ (s) = a g((cid:107)s−s (cid:107))+b + b x , (5) k i i 0 j j i=1 j=1 where s = (x ,...,x )(cid:48), 1 d  1  r2logr; if d = 2,  16π g(r) = Γ(d/2−2)  r4−d; if d = 1,3,  16πd/2 /Regularized Principal Component Analysis for Spatial Data 7 and the coefficients a = (a ,...,a )(cid:48) and b = (b ,b ,...,b )(cid:48) satisfy 1 p 0 1 d      ˆ G E a φ k    =  . ET 0 b 0 Here G is a p×p matrix with the (i,j)-th element g((cid:107)s −s (cid:107)), and E is a p×(d+1) i j matrix with the i-th row (1,s(cid:48)). Consequently, ϕˆ (·) in (5) can be expressed in terms i k ˆ of φ . Additionally, the roughness penalty can also be written as k J(ϕ ) = φ(cid:48)Ωφ , (6) k k k with Ω a known p×p matrix determined only by s ,...,s . The readers are referred 1 p to Green and Silverman (1994) for more details regarding smoothing splines. From (4) and (6), the proposed SpatPCA estimate of Φ can be written as: K K p (cid:88) (cid:88)(cid:88) Φˆ = argmin (cid:107)Y −Y ΦΦ(cid:48)(cid:107)2 +τ φ(cid:48)Ωφ +τ |φ |, (7) τ1,τ2 F 1 k k 2 jk Φ:Φ(cid:48)Φ=IK k=1 k=1 j=1 subjecttoφ(cid:48)Sφ ≥ φ(cid:48)Sφ ≥ ··· ≥ φ(cid:48) Sφ .Theresultingestimatesofϕ (·),...,ϕ (·) 1 1 2 2 K K 1 K can be directly computed from (5). When no confusion may arise, we shall simply ˆ ˆ write Φ as Φ. Note that the SpatPCA estimate of (7) reduces to a sparse PCA τ1,τ2 estimate of Zou, Hastie and Tibshirani (2006) if the orthogonal constraint is dropped and Ω = I (i.e., no spatial structure is considered). The tuning parameters τ and τ are selected using M-fold cross-validation (CV). 1 2 First, we partition {1,...,n} into M parts with as close to the same size as possible. Let Y (m) be the sub-matrix of Y corresponding to the m-th part, for m = 1,...,M. For each part, we treat Y (m) as the validation data, and obtain the estimate Φˆ(−m) τ1,τ2 of Φ for (τ ,τ ) ∈ A based on the remaining data Y (−m) using the proposed method, 1 2 where A ⊂ [0,∞)2 is a candidate index set. The proposed CV criterion is given in terms of an average residual sum of squares: M CV (τ ,τ ) = 1 (cid:88)(cid:13)(cid:13)Y (m) −Y (m)Φˆ(−m)(Φˆ(−m))(cid:48)(cid:13)(cid:13)2 , (8) 1 1 2 M τ1,τ2 τ1,τ2 F m=1 where Y (m)Φˆ(−m)(Φˆ(−m))(cid:48) is the projection of Y (m) onto the column space of Φˆ(−m). τ1,τ2 τ1,τ2 τ1,τ2 The final τ and τ values are (τˆ ,τˆ ) = argminCV (τ ,τ ). 1 2 1 2 1 1 2 (τ1,τ2)∈A /Regularized Principal Component Analysis for Spatial Data 8 2.2. Estimation of Spatial Covariance Function To estimate C (·,·) in (1), we also need to estimate the spatial covariance parameters, η σ2 and Λ. We apply the regularized least squares method of Tzeng and Huang (2015): (cid:26) (cid:27) (cid:0)σˆ2,Λˆ(cid:1) = argmin 1(cid:13)(cid:13)S −ΦˆΛΦˆ(cid:48) −σ2I(cid:13)(cid:13)2 +γ(cid:107)ΦˆΛΦˆ(cid:48)(cid:107) , (9) 2 F ∗ (σ2,Λ):σ2≥0,Λ(cid:23)0 where γ ≥ 0 is a tuning parameter, and (cid:107)M(cid:107) = tr((M(cid:48)M)1/2) is the nuclear ∗ norm of M. The first term of (9) corresponds to goodness of fit by noting that var(Y ) = ΦΛΦ(cid:48) + σ2I. The second term of (9) is a convex penalty, shrinking the i eigenvalues of ΦˆΛΦˆ(cid:48) to promote a low-rank structure and to avoid the eigenvalues being overestimated. By suitably choosing a tuning parameter γ, we can control the bias, while reducing the estimation variability. This is particularly effective when K is large. ˆ Tzeng and Huang (2015) provides a closed-form solution for Λ, but requires an iterative procedure for solving σˆ2. We found that closed-form expressions for both σˆ2 ˆ and Λ are available, and are shown in the following proposition with its proof given in the Appendix. Proposition 1. The solutions of (9) are given by Λˆ = Vˆdiag(cid:0)λˆ∗,...,λˆ∗ (cid:1)Vˆ(cid:48), (10) 1 K  (cid:18) Lˆ (cid:19)  1 tr(S)−(cid:88)(cid:0)dˆ −γ(cid:1) ; if dˆ > γ,  ˆ k 1 σˆ2 = p−L (11) k=1  1  ˆ  (tr(S)); if d ≤ γ ,  1 p where Vˆdiag(dˆ,...,dˆ )Vˆ(cid:48) is the eigen-decomposition of Φˆ(cid:48)SΦˆ with dˆ ≥ ··· ≥ dˆ , 1 K 1 K (cid:26) (cid:18) L (cid:19) (cid:27) 1 (cid:88) ˆ ˆ ˆ L = max L : d −γ > tr(S)− (d −γ) ,L = 1,...,K , (12) L k p−L k=1 and λˆ∗ = max(dˆ −σˆ2 −γ,0); k = 1,...,K. k k /Regularized Principal Component Analysis for Spatial Data 9 ˆ (cid:0)ˆ (cid:1) With Λ = λ given by (9) and ϕˆ (s) given by (5), the proposed estimate kk(cid:48) K×K k of C (s,s∗) is η K K (cid:88)(cid:88) Cˆ (s,s∗) = λˆ ϕˆ (s)ϕˆ (s∗). (13) η kk(cid:48) k k(cid:48) k=1k(cid:48)=1 Then the proposed estimate of (ϕ∗(s),...,ϕ∗ (s)) is 1 K (ϕˆ∗(s),...,ϕˆ∗ (s)) = (ϕˆ (s),...,ϕˆ (s))Vˆ; s ∈ D. 1 K 1 K We consider M-fold CV to select γ. As in the previous section, we partition the (cid:0) (cid:1) data into M parts, Y (1),...,Y (M). For m = 1,...,M, we estimate var Y (−m) by Σˆ(−m) = Φˆ(−m)Λˆ(−m)(cid:0)Φˆ(−m)(cid:1)(cid:48) + (cid:0)σˆ2(cid:1)(−m)I based on the remaining data Y (−m) by γ γ removing Y (m) from Y , where Λˆ(−m), (cid:0)σˆ2(cid:1)(−m) and Φˆ(−m) are the estimates of Λ, γ γ σ2 and Φ based on Y (−m), and for notational simplicity, their dependences on the selected (τ ,τ ) and K are suppressed. The proposed CV criterion is given by 1 2 M CV (K,γ) = 1 (cid:88)(cid:13)(cid:13)S(m) −Φˆ(−m)Λˆ(−m)(Φˆ(−m))(cid:48) −(σˆ2)(−m)I(cid:13)(cid:13)2 , (14) 2 M γ γ F m=1 where S(m) = (Y (m))(cid:48)Y (m)/n. Then the γ selected by CV based on K is γˆ = 2 K argminCV (K,γ). 2 γ≥0 The dimension of eigen-space K, corresponding to the maximum rank of ΦΛΦ(cid:48), could be selected by traditional approaches based on a given proportion of total varia- tion explained or the scree plot of the sample eigenvalues. However, these approaches tend to be more subjective and may not be effectivefor the covariance estimation pur- pose. We propose to select K using CV of (14) by subsequently increase the value 2 of K from K = 1,2,..., until no further reduction of the CV value. Specifically, we 2 select ˆ K = min{K : CV (K,γˆ ) ≤ CV (K +1,γˆ ), K = 1,2,...}. (15) 2 K 2 K+1 3. Computation Algorithm Solving (7) is a challenging problem especially when both the orthogonal constraint and the L penalty are involved simultaneously. Consequently, many regularized PCA 1 /Regularized Principal Component Analysis for Spatial Data 10 approaches, such as sparse PCA (Zou, Hastie and Tibshirani, 2006), do not cope with the orthogonal constraint. We adopt the ADMM algorithm by decomposing the orig- inal constrained optimization problem into small subproblems that can be efficiently handled through an iterative procedure. This type of algorithm was developed early in Gabay and Mercier (1976), and was systematically studied by Boyd et al. (2011) more recently. First, the optimization problem of (7) is transferred into the following equivalent problem by adding an p×K parameter matrix Q: K K p (cid:88) (cid:88)(cid:88) min (cid:107)Y −Y ΦΦ(cid:48)(cid:107)2 +τ φ(cid:48)Ωφ +τ |φ |, (16) F 1 k k 2 jk Φ,Q∈Rp×K k=1 k=1 j=1 subject to Q(cid:48)Q = I , φ(cid:48)Sφ ≥ φ(cid:48)Sφ ≥ ··· ≥ φ(cid:48) Sφ , and a new constrain, K 1 1 2 2 K K Φ = Q. Then the resulting constrained optimization problem of (16) is solved using the augmented Lagrangian method with its Lagrangian given by K K p (cid:88) (cid:88)(cid:88) L(Φ,Q,Γ) = (cid:107)Y −Y ΦΦ(cid:48)(cid:107)2 +τ φ(cid:48)Ωφ +τ |φ | F 1 k k 2 jk k=1 k=1 j=1 ρ +tr(Γ(cid:48)(Φ−Q))+ (cid:107)Φ−Q(cid:107)2 , 2 F subject to Q(cid:48)Q = I and φ(cid:48)Sφ ≥ φ(cid:48)Sφ ≥ ··· ≥ φ(cid:48) Sφ , where Γ is a p × K K 1 1 2 2 K K matrix of the Lagrange multipliers, and ρ > 0 is a penalty parameter to facilitate convergence. Note that the value of ρ does not affect the original optimization prob- lem. The ADMM algorithm iteratively updates one group of parameters at a time in both the primal and the dual spaces until convergence. Given the initial estimates, Q(0) and Γ(0) of Q and Γ, our ADMM algorithm consists of the following steps at

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.