ebook img

Smooth Principal Component Analysis over two-dimensional manifolds with an application to Neuroimaging PDF

5 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Smooth Principal Component Analysis over two-dimensional manifolds with an application to Neuroimaging

Smooth Principal Component Analysis over two-dimensional manifolds with an application to Neuroimaging Eardi Lila∗1,3, John A. D. Aston†2, and Laura M. Sangalli‡3 1Cambridge Centre for Analysis, University of Cambridge 2Statistical Laboratory, DPMMS, University of Cambridge 3MOX, Dipartimento di Matematica, Politecnico di Milano 6 1 0 2 p e Abstract S Motivated by the analysis of high-dimensional neuroimaging signals located over 2 the cortical surface, we introduce a novel Principal Component Analysis technique 1 that can handle functional data located over a two-dimensional manifold. For this purpose a regularization approach is adopted, introducing a smoothing penalty co- ] P herent with the geodesic distance over the manifold. The model introduced can be A applied to any manifold topology, can naturally handle missing data and functional . samples evaluated in different grids of points. We approach the discretization task t a by means of finite element analysis and propose an efficient iterative algorithm for t s its resolution. We compare the performances of the proposed algorithm with other [ approaches classically adopted in literature. We finally apply the proposed method 2 to resting state functional magnetic resonance imaging data from the Human Con- v nectomeProject,wherethemethodshowssubstantialdifferentialvariationsbetween 0 brain regions that were not apparent with other approaches. 7 6 3 1 Introduction 0 . 1 The recent growth of data arising from neuroimaging has led to profound changes in the 0 6 understanding of the brain. Neuroimaging is a multidisciplinary activity and the role 1 of statistics in its success should not be underestimated. Much of the work to date has : v been to determine how to use statistical models in high-dimensional settings that arise i X out of such imaging modalities as functional Magnetic Resonance Imaging (fMRI) and Electroencephalography (EEG). However, it is becoming increasingly clear that there is r a now a need to incorporate more and more complex information about brain structure and function into the statistical analysis to enhance our present understanding of the brain. ∗[email protected][email protected][email protected] 1 Considerable amounts of the brain signal captured, for example, by fMRI arise from the cerebral cortex. The cerebral cortex is the highly convoluted thin sheet where most neural activity is focused. It is natural to represent this thin sheet as a 2D surface embedded in a 3D space, structured with a 2D geodesic distance, rather than the 3D Euclidean distance within the volume. In fact, functionally distinct areas may be close to each other if measured with Euclidean distance, but due to the highly convoluted morphology of the cerebral cortex, their 2D geodesic distance along the cortical surface canbefargreater. Whileearlyapproachestotheanalysisofhemodynamicsignalsignore the morphology of the cortical surface, it has now been well established [Glasser et al. (2013) and references therein] that it is beneficial to analyze neuroimaging data through the processing of the signals on the cortical surface using surface-constrained techniques. Classical tools such as non-parametric smoothing models have already been adapted to deal with this kind of data, see e.g. Chung et al. (2014). The goal of the present paper is to introduce a novel Principal Component Analysis (PCA) technique suitable for working with functional signals distributed over curved domains and specifically over two-dimensional smooth Riemannian manifolds, such as the cortical surface. The cortical surface can be extracted from structural Magnetic Res- onance Imaging (MRI), a non-invasive scanning technique used to visualize the internal structure of the brain, rendering it as a 3D image with high spatial resolution. The sig- nal of interest, which we want to analyse with respect to the surface, comes from fMRI, which detects a Blood Oxygen Level Dependent (BOLD) signal [Ogawa et al. (1990)] as a series of repeated measurements in time, yielding a time series of 3D images. An increased neural activity in a particular area of the brain causes an increased demand for oxygen. As the fMRI signal is related to changes in the relative ratio of oxy- to deoxy-hemoglobin, due to their differing magnetic properties, the signal captured within an fMRI scan is considered to be a surrogate for neural activity and is used to produce activation maps or investigate brain functional connectivity. The fMRI signal of each individual related to the neural activity in the cerebral cortex is generally mapped on a common template cortical surface, to allow multi-subject statistical analysis. In this paper, in particular, we will focus our attention on functional connectivity (FC). FC maps, on the cortical surface, can be constructed computing the pairwise correlation between all vertex’s fMRI time-series and the mean time-series of a region of interest. The resulting FC map for each subject provides a clear view of areas to which the region of interest is functionally connected. In practice, the template cortical surface is represented by a triangulated surface that can be considered a discrete approximation of the underlying smooth compact two- dimensional Riemannian manifold M ⊂ R3 modelling the cortical surface. Each resting stateFCmapcanberepresentedbyafunctionx : M → R. Oncewehavethecorrelation i maps on the cortical surface we want to study how the phenomena varies from subject to subject. A statistical technique for this study is PCA. It is natural to contextualize this task in the framework of Functional Data Analysis [Ramsay and Silverman (2005)]. InSection2weestablishtheformaltheoreticalpropertiesofFunctionalPCA(FPCA) in the case of random functions whose domain is a manifold M. In Section 3 we intro- 2 duce a novel FPCA model and propose an algorithm for its resolution. We then give some simulation results in Section 4, indicating the performance of our methodology, as compared to other methods in literature. We then return to the FC maps example in Section 5, to consider how the surface based PCA analysis might be used in this case and draw some concluding remarks in Section 6. 2 Functional principal component analysis Consider the space of square integrable functions on M: L2(M) = {f : M → R : (cid:82) |f(p)|2dp < ∞} with the inner product (cid:104)f,g(cid:105) = (cid:82) f(p)g(p)dp and norm (cid:107)f(cid:107) = (cid:82)M|f(p)|2dp. Consider the random variable X wMith vaMlues in L2(M), mean µ = EM[X] aMnd a finite second moment, i.e. (cid:82) E[X2] < ∞, and assume that its covariance function M K(p,q) = E[(X(p)−µ(p))(X(q)−µ(q))]issquareintegrable. Mercer’sLemma[Rieszand Sz.-Nagy(1955)]guaranteestheexistenceofanon-increasingsequence(κ )ofeigenvalues j of K and an orthonormal sequence of corresponding eigenfunctions (ψ ), such that j (cid:90) K(p,q)ψ (p)dp = κ ψ (q), ∀q ∈ M (1) j j j M andthatK(p,q)canbewrittenasK(p,q) = (cid:80)∞ κ ψ (p)ψ (q)foreachp,q ∈ M. Thus j=1 j j j X can be expanded as X = µ+(cid:80)∞ ε ψ , where the random variables ε ,ε ,... are j=1 j j 1 2 (cid:82) uncorrelated and are given by ε = {X(p)−µ(p)}ψ (p)dp. This is also known as the j M j Karhunen-Lo`eve (KL) expansion of X. The collection (ψ ) defines the strongest modes of variation in the random function j X and these are called Principal Component (PC) functions. In fact ψ is such that 1 (cid:90) (cid:90) ψ = argmax φ(p)K(p,q)φ(q)dpdq, 1 φ:(cid:107)φ(cid:107)M=1 M M while ψ , for m > 1, solves an analogous problem with the added constraint of ψ being m m orthogonal to the previous m−1 functions ψ ,...,ψ , i.e. 1 m−1 (cid:90) (cid:90) ψ = argmax φ(p)K(p,q)φ(q)dpdq. m φ:(cid:107)φ(cid:107)M=1 M M (cid:104)φ,ψj(cid:105)M=0 j=1,...,m−1 The random variables ε ,ε ,... are called PC scores. 1 2 Another important property of PC functions is the best M basis approximation. In fact, for any fixed M ∈ N, the first M PC functions of X satisfies (cid:90) (cid:26) M (cid:27)2 (cid:88) (ψ )M = argmin E X −µ− (cid:104)X,φ (cid:105)φ , (2) i m=1 m m ({φm}Mm=1:(cid:104)φm,φl(cid:105)=δml) M m=1 where δ is the Kronecker delta; i.e. δ = 1 for m = l and 0 otherwise. ml ml Supposex ,...,x arensmoothsamplesfromX. Usually,foreachofthesefunctions, 1 n only noisy evaluations x (p ) on a fixed discrete grid of points p ,...,p are given. In i j 1 s 3 thissetting, wewillnowrecallthetwostandardapproachestoFPCA:thepre-smoothing approach and the regularized PCA approach. The pre-smoothing approach is based on the two following steps. In the first step, the observations associated to each function are smoothed, in order to obtain smooth representations of x ,...,x . Then, the sample mean x¯ = n−1(cid:80) x and the sample 1 n i i covariance Kˆ(p,q) = 1 (cid:80)n (x (p)−x¯(p))(x (q)−x¯(q)) are used to estimate µ and K n i=1 i i respectively. Finally, the estimates of the PC functions ψˆ ,ψˆ ,... are computed through 1 2 the characterization (cid:82) Kˆ(p,q)ψˆ (p)dp = κˆ ψˆ (q), which is solved by the discretization M j j j of the problem on a fine grid or by the basis expansion of estimated smooth functions. In the case where the domain is an interval of the real line, a theoretical study on the accuracy of ψˆ as an estimate of ψ is offered for example in Hall and Hosseini-Nasab j j (2006). Definethen×smatrixX = (x (p )),thecolumnvectorµ = (1 (cid:80)n x (p ))oflength i j n i=1 i j s, the n×M matrix A = ((cid:104)X ,φ (cid:105)) and the s×M matrix Φ = (φ (p )). Let 1 denote i m m j the column vector of length n with all entries equal to 1. The empirical counterpart of the objective function in (2) becomes 1 (cid:107)X−1µT −AΦT(cid:107)2, (3) n F where (cid:107)·(cid:107) is the Frobenius norm, defined as the square root of the sum of the squares F of its elements. This last formulation gives a natural way to deal with the fact that only pointwise and noisy evaluations x (p ), i = 1,...,n,j = 1,...,s of the underlying func- i j tional samples are usually available. However, it does not incorporate any information onthesmoothnessofthefunctionaldata. Infact, consideringtheSingularValueDecom- position (SVD) of X−1µT = UDVT, it can be shown that the minimizing arguments of (3) are Φˆ = V and Aˆ = UD, thus the obtained formulation is a multivariate PCA applied to the data-matrix X. The regularized PCA approach consists on adding a penalization term to the classic formulation of the PCA, in order to recover a desired feature of the estimated underlying functions. Inparticulartheformulation(3)hasshownagreatflexibilityforthispurpose. Examples of models where a sparseness property is assumed on the data are offered for instance in Jolliffe et al. (2003); Zou and Hastie (2005); Shen and Huang (2008). In the specific case of functional data analysis, the penalization term usually encourages the PC functions to be smooth. Examples of PCA models that explicitly incorporates a smoothing penalization term are given by Rice and Silverman (1991); Silverman (1996); Huang et al. (2008). The cited works deal with functions whose domain is a limited interval in R, and in particular, our proposal can be seen as an extension of Huang et al. (2008) to the case of functions whose domain is a two-dimensional manifold. Zhou and Pan(2014)recentlyproposedasmoothFPCAfortwo-dimensionalfunctionsonirregular planar domains; their approach is based on a mixed effects model that specifies the PC functions as bivariate splines on triangulations and the PC scores as random effects. Here we propose a FPCA model that can handle real functions observable on a two- dimensional manifold. We shall consider a smoothing penalty operator, coherent with 4 the 2D geodesic distances on the manifold. This leads to the definition of a model that can fully exploit the information about the geometry of the manifold. 3 Smooth FPCA over two-dimensional manifolds 3.1 Geometric concepts We first introduce the essential geometric concepts that allow the definition of the Laplace-Beltrami operator, which plays a central role in the proposed model. In detail, let the bijective and smooth function ϕ : U ⊂ R2 → R3 be a local parametrization of M around the point p ∈ M, as depicted in Figure 1. Let θ ∈ U be such that θ = ϕ−1(p), then (cid:8)∂ϕ (θ)} (4) i=1,2 ∂θ i defines a basis for the tangent space T M at the point p. p The Riemannian manifold M can be equipped with a metric by defining a scalar product g on the tangent space T M. This enables, for instance, the computation of p p the lengths of curves or integrals on the surface. Fixing the reference system on the tangent plane with the basis (4), we can represent g as the matrix G = (g ) such p ij i,j=1,2 that 2 (cid:88) g (v,w) = g v w p ij i j i,j=1 for all v = (cid:80)v ∂ϕ(θ) and w = (cid:80)w ∂ϕ(θ). In our case it is natural to consider the i∂θi i∂θi scalar product induced by the Euclidean embedding space R3, i.e. the first fundamental form ∂ϕ ∂ϕ g (θ) = (θ)· (θ), ij ∂θ ∂θ i j where · denotes the inner product in R3. Moreover, we denote by G−1 = (gij) the i,j=1,2 inverse of the matrix G and by g = det(G) the determinant of the matrix G. Letnowf : M → Rbearealvaluedandtwicedifferentiablefunctiononthemanifold M. Let F = f ◦ϕ, then the gradient ∇ f is defined as M 2 (cid:88) ∂F ∂ϕ (∇ f)(p) = gij(θ) (θ) (θ). M ∂θ ∂θ j j i,j=1 In the case of a flat manifold M, the last expression reduces to the expression of the gradient in R2, i.e. ∇ = ( ∂ , ∂ ) ∂θ1 ∂θ2 The Laplace-Beltrami operator ∆ is a generalization to the case of surfaces of the M standard Laplacian defined on Rn, i.e. ∆ = (cid:80)n ∂2 . It is related to the second partial i=1 ∂2θi derivatives of f on M, i.e. its local curvature, and it is defined as 2 1 (cid:88) ∂ (cid:112) ∂F (∆ f)(p) = gij g(θ) (θ). M (cid:112) g(θ) ∂θj ∂θj i,j=1 5 Figure 1: A pictorial representation of the geometric objects modelling the idealized cortical surface M. The defined operator is invariant with respect to rigid transformations of the reference system on U. 3.2 Model Suppose now the sample of n functions x : M → R is observed at a fixed set of points i p ,...,p in M (this will be relaxed later). Let u = {u } be a n-dimensional 1 s i i=1,...,n real column vector. We propose to estimate the first PC function fˆ: M → R and the associated PC scores vector uˆ, by solving the equation n s (cid:90) (cid:88)(cid:88) (uˆ,fˆ) = argmin (x (p )−u f(p ))2+λuTu ∆2 f, (5) i j i j M u,f M i=1 j=1 wheretheLaplace-BeltramioperatorisintegratedoverthemanifoldM,enablingaglobal roughness penalty on f, while the empirical term encourages f to capture the strongest mode of variation. The parameter λ controls the trade-off between the empirical term of the objective function and roughness penalizing term. The uTu term is justified by some invarianceconsiderationsontheobjectivefunctionasdoneinthecaseofonedimensional domains, in Huang et al. (2008). Consider the transformation (u → cu,f → 1f), with c c a constant, and the transformation (X → cX,u → cu), where X = (x (p )). Then i j the objective function in (5) is invariant with respect to the first transformation, while the empirical and the smoothness terms are re-scaled by the same coefficient with the second transformation. 6 The subsequent PCs can be extracted sequentially by removing the preceding es- timated components from the data matrix X. This allows the selection of a different penalization parameter λ for each PC estimate. We will refer to the model introduced as Smooth Manifold FPCA (SM-FPCA). 3.3 Iterative algorithm Here we present the numerical algorithm for the resolution of the model introduced above. Our approach for the minimization of the functional (5) can be summarized in the following two steps: • Splitting the optimization in a finite dimensional optimization in u and an infinite- dimensional optimization in f; • Approximatingtheinfinite-dimensionalsolutionthankstoaSurfaceFiniteElement discretization. Let f be the vector of length s such that f = (f(p ),...,f(p ))T. The expression s s 1 s in (5) can be rewritten as (cid:90) (uˆ,fˆ) = argmin(cid:107)X−ufT(cid:107)2 +λuTu ∆2 f. (6) s F M u,f M A normalization constraint must be considered in this minimization problem to make the representation unique, as in fact multiplying u by a constant and dividing f by the same constant does not change the objective function (6). In particular we set the constraint (cid:107)u(cid:107) = 1, as this allows us to leave the infinite-dimensional optimization in f 2 unconstrained. Ourproposalfortheminimizationofthecriterion(6)istoalternatetheminimization of u and f in an iterative algorithm: Step 1 Estimationofugivenf. Foragivenf,theminimizinguoftheobjectivefunction in (6) is Xf s u = , (7) (cid:107)f (cid:107)2+λ(cid:82) ∆2 f s 2 M M and the minimizing unitary-norm vector u is Xf s u = . (8) (cid:107)Xf (cid:107) s 2 Step 2 Estimationoff givenu. Foragivenu, solving(6)withrespecttof isequivalent to finding the f that minimizes (cid:90) J (f) = fTf +λ ∆2 f −2fTXTu. (9) λ,u s s M s M 7 Step1isbasicallytheclassicalexpressionofthescorevectorgiventheloadingsvector, where in this case the loading vector is given by f , the evaluations of the PC function s in p ,...,p . The problem in Step 2 is not trivial, consisting in an infinite-dimensional 1 s minimizationproblem. Letz denotethejthelementofthevectorXTu,thenminimizing j the functional in (9) is equivalent to minimizing s (cid:18) (cid:19)2 (cid:90) (cid:88) z −f(p ) +λ ∆2 f. (10) j j M M j=1 This problem involves estimating a smooth field f defined on a manifold, starting from noisy observations z at points p . In the case of real functions defined on the real j j line, adopting a penalty of the form λ(cid:82) f(cid:48)(cid:48), the minimization problem turns out to have a finite-dimensional closed form solution that is a cubic spline [Green and Silverman (1993)]. For real functions defined on an Euclidean space, cubic splines are generalized by thin-plate splines. In this case, for an opportune smoothing penalty, the solution of the minimization problem can be expressed in terms of a finite linear combination of radial basis functions [Duchon (1977)]. However, the case of real functions defined on a non-Euclidean domain M is more involved. In the special case where M is a sphere or a sphere-like surface, that is M = {σ(v) = ρ(v)v : v ∈ S} where S ⊂ R3 is the unit sphere centered at the origin, this smoothing problem has been considered, among others, by Wahba (1981) and Alfeld et al. (1996). Moreover, the functional (10) is considered, among others, by Ettinger et al. (2016) and Dassi et al. (2015). Here M is respectively a manifold homeomorphic to an open ended cylinder and a manifold homeomorphic to a sphere. In the latter two works the field f is estimated by first conformally recasting the problem to a planar domain and then discretizing it by means of planar finite elements, generalizing the planar smoothing model in Ramsay (2002). Our approach is also based on a Finite Element (FE) discretization, but differently from Ettinger et al. (2016) and Dassi et al. (2015), we construct here a FE space directly on the triangulated surface M that T approximates the manifold M, i.e. we use surface FE, avoiding any flattening step and thereby allowing the formulation to be applicable to any manifold topology. 3.4 Surface Finite Element discretization Assume, for clarity of exposition only, that M is a closed surface, as in our motivating application. The case of non-closed surfaces can be handled by considering some ap- propriate boundary conditions as done for instance in the planar case in Sangalli et al. (2013). Consider the linear functional space H2(M), the space of functions in L2(M) with first and second weak derivatives in L2(M). The infinite dimensional part of the estimation problem can be reformulated as follows: find fˆ∈ H2(M) such that fˆ= argmin J (f). (11) λ,u f∈H2(M) 8 Proposition 1. The solution fˆ∈ H2(M) exists and is unique and is such that s (cid:90) s n (cid:88) (cid:88) (cid:88) ϕ(p )fˆ(p )+λ ∆ ϕ∆ fˆ= ϕ(p ) x (p )u (12) j j M M j i j i M j=1 j=1 i=1 for every ϕ ∈ H2(M). As detailed in the Supplementary Material, the key idea is to minimize J (f) by λ,u differentiating this functional with respect to f. This leads to (21), that characterizes the estimate fˆas the solution of a linear fourth-order problem. Consider now a triangulated surface M , union of the finite set of triangles T, T giving an approximated representation of the manifold M. Figure 2 for instance shows the triangulated surface approximating the left hemisphere of a template brain. We then consider the linear finite element space V consisting in a set of globally continuous functions over M that are linear affine where restricted to any triangle τ in T, i.e. T V = {v ∈ C0(M ) : v| is linear affine for each τ ∈ T}. T τ Figure 2: The triangulated surface approximating the left hemisphere of the template brain. The mesh is composed by 32K nodes and by 64K triangles Thisspaceisspannedbythenodalbasisψ ,...,ψ associatedtothenodesξ ,...,ξ , 1 K 1 K corresponding to the vertices of the triangulation M . Such basis functions are la- T grangian, meaning that ψ (ξ ) = 1 if i = j and ψ (ξ ) = 0 otherwise. Setting f = i j i j (f(ξ ),...,f(ξ ))T and ψ = (ψ ,...,ψ )T, every function f ∈ V has the form 1 K 1 K K (cid:88) f(p) = f(ξ )ψ (p) = fTψ(p) (13) k k k=1 foreachp ∈ M . Thesurfacefiniteelementspaceprovidesafinitedimensionalsubspace T of H1(M) [Dziuk (1988)]. To use this finite element space to discretize the infinite- dimensional problem (21), that is well posed in H2(M), we first need a reformulation of (21) that involves only first-order derivatives. This can be obtained by introducing an 9 auxiliaryfunctiong thatplaystheroleof∆ f,splittingtheequation(21)intoacoupled M system of second-order problems and finally integrating by parts the second order terms. The details of this derivation can be found in the supplementary material. The discrete estimators fˆ,gˆ ∈ V are then obtained by solving h h  (cid:82) ∇ fˆ∇ ϕ −(cid:82) gˆ ϕ =0  λM(cid:82) T ∇MT hgˆ ∇MT vh +M(cid:80)sTfˆh(ph)v (p )= (cid:80)s v (p )(cid:80)n x (p )u (14)  MT MT h MT h j=1 h j h j j=1 h j i=1 i j i for all ϕ ,v ∈ V. Define the s×K matrix Ψ = (ψ (p )) and the K×K matrices R = h h k j 0 (cid:82) (ψψT)andR = (cid:82) (∇ ψ)(∇ ψ)T. Then, exploitingtherepresentation(13) MT 1 MT MT MT of functions in V we can rewrite (14) as a linear system. Specifically the Finite Element solution fˆ(p) of the discrete counterpart (14) is given by fˆ(p) = ψ(p)Tˆf whereˆf is the h h solution of (cid:20)ΨTΨ λR (cid:21)(cid:20)ˆf(cid:21) (cid:20)ΨTXTu(cid:21) 1 = (15) λR −λR gˆ 0 1 0 Solving (15) leads to ˆf = (ΨTΨ+λR R−1R )−1ΨTXTu. (16) 1 0 1 Although this last formula is a compact expression of the solution, it is preferable to compute the solution from the linear system (15) due to the sparsity property of the matrix in the left-hand side. As an example, in the simulations and the application shown in Sections 4-5, respectively less then 1% and less then 0.1% of the elements in the matrix in the left hand side of (15) are different from zero, allowing a very efficient solution of the linear system. In the model introduced, we assume that all the observed functions x are sampled i on the common set of points p ,...,p ∈ M. Suppose moreover, p ,...,p ∈ M coincide 1 s 1 s with the vertices of the triangulated surface M . In this particular case, an alternative T approach could consist of interpreting the points p ,...,p ∈ M as the nodes of a 1 s T graph linked by the edges of the triangulation and considering the model (5) with a discrete smoothness operator term instead of the Laplace-Beltrami operator (see e.g. Belkin and Niyogi (2001) for the choice of the penalization term and Cai et al. (2011) for an application to matrix decomposition). However, thanks to its functional nature, the formulation (5) can be easily extended to the case of missing data or sparsely sampled functional data. Specifically, suppose now that each function x is observable on a set of i points pi,...,pi , then the natural extension of the model (5) becomes 1 si (cid:88)n (cid:88)si (cid:90) (uˆ,fˆ) = argmin (x (pi)−u f(pi))2+λuTu ∆2 f. (17) i j i j M u,f M i=1 j=1 Following the same procedure, we can define an analogous algorithm based on the fol- lowing two steps. Step 1 For a given f, the unitary-norm vector u minimizing (17) is given by (cid:80)si x (pi)f(pi) j=1 i j j u such that u = √ . i (cid:80)n ((cid:80)si x (pi)f(pi))2 i=1 j=1 i j j 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.