ebook img

Tensor-based projection depth PDF

0.22 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Tensor-based projection depth

Bernoulli 17(4), 2011, 1386–1399 DOI: 10.3150/10-BEJ317 Tensor-based projection depth 2 1 0 YONGGANG HU*, YONG WANG** and YI WU† 2 Department of Mathematics and Systems Science, National University of Defense Technology, n Changsha 410073, China. a E-mail: *[email protected]; **[email protected]; †wuyi [email protected] J 5 Theconventionaldefinitionofadepthfunctionisvector-based.Inthispaper,anovelprojection depth (PD) technique directly based on tensors, such as matrices, is instead proposed. Tensor ] T projection depth (TPD) is still an ideal depth function and its computation can be achieved S through the iteration of PD. Furthermore, we also discuss the cases for sparse samples and . higherorder tensors. Experimental resultsin dataclassification with thetwo projection depths h show that TPD performs much better than PD for data with a natural tensor form, and even t a when the datahave a naturalvector form, TPD appears to perform noworse than PD. m Keywords: data depth;Rayleigh projection depth;statistical depth;tensor-based projection [ depth 1 v 1. Introduction 6 4 1 In the last ten years, statistical depth functions have increasingly served as a useful 1 tool in multidimensional exploratory data analysis and inference. The depth of a point . 1 in the multidimensional space measures the centrality of that point with respect to a 0 multivariate distribution or a given multivariate data cloud. Depth functions have been 2 successfully used in many fields, such as quality indices [17, 20], multivariable regression 1 : [24], limiting p values [18], robust estimation [3], nonparametric tests [4] and discrimi- v nant analysis [6, 11, 12, 14]. Some common statistical depths which have been defined i X include half-space depth [25], simplicial depth [19], projection depth [7, 8, 23, 29], spatial r depth [26], spatial rank depth [10] andintegrated dual depth [5]. Comparedto the others, a projection depth (PD) is preferable because of its good properties such as robustness, affineinvariance,maximalityatcenter,monotonicityrelativetodeepestpoint,vanishing at infinity and so on. However, almost all the depths proposed in the literature are defined over the vector space by now, and the fact is that not all of the observations are naturally in vector form. In the real world, the extracted feature of an object often has some specialized structures, and such structures are in the form of a second, or even higher order tensor. For example, this is the case when a captured image is a second-order tensor, that is, a This is an electronic reprint of theoriginal article published by theISI/BS in Bernoulli, 2011, Vol. 17, No. 4, 1386–1399. This reprint differs from theoriginal in pagination and typographic detail. 1350-7265 (cid:13)c 2011ISI/BS Tensor-based projection depth 1387 matrix, and when the sequential data, such as a video sequence for event analysis, is in the form of a third-order tensor. It would be desirable to keep the underlying structures of the data unchanged during the data analysis. Most of the previous work on depth has first transformed the input tensor data into vectors,whichinfactchangestheunderlyingstructureofthedatasets.Atthesametime, such a transformation often leads to the curse of dimensionality problem and the small samplesizeproblemsincemostdepthfunctions (suchasMahalanobisdepth)requirethe covariance matrix to be positive definite. Therefore,itis necessaryto extend the definition ofdepth to tensorspaces inorderto process the data sets directly with tensors without modifying the structures of them. In fact, many tensor-based methods in discriminant analysis have been proposed and have ledtomanyniceresults[2,27,28].Inthispaper,informedbythe aforementionedworks, we propose a tensor-based projection depth (TPD) in order to extend the definition of projectiondepthtotensorspaces.WewillprovethatTPDisstillanidealdepthaccording tothecriteria[30].Also,wewillexplorethecharacteristicsofhighordertensorprojection depthintheory.WewilldemonstratethatTPDallowsustoavoidtheabovetwoproblems when using vector representation. Thepaperisorganizedasfollows.Section2brieflyintroducestensoralgebra.Section3 introduces the projectiondepth andgives the solution to the Rayleighprojectiondepth. Section 4 gives the definition of tensor projection depth and discusses its properties. Section5suppliesthealgorithmforTPDandanalyzesitsconvergence.Section6analyzes the special case of sparse samples. Section 7 discusses the TPD for higher order tensors. Section 8 gives numerical results for TPD. Section 9 concludes the paper, and proofs of selected theorems and propositions are given in the Appendix. 2. Tensor algebra A tensor T of order k is a real-valued multilinear function on k vector spaces [13]: T:Rn1×···×Rnk →R. A multilinear function is linear as a function of each variable considered separately.The set of all kth-order tensors on Rni, i=1,...,k, denoted by Tk, is a vector space under the usual operations of pointwise addition and scalar multiplication: (aT)(a ,...,a )=a(T(a ,...,a )), 1 k 1 k (T +T′)(a ,...,a )=T(a ,...,a )+T′(a ,...,a ), 1 k 1 k 1 k where a ∈Rni. i Given two tensors, S∈Tk and T ∈Tl, their product, S⊗T:Rn1×···×Rnk+l →R, 1388 Y.G. Hu, Y. Wang and Y. Wu is defined as S⊗T(a ,...,a )=S(a ,...,a )T(a ,...,a ). 1 k+l 1 k k+1 k+l It is immediate from the multilinearity of S and T that S⊗T depends linearly on each argument a separately, so it is a (k+l)th-order tensor. i First-ordertensorsaresimplyvectorsonRn1.Thatis,T1=Rn1,whereRn1 isthedual space of Rn1. A second-order tensor space is a product of two first-order tensor spaces, that is, T2=Rn1⊗Rn2. Let e1,...,en1 be the standard basis of Rn1 and ε1,...,εn1 be the dual basis [21] of Rn1 which is formed from coordinatefunctions with respect to the basis of Rn1. Likewise, let e˜1,...,e˜n1 be a basis of Rn2 and ε˜1,...,ε˜n1 be the dual basis of Rn2. We have ε (e )=δ and ε˜(e˜ )=δ , i j ij i j ij where δ is the Kronecker delta function. Thus, {ε ⊗ε˜ } (1≤i≤n ,1≤j≤n ) forms ij i j 1 2 a basis for Rn1⊗Rn2. For any second-order tensor T, we can write T = T ε ⊗ε˜ . ij i j i,j X Given two vectors a= nk=11akek∈Rn1 and b= nl=21ble˜l∈Rn2, we have P n1 P n2 T(a,b)= T ε ⊗ε˜ a e , b e˜ ij i j k k l l ! ij k=1 l=1 X X X n1 n2 = T ε a e ε˜ b e˜ (2.1) ij i k k j l l ! ! ij k=1 l=1 X X X = T a b =aTTb. ij i j ij X This shows that every second-order tensor in Rn1 ⊗Rn2 uniquely corresponds to an n ×n matrix. 1 2 Note that in this paper,ourprimary interestis focused onsecond-ordertensors.How- ever,most of our conclusions for second-orderTPD can be naturally extended to higher orders. We will discuss this question in Section 7. 3. Projection depth According to [29], the definition of projection depth can be expressed as follows. Definition 3.1. Let µ and σ be univariate location and scale measures, respectively. Define the outlyingness of a point x∈Rp with respect to a given function F of X in Rp, Tensor-based projection depth 1389 p≥1, as |uTx−µ(Fu)| O(x,F)= sup , (3.1) kuk=1 σ(Fu) where Fu is the distribution of uTX. Then, O(x,F) is defined to be 0 if uTx−µ(Fu)= σ(Fu) = 0. The projection depth (PD) of a point x ∈ Rp with respect to the given F,PD(x,F), is then defined as 1 PD(x,F)= . (3.2) 1+O(x,F) Remark 3.1. Here, we also assume that µ and σ exist uniquely, µ is translation and scale equivariant, and σ is scale equivariant and translation invariant, that is, µ(F )=sµ(F )+c and σ(F )=|s|σ(F ), respectively, for any scalars s, c and sY+c Y sY+c Y random variable Y ∈R1. The most popular outlying function is defined as |uTx−Med(Fu)| O(x,F)= sup , (3.3) kuk=1 MAD(Fu) where Fu is the distribution of uTX,Med(Fu) is the median of Fu and MAD(Fu) is the median of the distribution of |uTX−Med(F )|. u Apart from the good properties of a statistical depth function, this version of PD is more robust compared with other depths. However, it is hard to compute for high- dimensional samples. Obviously,the varianceandmean arealso naturalchoices for σ and µ, respectively.It is easy to provethat sucha projection-baseddepth is also anideal depth function. And, most importantly, its computation is very simple. Theorem 3.1(Rayleigh projection depth). Let (µ,σ)=(mean,variance),andsup- pose that the second moments of X exist and that X ∼F. The solution of the outlying function (3.1) is then that of a Rayleigh quotient problem, |uTx−E(uTX)| O (x,F)= sup R kuk=1 E(uTX−E(uTX))2 (3.4) uTApu = 1 1 = λ , suT1Bu1 1 p where A is the matrix (x−EX)(x−EX)T, B is E(X −EX)(X −EX)T, λ is the 1 largest eigenvalue of the generalized eigenvalue problem Az=λBz, z6=0, 1390 Y.G. Hu, Y. Wang and Y. Wu and u is the corresponding eigenvector of λ . 1 1 We call this projection depth the Rayleigh projection depth. Remark 3.2. In this paper, for the convenience of computation, the examples in the experiments are all based on the Rayleigh projection depth, that is, (µ,σ)=(mean, varance). Remark 3.3. Obviously, RPD requires the covariance B to be positive. To avoid this situation, for the sparse samples, we simply project the samples into their nonzero sub- space using principal component analysis (PCA). 4. Tensor projection depth Before describing tensor projection depth, we first review the terminology associated with tensor operations [15, 16]. The inner product of tensors A and B (with the same ordersanddimensions)ishA,Bi= A B .ThenormofatensorAisdefinedasits i,j ij ij Frobenius norm, that is, kAk= hA,Ai, and the distance between two tensors A and P B in Rn1⊗Rn2 is defined as kAp−Bk, where A−B=(Aij−Bij)n1×n2. From the tensorial viewpoint, if we take X as a random variable in the first-order tensor space Rn1, then the outlyingness of the projection depth in Definition 3.1 can be expressed as |x(u)−µ(X(u))| O(x,X)= sup . kuk=1 σ(X(u)) Thus,ifX ∈Rn1⊗Rn2 isarandomvariable,then,accordingtotheformula(2.1),the outlying function in the tensor space Rn1⊗Rn2 can be naturally defined as |X(u,v)−µ(X(u,v))| |uTXv−µ(uTXv)| O(X,X)= sup = sup , (4.1) kuk=kvk=1 σ(X(u,v)) kuk=kvk=1 σ(uTXv) where u∈Rn1 and v∈Rn2. Definition 4.1 (Tensor projection depth). The projection depth with outlying func- tion given by formula (4.1) is called tensor projection depth. For a given univariate location (or “center”) measure µ, a distribution function F is X called µ-symmetric about the point θ∈Rn1 ⊗Rn2 if µ(uTXv)=uTθv for any pair of unit vectors u∈Rn1,v∈Rn2. We have the following theorem. Theorem 4.1. Suppose that θ in Rn1⊗Rn2 is the point of symmetry of a distribution F(X) with respect to a given notion of symmetry. The tensor projection depth function TPD(X,X) is: 1. convex; 2. symmetric for µ-symmetric F; Tensor-based projection depth 1391 3. affine invariant; 4. monotonic relative to the deepest point; 5. vanishing at infinity, that is, TPD(X,X)→0 as kXk→∞; 6. maximized at the center of µ-symmetric F. Remark 4.1. Theorem 4.1 shows that TPD is still an ideal depth according to the criteria [30]. Furthermore, we can easily obtain many other properties of TPD beyond those of the PD in [29], such as the properties of its sample versions and its medians. However,thesearenotthekeypointsofthispaperandsoweomitanydetaileddiscussion here. 5. Algorithm Suppose that the elements of S ={X ,...,X } are generated from F (where F is its n 1 n n empirical distribution) and that X is a fixed tensor. The TPD of X with respect to F n can then be computed by the following algorithm: 1. Initialization: Let u=(1,...,1)T. 2. Computing v: Let x =XTu and Fu=uTF . Then, v canbe computed by solving i i n n the vector-basedprojection depth |vTx−µ(Fuv)| sup n . (5.1) kvk=1 σ(Fnuv) 3. Computing u: Once v is obtained, let x˜ =X v and Fv =F v. Then, u can be i i n n computed by solving the following optimization problem: |uTx˜−µ(uTFv)| sup n . (5.2) kuk=1 σ(uTFnv) 4. Iteratively computing u and v: Using steps 2 and 3, we can iteratively compute u and v until they tend to converge. Remark 5.1. The optimization problems (5.1) and (5.2) are the same as (3.3) in the vector-based projection depth algorithm. Thus, any computational method for the pro- jection depth can also be used here. The following theorem shows that the above algorithm converges. Theorem5.1. Theiterativeproceduretosolvetheoptimization problems(5.1)and(5.2) will monotonically increase the objective function value in (4.1), hence the algorithm converges. 1392 Y.G. Hu, Y. Wang and Y. Wu Remark 5.2. Furthermore,iftheoptimizationproblem(3.1)isconvex,thenthesolution of (4.1) is also globally optimal. For instance, if (µ,σ)=(mean, variance) (the Rayleigh projection depth), then its solution is also globally optimal. 6. Sparse samples As with RPD, TPD based on RPD also faces the problem of sparse samples. From formulas (5.1) and (5.2), we know that for any sample set S ={X ,...,X } and its n 1 n corresponding empirical distribution F , the algorithm in the previous section requires n the covariance matrices of Fu and Fv to be positive for any u∈Rn1 and v ∈Rn2. n n However,in practice, the tensor data usually do not satisfy such requirements. There are two factors that can lead to such non-positiveness. First, the sample size is too small, that is, the size of S is less than n or n . Second, the data have some n 1 2 common columns or rows (e.g., the images have identical color edges or patterns). In the vector space, we usually use PCA to remove the redundant null space of the sam- ples and therefore we can use the tensor PCA proposed by Cai et al. [1] to reduce the dimensionality of the tensor samples. Suppose that MX= n1 ni=1Xi, P n MV = ((Xi−MX)(Xi−MX)T), i=1 X n MU = ((Xi−MX)T(Xi−MX)), i=1 X where the columns of V are the eigenvectors of M , and U are the eigenvectors of M . V U Thus, the new mappings of F can be expressed as n F(r1,r2)={VTX U ,...,VTX U }, (6.1) n r1 1 r2 r1 n r2 where r and r are the mapping dimensions, and V and U are the first r and r 1 2 r1 r2 1 2 columns of V and U, respectively. Here, we take r and r to be the ranks of M and 1 2 V M . U Theorem 6.1. For any u∈Rr1, v∈Rr2 with kuk=kvk=1, the covariance matrices of uTF(r1,r2) and F(r1,r2)v are always positive. n n 7. Higher order tensors The algorithm described above takes second-order tensors (i.e., matrices) as input data. However,the algorithm can also be extended to higher order tensors.In this section, we briefly describe the TPD algorithm for higher order tensors. Tensor-based projection depth 1393 Let S ={X ,i=1,...,n} denote the sample set and F its empirical distribution, n i n where Xi∈Rn1⊗···⊗Rnk. The outlying function of TPD is then . |X(u1,...,uk)−µ(Fn(u1,...,uk))| O(X,X)= sup , (7.1) ku1k=···=kukk=1 σ(Fn(u1,...,uk)) where u ∈Rni. i Beforestatingthealgorithm,wefirstintroduceanitemofnotationwhichwewillneed. If T ∈Rn1⊗···⊗Rnk, then for any al∈Rnl, 1≤l≤k, we use T ×lal to denote a new tensor in Rn1⊗···⊗Rnl−1⊗Rnl+1⊗···⊗Rnk, namely nl T × a = T ·a . (7.2) l l i1,...,il−1,...,il+1,...,ik il iXl=1 Thus, the algorithm for higher order tensors can naturally be expressed as follows: 1. Initialization: Let u0=(x ,...,x )T, x ∈R, j=1,...,n , i=1,...,k−1. i 1 ni j i 2. Computing u0: If we let xk=X× u0× u0×···× u0 , then u0 can be com- k 1 1 2 2 k−1 k−1 k puted by solving the vector-basedprojection depth |u0Txk−µ(F × u0× u0×···× u0 )| sup k n 1 1 2 2 k−1 k−1 . (7.3) ku0kk=1 σ(Fn×1u01×2u02×···×k−1u0k−1) 3. Computing u1 :Onceu0 isobtained,weletxk−1=X× u0×···× u0 × u0 k−1 k 1 1 k−2 k−2 k k and u1 can be computed by solving the optimization problem k−1 |u1T xk−1−µ(F × u0×···× u0 × u0)| sup k−1 n 1 1 k−2 k−2 k k . (7.4) ku1k−1k=1 σ(Fn×1u01×···×k−2u0k−2×ku0k) 4. Iteratively computing u , i=1,...,k, until they tend to converge. i Remark 7.1. It is easy to prove that TPD in a higher order tensor space still satisfies the above theorems and that its convergence is also guaranteed by Theorem 5.1. 8. Experiments First, we use data classification to demonstrate the validity of the TPD. Consider a multivariate data set C that is partitioned into given classes C ,...,C . An additional 1 q data point x has to be assigned to one of several given classes of object. Suppose that there are q classes. The most natural classifier provided by [14] is then classd(x)=argmaxD(x|C ), (8.1) j j 1394 Y.G. Hu, Y. Wang and Y. Wu Figure 1. Recognition rates of TPD and PD underdifferent training sample sizes. where D(x|C ) is the depth ofthe x with respectto class C , i=1,...,q.This assigns x j j to the class C in which x is deepest. j TheColumbiaObjectImageLibrary(COIL-20)[22]isadatabaseofgrayscaleimagesof 20objects.Theobjectswereplacedonamotorizedturntableagainstablackbackground. The turntable was rotated through 360 degrees to vary the object pose with respect to a fixed camera. Images of the objects were taken at pose intervals of 5 degrees. This correspondsto 72imageswiththe dimensionsof 32×32pixels perobject.Here,weonly take the first 10 objects as examples. In the experiments, recognition rates under different training sizes are computed by means of the following steps: 1. Select the test sets: Randomly select p test sets Xj from the object set X for test j each class, where j=1,...,10. 2. for each training size n k for each repeating round t: • Randomly select the training sets: Randomly select n training sets from k X /Xj (the left samples of X ) for each j, j=1,...,10. j test j • Compute the recognition rate. Compute the correctly recognized number ℓ j for each test set Xj by using the formula (8.1) and compute the glossary test recognition rate by η = 10 ℓ /10p. t j=1 j 3. Compute the mean and variaPnce of ηt. Tensor-based projection depth 1395 Table 1. The mean, deviation and variance of the recognition rates byTPD and PD with the COIL-20 set Tensor projection depth Projection depth Training size Mean Min. Max. Variance Mean Min. Max. Variance 25 0.3638 0.2571 0.4143 0.0018 0.0695 0.0286 0.1571 0.0017 30 0.6571 0.5286 0.7429 0.0028 0.1333 0.0571 0.2429 0.0028 35 0.7857 0.7000 0.8571 0.0017 0.1526 0.0429 0.2571 0.0044 40 0.8390 0.7714 0.9000 0.0010 0.3010 0.1714 0.3714 0.0029 45 0.8610 0.7714 0.9429 0.0019 0.4410 0.3571 0.5571 0.0041 50 0.8990 0.8429 0.9286 0.0008 0.5124 0.4429 0.6000 0.0017 55 0.9086 0.8714 0.9571 0.0006 0.5419 0.4429 0.6286 0.0021 Here, p=7 and the training number equals 25, 30, 35, 40, 45, 50, 55, respectively. The results are shown in Figure 1 and Table 1. From Figure 1 and Table 1, we can see that for such samples with intrinsic tensor form, TPD performs better than PD. A question then naturally arises: If the data sets arenaturallyinvectorform,howdoesTPDperformcomparedwithPD?Wewillanswer the question by means of the following experiment. We consider the famous Iris data [9], which contains measurements of four different features (sepal length, sepal width, petal length and petal width) for each of 150 obser- vations from three different types of iris plant: (1) setosa; (2) virginica; (3) versicolor. We randomly choose 10 observations from each class to construct the test sets and then randomlyselect10,15,20,25,30,35,40samplesfromtheremainingobservationsasthe respectivetrainingsets.Forthe computationofTPD,the samplesarereshapedas 2×2. From Table 2 we can see that there is no apparent difference between the two results. Therefore,datafromvectorspacescanbeconvertedintotensorsandwecanperformthe depth procession with TPD. Table 2. The mean, deviation and variance of the recognition rates byTPD and PD with the Iris set Tensor projection depth Projection depth Training size Mean Min. Max. Variance Mean Min. Max. Variance 10 0.9698 0.8571 1.0000 0.0016 0.9476 0.7619 1.0000 0.0041 15 0.9889 0.9524 1.0000 0.0004 0.9841 0.9524 1.0000 0.0005 20 0.9952 0.9048 1.0000 0.0004 0.9921 0.8571 1.0000 0.0008 25 0.9984 0.9524 1.0000 0.0001 0.9984 0.9524 1.0000 0.0001 30 0.9984 0.9524 1.0000 0.0001 0.9984 0.9524 1.0000 0.0001 35 1.0000 1.0000 1.0000 0.0000 1.0000 1.0000 1.0000 0.0000 40 1.0000 1.0000 1.0000 0.0000 1.0000 1.0000 1.0000 0.0000

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.