ebook img

Correlation Matrices in Cosine Space By Alexandria Ree Hadd Thesis Submitted to the Faculty of ... PDF

40 Pages·2016·1.33 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Correlation Matrices in Cosine Space By Alexandria Ree Hadd Thesis Submitted to the Faculty of ...

Correlation Matrices in Cosine Space By Alexandria Ree Hadd Thesis Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in Psychology December, 2016 Nashville, Tennessee Approved: Joseph L. Rodgers, Ph.D. Kristopher J. Preacher, Ph.D. Andrew J. Tomarken, Ph.D. To Kirsten with timeless love. ii TABLE OF CONTENTS Page DEDICATION.…………………………………………………………………………………... ii LIST OF FIGURES.…………………………………………………………………………….. iv LIST OF TABLES…………………………………………………………………………….…. v INTRODUCTION.………………………………………………………………………………. 1 Chapter 1. The 3x3 Cosine Space of Correlation Matrices……………………………………………… 3 The Space of Correlation Matrices………………………………………………………. 3 The 3x3 Correlation Space………………………………………….…………………… 4 The 3x3 Cosine Space…………………………………………………………………… 6 2. Cosine Spaces for Higher Dimension Correlation Matrices………………………………... 11 Reproducing the 3x3 Cosine Space with 4x4 Correlation Matrices……………………. 11 4x4 Cosine Spaces Within the Tetrahedron…………………………………………….. 14 Banded Cosine Spaces and Those Spaces That Do Not Subset 3x3 Cosine Space…….. 15 Cosine Space in Higher Dimensions……………………………………………………. 20 3. Application of the 3x3 Cosine Space to Correlation Matrix Generation…………………... 22 Methods of Generating Random Correlation Matrices…………………………………. 22 Demonstration Simulation……………………………………………………………… 23 Results of Simulation Study……………………………………………………………. 24 Summary of Simulation Findings………………………………………………………. 29 4. Discussion and Conclusion…………………………………………………………………. 31 REFERENCES…………………………………………………………………………………. 33 iii LIST OF FIGURES Figure Page 1. The 3x3 correlation space……...……………………………………………………………… 5 2. The 3x3 cosine space…………………………………………………………………...……... 8 3. Random sampling from [-1,1]3………………………………………………………………. 10 4. Transformation of the points in [-1,1]3 to [0,180]3…………………………………….…….. 10 5. 4x4 correlation matrices with two variables constrained to equality…………………..…….. 12 6. 4x4 correlation matrices with one variable uncorrelated with other variables.………...……. 13 7. Subsets of the 4x4 cosine space that are subsets of the regular tetrahedron……….………… 15 8. 4x4 banded correlation matrices and correlation space……………………………………… 16 9. 4x4 banded cosine space………………………………………………………………..….… 17 10. Subsets of the 4x4 cosine space that are not subsets of the tetrahedron…………...……..… 18 11. 6x6 correlation matrices such that only pairs of items correlate ………………………...… 20 12. Distribution of correlations across four correlation matrix generation methods (𝑝 = 3)....... 25 13. Distribution of eigenvalues across four correlation matrix generation methods (𝑝 = 3)....... 26 14. Distribution of correlations across four correlation matrix generation methods (𝑝 = 6)...... 27 15. Distribution of eigenvalues across four correlation matrix generation methods (𝑝 = 6)....... 28 iv LIST OF TABLES Table Page 1. General properties of correlation and cosine spaces for 𝑝𝑥𝑝 correlation matrices………...… 21 2. The proportion of the hypercube occupied by correlation and cosine spaces……………….. 21 v INTRODUCTION In the introduction of their article, The Shape of Correlation Matrices, Rousseeuw and Molenberghs (1994, p.276) asserted that “the correlation coefficient is one of the most frequently used statistical tools.” Correlations and correlation matrices are foundational concepts in all disciplines that use statistical analysis, including psychology, genetics, and finance. Diversity in the potential applications of the correlation leads to similar diversity in potential interpretations; the correlation coefficient can be variously interpreted as a mean, a ratio, a cross product, and through several trigonometric functions (Rodgers & Nicewander, 1988). Of particular interest is one of the trigonometric interpretations. Specifically, in person space – where 𝑁 individuals define the axes of an N-dimensional space, and centered or standardized scores on variable vectors are plotted on these axes – the correlation between two variables X and X can be 1 2 expressed as 𝑟 = cos𝜽 (1) 12 𝟏𝟐 where 𝜽 is the angle between the centered/standardized variable vectors X and X . (Box, 1978, 𝟏𝟐 1 2 documents Fisher’s reliance on this person space in the development of his statistical insights.) Rousseeuw and Molenberghs (1994) expanded on the geometric literature for the correlation coefficient. The authors demonstrated the three-dimensional closed surface that summarizes the space of true 3x3 correlation matrices – the 3x3 correlation space. This space provides insight for understanding individual correlation matrices, as well as the relationships among the correlations within a correlation matrix. Using the 3x3 correlation space as a starting point, in this thesis I accomplish three things. In Chapter 1, I show how their space can be usefully re-portrayed using the cosine formulation of the correlation coefficient – into the so- 1 named 3x3 cosine space. This reportrayal carries forward the strengths of the 3x3 correlation space for understanding individual correlation matrices, but also provides insight into the correlation space itself. In Chapter 2, I discuss how the 3x3 cosine space can provide insight into the shape of cosine (and correlation) spaces in higher dimensions. I give particular attention to the cosine space of 4x4 correlation matrices as a case study before considering properties of higher dimension cosine (and correlation) spaces. Third, I give a practical demonstration of the utility of 3x3 cosine space in generating random correlation matrices with a relatively high frequency of extreme correlations – that is, correlation matrices near the boundary of the correlation/cosine space. Throughout the thesis, I refer to the space originally envisioned by Rousseeuw and Molenberghs as the correlation space, and the transformation of this space by the cosine function as the cosine space. Technically, both spaces are “correlation spaces,” as correlations are cosines of angles. I could instead refer to the so-called correlation space as the [-1,1] space, the linear axis space, or the R&M space (or any combination of such descriptors of the space), and I could refer to the so-called cosine space as the angle space, the [0,180] space, the nonlinear axis space, or the transformed space. For ease, I simply refer to these spaces as the correlation spaces and cosine spaces respectively for correlation matrices of given dimension. 2 CHAPTER 1 THE 3x3 COSINE SPACE OF CORRELATION MATRICES The Space of Correlation Matrices Correlations among a set of variables (e.g., X X … X ) are typically summarized in a 1, 2, p correlation matrix 𝑹. Let 𝑹 be a square matrix of order 𝑝; the rows and columns of 𝑹 indicate the variables being correlated, and the entries 𝑟 in 𝑹 are correlation coefficients between pairs of 𝑖𝑗 variables X and X with three necessary properties: i j (i) 𝑟 = 𝑟 (i.e., 𝑹 is symmetric) 𝑖𝑗 𝑗𝑖 (ii) 𝑟 = 1 if 𝑖 = 𝑗 (i.e., the diagonals of 𝑹 are 1) 𝑖𝑗 (iii)−1 ≤ 𝑟 ≤ 1 if 𝑖 ≠ 𝑗 (i.e., the off-diagonals of 𝑹 are correlation coefficients) 𝑖𝑗 These three properties are necessary for 𝑹 and are simple to check, but they are not sufficient. To be a true correlation matrix, 𝑹 must also be positive semidefinite (PSD). As such, we add a fourth property to 𝑹 that ensures it is a PSD matrix: (iv) 𝜆 ,𝜆 ,… 𝜆 ≥ 0 where 𝜆 ,𝑖 = 1,2,…,𝑝 are the eigenvalues of 𝑹. 1 2 𝑝 𝑖 This fourth property is equivalently satisfied by ensuring that the determinants of 𝑹 and all principle minor submatrices of 𝑹 are nonnegative. (Matrices with all positive 𝜆 and positive 𝑝 determinants for the principle minor submatrices are said to be positive definite, or PD). Matrices that satisfy the first three properties are called pseudo-correlation matrices (𝑹∗), with the subset of 𝑹∗ also satisfying the fourth property being true correlation matrices (𝑹). Non-PSD pseudo- correlation matrices cannot occur under typical data circumstances; they may occur through pairwise deletion of variables or use of tetrachoric or polychoric correlations (Knol & Berger, 1991), but matrices constructed from complete, quantitative data must be true correlation 3 matrices. (Note that Rousseeuw and Molenberghs frame the PSD condition in terms of determinants of 𝑹∗. This framing suffices for the 3x3 case, and aids in providing the equation for the boundary of the correlation space, but I define the PSD condition with eigenvalues of 𝑹∗ to facilitate expansion of the PSD condition to higher orders of 𝑝, where calculation of determinants of the principle minor submatrices of 𝑹∗ becomes increasingly cumbersome.) Because of the symmetry and unit diagonal of 𝑹∗, only the upper-triangular portion of the matrix need be represented. A shorthand for 𝑹∗ is obtained by half-vectorization of 𝑹∗ , that is, concatenating the rows of the upper-triangular portion of 𝑹∗ to form an ordered 𝑛 = (𝑝(𝑝− 1))/2-tuple, 𝒓∗, which uniquely identifies the original pseudo-correlation matrix (also referred to as the vecp or vech operator; Browne & Shapiro, 1986). Note that 𝑝 is the dimension of 𝑹∗and 𝑛 is the dimension of the set of 𝑹∗ (i.e., 𝒓∗ ∈ [−1,1]𝑛). For example, the matrix 1 −.23 .04 −.14 −.23 1 .35 .05 𝑹∗ = [ ] .04 .35 1 −.06 −.14 .05 −.06 1 can be uniquely identified by the ordered sextuple 𝒓∗ = (−.23, .04, −.14, .35, .05, −.06) ∈ [−1,1]6. This shorthand will prove useful in depicting the subset of 𝑹 within the set of all 𝑹∗ in three dimensions. The 3x3 Correlation Space The correlations among three variables, X X , and X , produce a correlation matrix 𝑹 of 1, 2 3 order 𝑛 = 𝑝 = 3, which can be represented in the ordered triple 𝒓 = (𝑟 ,𝑟 ,𝑟 ). The set of all 12 13 23 possible 𝒓 within the cube [-1,1]3 is depicted in Figure 1, and is the subject of Rousseeuw and Molenberghs’ (1994) article. 4 Figure 1. The 3x3 correlation space. The larger point (𝑟 = .3, 𝑟 = −.6, 𝑟 = −.3) lies within 12 13 23 the correlation space and corresponds to a true correlation matrix 𝑹 . The smaller point (𝑟 = 12 −.7, 𝑟 = .8, 𝑟 = .8) lies outside the correlation space, and corresponds to a non-PSD pseudo- 13 23 correlation matrix 𝑹∗. The convex shape, described as an elliptical tetrahedron or an elliptope (Chai, 2014) meets the edges of the cube at four of the eight corners, and has diagonal lines across the six cube faces. Slicing parallel to any face of the cube, the shape evolves from a diagonal line to ellipses to a perfect circle, before reversing back to ellipses, then a diagonal line along the opposite face from which it started. In the case of 3x3 correlation matrices, the shape of true correlation matrices (i.e., the set of all 𝑹) occupies approximately 61.7% of the cube. Practically speaking, any 𝒓∗ generated randomly and uniformly from the [-1,1]3 cube has a 61.7% chance of corresponding to a true correlation matrix. An 𝒓 closer to the surface of the 3x3 correlation space corresponds to an 𝑹 that has closer-to-zero eigenvalues or, alternatively stated, has near-linear dependency among the three variables. Recently, Waller (2016) used the 3x3 correlation space to demonstrate the geometry of fungible correlation matrices. Fungible correlation matrices are 𝑹∗ that, given a pre-specified set 5

Description:
Kristopher J. Preacher, Ph.D. Andrew J. Tomarken, Ph.D. Chalmers, C. P. (1975). Generation of correlation matrices with a given eigen–structure.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.