ABSTRACT
Sample correlation matrices are widely used, but for high-dimensional data little is known about their spectral properties beyond "null models", which assume the data have independent coordinates. In the class of spiked models, we apply random matrix theory to derive asymptotic first-order and distributional results for both leading eigenvalues and eigenvectors of sample correlation matrices, assuming a high-dimensional regime in which the ratio p/n, of number of variables p to sample size n, converges to a positive constant. While the first-order spectral properties of sample correlation matrices match those of sample covariance matrices, their asymptotic distributions can differ significantly. Indeed, the correlation-based fluctuations of both sample eigenvalues and eigenvectors are often remarkably smaller than those of their sample covariance counterparts.
ABSTRACT
We study improved approximations to the distribution of the largest eigenvalue â ^ of the sample covariance matrix of n zero-mean Gaussian observations in dimension p + 1. We assume that one population principal component has variance â > 1 and the remaining 'noise' components have common variance 1. In the high-dimensional limit p/n â γ > 0, we study Edgeworth corrections to the limiting Gaussian distribution of â ^ in the supercritical case â â > 1 + γ . The skewness correction involves a quadratic polynomial, as in classical settings, but the coefficients reflect the high-dimensional structure. The methods involve Edgeworth expansions for sums of independent non-identically distributed variates obtained by conditioning on the sample noise eigenvalues, and the limiting bulk properties and fluctuations of these noise eigenvalues.