Number of relevant directions in Principal Component Analysis and Wishart random matrices
Abstract
We compute analytically, for large $N$, the probability $\mathcal{P}(N_+,N)$ that a $N\times N$ Wishart random matrix has $N_+$ eigenvalues exceeding a threshold $N\zeta$, including its large deviation tails. This probability plays a benchmark role when performing the Principal Component Analysis of a large empirical dataset. We find that $\mathcal{P}(N_+,N)\approx\exp(\beta N^2 \psi_\zeta(N_+/N))$, where $\beta$ is the Dyson index of the ensemble and $\psi_\zeta(\kappa)$ is a rate function that we compute explicitly in the full range $0\leq \kappa\leq 1$ and for any $\zeta$. The rate function $\psi_\zeta(\kappa)$ displays a quadratic behavior modulated by a logarithmic singularity close to its minimum $\kappa^\star(\zeta)$. This is shown to be a consequence of a phase transition in an associated Coulomb gas problem. The variance $\Delta(N)$ of the number of relevant components is also shown to grow universally (independent of $\zeta)$ as $\Delta(N)\sim (\beta \pi^2)^{1}\ln N$ for large $N$.
 Publication:

arXiv eprints
 Pub Date:
 December 2011
 DOI:
 10.48550/arXiv.1112.5391
 arXiv:
 arXiv:1112.5391
 Bibcode:
 2011arXiv1112.5391M
 Keywords:

 Condensed Matter  Statistical Mechanics;
 Mathematical Physics
 EPrint:
 5 pag., 2 fig