Number of relevant directions in Principal Component Analysis and Wishart random matrices
Abstract
We compute analytically, for large $N$, the probability $\mathcal{P}(N_+,N)$ that a $N\times N$ Wishart random matrix has $N_+$ eigenvalues exceeding a threshold $N\zeta$, including its large deviation tails. This probability plays a benchmark role when performing the Principal Component Analysis of a large empirical dataset. We find that $\mathcal{P}(N_+,N)\approx\exp(-\beta N^2 \psi_\zeta(N_+/N))$, where $\beta$ is the Dyson index of the ensemble and $\psi_\zeta(\kappa)$ is a rate function that we compute explicitly in the full range $0\leq \kappa\leq 1$ and for any $\zeta$. The rate function $\psi_\zeta(\kappa)$ displays a quadratic behavior modulated by a logarithmic singularity close to its minimum $\kappa^\star(\zeta)$. This is shown to be a consequence of a phase transition in an associated Coulomb gas problem. The variance $\Delta(N)$ of the number of relevant components is also shown to grow universally (independent of $\zeta)$ as $\Delta(N)\sim (\beta \pi^2)^{-1}\ln N$ for large $N$.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2011
- DOI:
- 10.48550/arXiv.1112.5391
- arXiv:
- arXiv:1112.5391
- Bibcode:
- 2011arXiv1112.5391M
- Keywords:
-
- Condensed Matter - Statistical Mechanics;
- Mathematical Physics
- E-Print:
- 5 pag., 2 fig