Feynman-Hellmann Theorem and Signal Identification from Sample Covariance Matrices
Abstract
A common method for extracting true correlations from large data sets is to look for variables with unusually large coefficients on those principal components with the biggest eigenvalues. Here, we show that even if the top principal components have no unusually large coefficients, large coefficients on lower principal components can still correspond to a valid signal. This contradicts the typical mathematical justification for principal component analysis, which requires that eigenvalue distributions from relevant random matrix ensembles have compact support, so that any eigenvalue above the upper threshold corresponds to signal. The new possibility arises via a mechanism based on a variant of the Feynman-Hellmann theorem, and leads to significant correlations between a signal and principal components when the underlying noise is not both independent and uncorrelated, so the eigenvalue spacing of the noise distribution can be sufficiently large. This mechanism justifies a new way of using principal component analysis and rationalizes recent empirical findings that lower principal components can have information about the signal, even if the largest ones do not.
- Publication:
-
Physical Review X
- Pub Date:
- July 2014
- DOI:
- 10.1103/PhysRevX.4.031032
- Bibcode:
- 2014PhRvX...4c1032C
- Keywords:
-
- 05.40.-a;
- Fluctuation phenomena random processes noise and Brownian motion