Sparse PCA: Phase Transitions in the Critical Sparsity Regime
Abstract
This work studies estimation of sparse principal components in high dimensions. Specifically, we consider a class of estimators based on kernel PCA, generalizing the covariance thresholding algorithm proposed by Krauthgamer et al. (2015). Focusing on Johnstone's spiked covariance model, we investigate the "critical" sparsity regime, where the sparsity level $m$, sample size $n$, and dimension $p$ each diverge and $m/\sqrt{n} \rightarrow \beta$, $p/n \rightarrow \gamma$. Within this framework, we develop a fine-grained understanding of signal detection and recovery. Our results establish a detectability phase transition, analogous to the Baik--Ben Arous--Péché (BBP) transition: above a certain threshold -- depending on the kernel function, $\gamma$, and $\beta$ -- kernel PCA is informative. Conversely, below the threshold, kernel principal components are asymptotically orthogonal to the signal. Notably, above this detection threshold, we find that consistent support recovery is possible with high probability. Sparsity plays a key role in our analysis, and results in more nuanced phenomena than in related studies of kernel PCA with delocalized (dense) components. Finally, we identify optimal kernel functions for detection -- and consequently, support recovery -- and numerical calculations suggest that soft thresholding is nearly optimal.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.21038
- Bibcode:
- 2024arXiv241221038F
- Keywords:
-
- Mathematics - Statistics Theory