Curse of Heterogeneity: Computational Barriers in Sparse Mixture Models and Phase Retrieval
Abstract
We study the fundamental tradeoffs between statistical accuracy and computational tractability in the analysis of high dimensional heterogeneous data. As examples, we study sparse Gaussian mixture model, mixture of sparse linear regressions, and sparse phase retrieval model. For these models, we exploit an oracle-based computational model to establish conjecture-free computationally feasible minimax lower bounds, which quantify the minimum signal strength required for the existence of any algorithm that is both computationally tractable and statistically accurate. Our analysis shows that there exist significant gaps between computationally feasible minimax risks and classical ones. These gaps quantify the statistical price we must pay to achieve computational tractability in the presence of data heterogeneity. Our results cover the problems of detection, estimation, support recovery, and clustering, and moreover, resolve several conjectures of Azizyan et al. (2013, 2015); Verzelen and Arias-Castro (2017); Cai et al. (2016). Interestingly, our results reveal a new but counter-intuitive phenomenon in heterogeneous data analysis that more data might lead to less computation complexity.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2018
- DOI:
- arXiv:
- arXiv:1808.06996
- Bibcode:
- 2018arXiv180806996F
- Keywords:
-
- Mathematics - Statistics Theory;
- Computer Science - Information Theory;
- Computer Science - Machine Learning;
- Statistics - Machine Learning
- E-Print:
- 75 pages