ListDecodable Covariance Estimation
Abstract
We give the first polynomial time algorithm for \emph{listdecodable covariance estimation}. For any $\alpha > 0$, our algorithm takes input a sample $Y \subseteq \mathbb{R}^d$ of size $n\geq d^{\mathsf{poly}(1/\alpha)}$ obtained by adversarially corrupting an $(1\alpha)n$ points in an i.i.d. sample $X$ of size $n$ from the Gaussian distribution with unknown mean $\mu_*$ and covariance $\Sigma_*$. In $n^{\mathsf{poly}(1/\alpha)}$ time, it outputs a constantsize list of $k = k(\alpha)= (1/\alpha)^{\mathsf{poly}(1/\alpha)}$ candidate parameters that, with high probability, contains a $(\hat{\mu},\hat{\Sigma})$ such that the total variation distance $TV(\mathcal{N}(\mu_*,\Sigma_*),\mathcal{N}(\hat{\mu},\hat{\Sigma}))<1O_{\alpha}(1)$. This is the statistically strongest notion of distance and implies multiplicative spectral and relative Frobenius distance approximation for parameters with dimension independent error. Our algorithm works more generally for $(1\alpha)$corruptions of any distribution $D$ that possesses lowdegree sumofsquares certificates of two natural analytic properties: 1) anticoncentration of onedimensional marginals and 2) hypercontractivity of degree 2 polynomials. Prior to our work, the only known results for estimating covariance in the listdecodable setting were for the special cases of listdecodable linear regression and subspace recovery due to Karmarkar, Klivans, and Kothari (2019), Raghavendra and Yau (2019 and 2020) and Bakshi and Kothari (2020). These results need superpolynomial time for obtaining any subconstant error in the underlying dimension. Our result implies the first polynomialtime \emph{exact} algorithm for listdecodable linear regression and subspace recovery that allows, in particular, to obtain $2^{\mathsf{poly}(d)}$ error in polynomialtime. Our result also implies an improved algorithm for clustering nonspherical mixtures.
 Publication:

arXiv eprints
 Pub Date:
 June 2022
 DOI:
 10.48550/arXiv.2206.10942
 arXiv:
 arXiv:2206.10942
 Bibcode:
 2022arXiv220610942I
 Keywords:

 Computer Science  Data Structures and Algorithms;
 Computer Science  Machine Learning;
 Mathematics  Statistics Theory;
 Statistics  Machine Learning;
 F.2.1
 EPrint:
 Abstract slightly clipped. To appear at STOC 2022