Robust Mean Estimation in High Dimensions: An Outlier Fraction Agnostic and Efficient Algorithm

doi:10.48550/arXiv.2102.08573

Robust Mean Estimation in High Dimensions: An Outlier Fraction Agnostic and Efficient Algorithm

The problem of robust mean estimation in high dimensions is studied, in which a certain fraction (less than half) of the datapoints can be arbitrarily corrupted. Motivated by compressive sensing, the robust mean estimation problem is formulated as the minimization of the $\ell_0$-`norm' of an \emph{outlier indicator vector}, under a second moment constraint on the datapoints. The $\ell_0$-`norm' is then relaxed to the $\ell_p$-norm ($0<p\leq 1$) in the objective, and it is shown that the global minima for each of these objectives are order-optimal and have optimal breakdown point for the robust mean estimation problem. Furthermore, a computationally tractable iterative $\ell_p$-minimization and hard thresholding algorithm is proposed that outputs an order-optimal robust estimate of the population mean. The proposed algorithm (with breakdown point $\approx 0.3$) does not require prior knowledge of the fraction of outliers, in contrast with most existing algorithms, and for $p=1$ it has near-linear time complexity. Both synthetic and real data experiments demonstrate that the proposed algorithm outperforms state-of-the-art robust mean estimation methods.

Publication:

arXiv e-prints

Pub Date:

February 2021

DOI:

10.48550/arXiv.2102.08573

arXiv:

arXiv:2102.08573

Bibcode:

2021arXiv210208573D

Keywords:

Statistics - Applications;
Computer Science - Information Theory

E-Print:

arXiv admin note: text overlap with arXiv:2008.09239

NASA/ADS

Robust Mean Estimation in High Dimensions: An Outlier Fraction Agnostic and Efficient Algorithm

Abstract