Sparse least trimmed squares regression for analyzing high-dimensional large data sets

doi:10.48550/arXiv.1304.4773

Sparse least trimmed squares regression for analyzing high-dimensional large data sets

Sparse model estimation is a topic of high importance in modern data analysis due to the increasing availability of data sets with a large number of variables. Another common problem in applied statistics is the presence of outliers in the data. This paper combines robust regression and sparse model estimation. A robust and sparse estimator is introduced by adding an $L_1$ penalty on the coefficient estimates to the well-known least trimmed squares (LTS) estimator. The breakdown point of this sparse LTS estimator is derived, and a fast algorithm for its computation is proposed. In addition, the sparse LTS is applied to protein and gene expression data of the NCI-60 cancer cell panel. Both a simulation study and the real data application show that the sparse LTS has better prediction performance than its competitors in the presence of leverage points.

Publication:

arXiv e-prints

Pub Date:

April 2013

DOI:

10.48550/arXiv.1304.4773

arXiv:

arXiv:1304.4773

Bibcode:

2013arXiv1304.4773A

Keywords:

Statistics - Applications

E-Print:

Published in at http://dx.doi.org/10.1214/12-AOAS575 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

NASA/ADS

Sparse least trimmed squares regression for analyzing high-dimensional large data sets

Abstract