On crossvalidated Lasso in high dimensions
Abstract
In this paper, we derive nonasymptotic error bounds for the Lasso estimator when the penalty parameter for the estimator is chosen using $K$fold crossvalidation. Our bounds imply that the crossvalidated Lasso estimator has nearly optimal rates of convergence in the prediction, $L^2$, and $L^1$ norms. For example, we show that in the model with the Gaussian noise and under fairly general assumptions on the candidate set of values of the penalty parameter, the estimation error of the crossvalidated Lasso estimator converges to zero in the prediction norm with the $\sqrt{s\log p / n}\times \sqrt{\log(p n)}$ rate, where $n$ is the sample size of available data, $p$ is the number of covariates, and $s$ is the number of nonzero coefficients in the model. Thus, the crossvalidated Lasso estimator achieves the fastest possible rate of convergence in the prediction norm up to a small logarithmic factor $\sqrt{\log(p n)}$, and similar conclusions apply for the convergence rate both in $L^2$ and in $L^1$ norms. Importantly, our results cover the case when $p$ is (potentially much) larger than $n$ and also allow for the case of nonGaussian noise. Our paper therefore serves as a justification for the widely spread practice of using crossvalidation as a method to choose the penalty parameter for the Lasso estimator.
 Publication:

arXiv eprints
 Pub Date:
 May 2016
 DOI:
 10.48550/arXiv.1605.02214
 arXiv:
 arXiv:1605.02214
 Bibcode:
 2016arXiv160502214C
 Keywords:

 Mathematics  Statistics Theory