Sign-constrained least squares estimation for high-dimensional regression
Abstract
Many regularization schemes for high-dimensional regression have been put forward. Most require the choice of a tuning parameter, using model selection criteria or cross-validation schemes. We show that a simple non-negative or sign-constrained least squares is a very simple and effective regularization technique for a certain class of high-dimensional regression problems. The sign constraint has to be derived via prior knowledge or an initial estimator but no further tuning or cross-validation is necessary. The success depends on conditions that are easy to check in practice. A sufficient condition for our results is that most variables with the same sign constraint are positively correlated. For a sparse optimal predictor, a non-asymptotic bound on the L1-error of the regression coefficients is then proven. Without using any further regularization, the regression vector can be estimated consistently as long as \log(p) s/n -> 0 for n -> \infty, where s is the sparsity of the optimal regression vector, p the number of variables and n sample size. Network tomography is shown to be an application where the necessary conditions for success of non-negative least squares are naturally fulfilled and empirical results confirm the effectiveness of the sign constraint for sparse recovery.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2012
- DOI:
- 10.48550/arXiv.1202.0889
- arXiv:
- arXiv:1202.0889
- Bibcode:
- 2012arXiv1202.0889M
- Keywords:
-
- Statistics - Methodology