From safe screening rules to working sets for faster Lassotype solvers
Abstract
Convex sparsitypromoting regularizations are ubiquitous in modern statistical learning. By construction, they yield solutions with few nonzero coefficients, which correspond to saturated constraints in the dual optimization formulation. Working set (WS) strategies are generic optimization techniques that consist in solving simpler problems that only consider a subset of constraints, whose indices form the WS. Working set methods therefore involve two nested iterations: the outer loop corresponds to the definition of the WS and the inner loop calls a solver for the subproblems. For the Lasso estimator a WS is a set of features, while for a Group Lasso it refers to a set of groups. In practice, WS are generally small in this context so the associated feature Gram matrix can fit in memory. Here we show that the GaussSouthwell rule (a greedy strategy for block coordinate descent techniques) leads to fast solvers in this case. Combined with a working set strategy based on an aggressive use of socalled Gap Safe screening rules, we propose a solver achieving stateoftheart performance on sparse learning problems. Results are presented on Lasso and multitask Lasso estimators.
 Publication:

arXiv eprints
 Pub Date:
 March 2017
 DOI:
 10.48550/arXiv.1703.07285
 arXiv:
 arXiv:1703.07285
 Bibcode:
 2017arXiv170307285M
 Keywords:

 Statistics  Machine Learning;
 Computer Science  Machine Learning;
 Mathematics  Optimization and Control;
 Statistics  Computation