LassoNet: A Neural Network with Feature Sparsity
Abstract
Much work has been done recently to make neural networks more interpretable, and one obvious approach is to arrange for the network to use only a subset of the available features. In linear models, Lasso (or $\ell_1$-regularized) regression assigns zero weights to the most irrelevant or redundant features, and is widely used in data science. However the Lasso only applies to linear models. Here we introduce LassoNet, a neural network framework with global feature selection. Our approach enforces a hierarchy: specifically a feature can participate in a hidden unit only if its linear representative is active. Unlike other approaches to feature selection for neural nets, our method uses a modified objective function with constraints, and so integrates feature selection with the parameter learning directly. As a result, it delivers an entire regularization path of solutions with a range of feature sparsity. On systematic experiments, LassoNet significantly outperforms state-of-the-art methods for feature selection and regression. The LassoNet method uses projected proximal gradient descent, and generalizes directly to deep networks. It can be implemented by adding just a few lines of code to a standard neural network.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2019
- DOI:
- 10.48550/arXiv.1907.12207
- arXiv:
- arXiv:1907.12207
- Bibcode:
- 2019arXiv190712207L
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Machine Learning
- E-Print:
- 18 pages, 10 fg. arXiv admin note: text overlap with arXiv:1901.09346 by other authors