Neural Conditional Gradients
Abstract
The move from hand-designed to learned optimizers in machine learning has been quite successful for gradient-based and -free optimizers. When facing a constrained problem, however, maintaining feasibility typically requires a projection step, which might be computationally expensive and not differentiable. We show how the design of projection-free convex optimization algorithms can be cast as a learning problem based on Frank-Wolfe Networks: recurrent networks implementing the Frank-Wolfe algorithm aka. conditional gradients. This allows them to learn to exploit structure when, e.g., optimizing over rank-1 matrices. Our LSTM-learned optimizers outperform hand-designed as well learned but unconstrained ones. We demonstrate this for training support vector machines and softmax classifiers.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2018
- DOI:
- 10.48550/arXiv.1803.04300
- arXiv:
- arXiv:1803.04300
- Bibcode:
- 2018arXiv180304300S
- Keywords:
-
- Computer Science - Machine Learning;
- Statistics - Machine Learning
- E-Print:
- arXiv admin note: text overlap with arXiv:1610.05120 by other authors