Neural Conditional Gradients

doi:10.48550/arXiv.1803.04300

Neural Conditional Gradients

The move from hand-designed to learned optimizers in machine learning has been quite successful for gradient-based and -free optimizers. When facing a constrained problem, however, maintaining feasibility typically requires a projection step, which might be computationally expensive and not differentiable. We show how the design of projection-free convex optimization algorithms can be cast as a learning problem based on Frank-Wolfe Networks: recurrent networks implementing the Frank-Wolfe algorithm aka. conditional gradients. This allows them to learn to exploit structure when, e.g., optimizing over rank-1 matrices. Our LSTM-learned optimizers outperform hand-designed as well learned but unconstrained ones. We demonstrate this for training support vector machines and softmax classifiers.

Publication:

arXiv e-prints

Pub Date:

March 2018

DOI:

10.48550/arXiv.1803.04300

arXiv:

arXiv:1803.04300

Bibcode:

2018arXiv180304300S

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

E-Print:

arXiv admin note: text overlap with arXiv:1610.05120 by other authors

NASA/ADS

Neural Conditional Gradients

Abstract