Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

doi:10.48550/arXiv.2006.16495

Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

Choosing the right parameters for optimization algorithms is often the key to their success in practice. Solving this problem using a learning-to-learn approach -- using meta-gradient descent on a meta-objective based on the trajectory that the optimizer generates -- was recently shown to be effective. However, the meta-optimization problem is difficult. In particular, the meta-gradient can often explode/vanish, and the learned optimizer may not have good generalization performance if the meta-objective is not chosen carefully. In this paper we give meta-optimization guarantees for the learning-to-learn approach on a simple problem of tuning the step size for quadratic loss. Our results show that the naïve objective suffers from meta-gradient explosion/vanishing problem. Although there is a way to design the meta-objective so that the meta-gradient remains polynomially bounded, computing the meta-gradient directly using backpropagation leads to numerical issues. We also characterize when it is necessary to compute the meta-objective on a separate validation set to ensure the generalization performance of the learned optimizer. Finally, we verify our results empirically and show that a similar phenomenon appears even for more complicated learned optimizers parametrized by neural networks.

Publication:

arXiv e-prints

Pub Date:

June 2020

DOI:

10.48550/arXiv.2006.16495

arXiv:

arXiv:2006.16495

Bibcode:

2020arXiv200616495W

Keywords:

Statistics - Machine Learning;
Computer Science - Machine Learning

E-Print:

ICML 2021. Added proof sketch

NASA/ADS

Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

Abstract