Gradient-based Hyperparameter Optimization through Reversible Learning
Abstract
Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure. These gradients allow us to optimize thousands of hyperparameters, including step-size and momentum schedules, weight initialization distributions, richly parameterized regularization schemes, and neural network architectures. We compute hyperparameter gradients by exactly reversing the dynamics of stochastic gradient descent with momentum.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2015
- DOI:
- 10.48550/arXiv.1502.03492
- arXiv:
- arXiv:1502.03492
- Bibcode:
- 2015arXiv150203492M
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Machine Learning
- E-Print:
- 10 figures. Submitted to ICML