Using Deep Q-Learning to Control Optimization Hyperparameters
Abstract
We present a novel definition of the reinforcement learning state, actions and reward function that allows a deep Q-network (DQN) to learn to control an optimization hyperparameter. Using Q-learning with experience replay, we train two DQNs to accept a state representation of an objective function as input and output the expected discounted return of rewards, or q-values, connected to the actions of either adjusting the learning rate or leaving it unchanged. The two DQNs learn a policy similar to a line search, but differ in the number of allowed actions. The trained DQNs in combination with a gradient-based update routine form the basis of the Q-gradient descent algorithms. To demonstrate the viability of this framework, we show that the DQN's q-values associated with optimal action converge and that the Q-gradient descent algorithms outperform gradient descent with an Armijo or nonmonotone line search. Unlike traditional optimization methods, Q-gradient descent can incorporate any objective statistic and by varying the actions we gain insight into the type of learning rate adjustment strategies that are successful for neural network optimization.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2016
- DOI:
- 10.48550/arXiv.1602.04062
- arXiv:
- arXiv:1602.04062
- Bibcode:
- 2016arXiv160204062H
- Keywords:
-
- Mathematics - Optimization and Control;
- Computer Science - Machine Learning