AEGD: Adaptive Gradient Descent with Energy
Abstract
We propose AEGD, a new algorithm for first-order gradient-based optimization of non-convex objective functions, based on a dynamically updated energy variable. The method is shown to be unconditionally energy stable, irrespective of the step size. We prove energy-dependent convergence rates of AEGD for both non-convex and convex objectives, which for a suitably small step size recovers desired convergence rates for the batch gradient descent. We also provide an energy-dependent bound on the stationary convergence of AEGD in the stochastic non-convex setting. The method is straightforward to implement and requires little tuning of hyper-parameters. Experimental results demonstrate that AEGD works well for a large variety of optimization problems: it is robust with respect to initial data, capable of making rapid initial progress. The stochastic AEGD shows comparable and often better generalization performance than SGD with momentum for deep neural networks.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2020
- DOI:
- 10.48550/arXiv.2010.05109
- arXiv:
- arXiv:2010.05109
- Bibcode:
- 2020arXiv201005109L
- Keywords:
-
- Mathematics - Optimization and Control;
- Computer Science - Machine Learning;
- Mathematics - Numerical Analysis;
- Statistics - Machine Learning;
- 65K10 (Primary) 90C15;
- 68Q25 (Secondary)
- E-Print:
- 25 pages, 6 figures, submitted to SIAM J. Optimization