Continuous-time Models for Stochastic Optimization Algorithms
Abstract
We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods. We exploit these continuous-time models, together with simple Lyapunov analysis as well as tools from stochastic calculus, in order to derive convergence bounds for various types of non-convex functions. Guided by such analysis, we show that the same Lyapunov arguments hold in discrete-time, leading to matching rates. In addition, we use these models and Ito calculus to infer novel insights on the dynamics of SGD, proving that a decreasing learning rate acts as time warping or, equivalently, as landscape stretching.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2018
- DOI:
- 10.48550/arXiv.1810.02565
- arXiv:
- arXiv:1810.02565
- Bibcode:
- 2018arXiv181002565O
- Keywords:
-
- Mathematics - Optimization and Control;
- Computer Science - Machine Learning
- E-Print:
- 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada