Second-Order Guarantees of Stochastic Gradient Descent in Non-Convex Optimization
Abstract
Recent years have seen increased interest in performance guarantees of gradient descent algorithms for non-convex optimization. A number of works have uncovered that gradient noise plays a critical role in the ability of gradient descent recursions to efficiently escape saddle-points and reach second-order stationary points. Most available works limit the gradient noise component to be bounded with probability one or sub-Gaussian and leverage concentration inequalities to arrive at high-probability results. We present an alternate approach, relying primarily on mean-square arguments and show that a more relaxed relative bound on the gradient noise variance is sufficient to ensure efficient escape from saddle-points without the need to inject additional noise, employ alternating step-sizes or rely on a global dispersive noise assumption, as long as a gradient noise component is present in a descent direction for every saddle-point.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2019
- DOI:
- 10.48550/arXiv.1908.07023
- arXiv:
- arXiv:1908.07023
- Bibcode:
- 2019arXiv190807023V
- Keywords:
-
- Mathematics - Optimization and Control;
- Computer Science - Machine Learning;
- Statistics - Machine Learning