Second-Order Guarantees of Stochastic Gradient Descent in Non-Convex Optimization

doi:10.48550/arXiv.1908.07023

Second-Order Guarantees of Stochastic Gradient Descent in Non-Convex Optimization

Recent years have seen increased interest in performance guarantees of gradient descent algorithms for non-convex optimization. A number of works have uncovered that gradient noise plays a critical role in the ability of gradient descent recursions to efficiently escape saddle-points and reach second-order stationary points. Most available works limit the gradient noise component to be bounded with probability one or sub-Gaussian and leverage concentration inequalities to arrive at high-probability results. We present an alternate approach, relying primarily on mean-square arguments and show that a more relaxed relative bound on the gradient noise variance is sufficient to ensure efficient escape from saddle-points without the need to inject additional noise, employ alternating step-sizes or rely on a global dispersive noise assumption, as long as a gradient noise component is present in a descent direction for every saddle-point.

Publication:

arXiv e-prints

Pub Date:

August 2019

DOI:

10.48550/arXiv.1908.07023

arXiv:

arXiv:1908.07023

Bibcode:

2019arXiv190807023V

Keywords:

Mathematics - Optimization and Control;
Computer Science - Machine Learning;
Statistics - Machine Learning

NASA/ADS

Second-Order Guarantees of Stochastic Gradient Descent in Non-Convex Optimization

Abstract