Escaping SaddlePoints Faster under Interpolationlike Conditions
Abstract
In this paper, we show that under overparametrization several standard stochastic optimization algorithms escape saddlepoints and converge to localminimizers much faster. One of the fundamental aspects of overparametrized models is that they are capable of interpolating the training data. We show that, under interpolationlike assumptions satisfied by the stochastic gradients in an overparametrization setting, the firstorder oracle complexity of Perturbed Stochastic Gradient Descent (PSGD) algorithm to reach an $\epsilon$localminimizer, matches the corresponding deterministic rate of $\tilde{\mathcal{O}}(1/\epsilon^{2})$. We next analyze Stochastic CubicRegularized Newton (SCRN) algorithm under interpolationlike conditions, and show that the oracle complexity to reach an $\epsilon$localminimizer under interpolationlike conditions, is $\tilde{\mathcal{O}}(1/\epsilon^{2.5})$. While this obtained complexity is better than the corresponding complexity of either PSGD, or SCRN without interpolationlike assumptions, it does not match the rate of $\tilde{\mathcal{O}}(1/\epsilon^{1.5})$ corresponding to deterministic CubicRegularized Newton method. It seems further Hessianbased interpolationlike assumptions are necessary to bridge this gap. We also discuss the corresponding improved complexities in the zerothorder settings.
 Publication:

arXiv eprints
 Pub Date:
 September 2020
 arXiv:
 arXiv:2009.13016
 Bibcode:
 2020arXiv200913016R
 Keywords:

 Statistics  Machine Learning;
 Computer Science  Machine Learning;
 Mathematics  Optimization and Control;
 Mathematics  Statistics Theory
 EPrint:
 To appear in NeurIPS, 2020