Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive Step Size

doi:10.48550/arXiv.2112.14872

Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive Step Size

Establishing a fast rate of convergence for optimization methods is crucial to their applicability in practice. With the increasing popularity of deep learning over the past decade, stochastic gradient descent and its adaptive variants (e.g. Adagrad, Adam, etc.) have become prominent methods of choice for machine learning practitioners. While a large number of works have demonstrated that these first order optimization methods can achieve sub-linear or linear convergence, we establish local quadratic convergence for stochastic gradient descent with adaptive step size for problems such as matrix inversion.

Publication:

arXiv e-prints

Pub Date:

December 2021

DOI:

10.48550/arXiv.2112.14872

arXiv:

arXiv:2112.14872

Bibcode:

2021arXiv211214872R

Keywords:

Mathematics - Optimization and Control;
Computer Science - Machine Learning

E-Print:

ICML 2021 Workshop on Beyond first-order methods in ML systems

NASA/ADS

Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive Step Size

Abstract