Stochastic Gradient Methods with Preconditioned Updates
Abstract
This work considers the non-convex finite sum minimization problem. There are several algorithms for such problems, but existing methods often work poorly when the problem is badly scaled and/or ill-conditioned, and a primary goal of this work is to introduce methods that alleviate this issue. Thus, here we include a preconditioner based on Hutchinson's approach to approximating the diagonal of the Hessian, and couple it with several gradient-based methods to give new scaled algorithms: Scaled SARAH and Scaled L-SVRG. Theoretical complexity guarantees under smoothness assumptions are presented. We prove linear convergence when both smoothness and the PL condition are assumed. Our adaptively scaled methods use approximate partial second-order curvature information and, therefore, can better mitigate the impact of badly scaled problems. This improved practical performance is demonstrated in the numerical experiments also presented in this work.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2022
- DOI:
- 10.48550/arXiv.2206.00285
- arXiv:
- arXiv:2206.00285
- Bibcode:
- 2022arXiv220600285S
- Keywords:
-
- Mathematics - Optimization and Control;
- Computer Science - Machine Learning
- E-Print:
- 40 pages, 2 new algorithms, 20 figures, 4 tables