A Geometric Structure of Acceleration and Its Role in Making Gradients Small Fast
Abstract
Since Nesterov's seminal 1983 work, many accelerated first-order optimization methods have been proposed, but their analyses lacks a common unifying structure. In this work, we identify a geometric structure satisfied by a wide range of first-order accelerated methods. Using this geometric insight, we present several novel generalizations of accelerated methods. Most interesting among them is a method that reduces the squared gradient norm with $\mathcal{O}(1/K^4)$ rate in the prox-grad setup, faster than the $\mathcal{O}(1/K^3)$ rates of Nesterov's FGM or Kim and Fessler's FPGM-m.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2021
- DOI:
- arXiv:
- arXiv:2106.10439
- Bibcode:
- 2021arXiv210610439L
- Keywords:
-
- Mathematics - Optimization and Control
- E-Print:
- Published in the Neural Information Processing Systems, 2021