Toward a Unified Theory of Gradient Descent under Generalized Smoothness
Abstract
We study the classical optimization problem $\min_{x \in \mathbb{R}^d} f(x)$ and analyze the gradient descent (GD) method in both nonconvex and convex settings. It is well-known that, under the $L$-smoothness assumption ($\|\nabla^2 f(x)\| \leq L$), the optimal point minimizing the quadratic upper bound $f(x_k) + \langle\nabla f(x_k), x_{k+1} - x_k\rangle + \frac{L}{2} \|x_{k+1} - x_k\|^2$ is $x_{k+1} = x_k - \gamma_k \nabla f(x_k)$ with step size $\gamma_k = \frac{1}{L}$. Surprisingly, a similar result can be derived under the $\ell$-generalized smoothness assumption ($\|\nabla^2 f(x)\| \leq \ell(\|\nabla f(x)\|)$). In this case, we derive the step size $$\gamma_k = \int_{0}^{1} \frac{d v}{\ell(\|\nabla f(x_k)\| + \|\nabla f(x_k)\| v)}.$$ Using this step size rule, we improve upon existing theoretical convergence rates and obtain new results in several previously unexplored setups.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.11773
- Bibcode:
- 2024arXiv241211773T
- Keywords:
-
- Mathematics - Optimization and Control