Convergence Rates of Stochastic Zeroth-order Gradient Descent for Ł ojasiewicz Functions
Abstract
We prove convergence rates of Stochastic Zeroth-order Gradient Descent (SZGD) algorithms for Lojasiewicz functions. The SZGD algorithm iterates as \begin{align*} \mathbf{x}_{t+1} = \mathbf{x}_t - \eta_t \widehat{\nabla} f (\mathbf{x}_t), \qquad t = 0,1,2,3,\cdots , \end{align*} where $f$ is the objective function that satisfies the Łojasiewicz inequality with Łojasiewicz exponent $\theta$, $\eta_t$ is the step size (learning rate), and $ \widehat{\nabla} f (\mathbf{x}_t) $ is the approximate gradient estimated using zeroth-order information only. Our results show that $ \{ f (\mathbf{x}_t) - f (\mathbf{x}_\infty) \}_{t \in \mathbb{N} } $ can converge faster than $ \{ \| \mathbf{x}_t - \mathbf{x}_\infty \| \}_{t \in \mathbb{N} }$, regardless of whether the objective $f$ is smooth or nonsmooth.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2022
- DOI:
- 10.48550/arXiv.2210.16997
- arXiv:
- arXiv:2210.16997
- Bibcode:
- 2022arXiv221016997W
- Keywords:
-
- Mathematics - Optimization and Control;
- Computer Science - Machine Learning