Almost sure convergence of stochastic Hamiltonian descent methods

doi:10.48550/arXiv.2406.16649

Almost sure convergence of stochastic Hamiltonian descent methods

Gradient normalization and soft clipping are two popular techniques for tackling instability issues and improving convergence of stochastic gradient descent (SGD) with momentum. In this article, we study these types of methods through the lens of dissipative Hamiltonian systems. Gradient normalization and certain types of soft clipping algorithms can be seen as (stochastic) implicit-explicit Euler discretizations of dissipative Hamiltonian systems, where the kinetic energy function determines the type of clipping that is applied. We make use of unified theory from dynamical systems to show that all of these schemes converge almost surely to stationary points of the objective function.

Publication:

arXiv e-prints

Pub Date:

June 2024

DOI:

10.48550/arXiv.2406.16649

arXiv:

arXiv:2406.16649

Bibcode:

2024arXiv240616649W

Keywords:

Mathematics - Optimization and Control

NASA/ADS

Almost sure convergence of stochastic Hamiltonian descent methods

Abstract