Quasi-hyperbolic momentum and Adam for deep learning

doi:10.48550/arXiv.1810.06801

Quasi-hyperbolic momentum and Adam for deep learning

Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. We describe numerous connections to and identities with other algorithms, and we characterize the set of two-state optimization algorithms that QHM can recover. Finally, we propose a QH variant of Adam called QHAdam, and we empirically demonstrate that our algorithms lead to significantly improved training in a variety of settings, including a new state-of-the-art result on WMT16 EN-DE. We hope that these empirical results, combined with the conceptual and practical simplicity of QHM and QHAdam, will spur interest from both practitioners and researchers. Code is immediately available.

Publication:

arXiv e-prints

Pub Date:

October 2018

DOI:

10.48550/arXiv.1810.06801

arXiv:

arXiv:1810.06801

Bibcode:

2018arXiv181006801M

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

E-Print:

Published as a conference paper at ICLR 2019. This version corrects one typological error in the published text

NASA/ADS

Quasi-hyperbolic momentum and Adam for deep learning

Abstract