SGDA with shuffling: faster convergence for nonconvex-PŁ minimax optimization

doi:10.48550/arXiv.2210.05995

SGDA with shuffling: faster convergence for nonconvex-PŁ minimax optimization

Stochastic gradient descent-ascent (SGDA) is one of the main workhorses for solving finite-sum minimax optimization problems. Most practical implementations of SGDA randomly reshuffle components and sequentially use them (i.e., without-replacement sampling); however, there are few theoretical results on this approach for minimax algorithms, especially outside the easier-to-analyze (strongly-)monotone setups. To narrow this gap, we study the convergence bounds of SGDA with random reshuffling (SGDA-RR) for smooth nonconvex-nonconcave objectives with Polyak-Łojasiewicz (PŁ) geometry. We analyze both simultaneous and alternating SGDA-RR for nonconvex-PŁ and primal-PŁ-PŁ objectives, and obtain convergence rates faster than with-replacement SGDA. Our rates extend to mini-batch SGDA-RR, recovering known rates for full-batch gradient descent-ascent (GDA). Lastly, we present a comprehensive lower bound for GDA with an arbitrary step-size ratio, which matches the full-batch upper bound for the primal-PŁ-PŁ case.

Publication:

arXiv e-prints

Pub Date:

October 2022

DOI:

10.48550/arXiv.2210.05995

arXiv:

arXiv:2210.05995

Bibcode:

2022arXiv221005995C

Keywords:

Mathematics - Optimization and Control;
Statistics - Machine Learning

E-Print:

ICLR 2023 camera-ready version

NASA/ADS

SGDA with shuffling: faster convergence for nonconvex-PŁ minimax optimization

Abstract