NearOptimal NoRegret Learning in General Games
Abstract
We show that Optimistic Hedge  a common variant of multiplicativeweightsupdates with recency bias  attains ${\rm poly}(\log T)$ regret in multiplayer generalsum games. In particular, when every player of the game uses Optimistic Hedge to iteratively update her strategy in response to the history of play so far, then after $T$ rounds of interaction, each player experiences total regret that is ${\rm poly}(\log T)$. Our bound improves, exponentially, the $O({T}^{1/2})$ regret attainable by standard noregret learners in games, the $O(T^{1/4})$ regret attainable by noregret learners with recency bias (Syrgkanis et al., 2015), and the ${O}(T^{1/6})$ bound that was recently shown for Optimistic Hedge in the special case of twoplayer games (Chen & Pen, 2020). A corollary of our bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of $\tilde{O}\left(\frac 1T\right)$.
 Publication:

arXiv eprints
 Pub Date:
 August 2021
 DOI:
 10.48550/arXiv.2108.06924
 arXiv:
 arXiv:2108.06924
 Bibcode:
 2021arXiv210806924D
 Keywords:

 Computer Science  Machine Learning
 EPrint:
 40 pages