Independent Policy Gradient for LargeScale Markov Potential Games: Sharper Rates, Function Approximation, and GameAgnostic Convergence
Abstract
We examine global nonasymptotic convergence properties of policy gradient methods for multiagent reinforcement learning (RL) problems in Markov potential games (MPG). To learn a Nash equilibrium of an MPG in which the size of state space and/or the number of players can be very large, we propose new independent policy gradient algorithms that are run by all players in tandem. When there is no uncertainty in the gradient evaluation, we show that our algorithm finds an $\epsilon$Nash equilibrium with $O(1/\epsilon^2)$ iteration complexity which does not explicitly depend on the state space size. When the exact gradient is not available, we establish $O(1/\epsilon^5)$ sample complexity bound in a potentially infinitely large state space for a samplebased algorithm that utilizes function approximation. Moreover, we identify a class of independent policy gradient algorithms that enjoys convergence for both zerosum Markov games and Markov cooperative games with the players that are oblivious to the types of games being played. Finally, we provide computational experiments to corroborate the merits and the effectiveness of our theoretical developments.
 Publication:

arXiv eprints
 Pub Date:
 February 2022
 arXiv:
 arXiv:2202.04129
 Bibcode:
 2022arXiv220204129D
 Keywords:

 Computer Science  Machine Learning;
 Computer Science  Computer Science and Game Theory;
 Computer Science  Multiagent Systems;
 Mathematics  Optimization and Control
 EPrint:
 55 pages, 6 figures