Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games
Abstract
Policy-based methods with function approximation are widely used for solving two-player zero-sum games with large state and/or action spaces. However, it remains elusive how to obtain optimization and statistical guarantees for such algorithms. We present a new policy optimization algorithm with function approximation and prove that under standard regularity conditions on the Markov game and the function approximation class, our algorithm finds a near-optimal policy within a polynomial number of samples and iterations. To our knowledge, this is the first provably efficient policy optimization algorithm with function approximation that solves two-player zero-sum Markov games.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2021
- DOI:
- 10.48550/arXiv.2102.08903
- arXiv:
- arXiv:2102.08903
- Bibcode:
- 2021arXiv210208903Z
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Computer Science and Game Theory;
- Mathematics - Optimization and Control;
- Statistics - Machine Learning
- E-Print:
- AISTATS 2022