Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games

doi:10.48550/arXiv.2102.08903

Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games

Policy-based methods with function approximation are widely used for solving two-player zero-sum games with large state and/or action spaces. However, it remains elusive how to obtain optimization and statistical guarantees for such algorithms. We present a new policy optimization algorithm with function approximation and prove that under standard regularity conditions on the Markov game and the function approximation class, our algorithm finds a near-optimal policy within a polynomial number of samples and iterations. To our knowledge, this is the first provably efficient policy optimization algorithm with function approximation that solves two-player zero-sum Markov games.

Publication:

arXiv e-prints

Pub Date:

February 2021

DOI:

10.48550/arXiv.2102.08903

arXiv:

arXiv:2102.08903

Bibcode:

2021arXiv210208903Z

Keywords:

Computer Science - Machine Learning;
Computer Science - Computer Science and Game Theory;
Mathematics - Optimization and Control;
Statistics - Machine Learning

E-Print:

AISTATS 2022

NASA/ADS

Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games

Abstract