On Bellman's Optimality Principle for zs-POSGs

doi:10.48550/arXiv.2006.16395

On Bellman's Optimality Principle for zs-POSGs

Many non-trivial sequential decision-making problems are efficiently solved by relying on Bellman's optimality principle, i.e., exploiting the fact that sub-problems are nested recursively within the original problem. Here we show how it can apply to (infinite horizon) 2-player zero-sum partially observable stochastic games (zs-POSGs) by (i) taking a central planner's viewpoint, which can only reason on a sufficient statistic called occupancy state, and (ii) turning such problems into zero-sum occupancy Markov games (zs-OMGs). Then, exploiting the Lipschitz-continuity of the value function in occupancy space, one can derive a version of the HSVI algorithm (Heuristic Search Value Iteration) that provably finds an $\epsilon$-Nash equilibrium in finite time.

Publication:

arXiv e-prints

Pub Date:

June 2020

DOI:

10.48550/arXiv.2006.16395

arXiv:

arXiv:2006.16395

Bibcode:

2020arXiv200616395B

Keywords:

Computer Science - Artificial Intelligence;
Computer Science - Computer Science and Game Theory;
I.2.8

E-Print:

18 pages, 0 figures, 1 algorithm

NASA/ADS

On Bellman's Optimality Principle for zs-POSGs

Abstract