Generalized Individual Q-learning for Polymatrix Games with Partial Observations

doi:10.48550/arXiv.2409.02663

Generalized Individual Q-learning for Polymatrix Games with Partial Observations

This paper addresses the challenge of limited observations in non-cooperative multi-agent systems where agents can have partial access to other agents' actions. We present the generalized individual Q-learning dynamics that combine belief-based and payoff-based learning for the networked interconnections of more than two self-interested agents. This approach leverages access to opponents' actions whenever possible, demonstrably achieving a faster (guaranteed) convergence to quantal response equilibrium in multi-agent zero-sum and potential polymatrix games. Notably, the dynamics reduce to the well-studied smoothed fictitious play and individual Q-learning under full and no access to opponent actions, respectively. We further quantify the improvement in convergence rate due to observing opponents' actions through numerical simulations.

Publication:

arXiv e-prints

Pub Date:

September 2024

DOI:

10.48550/arXiv.2409.02663

arXiv:

arXiv:2409.02663

Bibcode:

2024arXiv240902663S

Keywords:

Computer Science - Computer Science and Game Theory;
Electrical Engineering and Systems Science - Systems and Control

E-Print:

Extended version (including proofs of Propositions 1 and 2) of the paper: A. S. Donmez and M. O. Sayin, "Generalized individual Q-learning for polymatrix games with partial observations", to appear in the Proceedings of the 63rd IEEE Conference on Decision and Control, 2024

ADS

Generalized Individual Q-learning for Polymatrix Games with Partial Observations

Abstract