Generalized Individual Q-learning for Polymatrix Games with Partial Observations
Abstract
This paper addresses the challenge of limited observations in non-cooperative multi-agent systems where agents can have partial access to other agents' actions. We present the generalized individual Q-learning dynamics that combine belief-based and payoff-based learning for the networked interconnections of more than two self-interested agents. This approach leverages access to opponents' actions whenever possible, demonstrably achieving a faster (guaranteed) convergence to quantal response equilibrium in multi-agent zero-sum and potential polymatrix games. Notably, the dynamics reduce to the well-studied smoothed fictitious play and individual Q-learning under full and no access to opponent actions, respectively. We further quantify the improvement in convergence rate due to observing opponents' actions through numerical simulations.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2024
- DOI:
- arXiv:
- arXiv:2409.02663
- Bibcode:
- 2024arXiv240902663S
- Keywords:
-
- Computer Science - Computer Science and Game Theory;
- Electrical Engineering and Systems Science - Systems and Control
- E-Print:
- Extended version (including proofs of Propositions 1 and 2) of the paper: A. S. Donmez and M. O. Sayin, "Generalized individual Q-learning for polymatrix games with partial observations", to appear in the Proceedings of the 63rd IEEE Conference on Decision and Control, 2024