On Information Asymmetry in Competitive Multi-Agent Reinforcement Learning: Convergence and Optimality
Abstract
In this work, we study the system of interacting non-cooperative two Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which generally does not occur in an environment of general independent learners. The resulting post-learning policies are almost optimal in the underlying game sense, i.e., they form a Nash equilibrium. Furthermore, we propose in this work a Q-learning algorithm, requiring predictive observation of two subsequent opponent's actions, yielding an optimal strategy given that the latter applies a stationary strategy, and discuss the existence of the Nash equilibrium in the underlying information asymmetrical game.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2020
- DOI:
- 10.48550/arXiv.2010.10901
- arXiv:
- arXiv:2010.10901
- Bibcode:
- 2020arXiv201010901T
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Computer Science and Game Theory;
- Computer Science - Multiagent Systems;
- Economics - Theoretical Economics;
- Electrical Engineering and Systems Science - Systems and Control
- E-Print:
- Preprint