On Information Asymmetry in Competitive MultiAgent Reinforcement Learning: Convergence and Optimality
Abstract
In this work, we study the system of interacting noncooperative two Qlearning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which generally does not occur in an environment of general independent learners. The resulting postlearning policies are almost optimal in the underlying game sense, i.e., they form a Nash equilibrium. Furthermore, we propose in this work a Qlearning algorithm, requiring predictive observation of two subsequent opponent's actions, yielding an optimal strategy given that the latter applies a stationary strategy, and discuss the existence of the Nash equilibrium in the underlying information asymmetrical game.
 Publication:

arXiv eprints
 Pub Date:
 October 2020
 DOI:
 10.48550/arXiv.2010.10901
 arXiv:
 arXiv:2010.10901
 Bibcode:
 2020arXiv201010901T
 Keywords:

 Computer Science  Machine Learning;
 Computer Science  Computer Science and Game Theory;
 Computer Science  Multiagent Systems;
 Economics  Theoretical Economics;
 Electrical Engineering and Systems Science  Systems and Control
 EPrint:
 Preprint