On Information Asymmetry in Competitive Multi-Agent Reinforcement Learning: Convergence and Optimality

doi:10.48550/arXiv.2010.10901

On Information Asymmetry in Competitive Multi-Agent Reinforcement Learning: Convergence and Optimality

In this work, we study the system of interacting non-cooperative two Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which generally does not occur in an environment of general independent learners. The resulting post-learning policies are almost optimal in the underlying game sense, i.e., they form a Nash equilibrium. Furthermore, we propose in this work a Q-learning algorithm, requiring predictive observation of two subsequent opponent's actions, yielding an optimal strategy given that the latter applies a stationary strategy, and discuss the existence of the Nash equilibrium in the underlying information asymmetrical game.

Publication:

arXiv e-prints

Pub Date:

October 2020

DOI:

10.48550/arXiv.2010.10901

arXiv:

arXiv:2010.10901

Bibcode:

2020arXiv201010901T

Keywords:

Computer Science - Machine Learning;
Computer Science - Computer Science and Game Theory;
Computer Science - Multiagent Systems;
Economics - Theoretical Economics;
Electrical Engineering and Systems Science - Systems and Control

E-Print:

Preprint

NASA/ADS

On Information Asymmetry in Competitive Multi-Agent Reinforcement Learning: Convergence and Optimality

Abstract