Efficient ModelBased MultiAgent MeanField Reinforcement Learning
Abstract
Learning in multiagent systems is highly challenging due to the inherent complexity introduced by agents' interactions. We tackle systems with a huge population of interacting agents (e.g., swarms) via MeanField Control (MFC). MFC considers an asymptotically infinite population of identical agents that aim to collaboratively maximize the collective reward. Specifically, we consider the case of unknown system dynamics where the goal is to simultaneously optimize for the rewards and learn from experience. We propose an efficient modelbased reinforcement learning algorithm $\text{M}^3\text{UCRL}$ that runs in episodes and provably solves this problem. $\text{M}^3\text{UCRL}$ uses upperconfidence bounds to balance exploration and exploitation during policy learning. Our main theoretical contributions are the first general regret bounds for modelbased RL for MFC, obtained via a novel meanfield type analysis. $\text{M}^3\text{UCRL}$ can be instantiated with different models such as neural networks or Gaussian Processes, and effectively combined with neural network policy learning. We empirically demonstrate the convergence of $\text{M}^3\text{UCRL}$ on the swarm motion problem of controlling an infinite population of agents seeking to maximize locationdependent reward and avoid congested areas.
 Publication:

arXiv eprints
 Pub Date:
 July 2021
 arXiv:
 arXiv:2107.04050
 Bibcode:
 2021arXiv210704050P
 Keywords:

 Statistics  Machine Learning;
 Computer Science  Machine Learning;
 Computer Science  Multiagent Systems
 EPrint:
 28 pages, 2 figures, Preprint, Submitted to NeurIPS 2021