Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods

doi:10.48550/arXiv.1910.04295

Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods

We investigate reinforcement learning for mean field control problems in discrete time, which can be viewed as Markov decision processes for a large number of exchangeable agents interacting in a mean field manner. Such problems arise, for instance when a large number of robots communicate through a central unit dispatching the optimal policy computed by minimizing the overall social cost. An approximate solution is obtained by learning the optimal policy of a generic agent interacting with the statistical distribution of the states of the other agents. We prove rigorously the convergence of exact and model-free policy gradient methods in a mean-field linear-quadratic setting. We also provide graphical evidence of the convergence based on implementations of our algorithms.

Publication:

arXiv e-prints

Pub Date:

October 2019

DOI:

10.48550/arXiv.1910.04295

arXiv:

arXiv:1910.04295

Bibcode:

2019arXiv191004295C

Keywords:

Mathematics - Optimization and Control;
Computer Science - Machine Learning

NASA/ADS

Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods

Abstract