Distributed Policy Evaluation Under Multiple Behavior Strategies

doi:10.48550/arXiv.1312.7606

Distributed Policy Evaluation Under Multiple Behavior Strategies

We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm in which agents in a network communicate only with their immediate neighbors to improve predictions about their environment. The algorithm can also be applied to off-policy learning, meaning that the agents can predict the response to a behavior different from the actual policies they are following. The proposed distributed strategy is efficient, with linear complexity in both computation time and memory footprint. We provide a mean-square-error performance analysis and establish convergence under constant step-size updates, which endow the network with continuous learning capabilities. The results show a clear gain from cooperation: when the individual agents can estimate the solution, cooperation increases stability and reduces bias and variance of the prediction error; but, more importantly, the network is able to approach the optimal solution even when none of the individual agents can (e.g., when the individual behavior policies restrict each agent to sample a small portion of the state space).

Publication:

arXiv e-prints

Pub Date:

December 2013

DOI:

10.48550/arXiv.1312.7606

arXiv:

arXiv:1312.7606

Bibcode:

2013arXiv1312.7606V

Keywords:

Computer Science - Multiagent Systems;
Computer Science - Artificial Intelligence;
Computer Science - Distributed;
Parallel;
and Cluster Computing;
Computer Science - Machine Learning

E-Print:

36 pages, 4 figures, accepted for publication on IEEE Transactions on Automatic Control

NASA/ADS

Distributed Policy Evaluation Under Multiple Behavior Strategies

Abstract