Non-local Policy Optimization via Diversity-regularized Collaborative Exploration

doi:10.48550/arXiv.2006.07781

Non-local Policy Optimization via Diversity-regularized Collaborative Exploration

Conventional Reinforcement Learning (RL) algorithms usually have one single agent learning to solve the task independently. As a result, the agent can only explore a limited part of the state-action space while the learned behavior is highly correlated to the agent's previous experience, making the training prone to a local minimum. In this work, we empower RL with the capability of teamwork and propose a novel non-local policy optimization framework called Diversity-regularized Collaborative Exploration (DiCE). DiCE utilizes a group of heterogeneous agents to explore the environment simultaneously and share the collected experiences. A regularization mechanism is further designed to maintain the diversity of the team and modulate the exploration. We implement the framework in both on-policy and off-policy settings and the experimental results show that DiCE can achieve substantial improvement over the baselines in the MuJoCo locomotion tasks.

Publication:

arXiv e-prints

Pub Date:

June 2020

DOI:

10.48550/arXiv.2006.07781

arXiv:

arXiv:2006.07781

Bibcode:

2020arXiv200607781P

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

E-Print:

https://decisionforce.github.io/DiCE/

NASA/ADS

Non-local Policy Optimization via Diversity-regularized Collaborative Exploration

Abstract