DisCoRL: Continual Reinforcement Learning via Policy Distillation

doi:10.48550/arXiv.1907.05855

DisCoRL: Continual Reinforcement Learning via Policy Distillation

In multi-task reinforcement learning there are two main challenges: at training time, the ability to learn different policies with a single model; at test time, inferring which of those policies applying without an external signal. In the case of continual reinforcement learning a third challenge arises: learning tasks sequentially without forgetting the previous ones. In this paper, we tackle these challenges by proposing DisCoRL, an approach combining state representation learning and policy distillation. We experiment on a sequence of three simulated 2D navigation tasks with a 3 wheel omni-directional robot. Moreover, we tested our approach's robustness by transferring the final policy into a real life setting. The policy can solve all tasks and automatically infer which one to run.

Publication:

arXiv e-prints

Pub Date:

July 2019

DOI:

10.48550/arXiv.1907.05855

arXiv:

arXiv:1907.05855

Bibcode:

2019arXiv190705855T

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Statistics - Machine Learning

E-Print:

arXiv admin note: text overlap with arXiv:1906.04452

NASA/ADS

DisCoRL: Continual Reinforcement Learning via Policy Distillation

Abstract