Learning Memory-Dependent Continuous Control from Demonstrations

doi:10.48550/arXiv.2102.09208

Learning Memory-Dependent Continuous Control from Demonstrations

Efficient exploration has presented a long-standing challenge in reinforcement learning, especially when rewards are sparse. A developmental system can overcome this difficulty by learning from both demonstrations and self-exploration. However, existing methods are not applicable to most real-world robotic controlling problems because they assume that environments follow Markov decision processes (MDP); thus, they do not extend to partially observable environments where historical observations are necessary for decision making. This paper builds on the idea of replaying demonstrations for memory-dependent continuous control, by proposing a novel algorithm, Recurrent Actor-Critic with Demonstration and Experience Replay (READER). Experiments involving several memory-crucial continuous control tasks reveal significantly reduce interactions with the environment using our method with a reasonably small number of demonstration samples. The algorithm also shows better sample efficiency and learning capabilities than a baseline reinforcement learning algorithm for memory-based control from demonstrations.

Publication:

arXiv e-prints

Pub Date:

February 2021

DOI:

10.48550/arXiv.2102.09208

arXiv:

arXiv:2102.09208

Bibcode:

2021arXiv210209208H

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence

E-Print:

10 pages, 6 figures

NASA/ADS

Learning Memory-Dependent Continuous Control from Demonstrations

Abstract