Sample-efficient Cross-Entropy Method for Real-time Planning

doi:10.48550/arXiv.2008.06389

Sample-efficient Cross-Entropy Method for Real-time Planning

Trajectory optimizers for model-based reinforcement learning, such as the Cross-Entropy Method (CEM), can yield compelling results even in high-dimensional control tasks and sparse-reward environments. However, their sampling inefficiency prevents them from being used for real-time planning and control. We propose an improved version of the CEM algorithm for fast planning, with novel additions including temporally-correlated actions and memory, requiring 2.7-22x less samples and yielding a performance increase of 1.2-10x in high-dimensional control problems.

Publication:

arXiv e-prints

Pub Date:

August 2020

DOI:

10.48550/arXiv.2008.06389

arXiv:

arXiv:2008.06389

Bibcode:

2020arXiv200806389P

Keywords:

Computer Science - Machine Learning;
Computer Science - Robotics;
Statistics - Machine Learning

NASA/ADS

Sample-efficient Cross-Entropy Method for Real-time Planning

Abstract