Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning

doi:10.48550/arXiv.2006.07549

Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning

We propose a graphical model framework for goal-conditioned RL, with an EM algorithm that operates on the lower bound of the RL objective. The E-step provides a natural interpretation of how 'learning in hindsight' techniques, such as HER, to handle extremely sparse goal-conditioned rewards. The M-step reduces policy optimization to supervised learning updates, which greatly stabilizes end-to-end training on high-dimensional inputs such as images. We show that the combined algorithm, hEM significantly outperforms model-free baselines on a wide range of goal-conditioned benchmarks with sparse rewards.

Publication:

arXiv e-prints

Pub Date:

June 2020

DOI:

10.48550/arXiv.2006.07549

arXiv:

arXiv:2006.07549

Bibcode:

2020arXiv200607549T

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

E-Print:

Accepted at International Conference on Artificial Intelligence and Statistics (AISTATS), 2021

NASA/ADS

Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning

Abstract