MIME: Mutual Information Minimisation Exploration

doi:10.48550/arXiv.2001.05636

MIME: Mutual Information Minimisation Exploration

We show that reinforcement learning agents that learn by surprise (surprisal) get stuck at abrupt environmental transition boundaries because these transitions are difficult to learn. We propose a counter-intuitive solution that we call Mutual Information Minimising Exploration (MIME) where an agent learns a latent representation of the environment without trying to predict the future states. We show that our agent performs significantly better over sharp transition boundaries while matching the performance of surprisal driven agents elsewhere. In particular, we show state-of-the-art performance on difficult learning games such as Gravitar, Montezuma's Revenge and Doom.

Publication:

arXiv e-prints

Pub Date:

January 2020

DOI:

10.48550/arXiv.2001.05636

arXiv:

arXiv:2001.05636

Bibcode:

2020arXiv200105636X

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

NASA/ADS

MIME: Mutual Information Minimisation Exploration

Abstract