Recurrent World Models Facilitate Policy Evolution

doi:10.48550/arXiv.1809.01999

Recurrent World Models Facilitate Policy Evolution

A generative recurrent neural network is quickly trained in an unsupervised manner to model popular reinforcement learning environments through compressed spatio-temporal representations. The world model's extracted features are fed into compact and simple policies trained by evolution, achieving state of the art results in various environments. We also train our agent entirely inside of an environment generated by its own internal world model, and transfer this policy back into the actual environment. Interactive version of paper at https://worldmodels.github.io

Publication:

arXiv e-prints

Pub Date:

September 2018

DOI:

10.48550/arXiv.1809.01999

arXiv:

arXiv:1809.01999

Bibcode:

2018arXiv180901999H

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

E-Print:

To appear at NIPS 2018, selected for an oral presentation. arXiv admin note: substantial text overlap with arXiv:1803.10122

NASA/ADS

Recurrent World Models Facilitate Policy Evolution

Abstract