Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning

doi:10.48550/arXiv.1705.03562

Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning

Stenberg Hansen, Steven

We present a new deep meta reinforcement learner, which we call Deep Episodic Value Iteration (DEVI). DEVI uses a deep neural network to learn a similarity metric for a non-parametric model-based reinforcement learning algorithm. Our model is trained end-to-end via back-propagation. Despite being trained using the model-free Q-learning objective, we show that DEVI's model-based internal structure provides `one-shot' transfer to changes in reward and transition structure, even for tasks with very high-dimensional state spaces.

Publication:

arXiv e-prints

Pub Date:

May 2017

DOI:

10.48550/arXiv.1705.03562

arXiv:

arXiv:1705.03562

Bibcode:

2017arXiv170503562S

Keywords:

Statistics - Machine Learning;
Computer Science - Artificial Intelligence;
Computer Science - Machine Learning

NASA/ADS

Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning

Abstract