Model-Based Regularization for Deep Reinforcement Learning with Transcoder Networks

doi:10.48550/arXiv.1809.01906

Model-Based Regularization for Deep Reinforcement Learning with Transcoder Networks

This paper proposes a new optimization objective for value-based deep reinforcement learning. We extend conventional Deep Q-Networks (DQNs) by adding a model-learning component yielding a transcoder network. The prediction errors for the model are included in the basic DQN loss as additional regularizers. This augmented objective leads to a richer training signal that provides feedback at every time step. Moreover, because learning an environment model shares a common structure with the RL problem, we hypothesize that the resulting objective improves both sample efficiency and performance. We empirically confirm our hypothesis on a range of 20 games from the Atari benchmark attaining superior results over vanilla DQN without model-based regularization.

Publication:

arXiv e-prints

Pub Date:

September 2018

DOI:

10.48550/arXiv.1809.01906

arXiv:

arXiv:1809.01906

Bibcode:

2018arXiv180901906L

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

E-Print:

Presented at the NIPS Deep Reinforcement Learning Workshop, Montreal, Canada, 2018

NASA/ADS

Model-Based Regularization for Deep Reinforcement Learning with Transcoder Networks

Abstract