Measuring and Characterizing Generalization in Deep Reinforcement Learning

doi:10.48550/arXiv.1812.02868

Measuring and Characterizing Generalization in Deep Reinforcement Learning

Deep reinforcement-learning methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re-examine what is meant by generalization in RL, and propose several definitions based on an agent's performance in on-policy, off-policy, and unreachable states. We propose a set of practical methods for evaluating agents with these definitions of generalization. We demonstrate these techniques on a common benchmark task for deep RL, and we show that the learned networks make poor decisions for states that differ only slightly from on-policy states, even though those states are not selected adversarially. Taken together, these results call into question the extent to which deep Q-networks learn generalized representations, and suggest that more experimentation and analysis is necessary before claims of representation learning can be supported.

Publication:

arXiv e-prints

Pub Date:

December 2018

DOI:

10.48550/arXiv.1812.02868

arXiv:

arXiv:1812.02868

Bibcode:

2018arXiv181202868W

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Statistics - Machine Learning

NASA/ADS

Measuring and Characterizing Generalization in Deep Reinforcement Learning

Abstract