MDP Playground: A Design and Debug Testbed for Reinforcement Learning
Abstract
We present \emph{MDP Playground}, an efficient testbed for Reinforcement Learning (RL) agents with \textit{orthogonal} dimensions that can be controlled independently to challenge agents in different ways and obtain varying degrees of hardness in generated environments. We consider and allow control over a wide variety of dimensions, including \textit{delayed rewards}, \textit{rewardable sequences}, \textit{density of rewards}, \textit{stochasticity}, \textit{image representations}, \textit{irrelevant features}, \textit{time unit}, \textit{action range} and more. We define a parameterised collection of fast-to-run toy environments in \textit{OpenAI Gym} by varying these dimensions and propose to use these for the initial design and development of agents. We also provide wrappers that inject these dimensions into complex environments from \textit{Atari} and \textit{Mujoco} to allow for evaluating agent robustness. We further provide various example use-cases and instructions on how to use \textit{MDP Playground} to design and debug agents. We believe that \textit{MDP Playground} is a valuable testbed for researchers designing new, adaptive and intelligent RL agents and those wanting to unit test their agents.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2019
- DOI:
- 10.48550/arXiv.1909.07750
- arXiv:
- arXiv:1909.07750
- Bibcode:
- 2019arXiv190907750R
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence;
- Statistics - Machine Learning
- E-Print:
- NeurIPS 2021 Data and Benchmark Track submission (with slight formatting differences, most notably citation style)