Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation
Abstract
In many environments, only a relatively small subset of the complete state space is necessary in order to accomplish a given task. We develop a simple technique using emergency stops (e-stops) to exploit this phenomenon. Using e-stops significantly improves sample complexity by reducing the amount of required exploration, while retaining a performance bound that efficiently trades off the rate of convergence with a small asymptotic sub-optimality gap. We analyze the regret behavior of e-stops and present empirical results in discrete and continuous settings demonstrating that our reset mechanism can provide order-of-magnitude speedups on top of existing reinforcement learning methods.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2019
- DOI:
- 10.48550/arXiv.1912.01649
- arXiv:
- arXiv:1912.01649
- Bibcode:
- 2019arXiv191201649A
- Keywords:
-
- Computer Science - Machine Learning;
- Statistics - Machine Learning
- E-Print:
- NeurIPS 2019