Active Reinforcement Learning: Observing Rewards at a Cost
Abstract
Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0. The central question of ARL is how to quantify the long-term value of reward information. Even in multi-armed bandits, computing the value of this information is intractable and we have to rely on heuristics. We propose and evaluate several heuristic approaches for ARL in multi-armed bandits and (tabular) Markov decision processes, and discuss and illustrate some challenging aspects of the ARL problem.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2020
- DOI:
- 10.48550/arXiv.2011.06709
- arXiv:
- arXiv:2011.06709
- Bibcode:
- 2020arXiv201106709K
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence;
- Statistics - Machine Learning
- E-Print:
- Originally appeared at the NeurIPS 2016 "Future of Interactive Learning Machines (FILM)" workshop