Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
Abstract
It is well known that quantifying uncertainty in the actionvalue estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to represent uncertainty in our actionvalue estimates. We first present an algorithm to learn parameterized indexed value function through a distributional version of temporal difference in a tabular setting and prove its regret bound. Then, in a computational point of view, we propose a dualnetwork architecture, Parameterized Indexed Networks (PINs), comprising one mean network and one uncertainty network to learn the indexed value function. Finally, we show the efficacy of PINs through computational experiments.
 Publication:

arXiv eprints
 Pub Date:
 December 2019
 arXiv:
 arXiv:1912.10577
 Bibcode:
 2019arXiv191210577T
 Keywords:

 Computer Science  Machine Learning;
 Computer Science  Artificial Intelligence;
 Statistics  Machine Learning
 EPrint:
 17 pages, 4 figures, Proceedings of the 34th AAAI Conference on Artificial Intelligence