Better Optimism By Bayes: Adaptive Planning with Rich Models
Abstract
The computational costs of inference and planning have confined Bayesian model-based reinforcement learning to one of two dismal fates: powerful Bayes-adaptive planning but only for simplistic models, or powerful, Bayesian non-parametric models but using simple, myopic planning strategies such as Thompson sampling. We ask whether it is feasible and truly beneficial to combine rich probabilistic models with a closer approximation to fully Bayesian planning. First, we use a collection of counterexamples to show formal problems with the over-optimism inherent in Thompson sampling. Then we leverage state-of-the-art techniques in efficient Bayes-adaptive planning and non-parametric Bayesian methods to perform qualitatively better than both existing conventional algorithms and Thompson sampling on two contextual bandit-like problems.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2014
- DOI:
- 10.48550/arXiv.1402.1958
- arXiv:
- arXiv:1402.1958
- Bibcode:
- 2014arXiv1402.1958G
- Keywords:
-
- Computer Science - Artificial Intelligence;
- Computer Science - Machine Learning;
- Statistics - Machine Learning
- E-Print:
- 11 pages, 11 figures