Generalised Entropy MDPs and Minimax Regret
Abstract
Bayesian methods suffer from the problem of how to specify prior beliefs. One interesting idea is to consider worst-case priors. This requires solving a stochastic zero-sum game. In this paper, we extend well-known results from bandit theory in order to discover minimax-Bayes policies and discuss when they are practical.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2014
- DOI:
- 10.48550/arXiv.1412.3276
- arXiv:
- arXiv:1412.3276
- Bibcode:
- 2014arXiv1412.3276A
- Keywords:
-
- Computer Science - Machine Learning;
- Statistics - Machine Learning
- E-Print:
- 7 pages, NIPS workshop "From bad models to good policies"