Generalised Entropy MDPs and Minimax Regret

doi:10.48550/arXiv.1412.3276

Generalised Entropy MDPs and Minimax Regret

Bayesian methods suffer from the problem of how to specify prior beliefs. One interesting idea is to consider worst-case priors. This requires solving a stochastic zero-sum game. In this paper, we extend well-known results from bandit theory in order to discover minimax-Bayes policies and discuss when they are practical.

Publication:

arXiv e-prints

Pub Date:

December 2014

DOI:

10.48550/arXiv.1412.3276

arXiv:

arXiv:1412.3276

Bibcode:

2014arXiv1412.3276A

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

E-Print:

7 pages, NIPS workshop "From bad models to good policies"

NASA/ADS

Generalised Entropy MDPs and Minimax Regret

Abstract