Central-limit approach to risk-aware Markov decision processes
Abstract
Whereas classical Markov decision processes maximize the expected reward, we consider minimizing the risk. We propose to evaluate the risk associated to a given policy over a long-enough time horizon with the help of a central limit theorem. The proposed approach works whether the transition probabilities are known or not. We also provide a gradient-based policy improvement algorithm that converges to a local optimum of the risk objective.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2015
- DOI:
- 10.48550/arXiv.1512.00583
- arXiv:
- arXiv:1512.00583
- Bibcode:
- 2015arXiv151200583Y
- Keywords:
-
- Mathematics - Optimization and Control;
- Computer Science - Systems and Control
- E-Print:
- arXiv admin note: text overlap with arXiv:1403.6530 by other authors