Combinations and Mixtures of Optimal Policies in Unichain Markov Decision Processes are Optimal
Abstract
We show that combinations of optimal (stationary) policies in unichain Markov decision processes are optimal. That is, let M be a unichain Markov decision process with state space S, action space A and policies \pi_j^*: S -> A (1\leq j\leq n) with optimal average infinite horizon reward. Then any combination \pi of these policies, where for each state i in S there is a j such that \pi(i)=\pi_j^*(i), is optimal as well. Furthermore, we prove that any mixture of optimal policies, where at each visit in a state i an arbitrary action \pi_j^*(i) of an optimal policy is chosen, yields optimal average reward, too.
- Publication:
-
arXiv Mathematics e-prints
- Pub Date:
- August 2005
- DOI:
- arXiv:
- arXiv:math/0508319
- Bibcode:
- 2005math......8319O
- Keywords:
-
- Combinatorics;
- Discrete Mathematics;
- Learning;
- Optimization and Control;
- Probability;
- 90C40
- E-Print:
- 9 pages