Finite-Memory Strategies in POMDPs with Long-Run Average Objectives
Abstract
Partially observable Markov decision processes (POMDPs) are standard models for dynamic systems with probabilistic and nondeterministic behaviour in uncertain environments. We prove that in POMDPs with long-run average objective, the decision maker has approximately optimal strategies with finite memory. This implies notably that approximating the long-run value is recursively enumerable, as well as a weak continuity property of the value with respect to the transition function.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2019
- DOI:
- 10.48550/arXiv.1904.13360
- arXiv:
- arXiv:1904.13360
- Bibcode:
- 2019arXiv190413360C
- Keywords:
-
- Computer Science - Computer Science and Game Theory;
- Mathematics - Optimization and Control;
- 90C39;
- 90C40;
- 37A50
- E-Print:
- doi:10.1287/moor.2020.1116