Finite-Memory Strategies in POMDPs with Long-Run Average Objectives

doi:10.48550/arXiv.1904.13360

Finite-Memory Strategies in POMDPs with Long-Run Average Objectives

Partially observable Markov decision processes (POMDPs) are standard models for dynamic systems with probabilistic and nondeterministic behaviour in uncertain environments. We prove that in POMDPs with long-run average objective, the decision maker has approximately optimal strategies with finite memory. This implies notably that approximating the long-run value is recursively enumerable, as well as a weak continuity property of the value with respect to the transition function.

Publication:

arXiv e-prints

Pub Date:

April 2019

DOI:

10.48550/arXiv.1904.13360

arXiv:

arXiv:1904.13360

Bibcode:

2019arXiv190413360C

Keywords:

Computer Science - Computer Science and Game Theory;
Mathematics - Optimization and Control;
90C39;
90C40;
37A50

E-Print:

doi:10.1287/moor.2020.1116

NASA/ADS

Finite-Memory Strategies in POMDPs with Long-Run Average Objectives

Abstract