Learning Causal State Representations of Partially Observable Environments
Abstract
Intelligent agents can cope with sensoryrich environments by learning taskagnostic state abstractions. In this paper, we propose an algorithm to approximate causal states, which are the coarsest partition of the joint history of actions and observations in partiallyobservable Markov decision processes (POMDP). Our method learns approximate causal state representations from RNNs trained to predict subsequent observations given the history. We demonstrate that these learned state representations are useful for learning policies efficiently in reinforcement learning problems with rich observation spaces. We connect causal states with causal feature sets from the causal inference literature, and also provide theoretical guarantees on the optimality of the continuous version of this causal state representation under Lipschitz assumptions by proving equivalence to bisimulation, a relation between behaviorally equivalent systems. This allows for lower bounds on the optimal value function of the learned representation, which is tight given certain assumptions. Finally, we empirically evaluate causal state representations using multiple partially observable tasks and compare with prior methods.
 Publication:

arXiv eprints
 Pub Date:
 June 2019
 DOI:
 10.48550/arXiv.1906.10437
 arXiv:
 arXiv:1906.10437
 Bibcode:
 2019arXiv190610437Z
 Keywords:

 Computer Science  Machine Learning;
 Statistics  Machine Learning
 EPrint:
 35 pages, 8 figures