Sufficient Markov Decision Processes with Alternating Deep Neural Networks
Abstract
Advances in mobile computing technologies have made it possible to monitor and apply datadriven interventions across complex systems in real time. Markov decision processes (MDPs) are the primary model for sequential decision problems with a large or indefinite time horizon. Choosing a representation of the underlying decision process that is both Markov and lowdimensional is nontrivial. We propose a method for constructing a lowdimensional representation of the original decision process for which: 1. the MDP model holds; 2. a decision strategy that maximizes mean utility when applied to the lowdimensional representation also maximizes mean utility when applied to the original process. We use a deep neural network to define a class of potential process representations and estimate the process of lowest dimension within this class. The method is illustrated using data from a mobile study on heavy drinking and smoking among college students.
 Publication:

arXiv eprints
 Pub Date:
 April 2017
 DOI:
 10.48550/arXiv.1704.07531
 arXiv:
 arXiv:1704.07531
 Bibcode:
 2017arXiv170407531W
 Keywords:

 Statistics  Methodology;
 Mathematics  Statistics Theory;
 Statistics  Machine Learning
 EPrint:
 31 pages, 3 figures, extended abstract in the proceedings of RLDM2017. (v2 revisions: Fixed a minor bug in the code w.r.t. setting seed, as a result numbers in the simulation experiments had some slight changes, but conclusions stayed the same. Corrected typos. Improved notations.)