Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

doi:10.48550/arXiv.1911.06854

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications. Given the increasing interest in deploying learning-based methods, there has been a flurry of recent proposals for OPE method, leading to a need for standardized empirical analyses. Our work takes a strong focus on diversity of experimental design to enable stress testing of OPE methods. We provide a comprehensive benchmarking suite to study the interplay of different attributes on method performance. We distill the results into a summarized set of guidelines for OPE in practice. Our software package, the Caltech OPE Benchmarking Suite (COBS), is open-sourced and we invite interested researchers to further contribute to the benchmark.

Publication:

arXiv e-prints

Pub Date:

November 2019

DOI:

10.48550/arXiv.1911.06854

arXiv:

arXiv:1911.06854

Bibcode:

2019arXiv191106854V

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Computer Science - Robotics;
Statistics - Machine Learning

NASA/ADS

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Abstract