Deeply-Debiased Off-Policy Interval Estimation

doi:10.48550/arXiv.2105.04646

Deeply-Debiased Off-Policy Interval Estimation

Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel deeply-debiasing procedure to construct an efficient, robust, and flexible CI on a target policy's value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/RunzheStat/D2OPE.

Publication:

arXiv e-prints

Pub Date:

May 2021

DOI:

10.48550/arXiv.2105.04646

arXiv:

arXiv:2105.04646

Bibcode:

2021arXiv210504646S

Keywords:

Statistics - Machine Learning;
Computer Science - Artificial Intelligence;
Computer Science - Machine Learning

NASA/ADS

Deeply-Debiased Off-Policy Interval Estimation

Abstract