Explaining Off-Policy Actor-Critic From A Bias-Variance Perspective

doi:10.48550/arXiv.2110.02421

Explaining Off-Policy Actor-Critic From A Bias-Variance Perspective

Off-policy Actor-Critic algorithms have demonstrated phenomenal experimental performance but still require better explanations. To this end, we show its policy evaluation error on the distribution of transitions decomposes into: a Bellman error, a bias from policy mismatch, and a variance term from sampling. By comparing the magnitude of bias and variance, we explain the success of the Emphasizing Recent Experience sampling and 1/age weighted sampling. Both sampling strategies yield smaller bias and variance and are hence preferable to uniform sampling.

Publication:

arXiv e-prints

Pub Date:

October 2021

DOI:

10.48550/arXiv.2110.02421

arXiv:

arXiv:2110.02421

Bibcode:

2021arXiv211002421F

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence

ADS

Explaining Off-Policy Actor-Critic From A Bias-Variance Perspective

Abstract