Bellman Error Centering
Abstract
This paper revisits the recently proposed reward centering algorithms including simple reward centering (SRC) and value-based reward centering (VRC), and points out that SRC is indeed the reward centering, while VRC is essentially Bellman error centering (BEC). Based on BEC, we provide the centered fixpoint for tabular value functions, as well as the centered TD fixpoint for linear value function approximation. We design the on-policy CTD algorithm and the off-policy CTDC algorithm, and prove the convergence of both algorithms. Finally, we experimentally validate the stability of our proposed algorithms. Bellman error centering facilitates the extension to various reinforcement learning algorithms.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2025
- DOI:
- arXiv:
- arXiv:2502.03104
- Bibcode:
- 2025arXiv250203104C
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence