Average-Reward Reinforcement Learning with Entropy Regularization
Abstract
The average-reward formulation of reinforcement learning (RL) has drawn increased interest in recent years due to its ability to solve temporally-extended problems without discounting. Independently, RL algorithms have benefited from entropy-regularization: an approach used to make the optimal policy stochastic, thereby more robust to noise. Despite the distinct benefits of the two approaches, the combination of entropy regularization with an average-reward objective is not well-studied in the literature and there has been limited development of algorithms for this setting. To address this gap in the field, we develop algorithms for solving entropy-regularized average-reward RL problems with function approximation. We experimentally validate our method, comparing it with existing algorithms on standard benchmarks for RL.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2025
- arXiv:
- arXiv:2501.09080
- Bibcode:
- 2025arXiv250109080A
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence
- E-Print:
- Accepted at the AAAI-25 Eighth Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning (PRL)