Hierarchical Universal Value Function Approximators

doi:10.48550/arXiv.2410.08997

Hierarchical Universal Value Function Approximators

Arora, Rushiv

There have been key advancements to building universal approximators for multi-goal collections of reinforcement learning value functions -- key elements in estimating long-term returns of states in a parameterized manner. We extend this to hierarchical reinforcement learning, using the options framework, by introducing hierarchical universal value function approximators (H-UVFAs). This allows us to leverage the added benefits of scaling, planning, and generalization expected in temporal abstraction settings. We develop supervised and reinforcement learning methods for learning embeddings of the states, goals, options, and actions in the two hierarchical value functions: $Q(s, g, o; \theta)$ and $Q(s, g, o, a; \theta)$. Finally we demonstrate generalization of the HUVFAs and show they outperform corresponding UVFAs.

Publication:

arXiv e-prints

Pub Date:

October 2024

DOI:

10.48550/arXiv.2410.08997

arXiv:

arXiv:2410.08997

Bibcode:

2024arXiv241008997A

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Statistics - Machine Learning;
I.2.6

E-Print:

13 pages, 11 figures, 3 appendices. Currently under review

NASA/ADS

Hierarchical Universal Value Function Approximators

Abstract