$\pi2\text{vec}$: Policy Representations with Successor Features
Abstract
This paper describes $\pi2\text{vec}$, a method for representing behaviors of black box policies as feature vectors. The policy representations capture how the statistics of foundation model features change in response to the policy behavior in a task agnostic way, and can be trained from offline data, allowing them to be used in offline policy selection. This work provides a key piece of a recipe for fusing together three modern lines of research: Offline policy evaluation as a counterpart to offline RL, foundation models as generic and powerful state representations, and efficient policy selection in resource constrained environments.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2023
- DOI:
- 10.48550/arXiv.2306.09800
- arXiv:
- arXiv:2306.09800
- Bibcode:
- 2023arXiv230609800S
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Robotics
- E-Print:
- Accepted paper at ICLR2024