Batch Active Learning of Reward Functions from Human Preferences
Abstract
Data generation and labeling are often expensive in robot learning. Preference-based learning is a concept that enables reliable labeling by querying users with preference questions. Active querying methods are commonly employed in preference-based learning to generate more informative data at the expense of parallelization and computation time. In this paper, we develop a set of novel algorithms, batch active preference-based learning methods, that enable efficient learning of reward functions using as few data samples as possible while still having short query generation times and also retaining parallelizability. We introduce a method based on determinantal point processes (DPP) for active batch generation and several heuristic-based alternatives. Finally, we present our experimental results for a variety of robotics tasks in simulation. Our results suggest that our batch active learning algorithm requires only a few queries that are computed in a short amount of time. We showcase one of our algorithms in a study to learn human users' preferences.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2024
- DOI:
- 10.48550/arXiv.2402.15757
- arXiv:
- arXiv:2402.15757
- Bibcode:
- 2024arXiv240215757B
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence;
- Computer Science - Robotics;
- Statistics - Machine Learning
- E-Print:
- To appear in ACM Transactions on Human-Robot Interaction (THRI). 27 pages, 12 figures, 2 tables. arXiv admin note: text overlap with arXiv:1810.04303