Configuration Path Control
Abstract
Reinforcement learning methods often produce brittle policies -- policies that perform well during training, but generalize poorly beyond their direct training experience, thus becoming unstable under small disturbances. To address this issue, we propose a method for stabilizing a control policy in the space of configuration paths. It is applied post-training and relies purely on the data produced during training, as well as on an instantaneous control-matrix estimation. The approach is evaluated empirically on a planar bipedal walker subjected to a variety of perturbations. The control policies obtained via reinforcement learning are compared against their stabilized counterparts. Across different experiments, we find two- to four-fold increase in stability, when measured in terms of the perturbation amplitudes. We also provide a zero-dynamics interpretation of our approach.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2022
- DOI:
- 10.48550/arXiv.2204.02471
- arXiv:
- arXiv:2204.02471
- Bibcode:
- 2022arXiv220402471P
- Keywords:
-
- Computer Science - Robotics;
- Computer Science - Machine Learning;
- Electrical Engineering and Systems Science - Systems and Control
- E-Print:
- 12 pages, 3 figures, accepted for publication