Physics-model-guided Worst-case Sampling for Safe Reinforcement Learning
Abstract
Real-world accidents in learning-enabled CPS frequently occur in challenging corner cases. During the training of deep reinforcement learning (DRL) policy, the standard setup for training conditions is either fixed at a single initial condition or uniformly sampled from the admissible state space. This setup often overlooks the challenging but safety-critical corner cases. To bridge this gap, this paper proposes a physics-model-guided worst-case sampling strategy for training safe policies that can handle safety-critical cases toward guaranteed safety. Furthermore, we integrate the proposed worst-case sampling strategy into the physics-regulated deep reinforcement learning (Phy-DRL) framework to build a more data-efficient and safe learning algorithm for safety-critical CPS. We validate the proposed training strategy with Phy-DRL through extensive experiments on a simulated cart-pole system, a 2D quadrotor, a simulated and a real quadruped robot, showing remarkably improved sampling efficiency to learn more robust safe policies.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.13224
- Bibcode:
- 2024arXiv241213224C
- Keywords:
-
- Computer Science - Robotics;
- Computer Science - Artificial Intelligence;
- Computer Science - Machine Learning
- E-Print:
- under review