Data-Based Efficient Off-Policy Stabilizing Optimal Control Algorithms for Discrete-Time Linear Systems via Damping Coefficients
Abstract
Policy iteration is one of the classical frameworks of reinforcement learning, which requires a known initial stabilizing control. However, finding the initial stabilizing control depends on the known system model. To relax this requirement and achieve model-free optimal control, in this paper, two different reinforcement learning algorithms based on policy iteration and variable damping coefficients are designed for unknown discrete-time linear systems. First, a stable artificial system is designed, and this system is gradually iterated to the original system by varying the damping coefficients. This allows the initial stabilizing control to be obtained in a finite number of iteration steps. Then, an off-policy iteration algorithm and an off-policy $\mathcal{Q}$-learning algorithm are designed to select the appropriate damping coefficients and realize data-driven. In these two algorithms, the current estimates of optimal control gain are not applied to the system to re-collect data. Moreover, they are characterized by the fast convergence of the traditional policy iteration. Finally, the proposed algorithms are validated by simulation.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.20845
- Bibcode:
- 2024arXiv241220845L
- Keywords:
-
- Electrical Engineering and Systems Science - Systems and Control