Robust exploration in linear quadratic reinforcement learning

doi:10.48550/arXiv.1906.01584

Robust exploration in linear quadratic reinforcement learning

This paper concerns the problem of learning control policies for an unknown linear dynamical system to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task robustly: i.e., we minimize the worst-case cost, accounting for system uncertainty given the observed data. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.

Publication:

arXiv e-prints

Pub Date:

June 2019

DOI:

10.48550/arXiv.1906.01584

arXiv:

arXiv:1906.01584

Bibcode:

2019arXiv190601584U

Keywords:

Mathematics - Optimization and Control;
Computer Science - Machine Learning;
Statistics - Machine Learning

NASA/ADS

Robust exploration in linear quadratic reinforcement learning

Abstract