Learning the Linear Quadratic Regulator from Nonlinear Observations
Abstract
We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR. In our setting, the environment is summarized by a lowdimensional continuous latent state with linear dynamics and quadratic costs, but the agent operates on highdimensional, nonlinear observations such as images from a camera. To enable sampleefficient learning, we assume that the learner has access to a class of decoder functions (e.g., neural networks) that is flexible enough to capture the mapping from observations to latent states. We introduce a new algorithm, RichID, which learns a nearoptimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class. RichID is oracleefficient and accesses the decoder class only through calls to a leastsquares regression oracle. Our results constitute the first provable sample complexity guarantee for continuous control with an unknown nonlinearity in the system model and general function approximation.
 Publication:

arXiv eprints
 Pub Date:
 October 2020
 arXiv:
 arXiv:2010.03799
 Bibcode:
 2020arXiv201003799M
 Keywords:

 Computer Science  Machine Learning;
 Mathematics  Optimization and Control;
 Mathematics  Statistics Theory;
 Statistics  Machine Learning
 EPrint:
 To appear at NeurIPS 2020