Improper Learning for NonStochastic Control
Abstract
We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states, known as nonstochastic control. We introduce a controller parametrization based on the denoised observations, and prove that applying online gradient descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closedloop policies. In the fullyadversarial setting, our controller attains an optimal regret bound of $\sqrt{T}$when the system is known, and, when combined with an initial stage of leastsquares estimation, $T^{2/3}$ when the system is unknown; both yield the first sublinear regret for the partially observed setting. Our bounds are the first in the nonstochastic control setting that compete with \emph{all} stabilizing linear dynamical controllers, not just state feedback. Moreover, in the presence of semiadversarial noise containing both stochastic and adversarial components, our controller attains the optimal regret bounds of $\mathrm{poly}(\log T)$ when the system is known, and $\sqrt{T}$ when unknown. To our knowledge, this gives the first endtoend $\sqrt{T}$ regret for online Linear Quadratic Gaussian controller, and applies in a more general setting with adversarial losses and semiadversarial noise.
 Publication:

arXiv eprints
 Pub Date:
 January 2020
 arXiv:
 arXiv:2001.09254
 Bibcode:
 2020arXiv200109254S
 Keywords:

 Computer Science  Machine Learning;
 Mathematics  Optimization and Control;
 Statistics  Machine Learning