Online estimation and control with optimal pathlength regret
Abstract
A natural goal when designing online learning algorithms for nonstationary environments is to bound the regret of the algorithm in terms of the temporal variation of the input sequence. Intuitively, when the variation is small, it should be easier for the algorithm to achieve low regret, since past observations are predictive of future inputs. Such datadependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including OCO and bandits. We obtain the first pathlength regret bounds for online control and estimation (e.g. Kalman filtering) in linear dynamical systems. The key idea in our derivation is to reduce pathlengthoptimal filtering and control to certain variational problems in robust estimation and control; these reductions may be of independent interest. Numerical simulations confirm that our pathlengthoptimal algorithms outperform traditional $H_2$ and $H_{\infty}$ algorithms when the environment varies over time.
 Publication:

arXiv eprints
 Pub Date:
 October 2021
 arXiv:
 arXiv:2110.12544
 Bibcode:
 2021arXiv211012544G
 Keywords:

 Computer Science  Machine Learning;
 Mathematics  Optimization and Control