Loading [MathJax]/jax/output/CommonHTML/jax.js

Optimal Dynamic Regret in LQR Control

Part of Advances in Neural Information Processing Systems 35 (NeurIPS 2022) Main Conference Track

Bibtex Paper Supplemental

Authors

Dheeraj Baby, Yu-Xiang Wang

Abstract

We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control. We provide an efficient online algorithm that achieves an optimal dynamic (policy) regret of ˜O(n1/3TV(M2/31:n1), where TV(M1:n) is the total variation of any oracle sequence of \emph{Disturbance Action} policies parameterized by M1,...,Mn --- chosen in hindsight to cater to unknown nonstationarity. The rate improves the best known rate of ˜O(n(TV(M1:n)+1)) for general convex losses and is information-theoretically optimal for LQR. Main technical components include the reduction of LQR to online linear regression with delayed feedback due to Foster & Simchowitz 2020, as well as a new \emph{proper} learning algorithm with an optimal ˜O(n1/3) dynamic regret on a family of "minibatched'' quadratic losses, which could be of independent interest.