{"title": "Online Optimal Control with Linear Dynamics and Predictions: Algorithms and Regret Analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 14887, "page_last": 14899, "abstract": "This paper studies the online optimal control problem with time-varying convex stage costs for a time-invariant linear dynamical system, where a finite lookahead window of accurate predictions of the stage costs are available at each time. We design online algorithms, Receding Horizon Gradient-based Control (RHGC), that utilize the predictions through finite steps of gradient computations. We study the algorithm performance measured by dynamic regret: the online performance minus the optimal performance in hindsight. It is shown that the dynamic regret of RHGC decays exponentially with the size of the lookahead window. In addition, we provide a fundamental limit of the dynamic regret for any online algorithms by considering linear quadratic tracking problems. The regret upper bound of one RHGC method almost reaches the fundamental limit, demonstrating the effectiveness of the algorithm. Finally, we numerically test our algorithms for both linear and nonlinear systems to show the effectiveness and generality of our RHGC.", "full_text": "Online Optimal Control with Linear Dynamics and\n\nPredictions: Algorithms and Regret Analysis\n\nYingying Li\n\nSEAS\n\nXin Chen\n\nSEAS\n\nHarvard University\n\nCambridge, MA, 02138\n\nHarvard University\n\nCambridge, MA, 02138\n\nyingyingli@g.harvard.edu\n\nchen_xin@g.harvard.edu\n\nNa Li\nSEAS\n\nHarvard University\n\nCambridge, MA, 02138\n\nnali@seas.harvard.edu\n\nAbstract\n\nThis paper studies the online optimal control problem with time-varying convex\nstage costs for a time-invariant linear dynamical system, where a \ufb01nite lookahead\nwindow of accurate predictions of the stage costs are available at each time. We\ndesign online algorithms, Receding Horizon Gradient-based Control (RHGC), that\nutilize the predictions through \ufb01nite steps of gradient computations. We study\nthe algorithm performance measured by dynamic regret: the online performance\nminus the optimal performance in hindsight. It is shown that the dynamic regret of\nRHGC decays exponentially with the size of the lookahead window. In addition,\nwe provide a fundamental limit of the dynamic regret for any online algorithms\nby considering linear quadratic tracking problems. The regret upper bound of one\nRHGC method almost reaches the fundamental limit, demonstrating the effective-\nness of the algorithm. Finally, we numerically test our algorithms for both linear\nand nonlinear systems to show the effectiveness and generality of our RHGC.\n\n1\n\nIntroduction\n\nIn this paper, we consider a N-horizon discrete-time sequential decision-making problem. At each\ntime t = 0, . . . , N \u2212 1, the decision maker observes a state xt of a dynamical system, receives\na W -step lookahead window of future cost functions of states and control actions, i.e. ft(x) +\ngt(u), . . . , ft+W\u22121(x) + gt+W\u22121(u), then decides the control input ut which drives the system to a\nnew state xt+1 following some known dynamics. For simplicity, we consider a linear time-invariant\n(LTI) system xt+1 = Axt + But with (A, B) known in advance. The goal is to minimize the overall\ncost over the N time steps. This problem enjoys many applications in, e.g. data center management\n[1, 2], robotics [3], autonomous driving [4, 5], energy systems [6], manufacturing [7, 8]. Hence, there\nhas been a growing interest on the problem, from both control and online optimization communities.\nIn the control community, studies on the above problem focus on economic model predictive control\n(EMPC), which is a variant of model predictive control (MPC) with a primary goal on optimizing\neconomic costs [9, 10, 11, 12, 13, 14, 15, 16]. Recent years have seen a lot of attention on the\noptimality performance analysis of EMPC, under both time-invariant costs [17, 18, 19] and time-\nvarying costs [20, 12, 14, 21, 22]. However, most studies focus on asymptotic performance and there\nis still limited understanding on the non-asymptotic performance, especially under time-varying\ncosts. Moreover, for computationally ef\ufb01cient algorithms, e.g. suboptimal MPC and inexact MPC\n[23, 24, 25, 26], there is limited work on the optimality performance guarantee.\nIn online optimization, on the contrary, there are many papers on the non-asymptotic performance\nanalysis, where the performance is usually measured by regret, e.g., static regrets[27, 28], dynamic\nregrets[29], etc., but most work does not consider predictions and/or dynamical systems. Further,\nmotivated by the applications with predictions, e.g. predictions of electricity prices in data center\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fmanagement problems [30, 31], there is a growing interest on the effect of predictions on the online\nproblems [32, 33, 30, 34, 31, 35, 36]. However, though some papers consider switching costs which\ncan be viewed as a simple and special dynamical model [37, 36], there is a lack of study on the\ngeneral dynamical systems and on how predictions affect the online problem with dynamical systems.\nIn this paper, we propose novel gradient-based online control algorithms, receding horizon gradient-\nbased control (RHGC), and provide nonasymptotic optimality guarantees by dynamic regrets. RHGC\ncan be based on many gradient methods, e.g. vanilla gradient descent, Nesterov\u2019s accelerated gradient,\ntriple momentum, etc., [38, 39]. Due to the space limit, this paper only presents receding horizon\ngradient descent (RHGD) and receding horizon triple momentum (RHTM). For the theoretical\nanalysis, we assume strongly convex and smooth cost functions, whereas applying RHGC does not\nrequire these conditions. Speci\ufb01cally, we show that the regret bounds of RHGD and RHTM decay\nexponentially with the prediction window\u2019s size W , demonstrating that our algorithms ef\ufb01ciently\nutilize the prediction. Besides, our regret bounds decrease when the system is more \u201cagile\u201d in the\nsense of a controllability index [40]. Further, we provide a fundamental limit for any online control\nalgorithms and show that the fundamental lower bound almost matches the regret upper bound of\nRHTM. This indicates that RHTM achieves near-optimal performance at least in the worst case. We\nalso provide some discussion on the classic linear quadratic tracking problems, a widely studied\ncontrol problem in literature, to provide more insightful interpretations of our results. Finally, we\nnumerically test our algorithms. In addition to linear systems, we also apply RHGC to a nonlinear\ndynamical system: path tracking by a two-wheeled robot. Results show that RHGC works effectively\nfor nonlinear systems though RHGC is only presented and theoretical analyzed on LTI systems.\nResults in this paper are built on a paper on online optimization with switching costs [36]. Compared\nwith [36], this paper studies online optimal control with general linear dynamics, which includes\n[36] as a special case; and studies how the system controllability index affects the regrets.\nThere has been some recent work on online optimal control problems with time-varying costs\n[41, 42, 37, 43] and/or time-varying disturbances [43], but most papers focus on the no-prediction\ncases. As we show later in this paper, these algorithms can be used in our RHGC methods as\ninitialization oracles. Moreover, our regret analysis shows that RHGC can reduce the regret of these\nno-prediction online algorithms by a factor exponentially decaying with the prediction window\u2019s size.\nFinally, we would like to mention another related line of work: learning-based control [44, 45, 46,\n47, 48]. In some sense, the results in this paper are orthogonal to that of the learning-based control,\nbecause the learning-based control usually considers a time-invariant environment but unknown\ndynamics, and aims to learn system dynamics or optimal controllers by data; while this paper\nconsiders a time-varying scenario with known dynamics but changing objectives and studies decision\nmaking with limited predictions. It is an interesting future direction to combine the two lines of work\nfor designing more applicable algorithms.\nNotations. Consider matrices A and B, A \u2265 B means A \u2212 B is positive semide\ufb01nite and [A, B]\ndenotes a block matrix. The norm (cid:107) \u00b7 (cid:107) refers to the L2 norm for both vectors and matrices. Let xi\ndenote the ith entry of the vector. Consider a set I = {k1, . . . , km}, then xI = (xk1, . . . , xkm )(cid:62),\nand A(I, :) denotes the I rows of matrix A stacked together. Let Im be an identity matrix in Rm\u00d7m.\n\n2 Problem formulation and preliminaries\n\nConsider a \ufb01nite-horizon discrete-time optimal control problem with time-varying cost functions\nft(xt) + gt(ut) and a linear time-invariant (LTI) dynamical system:\n\nN\u22121(cid:88)\n\nt=0\n\nmin\nx,u\n\nJ(x, u) =\n\n[ft(xt) + gt(ut)] + fN (xN )\n\n(1)\n\ns.t. xt+1 = Axt + But,\n\nt \u2265 0\nN )(cid:62), u = (u(cid:62)\n0 , . . . , u(cid:62)\n\nwhere xt \u2208 Rn, ut \u2208 Rm, x = (x(cid:62)\nN\u22121)(cid:62), x0 is given, fN (xN ) is the\nterminal cost.1 To solve the optimal control problem (1), all cost functions from t = 0 to t = N\nare needed. However, at each time t, usually only a \ufb01nite lookahead window of cost functions are\navailable and the decision maker needs to make an online decision ut using the available information.\n\n1 , . . . , x(cid:62)\n\n1The results in this paper can be extended to cost ct(xt, ut) with proper assumptions.\n\n2\n\n\fIn particular, we consider a simpli\ufb01ed prediction model: at each time t, the decision maker obtains\naccurate predictions for the next W time steps, ft, gt, . . . , ft+W\u22121, gt+W\u22121, but no further prediction\nbeyond these W steps, meaning that ft+W , gt+W , . . . can even be adversarially generated. Though\nthis prediction model may be too optimistic in the short term and over pessimistic in the long term,\nthis model i) captures a commonly observed phenomenon in predictions that short-term predictions\nare usually much more accurate than the long-term predictions; ii) allows researchers to derive\ninsights for the role of predictions and possibly to extend to more complicated cases [31, 30, 49, 50].\nThe online optimal control problem is described as follows: at each time step t = 0, 1, . . . ,\n\u2022 the agent observes state xt and receives prediction ft, gt, . . . , ft+W\u22121, gt+W\u22121;\n\u2022 the agent decides and implements a control ut and suffers the cost ft(xt) + gt(ut);\n\u2022 the system evolves to the next state xt+1 = Axt + But.2\n\n),\n\ns=0\n\n(2)\n\nAn online control algorithm, denoted as A, can be de\ufb01ned as a mapping from the prediction informa-\ntion and the history information to the control action, denoted by ut(A):\n\nut(A) = A(xt(A), . . . , x0(A),{fs, gs}t+W\u22121\n\nt \u2265 0,\nwhere xt(A) is the state generated by implementing A and x0(A) = x0 is given.\nThis paper evaluates the performance of online control algorithms by comparing against the optimal\ncontrol cost J\u2217 in hindsight, that is, J\u2217 := min{J(x, u) | xt+1 = Axt + But, \u2200 t \u2265 0}.\nIn this paper, the performance of an online algorithm A is measured by 3\n\nRegret(A) := J(A) \u2212 J\u2217 = J(x(A), u(A)) \u2212 J\u2217,\n\n(3)\nwhich is sometimes called as dynamic regret [29, 51] or competitive difference [52]. Another popular\nregret notion is the static regret, which compares the online performance with the optimal static\ncontroller/policy [42, 41]. The benchmark in static regret is weaker than that in dynamic regret\nbecause the optimal controller may be far from being static, and it has been shown in literature that\no(N ) static regret can be achieved even without predictions (i.e., W = 0). Thus, we will focus on the\ndynamic regret analysis and study how predictions can improve the dynamic regret.\nExample 1 (Linear quadratic (LQ) tracking). Consider a discrete-time tracking problem for a system\nxt+1 = Axt + But. The goal is to minimize the quadratic tracking loss of a trajectory {\u03b8t}N\n\nt=0\n\nJ(x, u) =\n\n1\n2\n\n(xt \u2212 \u03b8t)\n\n(cid:62)\n\nQt(xt \u2212 \u03b8t) + u\n\n(cid:62)\nt Rtut\n\n(xN \u2212 \u03b8N )\n\n(cid:62)\n\nQN (xN \u2212 \u03b8N ).\n\n+\n\n1\n2\n\nt=0 a priori, what are revealed\n\nIn practice, it is usually dif\ufb01cult to know the complete trajectory {\u03b8t}N\nare usually the next few steps, making it an online control problem with predictions.\nAssumptions and useful concepts. Firstly, we assume controllability, which is standard in control\ntheory and roughly means that the system can be steered to any state by proper control inputs [53].\nAssumption 1. The LTI system xt+1 = Axt + But is controllable.\nIt is well-known that any controllable LTI system can be linearly transformed to a canonical form\n[40] and the linear transformation can be computed ef\ufb01ciently a priori using A and B, which can\nfurther be used to reformulate the cost functions ft, gt. Thus, without loss of generality, this paper\nonly considers LTI systems in the canonical form, de\ufb01ned as follows.\nDe\ufb01nition 1 (Canonical form). A system xt+1 = Axt + But is said to be in the canonical form if\n\nN\u22121(cid:88)\n\n(cid:104)\n\nt=0\n\n(cid:105)\n\n\uf8f9\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fb\n\n\uf8ee\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8f0\n\n0\n\n...\n\n0 ...\n\n0\n1\n0\n\n...\n...\n0 \u00b7\u00b7\u00b7\n0\n...\n\u00b7\u00b7\u00b7\n1 \u00b7\u00b7\u00b7\n\u00b7\u00b7\u00b7\n\u00b7\u00b7\u00b7\n\n...\n0\n\u00b7\u00b7\u00b7\n0 \u00b7\u00b7\u00b7\n...\n...\n0 \u00b7\u00b7\u00b71\n\n0\n\n\uf8f9\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fb\n\n,\n\n\uf8ee\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8f0\n\n0\n\n1\n\n0\n\n0 1\n\n...\n\n...\n\n...\n\u2217 \u2217 \u00b7\u00b7\u00b7 \u2217 \u2217 \u2217 ... \u2217\n0\n...\n\n...\n\n0 1\n\n...\n\n\u00b7\u00b7\u00b7\n\n\u2217\n\n0 1\n\n\u00b7\u00b7\u00b7\n\n\u2217 \u2217 \u00b7\u00b7\u00b7 \u2217 \u2217 \u2217 \u00b7\u00b7\u00b7 \u2217 \u00b7\u00b7\u00b7 \u2217 \u00b7\u00b7\u00b7\n\u00b7\u00b7\u00b7\n\n\u2217\n0 1 \u00b7\u00b7\u00b7 0\n...\n...\n...\n0 1\n\u2217 \u2217 \u00b7\u00b7\u00b7 \u2217 \u2217 \u2217 \u00b7\u00b7\u00b7 \u2217 \u00b7\u00b7\u00b7 \u2217 \u2217 \u00b7\u00b7\u00b7 \u2217\n\nA =\n\n, B =\n\n2We assume known A, B, no process noises, state feedback, and leave relaxing assumptions as future work.\n3The optimality gap depends on initial state x0 and {ft, gt}N\nt=0, but we omit them for simplicity of notation.\n\n3\n\n\fwhere each * represents a (possibly) nonzero entry, and the rows of B with 1 are the same rows of A\nwith * and the indices of these rows are denoted as {k1, . . . , km} =: I. Moreover, let pi = ki \u2212 ki\u22121\nfor 1 \u2264 i \u2264 m, where k0 = 0. The controllability index of a canonical-form (A, B) is de\ufb01ned as\n\np = max{p1, . . . , pm}.\n\nNext, we introduce assumptions on the cost functions and their optimal solutions.\nAssumption 2. Assume ft is \u00b5f strongly convex and lf Lipschitz smooth for 0 \u2264 t \u2264 N, and gt is\nconvex and lg Lipschitz smooth for 0 \u2264 t \u2264 N \u2212 1 for some \u00b5f , lf , lg > 0.\nAssumption 3. Assume the minimizers to ft, gt, denoted as \u03b8t = arg minx ft(x), \u03bet =\narg minu gt(u), are uniformly bounded, i.e. there exist \u00af\u03b8, \u00af\u03be such that (cid:107)\u03b8t(cid:107) \u2264 \u00af\u03b8, (cid:107)\u03bet(cid:107) \u2264 \u00af\u03be, \u2200 t.\nThese assumptions are commonly adopted in convex analysis. The uniform bounds rule out extreme\ncases. Notice that the LQ tracking problem in Example 1 satis\ufb01es Assumption 2 and 3 if Qt, Rt are\npositive de\ufb01nite with uniform bounds on the eigenvalues and if \u03b8t are uniformly bounded for all t.\n\n3 Online control algorithms: receding horizon gradient-based control\n\nThis section introduces our online control algorithms, receding horizon gradient-based control\n(RHGC). The design is by \ufb01rst converting the online control problem to an equivalent online\noptimization problem with \ufb01nite temporal-coupling costs, then designing gradient-based online\noptimization algorithms by utilizing this \ufb01nite temporal-coupling property.\n\n3.1 Problem transformation\n\nFirstly, we notice that the of\ufb02ine optimal control problem (1) can be viewed as an optimization with\nequality constraints over x and u. The individual stage cost ft(xt) + gt(ut) only depends on the\ncurrent xt and ut but the equality constraints couple xt, ut with xt+1 for each t. In the following, we\nwill rewrite (1) in an equivalent form of an unconstrained optimization problem on some entries of\nxt for all t, but the new stage cost at each time t will depend on these new entries across a few nearby\ntime steps. We will harness this structure to design our online algorithm.\nIn particular, the entries of xt adopted in the reformulation are: xk1\n{k1, . . . , km} is de\ufb01ned in De\ufb01nition 1. For ease of notation, we de\ufb01ne\n\n, where I =\n\nt , . . . , xkm\n\nt\n\n(cid:124)\n\n(cid:123)(cid:122)\n\np1\n\n(cid:125)\n\n(cid:124)\n\n(cid:123)(cid:122)\n\np2\n\n(cid:125)\n\n)(cid:62),\nt where j = 1, . . . , m. Let z := (z(cid:62)\n\nt , . . . , xkm\n\nzt := (xk1\n\nt\n\nt = xkj\n\nand write zj\nequality constraint xt = Axt\u22121 + But\u22121, we have xi\nzt\u2212p+1, . . . , zt in the following way:\n, z2\n\nt\u2212p1+1, . . . , z1\n\nt\u2212p2+1, . . . , z2\n\nxt = (z1\n\nt\n\nt\n\n, . . . , zm\n\nt = xi+1\n\nt \u2265 0\n\n(4)\nN )(cid:62). By the canonical-form\n1 , . . . , . . . , z(cid:62)\nt\u22121 for i (cid:54)\u2208 I, so xt can be represented by\n(cid:124)\n\nt\u2212pm+1, . . . , zm\n\nt \u2265 0,\n\n(cid:123)(cid:122)\n\n)(cid:62),\n\n(cid:125)\n\n(5)\n\nt\n\npm\n\nwhere zt for t \u2264 0 is determined by x0 in a way to let (5) hold for t = 0. For ease of exposition and\nwithout loss of generality, we consider x0 = 0 in this paper; then we have zt = 0 for t \u2264 0. Similarly,\nut can be determined by zt\u2212p+1, . . . , zt, zt+1 by\nut = zt+1 \u2212 A(I, :)xt = zt+1 \u2212 A(I, :)(z1\nt \u2265 0 (6)\nwhere A(I, :) consists of k1, . . . , km rows of A.\nIt is straightforward to verify that equations (4, 5, 6) describe a bijective transformation between\n{(x, u) | xt+1 = Axt +But} and z \u2208 RmN , since the LTI constraint xt+1 = Axt +But is naturally\nembedded in the relation (5, 6). Therefore, based on the transformation, an optimization problem with\nrespect to z \u2208 RmN can be designed to be equivalent with (1). Notice that the resulting optimization\nproblem has no constraint on z. Moreover, the cost functions on z can be obtained by substituting (5,\n6) into ft(xt) and gt(ut), i.e. \u02dcft(zt\u2212p+1, . . . , zt) := ft(xt) and \u02dcgt(zt\u2212p+1, . . . , zt, zt+1) := gt(ut).\nCorrespondingly, the objective function of the equivalent optimization with respect to z is\n\nt\u2212pm+1, . . . , zm\n\nt\u2212p1+1, . . . , z1\n\nt , . . . , zm\n\nt )(cid:62),\n\nN(cid:88)\n\nN\u22121(cid:88)\n\nC(z) :=\n\n\u02dcft(zt\u2212p+1, . . . , zt) +\n\n\u02dcgt(zt\u2212p+1, . . . , zt+1)\n\n(7)\n\nC(z) has many nice properties, some of which are formally stated below.\n\nt=0\n\nt=0\n\n4\n\n\fLemma 1. The function C(z) has the following properties:\n\ni) C(z) is \u00b5c = \u00b5f strongly convex and lc smooth for lc = plf + (p + 1)lg(cid:107)[Im,\u2212A(I, :)](cid:107)2.\nii) For any (x, u) s.t. xt+1 = Axt+But, C(z) = J(x, u) where z is de\ufb01ned in (4). Conversely,\n\n\u2200 z, the (x, u) determined by (5,6) satis\ufb01es xt+1 = Axt + But and J(x, u) = C(z);\n\niii) Each stage cost \u02dcft + \u02dcgt in (7) only depends on zt\u2212p+1, . . . , zt+1.\n\nProperty ii) implies that any online algorithm for deciding z can be translated to an online algorithm\nfor x and u by (5, 6) with the same costs. Property iii) highlights one nice property, \ufb01nite temporal-\ncoupling, of C(z), which serves as a foundation for our online algorithm design.\nExample 2. For illustration, consider the following dynamical system with n = 2, m = 1:\n\n(cid:20) x1\n\n(cid:21)\n\n(cid:20) 0\n\n+\n\n=\n\nt+1\n\nt+1\n\nut\n\na1\n\nx2\n\nt = x2\n\n(8)\nt\u22121 and xt = (zt\u22121, zt)(cid:62).\nt+1 \u2212 A(I, :)xt = zt+1 \u2212 A(I, :)(zt\u22121, zt)(cid:62). Hence, \u02dcft(zt\u22121, zt) = ft(xt) =\n\nt\nx2\nt\nHere, k1 = 2, I = {2}, A(I, :) = (a1, a2), and zt = x2\nt . By (8), x1\nSimilarly, ut = x2\nft((zt\u22121, zt)(cid:62)), \u02dcgt(zt\u22121, zt, zt+1) = gt(ut) = gt(zt+1 \u2212 A(I, :)(zt\u22121, zt)(cid:62)).\nRemark 1. This paper considers a reparameterization method with respect to states x via the canonical\nform, and it might be interesting to compare it with the more direct reparameterization with respect\nto control inputs u. The control-based reparameterization has been discussed in literature [54]. It\nhas been observed in [54] that when A is not stable, the condition number of the cost function\nderived from the control-based reparameterization goes to in\ufb01nity as W \u2192 +\u221e, which may result in\ncomputation issues when W is large. However, the state-based reparameterization considered in this\npaper can guarantee bounded condition number for all W even for unstable A, as shown in Lemma 1.\nThis is one major advantage of the state-based reparameterization method considered in this paper.\n\n(cid:21)(cid:20) x1\n\n(cid:21)\n\n1\na2\n\n(cid:21)\n\n(cid:20) 0\n\n1\n\n3.2 Online algorithm design: RHGC\n\nThis section introduces our RHGC based on the reformulation (7) and inspired by [36]. As mentioned\nearlier, any online algorithm for zt can be translated to an online algorithm for xt, ut. Hence, we\nwill focus on designing an online algorithm for zt in the following. By the \ufb01nite temporal-coupling\nproperty of C(z), the partial gradient of the total cost C(z) only depends on the \ufb01nite neighboring\nstage costs { \u02dcf\u03c4 , \u02dcg\u03c4}t+p\u22121\n\nand \ufb01nite neighboring stage variables (zt\u2212p, . . . , zt+p) =: zt\u2212p:t+p.\n\n\u03c4 =t\n\nt+p\u22121(cid:88)\n\n\u03c4 =t\n\nt+p\u22121(cid:88)\n\n\u03c4 =t\u22121\n\n\u2202C\n\u2202zt\n\n(z) =\n\n\u2202 \u02dcf\u03c4\n\u2202zt\n\n(z\u03c4\u2212p+1, . . . , z\u03c4 ) +\n\n\u2202\u02dcg\u03c4\n\u2202zt\n\n(z\u03c4\u2212p+1, . . . , z\u03c4 +1)\n\nWithout causing any confusion, we use \u2202C\n(z) for highlighting the local\n\u2202zt\ndependence. Thanks to the local dependence, despite the fact that not all the future costs are available,\nit is still possible to compute the partial gradient of the total cost by using only a \ufb01nite lookahead\nwindow of the cost functions. This observation motivates the design of our receding horizon gradient-\nbased control (RHGC) methods, which are the online implementation of gradient methods, such as\nvanilla gradient descent, Nesterov\u2019s accelerated gradient, triple momentum, etc., [38, 39].\n\n(zt\u2212p:t+p) to denote \u2202C\n\u2202zt\n\np (cid:99), stepsize \u03b3g, initialization oracle \u03d5.\n\nAlgorithm 1: Receding Horizon Gradient Descent (RHGD)\n1: inputs: Canonical form (A, B), W \u2265 1, K = (cid:98) W\u22121\n2: for t = 1 \u2212 W : N \u2212 1 do\n3:\n4:\n5:\n\nStep 1: initialize zt+W (0) by oracle \u03d5.\nfor j = 1, . . . , K do\n\nStep 2: update zt+W\u2212jp(j) by gradient descent\nzt+W\u2212jp(j) = zt+W\u2212jp(j \u2212 1) \u2212 \u03b3g\n\n(zt+W\u2212(j+1)p:t+W\u2212(j\u22121)p(j \u2212 1)).\nend for\nStep 3: compute ut by zt+1(K) and the observed state xt: ut = zt+1(K) \u2212 A(I, :)xt\n\n\u2202zt+W \u2212jp\n\n\u2202C\n\n6:\n7:\n8: end for\n\nFirstly, we illustrate the main idea of RHGC by receding horizon gradient descent (RHGD) based\non vanilla gradient descent. In RHGD (Algorithm 1), index j refers to the iteration number of the\n\n5\n\n\fs=0\n\ncorresponding gradient update of C(z). There are two major steps to decide zt. Step 1 is initializing\nthe decision variables z(0). Here, we do not restrict the initialization algorithm \u03d5 and allow any\noracle/online algorithm without using lookahead information, i.e. zt+W (0) is selected based only on\nthe information up to t + W \u2212 1: zt+W (0) = \u03d5({ \u02dcfs, \u02dcgs}t+W\u22121\n). One example of \u03d5 will be provided\nin Section 4. Step 2 is using the W -lookahead costs to conduct gradient updates. Notice that the\ngradient update from z\u03c4 (j\u22121) to z\u03c4 (j) is implemented in a backward order of \u03c4, i.e. from \u03c4 = t+W\nrequires the local decision variables zt\u2212p:t+p, given\nto \u03c4 = t. Moreover, since the partial gradient \u2202C\n\u2202zt\nW -lookahead information, RHGD can only conduct K = (cid:98) W\u22121\np (cid:99) iterations of gradient descent for\nthe total cost C(z). For more discussion, we refer the reader to [36] for the p = 1 case.\nIn addition to RHGD, RHGC can also incorporate accelerated gradient methods in the same way, such\nas Nesterov\u2019s accelerated gradient and triple momentum. For the space limit, we only formally present\nreceding horizon triple momentum (RHTM) in Algorithm 2 based on triple momentum [39]. RHTM\nalso consists of two major steps when determining zt: initialization and gradient updates based\non the lookahead window. The two major differences from RHGD are that the decision variables\nin RHTM include not only z(j) but also auxiliary variables \u03c9(j) and y(j), which are adopted in\ntriple momentum to accelerate the convergence, and that the gradient update is by triple momentum\np (cid:99) iterations of triple\ninstead of gradient descent. Nevertheless, RHTM can also conduct K = (cid:98) W\u22121\nmomentum for C(z) since the triple momentum update requires the same neighboring cost functions.\nThough it appears that RHTM does not fully exploit the lookahead information since only a few\ngradient updates are used, in Section 5, we show that RHTM achieves near-optimal performance with\nrespect to W , which means that RHTM successfully extracts and utilizes the prediction information.\nFinally, we brie\ufb02y introduce MPC[55] and suboptimal MPC[23], and compare them with our\nalgorithms. MPC tries to solve a W -stage optimization at each t and implements the \ufb01rst control input.\nSuboptimal MPC, as a variant of MPC aiming at reducing computation, conducts an optimization\nmethod only for a few iterations without solving the optimization completely. Our algorithm\u2019s\ncomputation time is similar to that of suboptimal MPC with a few gradient iterations. However,\nthe major difference between our algorithm and suboptimal MPC is that suboptimal MPC conducts\ngradient updates for a truncated W -stage optimal control problem based on W -lookahead information,\nwhile our algorithm is able to conduct gradient updates for the complete N-stage optimal control\nproblem based on the same W -lookahead information by utilizing the reformulation (4, 5, 6, 7).\n\n4 Regret upper bounds\n\nBecause our RHTM (RHGD) is designed to exactly implement the triple momentum (gradient\ndescent) of C(z) for K iterations, it is straightforward to have the following regret guarantees that\nconnect the regrets of RHTM and RHGD with the regret of the initialization oracle \u03d5,\n\nAlgorithm 2: Receding Horizon Triple Momentum (RHTM)\n\ninputs: Canonical form (A, B), W \u2265 1, K = (cid:98) W\u22121\nfor t = 1 \u2212 W : N \u2212 1 do\n\np (cid:99), stepsizes \u03b3c, \u03b3z, \u03b3\u03c9, \u03b3y > 0, oracle \u03d5.\nStep 1: initialize zt+W (0) by oracle \u03d5, then let \u03c9t+W (\u22121), \u03c9t+W (0), yt+W (0) be zt+W (0)\nfor j = 1, . . . , K do\n\nStep 2: update \u03c9t+W\u2212jp(j), yt+W\u2212jp(j), zt+W\u2212jp(j) by triple momentum.\n\u03c9t+W\u2212jp(j) = (1 + \u03b3\u03c9)\u03c9t+W\u2212jp(j \u2212 1) \u2212 \u03b3\u03c9\u03c9t+W\u2212jp(j \u2212 2)\n\n\u2212 \u03b3c\n\n(yt+W\u2212(j+1)p:t+W\u2212(j\u22121)p(j \u2212 1))\n\n\u2202C\n\n\u2202yt+W\u2212jp\n\nyt+W\u2212jp(j) = (1 + \u03b3y)\u03c9t+W\u2212jp(j) \u2212 \u03b3y\u03c9t+W\u2212jp(j \u2212 1)\nzt+W\u2212jp(j) = (1 + \u03b3z)\u03c9t+W\u2212jp(j) \u2212 \u03b3z\u03c9t+W\u2212jp(j \u2212 1)\n\nend for\nStep 3: compute ut by zt+1(K) and the observed state xt: ut = zt+1(K) \u2212 A(I, :)xt\n\nend for\n\n6\n\n\f(cid:18)\u221a\n\n\u03b6 \u2212 1\u221a\n\n(cid:19)2K\n\nTheorem 1. Consider W \u2265 1 and stepsizes \u03b3g = 1\n\u03b3z = \u03c62\n\n(1+\u03c6)(2\u2212\u03c6) ,\n\u03b6, and let \u03b6 = lc/\u00b5c denote C(z)\u2019s condition number. For any oracle \u03d5,\n\n, \u03b3c = 1+\u03c6\nlc\n\n2\u2212\u03c6 , \u03b3y =\n\n, \u03b3\u03c9 = \u03c62\n\n\u221a\n\n\u03c62\n\nlc\n\n(cid:18) \u03b6 \u2212 1\n\n(cid:19)K\n\n1\u2212\u03c62 , \u03c6 = 1 \u2212 1/\nRegret(RHGD) \u2264 \u03b6\nwhere K = (cid:98) W\u22121\n\n\u03b6\n\n\u03b6\n\nRegret(\u03d5), Regret(RHT M ) \u2264 \u03b6 2\n\n\u03b6 )K and \u03b6 2(\n\np (cid:99) gradient updates. Moreover, the factors decay exponentially with K = (cid:98) W\u22121\n\nRegret(\u03d5)\np (cid:99), Regret(\u03d5) is the regret of the initial controller: ut(0) = zt+1(0)\u2212 A(I, :)xt(0).\nTheorem 1 suggests that for any online algorithm \u03d5 without predictions, RHGD and RHTM can use\n\u221a\n\u03b6\u22121\u221a\npredictions to lower the regret by a factor of \u03b6( \u03b6\u22121\n\u03b6 )2K respectively via additional\nK = (cid:98) W\u22121\np (cid:99), and K\nalmost linearly increases with W . This indicates that RHGD and RHTM improve the performance\nexponentially fast with an increase in the prediction window W for any initialization method. In\naddition, K = (cid:98) W\u22121\np (cid:99) decreases with p, implying that the regrets increase with the controllability\nindex p (De\ufb01nition 1). This is intuitive because p roughly indicates how fast the controller can\nin\ufb02uence the system state effectively: the larger the p is, the longer it takes. To see this, consider\nExample 2. Since ut\u22121 does not directly affect x1\nt to a\ndesirable value. Finally, RHTM\u2019s regret decays faster than RHGD\u2019s, which is intuitive because triple\nmomentum converges faster than gradient descent. Thus, we will focus on RHTM in the following.\nAn initialization method: follow the optimal steady state (FOSS). To complete the regret analysis\nfor RHTM, we provide a simple initialization method, FOSS, and its dynamic regret bound. As\nmentioned before, any online control algorithm without predictions, e.g. [42, 41], can be applied as\nan initialization oracle \u03d5. However, most literature study static regrets rather than dynamic regrets.\nDe\ufb01nition 2 (Follow the optimal steady state (FOSS)). The optimal steady state for stage cost\nf (x) + g(u) refers to (xe, ue) := arg minx=Ax+Bu(f (x) + g(u)).\nFollow the optimal steady state algorithm (FOSS) \ufb01rst solves the optimal steady state (xe\ncost ft(x) + gt(u), then determines zt+1 by xe\n\nt , it takes at least p = 2 steps to change x1\n\nt , ue\n)(cid:62) at each t + 1.\n\nt , i.e. zt+1 = (xe,k1\n\n, . . . , xe,km\n\nt ) for\n\nt\n\nt\n\n(cid:80)N\u22121\n\nFOSS is motivated by the fact that the optimal steady state cost is the optimal in\ufb01nite-horizon\naverage cost for LTI systems with time-invariant cost functions [56], so FOSS should yield acceptable\nperformance at least for slowly changing cost functions. Nevertheless, we admit that FOSS is\nproposed mainly for analytical purposes and other online algorithms may outperform FOSS in various\nperspectives. The following is a regret bound for FOSS, relying on the solution to Bellman equations.\nDe\ufb01nition 3 (Solution to the Bellman equations [57]). Consider optimal control problem:\nt=0 (f (xt) + g(ut)) where xt+1 = Axt + But. Let \u03bbe be the optimal steady\nmin limN\u2192+\u221e 1\nN\nstate cost f (xe) + g(ue), which is also the optimal in\ufb01nite-horizon average cost [56]. The Bellman\nequations for the problem is he(x) + \u03bbe = minu(f (x) + g(u) + he(Ax + Bu)). The solution to\nthe Bellman equations, denoted by he(x), is sometimes called as a bias function [57]. To ensure the\nuniqueness of the solution, some extra conditions, e.g. he(0) = 0, are usually imposed.\nTheorem 2 (Regret bound of FOSS). Let (xe\nthe bias function with respect to cost ft(x) + gt(u) respectively for 0 \u2264 t \u2264 N \u2212 1. Suppose he\nexists for 0 \u2264 t \u2264 N \u2212 1,4 then the regret of FOSS can be bounded by\n\nt (x) denote the optimal steady state and\nt (x)\n\nt ) and he\n\nt , ue\n\n((cid:107)xe\n\nt (x\u2217\nt ))\n\nt\u22121(x\u2217\n\nt(cid:107) + he\n\nt\u22121 \u2212 xe\n\nRegret(FOSS) = O\n\nN (x) = fN (x), xe\n\nt ) \u2212 he\nt}N\nt=0 denotes the optimal state trajectory for (1), xe\u22121 = x\u2217\n\nwhere {x\u2217\n0, he\nization FOSS is Regret(RHTM) = O\n\n0 = x0 = 0, he\u22121(x) =\nN = \u03b8N . Consequently, by Theorem 1, the regret bound of RHTM with initial-\n.\n\nt(cid:107) + he\nTheorem 2 bounds the regret by the variation of the optimal steady states xe\nt and the bias functions\n0(cid:107) +\nt . If ft and gt do not change, xe\nhe\nt exists, the existence is guaranteed\n0(x0)), matching our intuition. Though Theorem 2 requires he\nhe\nfor many control problems, e.g. LQ tracking and control problems with turnpike properties [58, 22].\n\nt do not change, yielding a small O(1) regret, i.e. O((cid:107)xe\n\n\u03b6 )2K(cid:80)N\n\nt\u22121 \u2212 xe\n\nt=0((cid:107)xe\n\nt ) \u2212 he\n\nt\u22121(x\u2217\n\nt (x\u2217\nt ))\n\nt and he\n\n\u221a\n\u03b6\u22121\u221a\n\n(cid:17)\n\n\u03b6 2(\n\n,\n\n(cid:32) N(cid:88)\n(cid:16)\n\nt=0\n\n(cid:33)\n\n4he\n\nt may not be unique, so extra conditions can be imposed on he\n\nt for more interesting regret bounds.\n\n7\n\n\f5 Linear quadratic tracking: regret upper bounds and a fundamental limit\n\nTo provide more intuitive meaning for our regret analysis in Theorem 1 and Theorem 2, we apply\nRHTM to the LQ tracking problem in Example 1. Results for the time varying Qt, Rt, \u03b8t are provided\nin Appendix E; whereas here we focus on a special case which gives clean expressions for regret\nbounds: both an upper bound for RHTM with initialization FOSS and a lower bound for any online\nalgorithm. Further, we show that the lower bound and the upper bound almost match each other,\nimplying that our online algorithm RHTM uses the predictions in a nearly optimal way even though\nit only conducts a few gradient updates at each time step .\nThe special case of LQ tracking problems is in the following form,\n\nN\u22121(cid:88)\n\nt=0\n\n1\n2\n\n(cid:2)(xt \u2212 \u03b8t)(cid:62)Q(xt \u2212 \u03b8t) + u(cid:62)\n\nt Rut\n\n(cid:3) +\n\nx(cid:62)\nN P exN ,\n\n1\n2\n\n(9)\n\nwhere Q > 0, R > 0, and P e is the solution to the algebraic Riccati equation with respect to Q, R\n[59]. Basically, in this special case, Qt = Q, Rt = R for 0 \u2264 t \u2264 N \u2212 1, QN = P e, \u03b8N = 0, and\nonly \u03b8t changes for t = 0, 1, . . . , N \u2212 1. The LQ tracking problem (9) aims to follow a time-varying\ntrajectory {\u03b8t} with constant weights on the tracking cost and the control cost.\nRegret upper bound. Firstly, based on Theorem 1 and Theorem 2, we have the following bound.\nCorollary 1. Under the stepsizes in Theorem 1, RHTM with FOSS as the initialization rule satis\ufb01es\n\nRegret(RHT M ) = O\n\n\u03b6 2(\n\n(cid:32)\n\n\u221a\n\n\u03b6 \u2212 1\u221a\n\n\u03b6\n\n)2K\n\nN(cid:88)\n\nt=0\n\n(cid:33)\n\n(cid:107)\u03b8t \u2212 \u03b8t\u22121(cid:107)\n\nwhere K = (cid:98)(W \u2212 1)/p(cid:99), \u03b6 is the condition number of the corresponding C(z), \u03b8\u22121 = 0.\nThis corollary shows that the regret can be bounded by the total variation of \u03b8t for constant Q, R.\nFundamental limit. For any online algorithm, we have the following lower bound.\nTheorem 3 (Lower Bound). Consider 1 \u2264 W \u2264 N/3, any condition number \u03b6 > 1, any variation\nbudget 4\u00af\u03b8 \u2264 LN \u2264 (2N + 1)\u00af\u03b8, and any controllability index p \u2265 1. For any online algorithm\nA, there exists an LQ tracking problem in form (9) where i) the canonical-form system (A, B) has\nt=0 (cid:107)\u03b8t \u2212 \u03b8t\u22121(cid:107) \u2264 LN ,\nand iii) the corresponding C(z) has condition number \u03b6, such that the following lower bound holds\n\ncontrollability index p, ii) the sequence {\u03b8t} satis\ufb01es the variation budget(cid:80)N\nN(cid:88)\n\n(cid:32)\n\n(cid:33)\n\n(cid:18)\n\n(cid:19)\n\n)2KLN\n\n\u03b6 \u2212 1\n\u03b6 + 1\n\n(cid:107)\u03b8t \u2212 \u03b8t\u22121(cid:107)\n\nRegret(A) = \u2126\n\n\u03b6 \u2212 1\n\u03b6 + 1\nwhere K = (cid:98)(W \u2212 1)/p(cid:99) and \u03b8\u22121 = 0.\nFirstly, the lower bound in Theorem 3 almost matches the upper bound in Corollary 1, especially\nwhen \u03b6 is large, demonstrating that RHTM utilizes the predictions in a near-optimal way. The major\nconditions in Theorem 3 require that the prediction window is short compared with the horizon:\nW \u2264 N/3, and the variation of the cost functions should not be too small: LN \u2265 4\u00af\u03b8, otherwise the\nonline control problem is too easy and the regret can be very small. Moreover, the small gap between\nthe regret bounds is conjectured to be nontrivial, because this gap coincides with the long lasting gap\nin the convergence rate of the \ufb01rst-order algorithms for strongly convex and smooth optimization. In\nparticular, the lower bound in Theorem 3 matches the fundamental convergence limit in [38], and the\nupper bound is by triple momentum\u2019s convergence rate, which is the best one to our knowledge.\n\n(10)\n\n= \u2126\n\n(\n\n\u221a\n\u221a\n\n(\n\n\u221a\n\u221a\n\n)2K\n\nt=0\n\n6 Numerical experiments\n\nLQ tracking problem in Example 1. The system considered here has n = 2, m = 1, and p = 2.\nMore details of the experiment settings are provided in Appendix H. We compare RHGC with a\nsuboptimal MPC algorithm, fast gradient MPC (subMPC) [23]. Roughly speaking, subMPC solves\nthe W -stage truncated optimal control from t to t + W \u2212 1 by Nesterov\u2019s accelerated gradient\n[38], and one iteration of Nesterov\u2019s accelerated gradient requires 2W gradient evaluations of stage\n\n8\n\n\fFigure 1: Regret for LQ tracking.\n\nFigure 2: Two-wheel robot tracking with nonlinear dynamics.\n\ncost function since W stages are considered and each stage has two costs ft and gt. This implies\nthat, in terms of the number of gradient evaluations, subMPC with one iteration corresponds to our\nRHTM because RHTM also requires roughly 2W gradient evaluations per stage. Therefore, Figure 1\ncompares our RHGC algorithms with subMPC with one iteration. Figure 1 also plots subMPC with 3\nand 5 iterations for more insights. Besides, Figure 1 plots not only RHGD and RHTM, but also RHAG,\nwhich is based on Nesterov\u2019s accelerated gradient. Figure 1 shows that all our algorithms achieve\nexponential decaying regrets with respect to W , and the regrets are piecewise constant, matching\nTheorem 1. Further, it is observed that RHTM and RHAG perform better than RHGD, which is\nintuitive because triple momentum and Nesterov\u2019s accelerated gradient are accelerated versions of\ngradient descent. In addition, our algorithms are much better than subMPC with 1 iteration, implying\nthat our algorithms utilize the lookahead information more ef\ufb01ciently given similar computational\ntime. Finally, subMPC achieves better performance by increasing the iteration number but the\nimprovement saturates as W increases, in contrast to the steady improvement of RHGC.\nPath tracking for a two-wheel mobile robot. Though we presented our online algorithms on an\nLTI system, our RHGC methods are applicable to some nonlinear systems as well. Here we consider\na two-wheel mobile robot with nonlinear kinematic dynamics \u02d9x = v cos \u03b4, \u02d9y = v sin \u03b4, \u02d9\u03b4 = w\nwhere (x, y) is the robot location, v and w are the tangential and angular velocities respec-\ntively, \u03b4 denotes the tangent angle between v and the x axis [60]. The control is directly on\nthe v and w, e.g., via the pulse-width modulation (PWM) of the motor [61]. Given a refer-\nt ), the objective is to balance the tracking performance and the control cost, i.e.,\nence path (xr\n\nt \u00b7 (wt)2(cid:3). We discretize the dynamics\n\nt )2(cid:1) + cv\n\n(cid:2)ct \u00b7(cid:0)(xt \u2212 xr\n\nt , yr\n\nmin (cid:80)N\n\nt=0\n\nt )2 + (yt \u2212 yr\n\nt \u00b7 (vt)2 + cw\n\nwith time interval \u2206t = 0.025s; then follow similar ideas in this paper to reformulate the optimal\npath tracking problem to an unconstrained optimization with respect to (xt, yt) and apply RHGC.\nSee Appendix H for details. Figure 2 plots the tracking results with window W = 40 and W = 80\ncorresponding to lookahead time 1s and 2s. A video showing the dynamic processes with different W\nis provided at https://youtu.be/fal56LTBD1s. It is observed that the robot follows the reference\ntrajectory well especially when the path is smooth but deviates a little more when the path has sharp\nturns, and a longer lookahead window leads to better tracking performance. These results con\ufb01rm\nthat our RHGC works effectively on nonlinear systems.\n\n7 Conclusion\n\nThis paper studies the role of predictions on dynamic regrets of online control problems with linear\ndynamics. We design RHGC algorithms and provide regret upper bounds of two speci\ufb01c algorithms:\nRHGD and RHTM. We also provide a fundamental limit and show the fundamental limit almost\nmatches RHTM\u2019s upper bound. This paper leads to many interesting future directions, some of which\nare brie\ufb02y discussed below. The \ufb01rst direction is to study more realistic prediction models which\nconsiders random prediction noises, e.g. [33, 35, 62]. The second direction is to consider unknown\nsystems with process noises, possibly by applying learning-based control tools [44, 46, 48]. Further,\nmore studies could be conducted on general control problems including nonlinear control and control\nwith input and state constraints. Besides, it is interesting to consider other performance metrics,\nsuch as competitive ratio, since the dynamic regret is non-vanishing. Finally, other future directions\ninclude closing the gap of the regret bounds and more discussion on the effect of the canonical-form\ntransformation on the condition number.\n\n9\n\n2468101214Prediction W-10-5059log(regret)RHGDRHAGRHTMsubMPC Iter = 1subMPC Iter = 3subMPC Iter = 5-20-1001020X-20-1001020YW = 40referencerobot path-20-1001020X-20-1001020YW = 80referencerobot path\fAcknowledgement\n\nThis work was supported by NSF Career 1553407, ARPA-E NODES, AFOSR YIP and ONR YIP\nprograms.\n\nReferences\n\n[1] Nevena Lazic, Craig Boutilier, Tyler Lu, Eehern Wong, Binz Roy, MK Ryu, and Greg Imwalle.\nData center cooling using model-predictive control. In Advances in Neural Information Pro-\ncessing Systems, pages 3814\u20133823, 2018.\n\n[2] Wei Xu, Xiaoyun Zhu, Sharad Singhal, and Zhikui Wang. Predictive control for dynamic\nresource allocation in enterprise data centers. In 2006 IEEE/IFIP Network Operations and\nManagement Symposium NOMS 2006, pages 115\u2013126. IEEE, 2006.\n\n[3] Tomas Baca, Daniel Hert, Giuseppe Loianno, Martin Saska, and Vijay Kumar. Model predictive\ntrajectory tracking and collision avoidance for reliable outdoor deployment of unmanned aerial\nvehicles. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),\npages 6753\u20136760. IEEE, 2018.\n\n[4] Jackeline Rios-Torres and Andreas A Malikopoulos. A survey on the coordination of connected\nand automated vehicles at intersections and merging at highway on-ramps. IEEE Transactions\non Intelligent Transportation Systems, 18(5):1066\u20131077, 2016.\n\n[5] Kyoung-Dae Kim and Panganamala Ramana Kumar. An mpc-based approach to provable\nsystem-wide safety and liveness of autonomous ground traf\ufb01c. IEEE Transactions on Automatic\nControl, 59(12):3341\u20133356, 2014.\n\n[6] Samir Kouro, Patricio Cort\u00e9s, Ren\u00e9 Vargas, Ulrich Ammann, and Jos\u00e9 Rodr\u00edguez. Model pre-\ndictive control\u2014a simple and powerful method to control power converters. IEEE Transactions\non industrial electronics, 56(6):1826\u20131838, 2008.\n\n[7] Edgar Perea-Lopez, B Erik Ydstie, and Ignacio E Grossmann. A model predictive control\nstrategy for supply chain optimization. Computers & Chemical Engineering, 27(8-9):1201\u2013\n1218, 2003.\n\n[8] Wenlin Wang, Daniel E Rivera, and Karl G Kempf. Model predictive control strategies for\nsupply chain management in semiconductor manufacturing. International Journal of Production\nEconomics, 107(1):56\u201377, 2007.\n\n[9] Moritz Diehl, Rishi Amrit, and James B Rawlings. A lyapunov function for economic optimizing\n\nmodel predictive control. IEEE Transactions on Automatic Control, 56(3):703\u2013707, 2010.\n\n[10] Matthias A M\u00fcller and Frank Allg\u00f6wer. Economic and distributed model predictive control:\nRecent developments in optimization-based control. SICE Journal of Control, Measurement,\nand System Integration, 10(2):39\u201352, 2017.\n\n[11] Matthew Ellis, Helen Durand, and Panagiotis D Christo\ufb01des. A tutorial review of economic\n\nmodel predictive control methods. Journal of Process Control, 24(8):1156\u20131178, 2014.\n\n[12] Antonio Ferramosca, James B Rawlings, Daniel Lim\u00f3n, and Eduardo F Camacho. Economic\nmpc for a changing economic criterion. In 49th IEEE Conference on Decision and Control\n(CDC), pages 6131\u20136136. IEEE, 2010.\n\n[13] Matthew Ellis and Panagiotis D Christo\ufb01des. Economic model predictive control with time-\nvarying objective function for nonlinear process systems. AIChE Journal, 60(2):507\u2013519,\n2014.\n\n[14] David Angeli, Alessandro Casavola, and Francesco Tedesco. Theoretical advances on economic\nmodel predictive control with time-varying costs. Annual Reviews in Control, 41:218\u2013224,\n2016.\n\n[15] Rishi Amrit, James B Rawlings, and David Angeli. Economic optimization using model\n\npredictive control with a terminal cost. Annual Reviews in Control, 35(2):178\u2013186, 2011.\n\n[16] Lars Gr\u00fcne. Economic receding horizon control without terminal constraints. Automatica,\n\n49(3):725\u2013734, 2013.\n\n10\n\n\f[17] David Angeli, Rishi Amrit, and James B Rawlings. On average performance and stability of\neconomic model predictive control. IEEE transactions on automatic control, 57(7):1615\u20131626,\n2012.\n\n[18] Lars Gr\u00fcne and Marleen Stieler. Asymptotic stability and transient optimality of economic mpc\n\nwithout terminal conditions. Journal of Process Control, 24(8):1187\u20131196, 2014.\n\n[19] Lars Gr\u00fcne and Anastasia Panin. On non-averaged performance of economic mpc with terminal\nconditions. In 2015 54th IEEE Conference on Decision and Control (CDC), pages 4332\u20134337.\nIEEE, 2015.\n\n[20] Antonio Ferramosca, Daniel Limon, and Eduardo F Camacho. Economic mpc for a changing\neconomic criterion for linear systems. IEEE Transactions on Automatic Control, 59(10):2657\u2013\n2667, 2014.\n\n[21] Lars Gr\u00fcne and Simon Pirkelmann. Closed-loop performance analysis for economic model\npredictive control of time-varying systems. In 2017 IEEE 56th Annual Conference on Decision\nand Control (CDC), pages 5563\u20135569. IEEE, 2017.\n\n[22] Lars Gr\u00fcne and Simon Pirkelmann. Economic model predictive control for time-varying system:\n\nPerformance and stability results. Optimal Control Applications and Methods, 2018.\n\n[23] Melanie Nicole Zeilinger, Colin Neil Jones, and Manfred Morari. Real-time suboptimal\nmodel predictive control using a combination of explicit mpc and online optimization. IEEE\nTransactions on Automatic Control, 56(7):1524\u20131534, 2011.\n\n[24] Yang Wang and Stephen Boyd. Fast model predictive control using online optimization. IEEE\n\nTransactions on Control Systems Technology, 18(2):267\u2013278, 2010.\n\n[25] Knut Graichen and Andreas Kugi. Stability and incremental improvement of suboptimal mpc\nwithout terminal constraints. IEEE Transactions on Automatic Control, 55(11):2576\u20132580,\n2010.\n\n[26] Douglas A Allan, Cuyler N Bates, Michael J Risbeck, and James B Rawlings. On the inherent\nrobustness of optimal and suboptimal nonlinear mpc. Systems & Control Letters, 106:68\u201378,\n2017.\n\n[27] E. Hazan. Introduction to Online Convex Optimization. Foundations and Trends(r) in Optimiza-\n\ntion Series. Now Publishers, 2016.\n\n[28] S. Shalev-Shwartz. Online Learning and Online Convex Optimization. Foundations and\n\nTrends(r) in Machine Learning. Now Publishers, 2012.\n\n[29] Ali Jadbabaie, Alexander Rakhlin, Shahin Shahrampour, and Karthik Sridharan. Online\noptimization: Competing with dynamic comparators. In Arti\ufb01cial Intelligence and Statistics,\npages 398\u2013406, 2015.\n\n[30] Minghong Lin, Adam Wierman, Lachlan LH Andrew, and Eno Thereska. Dynamic right-\nsizing for power-proportional data centers. IEEE/ACM Transactions on Networking (TON),\n21(5):1378\u20131391, 2013.\n\n[31] Minghong Lin, Zhenhua Liu, Adam Wierman, and Lachlan LH Andrew. Online algorithms\nfor geographical load balancing. In Green Computing Conference (IGCC), 2012 International,\npages 1\u201310. IEEE, 2012.\n\n[32] Alexander Rakhlin and Karthik Sridharan. Online learning with predictable sequences. In\n\nConference on Learning Theory, pages 993\u20131019, 2013.\n\n[33] Niangjun Chen, Anish Agarwal, Adam Wierman, Siddharth Barman, and Lachlan LH Andrew.\nOnline convex optimization using predictions. In ACM SIGMETRICS Performance Evaluation\nReview, volume 43, pages 191\u2013204. ACM, 2015.\n\n[34] Masoud Badiei, Na Li, and Adam Wierman. Online convex optimization with ramp constraints.\nIn Decision and Control (CDC), 2015 IEEE 54th Annual Conference on, pages 6730\u20136736.\nIEEE, 2015.\n\n[35] Niangjun Chen, Joshua Comden, Zhenhua Liu, Anshul Gandhi, and Adam Wierman. Using\npredictions in online optimization: Looking forward with an eye on the past. In Proceedings\nof the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of\nComputer Science, pages 193\u2013206. ACM, 2016.\n\n11\n\n\f[36] Yingying Li, Guannan Qu, and Na Li. Online optimization with predictions and switching costs:\n\nFast algorithms and the fundamental limit. arXiv preprint arXiv:1801.07780, 2018.\n\n[37] Gautam Goel and Adam Wierman. An online algorithm for smoothed regression and lqr control.\nIn The 22nd International Conference on Arti\ufb01cial Intelligence and Statistics, pages 2504\u20132513,\n2019.\n\n[38] Yurii Nesterov. Introductory lectures on convex optimization: A basic course, volume 87.\n\nSpringer Science & Business Media, 2013.\n\n[39] Bryan Van Scoy, Randy A Freeman, and Kevin M Lynch. The fastest known globally convergent\n\ufb01rst-order method for minimizing strongly convex functions. IEEE Control Systems Letters,\n2(1):49\u201354, 2017.\n\n[40] David Luenberger. Canonical forms for linear multivariable systems. IEEE Transactions on\n\nAutomatic Control, 12(3):290\u2013293, 1967.\n\n[41] Yasin Abbasi-Yadkori, Peter Bartlett, and Varun Kanade. Tracking adversarial targets. In\n\nInternational Conference on Machine Learning, pages 369\u2013377, 2014.\n\n[42] Alon Cohen, Avinatan Hasidim, Tomer Koren, Nevena Lazic, Yishay Mansour, and Kunal\nTalwar. Online linear quadratic control. In International Conference on Machine Learning,\npages 1028\u20131037, 2018.\n\n[43] Naman Agarwal, Brian Bullins, Elad Hazan, Sham Kakade, and Karan Singh. Online control\nwith adversarial disturbances. In International Conference on Machine Learning, pages 111\u2013\n119, 2019.\n\n[44] Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, and Stephen Tu. On the sample\n\ncomplexity of the linear quadratic regulator. arXiv preprint arXiv:1710.01688, 2017.\n\n[45] Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, and Stephen Tu. Regret bounds for\nrobust adaptive control of the linear quadratic regulator. In Advances in Neural Information\nProcessing Systems, pages 4188\u20134197, 2018.\n\n[46] Stephen Tu and Benjamin Recht. Least-squares temporal difference learning for the linear\n\nquadratic regulator. arXiv preprint arXiv:1712.08642, 2017.\n\n[47] Kyriakos G Vamvoudakis and Frank L Lewis. Online actor\u2013critic algorithm to solve the\ncontinuous-time in\ufb01nite horizon optimal control problem. Automatica, 46(5):878\u2013888, 2010.\n[48] Yi Ouyang, Mukul Gagrani, and Rahul Jain. Learning-based control of unknown linear systems\n\nwith thompson sampling. arXiv preprint arXiv:1709.04047, 2017.\n\n[49] Lian Lu, Jinlong Tu, Chi-Kin Chau, Minghua Chen, and Xiaojun Lin. Online energy generation\nscheduling for microgrids with intermittent energy sources and co-generation, volume 41. ACM,\n2013.\n\n[50] Allan Borodin, Nathan Linial, and Michael E Saks. An optimal on-line algorithm for metrical\n\ntask system. Journal of the ACM (JACM), 39(4):745\u2013763, 1992.\n\n[51] Aryan Mokhtari, Shahin Shahrampour, Ali Jadbabaie, and Alejandro Ribeiro. Online optimiza-\ntion in dynamic environments: Improved regret rates for strongly convex problems. In 2016\nIEEE 55th Conference on Decision and Control (CDC), pages 7195\u20137201. IEEE, 2016.\n\n[52] Lachlan Andrew, Siddharth Barman, Katrina Ligett, Minghong Lin, Adam Meyerson, Alan\nRoytman, and Adam Wierman. A tale of two metrics: Simultaneous bounds on competitiveness\nand regret. In Conference on Learning Theory, pages 741\u2013763, 2013.\n\n[53] Joao P Hespanha. Linear systems theory. Princeton university press, 2018.\n[54] Stefan Richter, Colin Neil Jones, and Manfred Morari. Computational complexity certi\ufb01cation\nfor real-time mpc with input constraints based on the fast gradient method. IEEE Transactions\non Automatic Control, 57(6):1391\u20131403, 2011.\n\n[55] JB Rawlings and DQ Mayne. Postface to model predictive control: Theory and design. Nob\n\nHill Pub, pages 155\u2013158, 2012.\n\n[56] David Angeli, Rishi Amrit, and James B Rawlings. Receding horizon cost optimization for\noverly constrained nonlinear plants. In Proceedings of the 48h IEEE Conference on Decision\nand Control (CDC) held jointly with 2009 28th Chinese Control Conference, pages 7972\u20137977.\nIEEE, 2009.\n\n12\n\n\f[57] Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming.\n\nJohn Wiley & Sons, 2014.\n\n[58] Tobias Damm, Lars Gr\u00fcne, Marleen Stieler, and Karl Worthmann. An exponential turnpike\ntheorem for dissipative discrete time optimal control problems. SIAM Journal on Control and\nOptimization, 52(3):1935\u20131957, 2014.\n\n[59] Dimitri P Bertsekas. Dynamic programming and optimal control, volume 1. 2011.\n[60] Gregor Klancar, Drago Matko, and Saso Blazic. Mobile robot control on a reference path. In\nProceedings of the 2005 IEEE International Symposium on, Mediterrean Conference on Control\nand Automation Intelligent Control, 2005., pages 1343\u20131348. IEEE, 2005.\n\n[61] Pololu Corporation. Pololu m3pi User\u2019s Guide. Available at https://www.pololu.com/\n\ndocs/pdf/0J48/m3pi.pdf.\n\n[62] Heejung Bang and James M Robins. Doubly robust estimation in missing data and causal\n\ninference models. Biometrics, 61(4):962\u2013973, 2005.\n\n13\n\n\f", "award": [], "sourceid": 8472, "authors": [{"given_name": "Yingying", "family_name": "Li", "institution": "Harvard University"}, {"given_name": "Xin", "family_name": "Chen", "institution": "Harvard University"}, {"given_name": "Na", "family_name": "Li", "institution": "Harvard University"}]}