{"title": "Neural Lyapunov Control", "book": "Advances in Neural Information Processing Systems", "page_first": 3245, "page_last": 3254, "abstract": "We propose new methods for learning control policies and neural network Lyapunov functions for nonlinear control problems, with provable guarantee of stability. The framework consists of a learner that attempts to find the control and Lyapunov functions, and a falsifier that finds counterexamples to quickly guide the learner towards solutions. The procedure terminates when no counterexample is found by the falsifier, in which case the controlled nonlinear system is provably stable. The approach significantly simplifies the process of Lyapunov control design, provides end-to-end correctness guarantee, and can obtain much larger regions of attraction than existing methods such as LQR and SOS/SDP. We show experiments on how the new methods obtain high-quality solutions for challenging robot control problems such as path tracking for wheeled vehicles and humanoid robot balancing.", "full_text": "Neural Lyapunov Control\n\nYa-Chien Chang\n\nUCSD\n\nyac021@eng.ucsd.edu\n\nNima Roohi\n\nUCSD\n\nnroohi@eng.ucsd.edu\n\nSicun Gao\n\nUCSD\n\nsicung@eng.ucsd.edu\n\nAbstract\n\nWe propose new methods for learning control policies and neural network Lyapunov\nfunctions for nonlinear control problems, with provable guarantee of stability. The\nframework consists of a learner that attempts to \ufb01nd the control and Lyapunov\nfunctions, and a falsi\ufb01er that \ufb01nds counterexamples to quickly guide the learner\ntowards solutions. The procedure terminates when no counterexample is found by\nthe falsi\ufb01er, in which case the controlled nonlinear system is provably stable. The\napproach signi\ufb01cantly simpli\ufb01es the process of Lyapunov control design, provides\nend-to-end correctness guarantee, and can obtain much larger regions of attraction\nthan existing methods such as LQR and SOS/SDP. We show experiments on how the\nnew methods obtain high-quality solutions for challenging robot control problems\nsuch as path tracking for wheeled vehicles and humanoid robot balancing.\n\n1\n\nIntroduction\n\nLearning-based methods hold the promise of solving hard nonlinear control problems in robotics.\nMost existing work focuses on learning control functions represented as neural networks through\nrepeated interactions of an unknown environment in the framework of deep reinforcement learning,\nwith notable success. However, there are still well-known issues that impede the immediate use of\nthese methods in practical control applications, including sample complexity, interpretability, and\nsafety [5]. Our work investigates a different direction: Can learning methods be valuable even in\nthe most classical setting of nonlinear control design? We focus on the challenging problem of\ndesigning feedback controllers for stabilizing nonlinear dynamical systems with provable guarantee.\nThis problem captures the core dif\ufb01culty of underactuated robotics [25]. We demonstrate that neural\nnetworks and deep learning can \ufb01nd provably stable controllers in a direct way and tackle the\nfull nonlinearity of the systems, and signi\ufb01cantly outperform existing methods based on linear or\npolynomial approximations such as linear-quadratic regulators (LQR) [17] and sum-of-squares (SOS)\nand semide\ufb01nite programming (SDP) [21]. The results show the promise of neural networks and\ndeep learning in improving the solutions of many challenging problems in nonlinear control.\nThe prevalent way of stabilizing nonlinear dynamical systems is to linearize the system dynamics\naround an equilibrium, and formulate LQR problems to minimize deviation from the equilibrium.\nLQR methods compute a linear feedback control policy, with stability guarantee within a small\nneighborhood where linear approximation is accurate. However, the dependence on linearization\nproduces extremely conservative systems, and it explains why agile robot locomotion is hard [25].\nTo control nonlinear systems outside their linearizable regions, we need to rely on Lyapunov meth-\nods [13]. Following the intuition that a dynamical system stabilizes when its energy decreases over\ntime, Lyapunov methods construct a scalar \ufb01eld that can force stabilization. These \ufb01elds are highly\nnonlinear and the need for function approximations has long been recognized [13]. Many existing\napproaches rely on polynomial approximations of the dynamics and the search of sum-of-squares\npolynomials as Lyapunov functions through semide\ufb01nite programming (SDP) [21]. A rich theory\nhas been developed around the approach, but in practice the polynomial approximations pose much\nrestriction on the systems and the Lyapunov landscape. Moreover, well-known numerical sensitivity\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fissues in SDP [18] make it very hard to \ufb01nd solutions that fully satisfy the Lyapunov conditions. In\ncontrast, we exploit the expressive power of neural networks, the convenience of gradient descent for\nlearning, and the completeness of nonlinear constraint solving methods to provide full guarantee of\nLyapunov conditions. We show that the combination of these techniques produces control designs\nthat can stabilize various nonlinear systems with veri\ufb01ed regions of attraction that are much larger\nthan what can be obtained by existing control methods.\nWe propose an algorithmic framework for learning control functions and neural network Lyapunov\nfunctions for nonlinear systems without any local approximation of their dynamics. The framework\nconsists of a learner and a falsi\ufb01er. The learner uses stochastic gradient descent to \ufb01nd parameters\nin both a control function and a neural Lyapunov function, by iteratively minimizing the Lyapunov\nrisk which measures the violation of the Lyapunov conditions. The falsi\ufb01er takes a control function\nand Lyapunov function from the learner, and searches for counterexample state vectors that violate\nthe Lyapunov conditions. The counterexamples are added to the training set for the next iteration of\nlearning, generating an effective curriculum. The falsi\ufb01er uses delta-complete constraint solving [11],\nwhich guarantees that when no violation is found, the Lyapunov conditions are guaranteed to hold\nfor all states in the veri\ufb01ed domain. In this framework, the learner and falsi\ufb01er are given tasks that\nare dif\ufb01cult in different ways and can not be achieved by the other side. Moreover, we show that the\nframework provides the \ufb02exibility for \ufb01ne-tuning the control performance by directly enlarging the\nregion of attraction on demand, by adding regulator terms in the learning cost.\nWe experimented with several challenging nonlinear control problems in robotics, such as drone\nlanding, wheeled vehicle path following, and humanoid robot balancing. We are able to \ufb01nd new\ncontrol policies that produce certi\ufb01ed region of attractions that are signi\ufb01cantly larger than what can\nbe established previously. We provide a detailed analysis of the performance comparison between the\nproposed methods and the LQR/SOS/SDP methods.\nRelated Work. The recent work of Richards et. al. [24] has also proposed and shown the effectiveness\nof using neural networks to learn safety certi\ufb01cates in a Lyapunov framework, but our goals and\napproaches are different. Richards et. al. focus on discrete-time polynomial systems and the use of\nneural networks to learn the region of attraction of a given controller. The Lyapunov conditions are\nvalidated in relaxed forms through sampling. Special design of the neural architecture is required\nto compensate the lack of complete checking over all states. In comparison, we focus on learning\nthe control and the Lyapunov function together with provable guarantee of stability in larger regions\nof attraction. Our approach directly handles non-polynomial continuous dynamical systems, does\nnot assume control functions are given other than an initialization, and uses generic feed-forward\nnetwork representations without manual design. Our approach successfully works on many more\nnonlinear systems, and \ufb01nd new control functions that enlarge regions of attraction obtainable\nfrom standard control methods. Related learning-based approaches for \ufb01nding Lyapunov functions\ninclude [6, 7, 10, 22]. There is strong evidence that linear control functions are all we need for\nsolving highly nonlinear control problems through reinforcement learning as well [20], suggesting\nconvergence of different learning approaches. In the control and robotics community, similar learner-\nfalsi\ufb01er frameworks have been proposed by [23, 16] without using neural network representations.\nThe common assumption is the Lyapunov functions are high-degree polynomials. In these methods,\nan explicit control function and Lyapunov function can not be learned together because of the bilinear\noptimization problems that they generate. Our approach signi\ufb01cantly simpli\ufb01es the algorithms\nin this direction and has worked reliably on much harder control problems compared to existing\nmethods. Several theoretical results on asymptotic Lyapunov stability [2, 4, 3, 1] show that some\nvery simple dynamical systems do not admit a polynomial Lyapunov function of any degree, despite\nbeing globally asymptotically stable. Such results further motivates the use of neural networks as a\nmore suitable function approximator. A large body of work in control uses SOS representations and\nSDP optimization in the search for Lyapunov functions [14, 21, 9, 15, 19]. However, scalability and\nnumerical sensitivity issues have been the main challenge in practice. As is well known, the number\nof semide\ufb01nite programs from SOS decomposition grows quickly for low degree polynomials [21].\n\n2 Preliminaries\n\nWe consider the problem of designing control functions to stablize a dynamical system at an equilib-\nrium point. We make extensive use of the following results from Lyapunov stability theory.\n\n2\n\n\fDe\ufb01nition 1 (Controlled Dynamical Systems). An n-dimensional controlled dynamical system is\n\ndx\ndt\n\n= fu(x),\n\nx(0) = x0\n\n(1)\nwhere fu : D \u2192 Rn is a Lipschitz-continuous vector \ufb01eld, and D \u2286 Rn is an open set with 0 \u2208 D\nthat de\ufb01nes the state space of the system. Each x(t) \u2208 D is a state vector. The feedback control is\nde\ufb01ned by a continuous function u : Rn \u2192 Rm, used as a component in the full dynamics fu.\nDe\ufb01nition 2 (Asymptotic Stability). We say that system of (1) is stable at the origin if for any\n\u03b5 \u2208 R+, there exists \u03b4(\u03b5) \u2208 R+ such that if (cid:107)x(0)(cid:107) < \u03b4 then (cid:107)x(t)(cid:107) < \u03b5 for all t \u2265 0. The system is\nasymptotically stable at the origin if it is stable and also limt\u2192\u221e (cid:107)x(t)(cid:107) = 0 for all (cid:107)x(0)(cid:107) < \u03b4.\nDe\ufb01nition 3 (Lie Derivatives). The Lie derivative of a continuously differentiable scalar function\nV : D \u2192 R over a vector \ufb01eld fu is de\ufb01ned as\n\u2202V\n\u2202xi\n\n\u2207fuV (x) =\n\nn(cid:88)\n\nn(cid:88)\n\n[fu]i(x)\n\ndxi\ndt\n\n=\n\n\u2202V\n\u2202xi\n\ni=1\n\ni=1\n\nIt measures the rate of change of V along the direction of the system dynamics.\nProposition 1 (Lyapunov Functions for Asymptotic Stability). Consider a controlled system (1) with\nequilibrium at the origin, i.e., fu(0) = 0. Suppose there exists a continuously differentiable function\nV : D \u2192 R that satis\ufb01es the following conditions:\n\nV (0) = 0, and, \u2200x \u2208 D \\ {0}, V (x) > 0 and \u2207fu V (x) < 0.\n\n(2)\n\nThen, the system is asymptotically stable at the origin and V is called a Lyapunov function.\n\nLinear-Quadratic Regulators (LQR) is a widely-adpoted optimal control strategy. LQR controllers\nare guaranteed to work within a small neighborhood around the stationary point where the dynamics\ncan be approximated as linear systems. A detailed description can be found in [17].\n\n3 Learning to Stabilize with Neural Lyapunov Functions\n\nWe now describe how to learn both a control function and a neural Lyapunov function together, so\nthat the Lyapunov conditions can be rigorously veri\ufb01ed to ensure stability of the system. We provide\npseudocode of the algorithm in Algorithm 1.\n\n3.1 Control and Lyapunov Function Learning\n\nWe design the hypothesis class of candidate Lyapunov functions to be multilayered feedforward\nnetworks with tanh activation functions. It is important to note that unlike most other deep learning\napplications, we can not use non-smooth networks, such as with ReLU activations. This is because we\nwill need to analytically determine whether the Lyapunov conditions hold for these neural networks,\nwhich requires the existence of their Lie derivatives.\nFor a neural network Lyapunov function, its input is any state vector of the system in De\ufb01nition (1)\nand the output is a scalar value. We write \u03b8 to denote the parameter vector for a Lyapunov function\ncandidate V\u03b8. For notational convenience, we write u to denote both the control function and the\nparameters that de\ufb01ne the function. The learning process updates both the \u03b8 and u parameters to\nimprove the likelihood of satisfying the Lyapunov conditions, which we formulate as a cost function\nnamed the Lyapunov risk. The Lyapunov risk measures the degree of violation of the following\nLyapunov conditions, as shown in Proposition (1). First, the value of V\u03b8 (x) is positive; Second, the\nvalue of the Lie derivative \u2207fu V\u03b8 (x) is negative; Third, the value of V\u03b8(0) is zero. Conceptually, the\noverall Lyapunov control design problem is about minimizing the minimax cost of the form\n\n(cid:19)\n\n(cid:18)\n\ninf\n\u03b8,u\n\nsup\nx\u2208D\n\nmax(0,\u2212V\u03b8(x)) + max(0,\u2207fuV\u03b8(x)) + V 2\n\n\u03b8 (0)\n\n.\n\nThe dif\ufb01culty in control design problems is that the violation of the Lyapunov conditions can not just\nbe estimated, but needs to be fully guaranteed over all states in D. Thus, we need to rely on global\nsearch with complete guarantee for the inner maximization part, which we delegate to the falsi\ufb01er\nexplained in Section 3.2. For the learning step, we de\ufb01ne the following Lyapunov risk function.\n\n3\n\n\fDe\ufb01nition 4 (Lyapunov Risk). Consider a candidate Lyapunov function V\u03b8 for a controlled dynamical\nsystem fu from De\ufb01nition 1. The Lyapunov risk is a de\ufb01ned by the following function\n\nmax(0,\u2212V\u03b8(x)) + max(0,\u2207fu V\u03b8(x)) + V 2\n(cid:19)\n\nmax(\u2212V\u03b8(xi), 0) + max(0,\u2207fu V\u03b8(xi))\n\n(cid:19)\n\n\u03b8 (0)\n\n,\n\n(3)\n\n+ V 2\n\n\u03b8 (0),\n\n(4)\n\n(cid:18)\n\n(cid:18)\n\nN(cid:88)\n\ni=1\n\nL\u03c1 (\u03b8, u) = Ex\u223c\u03c1(D)\n\nLN,\u03c1 (\u03b8, u) =\n\n1\nN\n\nwhere x is a random variable over the state space of the system with a distribution \u03c1. In practice, we\nwork with the Monte Carlo estimate named the empirical Lyapunov risk by drawing samples:\n\nwhere xi, 1 \u2264 i \u2264 N are samples of the state vectors sampled according to \u03c1(D).\nIt is clear that the empirical Lyapunov risk is an unbiased estimator of the Lyapunov risk function. It\nis clear that LN,\u03c1 is an unbiased estimator of L\u03c1.\nNote that L\u03c1 is positive semide\ufb01nite, and any (\u03b8, u) that corresponds to a true Lyapunov function\nsatis\ufb01es L(\u03b8, u)=0. Thus, Lyapunov functions de\ufb01ne global minimizers of the Lyapunov risk function.\nProposition 2. Let V\u03b8o be a Lyapunov function for dynamical system fuo where uo is the control\nparameters. Then (\u03b8o, uo) is a global minimizer for L\u03c1 and L\u03c1(\u03b8o, uo) = 0.\n\nNote that both V\u03b8 and fu are highly nonlinear (even though u is almost always linear in practice),\nand thus L(\u03b8, u) generates a highly complex landscape. Surprisingly, multilayer feedforward tanh\nnetworks and stochastic gradient descent can quickly produce generalizable Lyapunov functions with\nnice geometric properties, as we report in detail in the experiments. In Figure 1 (b), we show an\nexample of how the Lyapunov risk is minimized over iterations on the inverted pendulum example.\n\nInitialization and improvement of control performance over LQR. Because of the local nature\nof stochastic gradient descent, it is hard to learn good control functions through random initialization\nof control parameters. Instead, the parameters u in the control function are initialized to the LQR\nsolution, obtained for the linearized dynamics in a small neighborhood around the stationary point. On\nthe other hand, the initialization of the neural network Lyapunov functions can be completely random.\nWe observe that the \ufb01nal learned controller often delivers signi\ufb01cantly better control solutions than\nthe initalization from LQR. Figure 1(a) shows how the learned control reduces oscillation of the\nsystem behavior in the humanoid robot balancing example and achieve more stable control.\n\nFigure 1: (a) Comparison between LQR and deep-learned controllers for humanoid balancing. (b)\nThe Lyapunov risk decreases quickly over iterations. (c) Counterexamples returned by falsi\ufb01ers from\nseveral epochs, which quickly guides the learner to focus on sepcial regions in the space.\n\n3.2 Falsi\ufb01cation and Counterexample Generation\n\nFor each control and Lyapunov function pair (V\u03b8, u) that the learner obtains, the falsi\ufb01er\u2019s task is to\n\ufb01nd states that violate the Lyapunov conditions in Proposition 1. We formulate the negations of the\nLyapunov conditions as a nonlinear constraint solving problem over real numbers. These falsi\ufb01cation\nconstraints are de\ufb01ned as follows.\n\n4\n\n\fDe\ufb01nition 5 (Lyapunov Falsi\ufb01cation Constraints). Let V be a candidate Lyapunov function for a\ndynamical system de\ufb01ned by fu de\ufb01ned in state space D. Let \u03b5 \u2208 Q+ be a small constant parameter\nthat bounds the tolerable numerical error. The Lyapunov falsi\ufb01cation constraint is the following\n\ufb01rst-order logic formula over real numbers:\ni \u2265 \u03b5\nx2\n\nV (x) \u2264 0 \u2228 \u2207fu V (x) \u2265 0\n\n(cid:18) n(cid:88)\n\n\u03a6\u03b5(x) :=\n\n(cid:19)\n\n(cid:19)\n\n(cid:18)\n\n\u2227\n\ni=1\n\n\u221a\n\n\u03b5 (cid:28) min(1,||D||2).\n\nwhere x is bounded in the state space D of the system. The numerical error parameter \u03b5 is explicitly\nintroduced for controlling numerical sensitivity near the origin. Here \u03b5 is orders of magnitude smaller\nthan the range of the state variables, i.e.,\nRemark 1. The numerical error parameter \u03b5 allows us to avoid pathological problems in numerical\nalgorithms such as arithmetic under\ufb02ow. Values inside this tiny ball correspond to disturbances that\nare physically insigni\ufb01cant. This parameter is important for eliminating from our framework the\nnumerical sensitivity issues commonly observed in SOS/SDP methods. Also note the \u03b5-ball does not\naffect properties of the Lyapunov level sets and regions of attraction outside it (i.e., D \\ B\u03b5).\nThe falsi\ufb01er computes solutions of the falsi\ufb01cation constraint \u03a6\u03b5(x). Solving the constraints requires\nglobal minimization of a highly nonconvex functions (involving Lie derivatives of the neural network\nLyapunov function), and it is a computationally expensive task (NP-hard). We rely on recent progress\nin nonlinear constraint solving in SMT solvers such as dReal [11], which has been used for similar\ncontrol design problems [16] that do not involve neural networks.\nExample 1. Consider a candidate Lyapunov function V (x) = tanh(a1x1 + a2x2 + b) and dynamics\n\u02d9x1 = \u2212x2\n\n2 and \u02d9x2 = sin(x1). The falsi\ufb01cation constraint is of the following form\n\n\u03a6\u03b5(x) := (x2\n\n1 + x2\n\n2) \u2265 \u03b5\u2227 (tanh(a1x1 + a2x2 + b) \u2264 0\u2228 a1(1\u2212 tanh2(a1x1 + a2x2 + b))(\u2212x2\n2)\n+ a2(1 \u2212 tanh2(a1x1 + a2x2 + b)) sin(x1) \u2265 0))\n\nwhich is a nonlinear non-polynomial disjunctive constraint system. The actual examples used in our\nexperiments all use larger two-layer tanh networks and much more complex dynamics.\n\nTo completely certify the Lyapunov conditions, the constraint solving step for \u03a6\u03b5(x) can never fail to\nreport solutions if there is any. This requirement is rigorously proved for algorithms in SMT solvers\nsuch as dReal [12], as a property called delta-completeness [11].\nDe\ufb01nition 6 (Delta-Complete Algorithms). Let C be a class of quanti\ufb01er-free \ufb01rst-order constraints.\nLet \u03b4 \u2208 Q+ be a \ufb01xed constant. We say an algorithm A is \u03b4-complete for C, if for any \u03d5(x) \u2208 C, A\nalways returns one of the following answers correctly: \u03d5 does not have a solution (unsatis\ufb01able), or\nthere is a solution x = a that satis\ufb01es \u03d5\u03b4(a). Here, \u03d5\u03b4 is de\ufb01ned as a small syntactic variation of the\noriginal constraint (precise de\ufb01nitions are in [11]).\n\nIn other words, if a delta-complete algorithm concludes that a formula \u03a6\u03b5(x) is unsatis\ufb01able, then it is\nguaranteed to not have any solution. In our context, this is exactly what we need for ensuring that the\nLyapunov condition holds over all state vectors. If \u03a6\u03b5(x) is determined to be \u03b4-satis\ufb01able, we obtain\ncounterexamples that are added to the training set for the learner. Note that the counterexamples are\nsimply state vectors without labels, and their Lyapunov risk will be determined by the learner, not\nthe falsi\ufb01er. Thus, although it is possible to have spurious counterexamples due to the \u03b4 error, they\nare used as extra samples and do not harm correctness of the end result. In all, when delta-complete\nalgorithms in dReal return that the falsi\ufb01cation constraints are unsatis\ufb01able, we conclude that the\nLyapunov conditions are satis\ufb01ed by the candidate Lyapunov and control functions. Figure 1(c)\nshows a sequence of counterexamples found by the falsi\ufb01er to improve the learned results.\nRemark 2. When solving \u03a6\u03b5(x) with \u03b4-complete constraint solving algorithms, we use \u03b4 (cid:28) \u03b5 to\nreduce the number of spurious counterexamples. Following delta-completeness, the choice of \u03b4 does\nnot affect the guarantee for the validation of the Lyapunov conditions.\n\n3.3 Tuning Region of Attraction\n\nAn important feature of the proposed learning framework is that we can adjust the cost functions\nto learn control and Lyapunov functions favoring various additional properties. In fact, the most\npractically important performance metric for a stabilizing controller is its region of attraction (ROA).\n\n5\n\n\fAn ROA de\ufb01nes a forward invariant set that is guaranteed to contain all possible trajectories of\nthe system, and thus can conclusively prove safety properties. Note that the Lyapunov conditions\nthemselves do not directly ensure safety, because the system can deviate arbitrarily far before coming\nback to the stable equilibrium. Formally, the ROA of an asymptotically stable system is de\ufb01ned as:\nDe\ufb01nition 7 (Region of Attraction). Let fu de\ufb01ne a system asymptotically stable at the origin with\nLyapunov function V for domain D. A region of attraction R is a subset of D that contains the origin\nand guarantees that the system never leaves R. Any level set of V completely contained in D de\ufb01nes\nan ROA. That is, for \u03b2 > 0, if R\u03b2 = {V (x) \u2264 \u03b2} \u2286 D, then R\u03b2 is an ROA for the system.\n(cid:80)N\nTo maximize the ROA produced by a pair of Lyapunov function and control function, we add a cost\nterm to the Lyapunov risk that regulates how quickly the Lyapunov function value increases with\ni=1 (cid:107)xi(cid:107)2 \u2212 \u03b1V\u03b8(xi) following\nrespect to the radius of the level sets, by using LN,p(\u03b8, u) + 1\nN\nDe\ufb01nition 4. Here \u03b1 is tunable parameter. We observe that the regulator can have major effect on\nthe performance of the learned control functions. Figure 2 illustrates such an example, showing how\ndifferent control functions are obtained by regulating the Lyapunov risk to achieve larger ROA.\n\nFigure 2: (a) Lyapunov function found by the initial LQR controller. (b) Lyapunov function found by\nlearning without tuning the ROA. (c) Lyapunov function found by learning after adding the ROA\ntuning term. (d) Comparison of ROA for the different Lyapunov functions.\n\nSet learning rate (0.01), input dimension (# of state variables), output dimension (1)\nInitialize feedback controller u to LQR solution qlqr\nRepeat:\n\n(cid:46) Forward pass of neural network\n\n(cid:46) Update weights using SGD\n\nV\u03b8 (x) , u (x) \u2190 NN\u03b8,u (x)\n\n\u2207fuV\u03b8 (x) \u2190(cid:80)Din\n\ni=1\n\n\u2202V\n\u2202xi\n\n[fu]i(x)\nCompute Lyapunov risk L (\u03b8, u)\n\u03b8 \u2190 \u03b8 + \u03b1\u2207\u03b8L (\u03b8, u)\nu \u2190 u + \u03b1\u2207uL (\u03b8, u)\n\nAlgorithm 1 Neural Lyapunov Control\n1: function LEARNING(X, f, qlqr)\n2:\n3:\n4:\n5:\n6:\n7:\n8:\n9:\n10:\n11:\n12: end function\n13: function FALSIFICATION(f, u, V\u03b8, \u03b5, \u03b4)\n14:\n15:\n16:\n17: end function\n18: function MAIN( )\n19:\n\nUntil convergence\nreturn V\u03b8, u\n\nEncode conditions in De\ufb01nition 5\nUsing SMT solver with \u03b4 to verify the conditions\nreturn satis\ufb01ability\n\n20:\n21:\n22:\n23:\nend while\n24:\n25: end function\n\nAdd counterexamples to X\nV\u03b8, u \u2190 LEARNING(X, f, qlqr)\nCE\u2190 FALSIFICATION(f, u, V\u03b8, \u03b5, \u03b4)\n\n6\n\nInput: dynamical system (f ), parameters of LQR (qlqr), radius (\u03b5), precision (\u03b4) and an\ninitial set of randomly sampled states in D\nwhile Satis\ufb01able do\n\n\f4 Experiments\n\nWe demonstrate that the proposed methods \ufb01nd provably stable control and Lyapunov functions on\nvarious nonlinear robot control problems. In all the examples, we use a learning rate of 0.01 for the\nlearner, an \u03b5 value of 0.25 and \u03b4 value of 0.01 for the falsi\ufb01er, and re-verify the result with smaller \u03b5\nin Table 1. We emphasize that the choices of these parameters do not affect the stability guarantees\non the \ufb01nal design of the control and Lyapunov functions. We show that the region of attraction is\nenlarged by 300% to 600% compared to LQR results in these examples. Full details of the results and\nsystem dynamics are provided in the Appendix. Note that for the Caltech ducted fan and humanoid\nbalancing examples, we numerically relaxed the conditions slightly when the learning has converged,\nso that the SMT solver dReal does not run into numerical issues. More details on the effect of such\nrelaxation can be found in the paper website [8].\n\nBenchmarks\n\nInverted Pendulum\n\nPath Following\n\nCaltech Ducted Fan\nHumanoid Balancing\n\nLearning time\n\n25.5\n36.3\n\n1455.16\n\n6000\n\nfalsi\ufb01cation time\n\n0.6\n0.2\n50.84\n458.27\n\n# samples\n\n500\n500\n1000\n1000\n\n# iterations\n\n430\n610\n3000\n4000\n\n\u03b5\n\n0.04\n0.01\n0.01\n0.01\n\nTable 1: Runtime statistics of the full procedures on four nonlinear control examples.\n\nInverted pendulum. The inverted pendulum is a standard nonlinear control problem for testing\ndifferent control methods. This system has two state variables, the angular position \u03b8, angular velocity\n\u02d9\u03b8 and one control input u. Our learning procedure \ufb01nds a neural Lyapunov function that is proved\nto be valid within the domain (cid:107)x(cid:107)2 \u2264 6. In contrast, the ROA found by SOS/SDP techniques is an\nellipse with large diameter of 1.75 and short diameter of 1.2. Using LQR control on the linearized\ndynamics, we obtain an ellipse with large diameter of 6 and short diameter of 0.1. We observe that\namong all the examples in our experiments, this is the only one where the SOS Lyapunov function\nhas passed the complete check by the constraint solver, so that we can compare to it. The Lyapunov\nfunction obtained by LQR gives a larger ROA if we ignore the linearization error. The different\nregions of attractions are shown in Figure 3. These values are consistent with the approximate\nmaximum region of attraction reported in [24]. In particular, Figure 3 (c) shows that the SOS function\ndoes not de\ufb01ne a big enough ROA, as many trajectories escape its region.\n\nFigure 3: Results of Lyapunov functions for inverted pendulum.\n(a) Lie derivative of learned\nLyapunov function over valid region. Its value is negative over the valid region, satisfying the\nLyapunov conditions. (b) ROA estimated by different Lyapunov functions. Our method enlarges\nthe ROA from LQR three times. (c) Validation of ROAs. Stars represent initial states. It shows\ntrajectories start near border of the ROA de\ufb01ned by the learned neural Lyapunov function are safely\nbounded within the green region. On the contrary, many trajectories (red) starting inside the SOS\nregion can escape, and thus the region fails to satisfy the ROA properties.\n\nCaltech ducted fan in hover mode. The system describes the motion of a landing aircraft in hover\nmode with two forces u1 and u2. The state variables x, y, \u03b8 denote the position and orientation of\nthe centre of the fan. There are six state variables [x, y, \u03b8, \u02d9x, \u02d9y, \u02d9\u03b8]. The dynamics, neural Lyapunov\n\n7\n\n\ffunction with two layers of tanh activation functions, and the control policy are given in the Appendix.\nIn Figure 4(a), we show that the ROA is signi\ufb01cantly larger than what can be obtained from LQR.\n\nFigure 4: (a) Comparison of ROAs for Caltech ducted fan. (b) Comparison of ROAs for path\nfollowing. (c) Schematic diagram of wheeled vehicle to show the nonlinear dynamics.\n\nWheeled vehicle path following. We consider the path tracking control using kinematic bicycle\nmodel (see Figure 4(c)). We take the angle error \u03b8e and the distance error de as state variables.\nAssume a target path is a unit circle, then we obtain the Lyapunov function within (cid:107)x(cid:107)2 \u2264 0.8.\nHumanoid balancing. The task of balancing a humanoid robot can be modelled as maintaining\nan n-link pendulum a vertical posture. The n-link pendulum system has n control inputs and 2n\nstate variables [\u03b81, \u03b82, . . . , \u03b8n, \u02d9\u03b81, \u02d9\u03b82, . . . , \u02d9\u03b8n], representing the n link angles and n angle velocities.\nEach link has mass mi and length (cid:96)i, and the moments of inertia Ii are computed from the link\npivots, where i = 1, 2, . . . , n. We \ufb01nd a neural Lyapunov function for the 3-link pendulum system\nwithin (cid:107)x(cid:107)2 \u2264 0.5. In Figure 5, we show the shape of the neural Lyapunov functions on two of the\ndimensions, and the ROA that the control design achieves. We also provide a video of the control on\nthe 3-link model.\n\nFigure 5: Results of humanoid balance. (a) Schematic diagram. (b) Learned Lyapunov function. (c)\nLie derivative of Lyapunov function. (d) Comparison of the region of attraction.\n\n5 Conclusion\n\nWe proposed new methods to learn control policies and neural network Lyapunov functions for highly\nnonlinear systems with provable guarantee of stability. The approach signi\ufb01cantly simpli\ufb01es the\nprocess of nonlinear control design, provides end-to-end provable correctness guarantee, and can\nobtain much larger regions of attraction compared to existing control methods. We show experiments\non challenging nonlinear problems central to various robotics problems. The proposed methods\ndemonstrate clear advantage over existing methods. We envision that neural networks and deep\nlearning will provide immediate solutions to many core problems in robot control design.\n\n8\n\n\fReferences\n[1] Amir A. Ahmadi and Rapha\u00ebl M. Jungers. Lower bounds on complexity of lyapunov functions\n\nfor switched linear systems. CoRR, abs/1504.03761, 2015.\n\n[2] Amir A. Ahmadi, M. Krstic, and P. A. Parrilo. a globally asymptotically stable polynomial\nvector \ufb01eld with no polynomial lyapunov function. In 2011 50th IEEE Conference on Decision\nand Control and European Control Conference.\n\n[3] Amir A. Ahmadi and Pablo A. Parrilo. Stability of polynomial differential equations: Complex-\n\nity and converse lyapunov questions. CoRR, abs/1308.6833, 2013.\n\n[4] Amir Ali Ahmadi. On the dif\ufb01culty of deciding asymptotic stability of cubic homogeneous\nvector \ufb01elds. In American Control Conference, ACC 2012, Montreal, QC, Canada, June 27-29,\n2012, pages 3334\u20133339, 2012.\n\n[5] Dario Amodei, Chris Olah, Jacob Steinhardt, Paul F. Christiano, John Schulman, and Dan Man\u00e9.\n\nConcrete problems in AI safety. CoRR, abs/1606.06565, 2016.\n\n[6] F. Berkenkamp, R. Moriconi, A. P. Schoellig, and A. Krause. Safe learning of regions of attrac-\ntion for uncertain, nonlinear systems with gaussian processes. In 2016 IEEE 55th Conference\non Decision and Control (CDC), pages 4661\u20134666, Dec 2016.\n\n[7] Felix Berkenkamp, Matteo Turchetta, Angela Schoellig, and Andreas Krause. Safe model-based\nreinforcement learning with stability guarantees. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wal-\nlach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information\nProcessing Systems 30, pages 908\u2013918. Curran Associates, Inc., 2017.\n\n[8] Ya-Chien Chang, Nima Roohi, and Sicun Gao. Neural Lyapunov control (project website),\n\nhttps://yachienchang.github.io/NeurIPS2019.\n\n[9] G. Chesi and D. Henrion. Guest editorial: Special issue on positive polynomials in control.\n\nIEEE Transactions on Automatic Control, 54(5):935\u2013936, May 2009.\n\n[10] Yinlam Chow, O\ufb01r Nachum, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh. A\nIn S. Bengio, H. Wallach,\nlyapunov-based approach to safe reinforcement learning.\nH. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural\nInformation Processing Systems 31, pages 8092\u20138101. Curran Associates, Inc., 2018.\n\n[11] Sicun Gao, Jeremy Avigad, and Edmund M. Clarke. Delta-Complete decision procedures\nfor satis\ufb01ability over the reals. In Automated Reasoning - 6th International Joint Conference,\nIJCAR 2012, Manchester, UK, June 26-29, 2012. Proceedings, pages 286\u2013300, 2012.\n\n[12] Sicun Gao, Soonho Kong, and Edmund M. Clarke. dReal: An SMT solver for nonlinear\ntheories over the reals. In Automated Deduction - CADE-24 - 24th International Conference on\nAutomated Deduction, Lake Placid, NY, USA, June 9-14, 2013. Proceedings, pages 208\u2013214,\n2013.\n\n[13] Wassim Haddad and Vijaysekhar Chellaboina. Nonlinear dynamical systems and control: A\nlyapunov-based approach. Nonlinear Dynamical Systems and Control: A Lyapunov-Based\nApproach, 01 2008.\n\n[14] D. Henrion and A. Garulli. Positive Polynomials in Control, volume 312 of Lecture Notes in\n\nControl and Information Sciences. Springer Berlin Heidelberg, 2005.\n\n[15] Z. Jarvis-Wloszek, R. Feeley, Weehong Tan, Kunpeng Sun, and A. Packard. Some controls\nIn 42nd IEEE International Conference on\napplications of sum of squares programming.\nDecision and Control (IEEE Cat. No.03CH37475), volume 5, pages 4676\u20134681 Vol.5, Dec\n2003.\n\n[16] James Kapinski, Jyotirmoy V. Deshmukh, Sriram Sankaranarayanan, and Nikos Arechiga.\nSimulation-guided lyapunov analysis for hybrid dynamical systems. In Proceedings of the 17th\nInternational Conference on Hybrid Systems: Computation and Control, HSCC \u201914, pages\n133\u2013142. ACM, 2014.\n\n9\n\n\f[17] Huibert Kwakernaak. Linear Optimal Control Systems. John Wiley & Sons, Inc., New York,\n\nNY, USA, 1972.\n\n[18] Johan L\u00f6fberg. Pre- and post-processing sum-of-squares programs in practice. IEEE Transac-\n\ntions on Automatic Control, 54(5):1007\u20131011, 2009.\n\n[19] Anirudha Majumdar and Russ Tedrake. Funnel libraries for real-time robust feedback motion\n\nplanning. The International Journal of Robotics Research, 36(8):947\u2013982, 2017.\n\n[20] Horia Mania, Aurelia Guy, and Benjamin Recht. Simple random search of static linear policies\nis competitive for reinforcement learning. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman,\nN. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems\n31, pages 1805\u20131814. Curran Associates, Inc., 2018.\n\n[21] Pablo A. Parrilo. Structured semide\ufb01nite programs and semialgebraic geometry methods in\n\nrobustness and optimization. PhD thesis, California Institute of Technology, 2000.\n\n[22] C.E. Rasmussen and C.K.I. Williams. Gaussian Processes for Machine Learning. Adaptative\n\ncomputation and machine learning series. University Press Group Limited, 2006.\n\n[23] Hadi Ravanbakhsh and Sriram Sankaranarayanan. Learning control lyapunov functions from\n\ncounterexamples and demonstrations. Autonomous Robots, 43(2):275\u2013307, 2019.\n\n[24] Spencer M. Richards, Felix Berkenkamp, and Andreas Krause. The lyapunov neural network:\nAdaptive stability certi\ufb01cation for safe learning of dynamical systems. In Proceedings of The\n2nd Conference on Robot Learning, volume 87 of Proceedings of Machine Learning Research,\npages 466\u2013476, 29\u201331 Oct 2018.\n\n[25] Russ Tedrake. Underactuated Robotics: Algorithms for Walking, Running, Swimming, Flying,\n\nand Manipulation (Course Notes for MIT 6.832). 2019.\n\n10\n\n\f", "award": [], "sourceid": 1824, "authors": [{"given_name": "Ya-Chien", "family_name": "Chang", "institution": "University of California, San Diego"}, {"given_name": "Nima", "family_name": "Roohi", "institution": "University of California San Diego"}, {"given_name": "Sicun", "family_name": "Gao", "institution": "University of California, San Diego"}]}