{"title": "A Constraint Generation Approach to Learning Stable Linear Dynamical Systems", "book": "Advances in Neural Information Processing Systems", "page_first": 1329, "page_last": 1336, "abstract": null, "full_text": "A Constraint Generation Approach to\n\nLearning Stable Linear Dynamical Systems\n\nSajid M. Siddiqi\nRobotics Institute\n\nCarnegie-Mellon University\n\nPittsburgh, PA 15213\n\nsiddiqi@cs.cmu.edu\n\nByron Boots\n\nGeoffrey J. Gordon\n\nComputer Science Department\nCarnegie-Mellon University\n\nMachine Learning Department\nCarnegie-Mellon University\n\nPittsburgh, PA 15213\nbeb@cs.cmu.edu\n\nPittsburgh, PA 15213\n\nggordon@cs.cmu.edu\n\nAbstract\n\nStability is a desirable characteristic for linear dynamical systems, but it is often\nignored by algorithms that learn these systems from data. We propose a novel\nmethod for learning stable linear dynamical systems: we formulate an approxima-\ntion of the problem as a convex program, start with a solution to a relaxed version\nof the program, and incrementally add constraints to improve stability. Rather\nthan continuing to generate constraints until we reach a feasible solution, we test\nstability at each step; because the convex program is only an approximation of the\ndesired problem, this early stopping rule can yield a higher-quality solution. We\napply our algorithm to the task of learning dynamic textures from image sequences\nas well as to modeling biosurveillance drug-sales data. The constraint generation\napproach leads to noticeable improvement in the quality of simulated sequences.\nWe compare our method to those of Lacy and Bernstein [1, 2], with positive results\nin terms of accuracy, quality of simulated sequences, and ef\ufb01ciency.\n\n1 Introduction\nMany problems in machine learning involve sequences of real-valued multivariate observations.\nTo model the statistical properties of such data, it is often sensible to assume each observation to be\ncorrelated to the value of an underlying latent variable, or state, that is evolving over the course of the\nsequence. In the case where the state is real-valued and the noise terms are assumed to be Gaussian,\nthe resulting model is called a linear dynamical system (LDS), also known as a Kalman Filter [3].\nLDSs are an important tool for modeling time series in engineering, controls and economics as well\nas the physical and social sciences.\nLet {\u03bbi(M)}n\ni=1 denote the eigenvalues of an n \u00d7 n matrix M in decreasing order of mag-\nnitude, {\u03bdi(M)}n\ni=1 the corresponding unit-length eigenvectors, and de\ufb01ne its spectral radius\n\u03c1(M) \u2261 |\u03bb1(M)|. An LDS with dynamics matrix A is stable if all of A\u2019s eigenvalues have mag-\nnitude at most 1, i.e., \u03c1(A) \u2264 1. Standard algorithms for learning LDS parameters do not enforce\nthis stability criterion, learning locally optimal values for LDS parameters by gradient descent [4],\nExpectation Maximization (EM) [5] or least squares on a state sequence estimate obtained by sub-\nspace identi\ufb01cation methods, as described in Section 3.1. However, when learning from \ufb01nite data\nsamples, the least squares solution may be unstable even if the system is stable [6]. The drawback\nof ignoring stability is most apparent when simulating long sequences from the system in order to\ngenerate representative data or infer stretches of missing values.\n\nWe propose a convex optimization algorithm for learning the dynamics matrix while guaranteeing\nstability. An estimate of the underlying state sequence is \ufb01rst obtained using subspace identi\ufb01ca-\ntion. We then formulate the least-squares problem for the dynamics matrix as a quadratic program\n(QP) [7], initially without constraints. When this QP is solved, the estimate \u02c6A obtained may be\nunstable. However, any unstable solution allows us to derive a linear constraint which we then add\n\n\fto our original QP and re-solve. The above two steps are iterated until we reach a stable solution,\nwhich is then re\ufb01ned by a simple interpolation to obtain the best possible stable estimate.\n\nOur method can be viewed as constraint generation for an underlying convex program with a feasi-\nble set of all matrices with singular values at most 1, similar to work in control systems [1]. However,\nwe terminate before reaching feasibility in the convex program, by checking for matrix stability after\neach new constraint. This makes our algorithm less conservative than previous methods for enforc-\ning stability since it chooses the best of a larger set of stable dynamics matrices. The difference in\nthe resulting stable systems is noticeable when simulating data. The constraint generation approach\nalso achieves much greater ef\ufb01ciency than previous methods in our experiments.\n\nOne application of LDSs in computer vision is learning dynamic textures from video data [8]. An\nadvantage of learning dynamic textures is the ability to play back a realistic-looking generated se-\nquence of any desired duration.\nIn practice, however, videos synthesized from dynamic texture\nmodels can quickly degenerate because of instability in the underlying LDS. In contrast, sequences\ngenerated from dynamic textures learned by our method remain \u201csane\u201d even after arbitrarily long\ndurations. We also apply our algorithm to learning baseline dynamic models of over-the-counter\n(OTC) drug sales for biosurveillance, and sunspot numbers from the UCR archive [9]. Comparison\nto the best alternative methods [1, 2] on these problems yields positive results.\n\n2 Related Work\nLinear system identi\ufb01cation is a well-studied subject [4]. Within this area, subspace identi\ufb01cation\nmethods [10] have been very successful. These techniques \ufb01rst estimate the model dimensionality\nand the underlying state sequence, and then derive parameter estimates using least squares. Within\nsubspace methods, techniques have been developed to enforce stability by augmenting the extended\nobservability matrix with zeros [6] or adding a regularization term to the least squares objective [11].\n\nAll previous methods were outperformed by Lacy and Bernstein [1], henceforth referred to as LB-1.\nThey formulate the problem as a semide\ufb01nite program (SDP) whose objective minimizes the state\nsequence reconstruction error, and whose constraint bounds the largest singular value by 1. This\nconvex constraint is obtained by rewriting the nonlinear matrix inequality In \u2212 AAT (cid:186) 0 as a linear\nmatrix inequality [12], where In is the n \u00d7 n identity matrix. Here, (cid:194) 0 ((cid:186) 0) denotes positive\n(semi-) de\ufb01niteness. The existence of this constraint also proves the convexity of the \u03c31 \u2264 1 region.\nA follow-up to this work by the same authors [2], which we will call LB-2, attempts to overcome the\nconservativeness of LB-1 by approximating the Lyapunov inequalities P \u2212 AP AT (cid:194) 0, P (cid:194) 0 with\nthe inequalities P \u2212 AP AT \u2212 \u03b4In (cid:186) 0, P \u2212 \u03b4In (cid:186) 0, \u03b4 > 0. These inequalities hold iff the spectral\nradius is less than 1. However, the approximation is achieved only at the cost of inducing a nonlinear\ndistortion of the objective function by a problem-dependent reweighting matrix involving P , which\nis a variable to be optimized. In our experiments, this causes LB-2 to perform worse than LB-1 (for\nany \u03b4) in terms of the state sequence reconstruction error, even while obtaining solutions outside\nthe feasible region of LB-1. Consequently, we focus on LB-1 in our conceptual and qualitative\ncomparisons as it is the strongest baseline available. However, LB-2 is more scalable than LB-1, so\nquantitative results are presented for both.\n\nTo summarize the distinction between constraint generation, LB-1 and LB-2: it is hard to have both\nthe right objective function (reconstruction error) and the right feasible region (the set of stable\nmatrices). LB-1 optimizes the right objective but over the wrong feasible region (the set of matrices\nwith \u03c31 \u2264 1). LB-2 has a feasible region close to the right one, but at the cost of distorting its\nobjective function to an extent that it fares worse than LB-1 in nearly all cases. In contrast, our\nmethod optimizes the right objective over a less conservative feasible region than that of any previous\nalgorithm with the right objective, and this combination is shown to work the best in practice.\n\n3 Linear Dynamical Systems\nThe evolution of a linear dynamical system can be described by the following two equations:\n\n(1)\nTime is indexed by the discrete variable t. Here xt denotes the hidden states in Rn, yt the observa-\ntions in Rm, and wt and vt are zero-mean normally distributed state and observation noise variables.\n\nxt+1 = Axt + wt\nyt = Cxt + vt\n\n\fFigure 1: A. Sunspot data, sampled monthly for 200 years. Each curve is a month, the x-axis is over\nyears. B. First two principal components of a 1-observation Hankel matrix. C. First two principal\ncomponents of a 12-observation Hankel matrix, which better re\ufb02ect temporal patterns in the data.\n\nAssume some initial state x0. The parameters of the system are the dynamics matrix A \u2208 Rn\u00d7n, the\nobservation model C \u2208 Rm\u00d7n, and the noise covariance matrices Q and R. Note that we are learn-\ning uncontrolled linear dynamical systems, though, as in previous work, control inputs can easily be\nincorporated into the objective function and convex program.\n\nLinear dynamical systems can also be viewed as probabilistic graphical models. The standard LDS\n\ufb01ltering and smoothing inference algorithms [3, 13] are instantiations of the junction tree algorithm\nfor Bayesian Networks (see, for example, [14]).\n\nWe follow the subspace identi\ufb01cation literature in estimating all parameters other than the dynamics\nmatrix. A clear and concise exposition of the required techniques is presented in Soatto et al. [8],\nwhich we summarize below. We use subspace identi\ufb01cation methods in our experiments for unifor-\nmity with previous work we are building on (in the control systems literature) and with work we are\ncomparing to ([8] on the dynamic textures data).\n\n3.1 Learning Model Parameters by Subspace Methods\nSubspace methods calculate LDS parameters by \ufb01rst decomposing a matrix of observations to yield\nan estimate of the underlying state sequence. The most straightforward such technique is used here,\nwhich relies on the singular value decomposition (SVD) [15]. See [10] for variations.\nLet Y1:\u03c4 = [y1 y2 . . . y\u03c4 ] \u2208 Rm\u00d7\u03c4 and X1:\u03c4 = [x1 x2 . . . x\u03c4 ] \u2208 Rn\u00d7\u03c4 . D denotes the matrix of\nobservations which is the input to SVD. One typical choice for D is D = Y1:\u03c4 ; we will discuss others\nbelow. SVD yields D \u2248 U\u03a3V T where U \u2208 Rm\u00d7n and V \u2208 R\u03c4\u00d7n have orthonormal columns {ui}\nand {vi}, and \u03a3 = diag{\u03c31, . . . , \u03c3n} contains the singular values. The model dimension n is\ndetermined by keeping all singular values of D above a threshold. We obtain estimates of C and X:\n\n\u02c6C = U\n\n(2)\nSee [8] for an explanation of why these estimates satisfy certain canonical model assumptions. \u02c6X\nis referred to as the extended observability matrix in the control systems literature; the tth column\nof \u02c6X represents an estimate of the state of our LDS at time t. The least squares estimate of A is:\n\n\u02c6X = \u03a3V T\n\n\u02c6A = arg min\n\nA\n\nJ 2(A) = arg min\nA\n\n(3)\nwhere (cid:107) \u00b7 (cid:107)F denotes the Frobenius norm and \u2020 denotes the Moore-Penrose inverse. Eq. (3) asks \u02c6A\nto minimize the error in predicting the state at time t + 1 from the state at time t. Given the above\nestimates \u02c6A and \u02c6C, the covariance matrices \u02c6Q and \u02c6R can be estimated directly from residuals.\n3.2 Designing the Observation Matrix\nIn the decomposition above, we chose each column of D to be the observation vector for a single\ntime step. Suppose that instead we set D to be a matrix of the form\n\nF = X1:\u03c4 X\n\n\u2020\n0:\u03c4\u22121\n\n(cid:176)(cid:176)AX0:\u03c4\u22121 \u2212 X1:\u03c4\n\n(cid:176)(cid:176)2\n\n\uf8ee\uf8ef\uf8f0 y1\n\n...\nyd\n\nD =\n\ny2\n...\nyd+1\n\ny3\n...\nyd+2\n\n\u00b7\u00b7\u00b7\n...\n\u00b7\u00b7\u00b7\n\ny\u03c4\n...\n\nyd+\u03c4\u22121\n\n\uf8f9\uf8fa\uf8fb\n\nmd\u00d7\u03c4\n\nA matrix of this form, with each block of rows equal to the previous block but shifted by a con-\nstant number of columns, is called a block Hankel matrix [4]. We say \u201cd-observation Hankel matrix\nof size \u03c4 \u201d to mean the data matrix D \u2208 Rmd\u00d7\u03c4 with d length-m observation vectors per column.\nStacking observations causes each state to incorporate more information about the future, since \u02c6xt\n\n1000200Sunspot numbersA.B.C.0300\fFigure 2: (A): Conceptual depiction of the space of n \u00d7 n matrices. The region of stability (S\u03bb) is\nnon-convex while the smaller region of matrices with \u03c31 \u2264 1 (S\u03c3) is convex. The elliptical contours\nindicate level sets of the quadratic objective function of the QP. \u02c6A is the unconstrained least-squares\nsolution to this objective. ALB-1 is the solution found by LB-1 [1]. One iteration of constraint\ngeneration yields the constraint indicated by the line labeled \u2018generated constraint\u2019, and (in this\ncase) leads to a stable solution A\u2217. The \ufb01nal step of our algorithm improves on this solution by\ninterpolating A\u2217 with the previous solution (in this case, \u02c6A) to obtain A\u2217\nf inal. (B): The actual stable\nand unstable regions for the space of 2\u00d72 matrices E\u03b1,\u03b2 = [ 0.3 \u03b1 ; \u03b2 0.3 ], with \u03b1, \u03b2 \u2208 [\u221210, 10].\nConstraint generation is able to learn a nearly optimal model from a noisy state sequence of length\n7 simulated from E0,10, with better state reconstruction error than either LB-1 or LB-2.\n\nnow represents coef\ufb01cients reconstructing yt as well as other observations in the future. However\nthe observation model estimate must now be \u02c6C = U( : , 1 : m), i.e., the submatrix consisting of the\n\ufb01rst m columns of U, because U( : , 1: m)\u02c6xt = \u02c6yt for any t, where \u02c6yt denotes a reconstructed obser-\nvation. Having multiple observations per column in D is particularly helpful when the underlying\ndynamical system is known to have periodicity. For example, see Figure 1(A). See [12] for details.\n\n4 The Algorithm\nThe estimation procedure in Section 3.1 does not enforce stability in \u02c6A. To account for stability, we\n\ufb01rst formulate the dynamics matrix learning problem as a quadratic program with a feasible set that\nincludes the set of stable dynamics matrices. Then we demonstrate how instability in its solutions\ncan be used to generate constraints that restrict this feasible set appropriately. As a \ufb01nal step, the\nsolution is re\ufb01ned to be as close as possible to the least-squares estimate while remaining stable.\nThe overall algorithm is illustrated in Figure 2(A). We now explain the algorithm in more detail.\n\n4.1 Formulating the Objective\nThe least squares problem in Eq. (3) can be written as follows (see [12] for the derivation):\n\n(cid:176)(cid:176)AX0:\u03c4\u22121 \u2212 X1:\u03c4\n(cid:176)(cid:176)2\n(cid:170)\n(cid:169)\na = vec(A) = [A11 A21 A31 \u00b7\u00b7\u00b7 Ann]T P = In \u2297(cid:161)\n(cid:161)\n\n\u02c6A = arg minA\n= arg mina\n\naTP a \u2212 2 qTa + r\n\nF\n\nwhere a \u2208 Rn2\u00d71, q \u2208 Rn2\u00d71, P \u2208 Rn2\u00d7n2 and r \u2208 R are de\ufb01ned as:\n\n(cid:162)\n\n0:\u03c4\u22121\n\nX0:\u03c4\u22121X T\n1:\u03c4 X1:\u03c4\n\nX T\n\n(4)\n\n(cid:162)\n\n(5)\nIn is the n \u00d7 n identity matrix and \u2297 denotes the Kronecker product. Note that P is a symmetric\nnonnegative-de\ufb01nite matrix. The objective function in (4) is a quadratic function of a.\n\nq = vec(X0:\u03c4\u22121X T\n\nr = tr\n\n1:\u03c4 )\n\n4.2 Generating Constraints\nThe quadratic objective function above is equivalent to the least squares problem of Eq. (3). Its\nfeasible set is the space of all n \u00d7 n matrices, regardless of their stability. When its solution yields\nan unstable matrix, the spectral radius of \u02c6A (i.e. |\u03bb1( \u02c6A)|) is greater than 1. Ideally we would like to\nuse \u02c6A to calculate a convex constraint on the spectral radius. However, consider the class of 2 \u00d7 2\nmatrices [16]: E\u03b1,\u03b2 = [ 0.3 \u03b1 ; \u03b2 0.3 ]. The matrices E10,0 and E0,10 are stable with \u03bb1 = 0.3, but\n\nAfinalLB-1AgeneratedconstraintAS A^S unstablematricesstablematricesRn2**\u03bb\u03c3\u03b1\u2212100 10 100 10 unstablematrices(stable matrices)-S S \u03bb\u03c3\u03b2A.B.\ftheir convex combination \u03b3E10,0 + (1 \u2212 \u03b3)E0,10 is unstable for (e.g.) \u03b3 = 0.5 (Figure 2(B)). This\nshows that the set of stable matrices is non-convex for n = 2, and in fact this is true for all n > 1.\nWe turn instead to the largest singular value, which is a closely related quantity since\n\n\u03c3min( \u02c6A) \u2264 |\u03bbi( \u02c6A)| \u2264 \u03c3max( \u02c6A)\n\n\u2200i = 1, . . . , n\n\n[15]\n\nTherefore every unstable matrix has a singular value greater than one, but the converse is not neces-\nsarily true. Moreover, the set of matrices with \u03c31 \u2264 1 is convex. Figure 2(A) conceptually depicts\nthe non-convex region of stability S\u03bb and the convex region S\u03c3 with \u03c31 \u2264 1 in the space of all\nn \u00d7 n matrices for some \ufb01xed n. The difference between S\u03c3 and S\u03bb can be signi\ufb01cant. Figure 2(B)\ndepicts these regions for E\u03b1,\u03b2 with \u03b1, \u03b2 \u2208 [\u221210, 10]. The stable matrices E10,0 and E0,10 reside\nat the edges of the \ufb01gure. While results for this class of matrices vary, the constraint generation\nalgorithm described below is able to learn a nearly optimal model from a noisy state sequence of\n\u03c4 = 7 simulated from E0,10, with better state reconstruction error than LB-1 and LB-2.\nLet \u02c6A = \u02dcU \u02dc\u03a3 \u02dcV T by SVD, where \u02dcU = [\u02dcui]n\n\ni=1 and \u02dcV = [\u02dcvi]n\n\n\u02c6A = \u02dcU \u02dc\u03a3 \u02dcV T \u21d2 \u02dc\u03a3 = \u02dcU T \u02c6A \u02dcV \u21d2 \u02dc\u03c31( \u02c6A) = \u02dcuT\n\nTherefore, instability of \u02c6A implies that:\n\u02c6A\u02dcv1\n\n\u02dc\u03c31 > 1 \u21d2 tr\n\n\u02dcuT\n1\n\n> 1 \u21d2 tr\n\n\u02dcv1\u02dcuT\n1\n\n\u02c6A\n\n(cid:179)\n\n(cid:180)\n\ni=1 and \u02dc\u03a3 = diag{\u02dc\u03c31, . . . , \u02dc\u03c3n}. Then:\n(cid:180)\n(6)\n\n\u02c6A\u02dcv1 = tr(\u02dcuT\n\n\u02c6A\u02dcv1)\n\n1\n\n1\n\n> 1 \u21d2 gT \u02c6a > 1\n\n(7)\n\n(cid:179)\n\nHere g = vec(\u02dcu1\u02dcvT\n1 ). Since Eq. (7) arose from an unstable solution of Eq. (4), g is a hyperplane\nseparating \u02c6a from the space of matrices with \u03c31 \u2264 1. We use the negation of Eq. (7) as a constraint:\n(8)\n\ngT \u02c6a \u2264 1\n\n4.3 Computing the Solution\nThe overall quadratic program can be stated as:\n\naTP a \u2212 2 qTa + r\n\nminimize\nsubject to Ga \u2264 h\n\n(9)\nwith a, P , q and r as de\ufb01ned in Eqs. (5). {G, h} de\ufb01ne the set of constraints, and are initially\nempty. The QP is invoked repeatedly until the stable region, i.e. S\u03bb, is reached. At each iteration,\nwe calculate a linear constraint of the form in Eq. (8), add the corresponding gT as a row in G,\nand augment h with 1. Note that we will almost always stop before reaching the feasible region S\u03c3.\nOnce a stable matrix is obtained, it is possible to re\ufb01ne this solution. We know that the last constraint\ncaused our solution to cross the boundary of S\u03bb, so we interpolate between the last solution and the\nprevious iteration\u2019s solution using binary search to look for a boundary of the stable region, in\norder to obtain a better objective value while remaining stable. An interpolation could be attempted\nbetween the least squares solution and any stable solution. However, the stable region can be highly\ncomplex, and there may be several folds and boundaries of the stable region in the interpolated area.\nIn our experiments (not shown), interpolating from the LB-1 solution yielded worse results.\n5 Experiments\nFor learning the dynamics matrix, we implemented1 least squares, constraint generation (using\nquadprog), LB-1 [1] and LB-2 [2] (using CVX with SeDuMi) in Matlab on a 3.2 GHz Pen-\ntium with 2 GB RAM. Note that these algorithms give a different result from the basic least-squares\nsystem identi\ufb01cation algorithm only in situations where the least-squares model is unstable. How-\never, least-squares LDSs trained in scarce-data scenarios are unstable for almost any domain, and\nsome domains lead to unstable models up to the limit of available data (e.g. the steam dynamic\ntextures in Section 5.1). The goals of our experiments are to: (1) examine the state evolution and\nsimulated observations of models learned using our method, and compare them to previous work;\nand (2) compare the algorithms in terms of reconstruction error and ef\ufb01ciency. The error metric used\nfor the quantitative experiments when evaluating matrix A\u2217 is\nJ 2(A\u2217) \u2212 J 2( \u02c6A)\n\n(10)\ni.e. percent increase in squared reconstruction error compared to least squares, with J(\u00b7) as de\ufb01ned\nin Eq. (4). We apply these algorithms to learning dynamic textures from the vision domain (Sec-\ntion 5.1), as well as OTC drug sales counts and sunspot numbers (Section 5.2).\n\nex(A\u2217) = 100 \u00d7\n\n/J 2( \u02c6A)\n\n(cid:180)\n\n(cid:179)\n\n1Source code is available at http://www.select.cs.cmu.edu/projects/stableLDS\n\n\fFigure 3: Dynamic textures. A. Samples from the original steam sequence and the fountain\nsequence. B. State evolution of synthesized sequences over 1000 frames (steam top, fountain\nbottom). The least squares solutions display instability as time progresses. The solutions obtained\nusing LB-1 remain stable for the full 1000 frame image sequence. The constraint generation solu-\ntions, however, yield state sequences that are stable over the full 1000 frame image sequence without\nsigni\ufb01cant damping. C. Samples drawn from a least squares synthesized sequences (top), and sam-\nples drawn from a constraint generation synthesized sequence (bottom). Images for LB-1 are not\nshown. The constraint generation synthesized steam sequence is qualitatively better looking than\nthe steam sequence generated by LB-1, although there is little qualitative difference between the\ntwo synthesized fountain sequences.\nLB-1\u2217\nLB-1\nsteam (n = 10)\n0.993\n1.000\n103.3\n95.87\nsteam (n = 20)\n\nLB-1\nfountain (n = 10)\n0.987\n1.000\n4.1\n15.43\nfountain (n = 20)\n\n1.000\n1.034\n546.9\n0.50\n\n0.993\n1.000\n103.3\n3.77\n\n0.987\n1.000\n4.1\n1.09\n\n0.999\n1.051\n0.1\n0.15\n\nLB-2\n\nCG\n\nCG\n\n1.000\n1.036\n45.2\n0.45\n\nLB-1\u2217\n\n|\u03bb1|\n\u03c31\nex(%)\ntime\n|\u03bb1|\n\u03c31\nex(%)\ntime\n|\u03bb1|\n\u03c31\nex(%)\ntime\n\n0.999 \u2014\n0.990\n1.037 \u2014\n1.000\n58.4\n\u2014\n154.7\n2.37\n\u2014\n1259.6\nsteam (n = 40)\n\n1.000 \u2014\n1.120 \u2014\n20.24 \u2014\n5.85\n\n0.989\n1.000\n282.7\n\n\u2014 79516.98\n\n0.999\n1.062\n294.8\n33.55\n\n1.000\n1.128\n768.5\n289.79\n\nLB-2\n\n0.997\n1.054\n3.0\n0.49\n\n0.996\n1.056\n22.3\n5.13\n\n0.999 \u2014\n1.054 \u2014\n1.2\n\u2014\n1.63\n\u2014\n\n0.988\n1.000\n5.0\n\n159.85\n\nfountain (n = 40)\n\n1.000 \u2014\n1.034 \u2014\n3.3\n\u2014\n61.9 \u2014 43457.77\n\n0.991\n1.000\n4.8\n\n1.000\n1.172\n21.5\n239.53\n\nTable 1: Quantitative results on the dynamic textures data for different numbers of states n. CG is our\nalgorithm, LB-1and LB-2 are competing algorithms, and LB-1\u2217 is a simulation of LB-1 using our\nalgorithm by generating constraints until we reach S\u03c3, since LB-1 failed for n > 10 due to memory\nlimits. ex is percent difference in squared reconstruction error as de\ufb01ned in Eq. (10). Constraint\ngeneration, in all cases, has lower error and faster runtime.\n\nLeast SquaresLB-1Constraint GenerationA.B.C.\u2212202x 1040500100005001000t =100t =200t =400t =80005001000t =100t =200t =400t =800ttt\u2212101state evolution\f5.1 Stable Dynamic Textures\nDynamic textures in vision can intuitively be described as models for sequences of images that\nexhibit some form of low-dimensional structure and recurrent (though not necessarily repeating)\ncharacteristics, e.g. \ufb01xed-background videos of rising smoke or \ufb02owing water. Treating each frame\nof a video as an observation vector of pixel values yt, we learned dynamic texture models of two\nvideo sequences: the steam sequence, composed of 120 \u00d7 170 pixel images, and the fountain\nsequence, composed of 150 \u00d7 90 pixel images, both of which originated from the MIT temporal\ntexture database (Figure 3(A)). We use parameters \u03c4 = 80, n = 15, and d = 10. Note that the state\nsequence we learn has no a priori interpretation.\n\nAn LDS model of a dynamic texture may synthesize an \u201cin\ufb01nitely\u201d long sequence of images by\ndriving the model with zero mean Gaussian noise. Each of our two models uses an 80 frame training\nsequence to generate 1000 sequential images in this way. To better visualize the difference between\nimage sequences generated by least-squares, LB-1, and constraint generation, the evolution of each\nmethod\u2019s state is plotted over the course of the synthesized sequences (Figure 3(B)). Sequences\ngenerated by the least squares models appear to be unstable, and this was in fact the case; both\nthe steam and the fountain sequences resulted in unstable dynamics matrices. Conversely, the\nconstrained subspace identi\ufb01cation algorithms all produced well-behaved sequences of states and\nstable dynamics matrices (Table 1), although constraint generation demonstrates the fastest runtime,\nbest scalability, and lowest error of any stability-enforcing approach.\n\nA qualitative comparison of images generated by constraint generation and least squares (Fig-\nure 3(C)) indicates the effect of instability in synthesized sequences generated from dynamic texture\nmodels. While the unstable least-squares model demonstrates a dramatic increase in image contrast\nover time, the constraint generation model continues to generate qualitatively reasonable images.\nQualitative comparisons between constraint generation and LB-1 indicate that constraint generation\nlearns models that generate more natural-looking video sequences2 than LB-1.\nTable 1 demonstrates that constraint generation always has the lowest error as well as the fastest\nruntime. The running time of constraint generation depends on the number of constraints needed to\nreach a stable solution. Note that LB-1 is more ef\ufb01cient and scalable when simulated using constraint\ngeneration (by adding constraints until S\u03c3 is reached) than it is in its original SDP formulation.\n5.2 Stable Baseline Models for Biosurveillance\nWe examine daily counts of OTC drug sales in pharmacies, obtained from the National Data Retail\nMonitor (NDRM) collection [17]. The counts are divided into 23 different categories and are tracked\nseparately for each zipcode in the country. We focus on zipcodes from a particular American city.\nThe data exhibits 7-day periodicity due to differential buying patterns during the week. We isolate a\n60-day subsequence where the data dynamics remain relatively stationary, and attempt to learn LDS\nparameters to be able to simulate sequences of baseline values for use in detecting anomalies.\nWe perform two experiments on different aggregations of the OTC data, with parameter values n =\n7, d = 7 and \u03c4 = 14. Figure 4(A) plots 22 different drug categories aggregated over all zipcodes,\nand Figure 4(B) plots a single drug category (cough/cold) in 29 different zipcodes separately. In both\ncases, constraint generation is able to use very little training data to learn a stable model that captures\nthe periodicity in the data, while the least squares model is unstable and its predictions diverge over\ntime. LB-1 learns a model that is stable but overconstrained, and the simulated observations quickly\ndrift from the correct magnitudes. We also tested the algorithms on the sunspots data (Figure 2(C))\nwith parameters n = 7, d = 18 and \u03c4 = 50, with similar results. Quantitative results on both these\ndomains exhibit similar trends as those in Table 1.\n6 Discussion\nWe have introduced a novel method for learning stable linear dynamical systems. Our constraint\ngeneration algorithm is more powerful than previous methods in the sense of optimizing over a\nlarger set of stable matrices with a suitable objective function. The constraint generation approach\nalso has the bene\ufb01t of being faster than previous methods in nearly all of our experiments. One\npossible extension is to modify the EM algorithm for LDSs to incorporate constraint generation\ninto the M-step in order to learn stable systems that locally maximize the observed data likelihood.\nStability could also be of advantage in planning applications.\n\n2See videos at http://www.select.cs.cmu.edu/projects/stableLDS\n\n\fFigure 4: (A): 60 days of data for 22 drug categories aggregated over all zipcodes in the city. (B):\n60 days of data for a single drug category (cough/cold) for all 29 zipcodes in the city. (C): Sunspot\nnumbers for 200 years separately for each of the 12 months. The training data (top), simulated\noutput from constraint generation, output from the unstable least squares model, and output from\nthe over-damped LB-1 model (bottom).\n\nAcknowledgements\n\nThis paper is based on work supported by DARPA under the Computer Science Study Panel program\n(authors GJG and BEB), the NSF under Grant Nos. EEC-0540865 (author BEB) and IIS-0325581\n(author SMS), and the CDC under award 8-R01-HK000020-02, \u201dEf\ufb01cient, scalable multisource\nsurveillance algorithms for Biosense\u201d (author SMS).\nReferences\n[1] Seth L. Lacy and Dennis S. Bernstein. Subspace identi\ufb01cation with guaranteed stability using constrained\n\noptimization. In Proc. American Control Conference, 2002.\n\n[2] Seth L. Lacy and Dennis S. Bernstein. Subspace identi\ufb01cation with guaranteed stability using constrained\n\noptimization. IEEE Transactions on Automatic Control, 48(7):1259\u20131263, July 2003.\n\n[3] R.E. Kalman. A new approach to linear \ufb01ltering and prediction problems. Trans. ASME\u2013JBE, 1960.\n[4] L. Ljung. System Identi\ufb01cation: Theory for the user. Prentice Hall, 2nd edition, 1999.\n[5] Zoubin Ghahramani and Geoffrey E. Hinton. Parameter estimation for Linear Dynamical Systems. Tech-\n\nnical Report CRG-TR-96-2, U. of Toronto, Department of Comp. Sci., 1996.\n\n[6] N. L. C. Chui and J. M. Maciejowski. Realization of stable models with subspace methods. Automatica,\n\n32(100):1587\u20131595, 1996.\n\n[7] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.\n[8] S. Soatto, G. Doretto, and Y. Wu. Dynamic Textures. Intl. Conf. on Computer Vision, 2001.\n[9] E. Keogh and T. Folias. The UCR Time Series Data Mining Archive, 2002.\n[10] P. Van Overschee and B. De Moor. Subspace Identi\ufb01cation for Linear Systems: Theory, Implementation,\n\nApplications. Kluwer, 1996.\n\n[11] T. Van Gestel, J. A. K. Suykens, P. Van Dooren, and B. De Moor.\n\nIdenti\ufb01cation of stable models in\n\nsubspace identi\ufb01cation by using regularization. IEEE Transactions on Automatic Control, 2001.\n\n[12] Sajid M. Siddiqi, Byron Boots, and Geoffrey J. Gordon. A Constraint Generation Approach to Learning\n\nStable Linear Dynamical Systems. Technical Report CMU-ML-08-101, CMU, 2008.\n\n[13] H. Rauch. Solutions to the linear smoothing problem. In IEEE Transactions on Automatic Control, 1963.\n[14] Kevin Murphy. Dynamic Bayesian Networks. PhD thesis, UC Berkeley, 2002.\n[15] Roger Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, 1985.\n[16] Andrew Y. Ng and H. Jin Kim. Stable adaptive control with online learning. In Proc. NIPS, 2004.\n[17] M. Wagner. A national retail data monitor for public health surveillance. Morbidity and Mortality Weekly\n\nReport, 53:40\u201342, 2004.\n\n0300Multi-drug sales counts30600Multi-zipcode sales counts30600Sunspot numbers1002000030003000300040004000400040001500015000150001500A.B.C.TrainingDataConstraintGenerationLeastSquaresLB-1\f", "award": [], "sourceid": 346, "authors": [{"given_name": "Byron", "family_name": "Boots", "institution": null}, {"given_name": "Geoffrey", "family_name": "Gordon", "institution": null}, {"given_name": "Sajid", "family_name": "Siddiqi", "institution": null}]}