{"title": "A Kernel Subspace Method by Stochastic Realization for Learning Nonlinear Dynamical Systems", "book": "Advances in Neural Information Processing Systems", "page_first": 665, "page_last": 672, "abstract": null, "full_text": "A Kernel Subspace Method by Stochastic Realization for Learning Nonlinear Dynamical Systems\n\nYoshinobu Kawahara Dept. of Aeronautics & Astronautics The University of Tokyo\n\nTakehisa Yairi Kazuo Machida Research Center for Advanced Science and Technology The University of Tokyo\n\nKomaba 4-6-1, Meguro-ku, Tokyo, 153-8904 JAPAN {kawahara,yairi,machida}@space.rcast.u-tokyo.ac.jp\n\nAbstract\nIn this paper, we present a subspace method for learning nonlinear dynamical systems based on stochastic realization, in which state vectors are chosen using kernel canonical correlation analysis, and then state-space systems are identified through regression with the state vectors. We construct the theoretical underpinning and derive a concrete algorithm for nonlinear identification. The obtained algorithm needs no iterative optimization procedure and can be implemented on the basis of fast and reliable numerical schemes. The simulation result shows that our algorithm can express dynamics with a high degree of accuracy.\n\n1\n\nIntroduction\n\nLearning dynamical systems is an important problem in several fields including engineering, physical science and social science. The objectives encompass a spectrum ranging from the control of target systems to the analysis of dynamic characterization, and for several decades, system identification for acquiring mathematical models from obtained input-output data has been researched in numerous fields, such as system control. Dynamical systems are learned by, basically, two different approaches. The first approach is based on the principles of minimizing suitable distance functions between data and chosen model classes. Well-known and widely accepted examples of such functions are likelihod functions [1] and the average squared prediction-errors of observed data. For multivariate models, however, this approach is known to have several drawbacks. First, the optimization tends to lead to an ill-conditioned estimation problem because of the over-parameterization, i.e., minimum parameters (called canonical forms) do not exist in multivariate systems. Second, the minimization, except in trivial cases, can only be carried out numerically using iterative algorithms. This often leads to there being no guarantee of reaching a global minimum and high computational costs. The second approach is a subspace method which involves geometric operations on subspaces spanned by the column or row vectors of certain block Hankel matrices formed by input-output data [2,3]. It is well known that subspace methods require no a priori choice of identifiable parameterizations and can be implemented by fast and reliable numerical schemes. The subspace method has been actively researched throughout the last few decades and several algorithms have been proposed, which are, for representative examples, based on the orthogonal decomposition of input-output data [2,4] and on stochastic realization using canonical correlation analysis [5]. Recently, nonlinear extensions have begun to be discussed for learning systems that cannot be modeled sufficiently with linear expressions. However, the nonlinear algorithms that\n\n\nURL: www.space.rcast.u-tokyo.ac.jp/kawahara/index e.html\n\n\f\nhave been proposed to date include only those in which models with specific nonlinearities are assumed [6] or those which need complicated nonlinear regression [7,8]. In this study, we extend the stochastic-realization-based subspace method [5] to the nonlinear regime by developing it on reproducing kernel Hilbert spaces [9], and derive a nonlinear subspace identification algorithm which can be executed by a procedure similar to that in the linear case. The outline of this paper is as follows. Section 2 gives some theoretical materials for the subspace identification of dynamical systems with reproducing kernels. In section 3, we give some approximations for deriving a practical algorithm, then describe the algorithm specifically in section 4. Finally, an empirical result is presented in section 5, and we give conclusions in section 6. Notation Let x, y and z be random vectors, then denote the covariance matrix of x and y by xy and the conditional covariance matrix of x and y conditioned on z by xy|z . Let a be a vector in a Hilbert space, and B , C Hilbert spaces. Then, denote the orthogonal projection of a onto B by a/B and the oblique projection of a onto B along C by a/C B . Let A be an [m n] matrix, then L{A} := {A| Rn } will be referred to as the column space and L{A } := {A | Rm } the row space of A. denotes the transpose of a matrix , and Id Rdd is the identity matrix.\n\n2\n2.1\n\nRationales\nProblem Description and Some Definitions\n\nConsider two discrete-time wide-sense stationary vector processes {u(t), y (t), t = 0, 1, } with dimensions nu and ny , respectively. The first component u(t) models the input signal while the second component y (t) models the output of the unknown stochastic system, which we want to construct from observed input-output data, as a nonlinear state-space system: x(t + 1) = g (x(t), u(t)) + v y (t) = h(x(t), u(t)) + w, (1)\n\nwhere x(t) Rn is the state vector, and v and w are the system and observation noises. Throughout this paper, we shall assume that the joint process (u, y ) is a stationary and purely nondeterministic full rank process [3,5,10]. It is also assumed that the two processes are zero-mean and have finite joint covariance matrices. A basic step in solving this realization problem, which is also the core of the subspace identification algorithm presented later, is the construction of a state space of the system. In this paper, we will derive a practical algorithm for this problem based on stochastic realization with reproducing kernel Hilbert spaces. We denote the joint input-output process w(t) = [y (t) , u(t) ] Rnw (nw = nu + ny ) and feature maps u : Ut Fu Rnu , y : Yt Fy Rny and w : Wt Fw Rnw with the Mercer kernels ku , ky and kw , where Ut , Yt and Wt are the Hilbert spaces generated by the secondorder random variables u(t), y (t) and w(t), and Fy , Fu and Fw are the respective feature spaces. Moreover, we define the future output, input and the past input-output vectors in the feature spaces as f (t) := y (y (t)) , y (y (t + 1)) , , y (y (t + l - 1)) Rlny ,\n u+ (t) := [u (u(t)) , u (u(t + 1)) , , u (u(t + l - 1)) ] \n\nRlnu ,\n\n\n\n(2)\n\np (t) := [w (w(t - 1)) , w (w(t - 2)) , ] and the Hilbert spaces generated by these random variables as:\n\n\n\nR ,\n\n Pt = span{(w( ))| < t}, Ut+ = span{(u( ))| t}, Yt+ = span{(y ( ))| t}. (3) Ut- and Yt- are defined similarly. These spaces are assumed to be closed with respect to the root-mean-square norm := [E { 2 }]1/2 , where E {} denotes the expectation value, and thus are thought of as Hilbert subspaces of an ambient Hilbert space H := U Y containing all linear functionals of the joint process in the feature spaces (u (u), y (y )).\n\n2.2\n\nOptimal Predictor in Kernel Feature Space\n\nFirst, we require the following technical assumptions [3,5].\n\n\f\nf (t) Ut+ u (t)) + ^ f (t)\n\n!!\n\n^ Figure 1: Optimal predictor f (t) of future output in feature space based on Pt Ut+ .\n\n\n\n0 p (t)\n\n\n Pt\n\nA S S U M P T I O N 1. The input u is `exogenous', i.e., no feedback from the output y to the input u. A S S U M P T I O N 2. The input process is `sufficiently rich'. More precisely, at each ti. e t, the input m U- Ut+ = {0} space Ut has the direct sum decomposition Ut = Ut- + Ut+ t\n\nNote that assumption 2 implies that the input process is purely nondeterministic and admits a spectral density matrix without zeros on the unit circle (i.e., coercive). This is too restrictive in many practical situations and we can instead assume only a persistently exciting (PE) condition of sufficiently high order and finite dimensionality for an underlying \"true\" system from the outset. Then, we can give the following proposition which enables us to develop a subspace method in feature space, as in the linear case. P RO P O S I T I O N 1. If assumptions 1 and 2 are satisfied, then similar conditions in the feature spaces described below are fulfilled: (1) There is no feedback from y (y ) to u (u). (2) Ut has the direct sum decomposition Ut = Ut- + Ut+ (Ut- Ut+ = {0})\n\nP RO O F. Condition (2) is shown straightforwardly from assumption 2 and the properties of the reproducing kernel Hilbert spaces. As Ut+ Yt- |Ut- (derived from assumption 1) and Y - /Ut+ Ut- = Yt- /Ut- are equivalent, if the orthogonal complement of Ut is denoted byUt , we can obtain Yt- = Ut- + Ut . Now, when representing Yt- using the input space on feature space Ut and the orthogonal complement Ut , we can write Yt- = Ut- + Ut because Ut = Ut- + Ut+ from condition (2), Ut+ Ut , and owing to the properties of the reproducing kernel Hilbert spaces. Therefore, Ut+ Yt- |Ut- can be shown by tracing inversely. Using proposition 1, we now obtain the following representation result. ^ T H E O R E M 1. Under assumptions 1 and 2, the optimal predictor f (t) of the future output vector in feature space f (t) based on Pt Ut + is uniquely given by the sum of the oblique projections:\n ^ f (t) = f (t) /Pt Ut + = p (t) + u (t), +\n\n(4) (5)\n\nin which and satisfy the discrete Wiener-Hopf-type equations p p |u = f p |u , u u |p = f u |p .\n\nP RO O F. From proposition 1, the proof can be carried out as in the linear case (cf. [3,5]). 2.3 Construction of State Vector\n\nLet Lf , Lp be the square root matrices of f f |u , p p |u , i.e., f f |u = Lf Lf , p p |u = Lp Lp , and assume that the SVD of the normalized conditional covariance is given by L-1 f p |u (L-1 ) p f\n=\n\nU SV\n\n,\n\n(6)\n\nwhere S Rlny np is the matrix with all entries being zero, except the leading diagonal, which has the entries i satisfying 1 n > 0 for n = min(lny , np ), and U , V are square orthogonal.\n\n\f\nWe define the extended observability and controllability matrices O := Lf U S 1/2 , C := S 1/2 V\nL p,\n\n(7)\n\nwhere rank(O) = rank(R) = n. Then, from the SVD of Eq. (6), the block Hankel matrix f p |u has the classical rank factorization f p |u = OC . If a 'state vector' is now defined to be the n-dimensional vector\n1 x (t) = C -p p |u p (t) = S 1/2 V L-1 p p (t),\n\n(8)\n\n it is readily seen that x(t) is a basis for the stationary oblique predictor space Xt := Yt+ /U + Pt , t which, on the basis of general geometric principles, can be shown to be a minimal state space for the process y (y ), as in the linear case [3,5]. This is also assured by the fact that the oblique projection of f (t) onto Ut+ along Pt can be expressed, using Eqs. (5), (7) and (8), as 1 f (t)/U + Pt = p (t) = f p |u -p p |u p (t) = Ox(t) \nt\n\n(9)\n\nand rank(O) = n, and the variance matrix of x(t) is nonsingular. In terms of x(t), the optimal ^ predictor f (t) in Eq. (4) has the form ^ f = Ox(t) + u (t). + (10)\n\nIt is seen that x(t) is a conditional minimal sufficient statistic carrying exactly all the information contained in Pt that is necessary for estimating the future outputs, given the future inputs. In analogy with the linear case [3,5], the output process in feature space y (y (t)) now admits a minimal stochastic realization with the state vector x(t) of the form x(t + 1) = A x(t) + B u (u(t)) + K e(t), y (y (t)) = C x(t) + D u (u(t)) + e(t), (11)\n\nwhere A Rnn , B Rnnu , C Rny n , D Rny nu and K Rnny are constant matrices and e(t) := (y (t)) - ((y (t))|Pt Ut ) is the prediction error. 2.4 Preimage\n\nIn this section, we describe the state-space model for the output y (t) while the state-space model (11), derived in the previous section, represents the output in feature space y (y (t)). At first, we define the feature maps x : Xt Fx Rnx , u := Ut Fu Rnu and the linear space Xt , generated by (x(t)), (u(t)). Then, the product of X and U satisfies X U = 0 Ut x u t t t t because Xt Ut = 0 and x , u are bijective. Therefore, the output y (t) is represented as the direct sum of the oblique projections as y (t)/Xt Ut = C x (x(t)) + D u (u(t)). (12)\n\nAs a result, we can obtain the following theorem. T H E O R E M 2. Under assumptions 1 and 2, if rank f p |u = n, then the output y can be represented in the following state-space model: x(t + 1) = A x(t) + B u (u(t)) + K e (e(t)), x (x(t)) + D u (u(t)) + e(t), y (t) = C (13)\n\n where e(t) := y (t) - y (t)/Xt Ut is the prediction error and K := K Ae , in which Ae is the 1 coefficient matrix of the nonlinear regression from e(t) to e(t) .\n1 Let f be a map from e(t) to e and minimize a regularized risk c((e1 , e1 , f (e1 )), , (em , em , f (em )))+ ( f H ), where : [0, ) R is a strictly monotonically increasing function and c : (E R2 )m R {} (E span{e}) is an arbitrary loss function; then, from the representer theorem[9], f satisfies f span{ffie (e(t))}, where ffie is a feature map with the associated Mercer kernel ke . Therefore, we can represent nonlinear regression from e(t) to e(t) as Ae ffie (e(t)). \n\n\f\n3\n3.1\n\nApproximations\nRealization with Finite Data\n\nIn practice, the state vector and associated state-space model should be constructed with available finite data. Let the past vector p (t) be truncated to finite length, i.e., p (t) := [w (w(t - T 1)) , w (w(t - 2)) , , w (w(t - T )) ] RT (ny +nu ) , where T > 0, and define P[t-T ,t) := span{p ( )| < t}. Then, the following theorem describes the construction of the state vecT ^ tor and the corresponding state-space system which form the finite-memory predictor f (t) := f (t)/Ut+ P[-T ,t) . t\nT\n\nT H E O R E M 3. Under assumptions 1 and 2, if rank(f p |u ) = n, then the process y (y ) is expressed by the following nonstationary state-space model: ^ ^ ^ xT (t + 1) = A xT (t) + B u (u(t)) + K (t)eT (t), ^ ^ y (y (t)) = C xT (t) + D u (u(t)) + eT (t). ^ and eT (t) := y (y (t)) - (y (y (t))|P[ ,t) Ut+ ) is the prediction error. T\nt\n\n(14)\n\n^ where the state vector xT (t) is a basis on the finite-memory predictor space Yt+ /U + P[-T ,t) , t\n\nThe proof can be carried out as in the linear case (cf. [3,5]). In other words, we can obtain the ^ approximated state vector xT by applying the facts in Section 2 to finite data. This state vector differs ^ from x(t) in Eq. (8); however, when T , the difference between xT (t) and x(t) converges to zero and the covariance matrix of the estimation error P also converges to the stabilizing solution of the following Algebra Riccati Equation (ARE): P = A P A\n+\n\n ww\n\n-\n\n(A P C \n\n+\n\n (C P C ww\n)\n\n)\n\n+\n\n ee\n\n)-1\n\n(A P C \n\n+\n\nMoreover, the Kalman gain K converges to K = (A P C \n+\n\n . ww (15) (16)\n\n)\n\n (C P C ww\n\n+\n\n ee\n\n)-1\n\n,\n\nwhere and are the covariance matrices of errors in the state and observation equations, e w respectively. 3.2 Using Kernel Principal Components\n\nLet z be a random variable, kz a Mercer kernel with a feature map z and a feature space Fz , and denote z := [z (z 1 ), , z (z m )] and the associated Gram matrix Gz := z z . The first ith principal components uz,i L{z }(i = 1, , dz ) combined in a matrix Uz = [uz,1 , , uz,dz ] form an orthonormal basis of a dz -dimensional subspace L{Uz } L{z }, and can therefore also be described as the linear combination Uz = z Az , where the matrix Az Rmdz holds the s expansion coefficients. Az is found by, for example, the eigendecomposition Gz = z z - z -1 / 2 uch that Az consists of the first dz columns of z z . Then, z with respect to the principal z components is given by Cz := z Uz = z Az = Gz Az [11]. From the orthogonality of z (i.e., z z = z z = Im ), we can derive the following equation: ( -1 -1 / 2 -1 / 2 (Az Gz Gz Az )-1 = z z,d ) (z z z )(z z z )(z z,d ) = Az G-1 G-1 Az , (17) z z\n1/ 2 where z,d is the matrix which consists of the first dz columns of z , and Az := z z,d satisfying z z z z A Az = A Az = Id and Az A = Az A = Im .\nz\n\nThis property of kernel principal components enables us to approximate matters described in the previous sections in computable forms. First, using Eq. (17), the conditional covariance matrix f f |u can be expressed as\n1 f f |u = f f - f u -u u u f \n\n Af Gf Gf Af - (Af Gf Gu Au )(Au Gu Gu Au )-1 (Au Gu Gf Af ) G A -1 f^ Gu Gf = Af f Gf - Gf Gu (Gu Gu ) f (:= A f f |u Af ),\n\n(18)\n\n\f\n^ where f f |u may be called the empirical conditional covariance operators, and the regularized variant can be obtained by replacing Gf Gf , Gu Gu with (Gf + Im )2 , (Gu + Im )2 ( > 0) (cf.[12,13]). ^ ^ p p |u and f p |u can be approximated as well. Moreover, using L-1 = L-1 A , where L is ^ |u ( = p, f ) 2 , we can represent Eqs. (6) and (8) approximately as the square root matrix of - ^p ^ ^^ ^ ^^ ^ ^p L-1 f p |u (Lp 1 ) (L-1 Af )(Af f p|u Ap ) Ap (L-1 ) = L-1 f p|u (L-1 ) = U S V , (19) f f f x (t) = S 1/2 V where k(p(t)) :=\nL-1 ^1/2 V (L-1 Ap )(Ap k(p(t))) = S 1/2 V L-1 k(p(t)), ^ ^p ^ ^ ^p p p (t) S p p (t) = [kp (p1 (t), p(t)), , kp (pm (t), p(t))] .\n\n(20)\n\nIn addition, we can apply this approximation with the kernel PCA to the state-space models derived in the previous sections. First, Eq. (11) can be approximated as x(t + 1) = A x(t) + B Au ku (u(t)) + K e(t), Ay ky (y (t)) = C x(t) + D Au ku (u(t)) + e(t), (21)\n\nwhere Au and Ay are the expansion coefficient matrices found by the eigendecomposition of Gu and Gy , respectively. Also, using the coefficient matrices Ax , Ae and Au , Eq.(13) can be written as u e x(t + 1) = A x(t) + B A ku (u(t)) + K A ke (e(t)), (22) x y (t) = C A kx (x(t)) + D Au ku (u(t)) + e(t).\n\n4\n\nAlgorithm\n\nIn this section, we give a subspace identification algorithm based on the discussions in the previous sections. Denote the finite input-output data as {u(t), y (t), t = 1, 2, , N + 2l - 1}, where l > 0 is an integer larger than the dimension of system n and N is the sufficient large integer, and assume that all data is centered. First, using the Gram matrices Gu , Gy and Gw associated with the input, the output, and the input-output, repectively, we must to calculate the Gram matrices GU , GY and GW corresponding to the past input, the future output, and the past input-output defined as 2l i 2l i 2l i Gu,i(i+N -1) Gu,i(i+1) Gu,ii =l +1 =l +1 =l +1 i 2l 2l 2l i i Gu,(i+1)(i+N -1) Gu,(i+1)(i+1) Gu,(i+1)i , =l +1 =l +1 =l +1 GU := . . . .. . . .. . . . 2l i 2l i 2l i Gu,(i+N -1)(i+N -1) Gu,(i+N -1)(i+1) Gu,(i+N -1)i\n=l +1 =l +1 =l +1\n\n(23)\n\nGW\n\nand GY is defined analogously to GU . Now the procedure is given as follows.\n\n =1 Gw,ii il Gw,(i+1)i := =1 . . . il Gw,(i+N -1)i\n=1\n\n\n\nil\n\n=1 il =1\n\nil\n\nGw,i(i+1) Gw,(i+1)(i+1) . . . Gw,(i+N -1)(i+1)\n\n .. .\n\nil\n\n\n\n=1\n\n. . . il\n\n=1 il =1\n\nil\n\nGw,i(i+N -1) Gw,(i+1)(i+N -1)\n\n\n\n=1\n\nGw,(i+N -1)(i+N -1)\n\n , \n\n(24)\n\nStep 1 Calculate the regularized empirical covariance operators and their square root matrices as ^^ ^ f f |u = (GY + IN )2 - GY GU (GU + IN )-2 GU GY = Lf Lf , ^ ^^ pp|u = (GW + IN )2 - GW GU (GU + IN )-2 GU GW = Lp Lp , ^ f p|u = GY GW - GY GU (GU + IN )-2 GU GW .\n2 1 ^ ^ ^ ^ This is given by (L-1 ) L-1 = - |u (A |u A )-1 = A -1|u A = A (L-1 ) L-1 A . \n\n(25)\n\n\f\nStep 2 Calculate the SVD of the normalized covariance matrix (cf. Eq. (19)) ^ ^p ^ ^^ L-1 f p|u (L-1 ) = U S V U1 S1 V1 ,\nf\n\n(26)\n\nwhere S1 is obtained by neglecting the small singular values so that the dimension of the state vector n equals the dimension of S1 . Step 3 Estimate the state sequence as (cf. Eq. (20)) ^ Xl := [x(l), x(l + 1), , x(l + N - 1)] = S1 V1 L-1 GW , p and define the following matrices consisting of N - 1 columns: ^ ^ Xl+1 = Xl (:, 2 : N ), Xl = Xl (:, 1 : N - 1).\n1/ 2\n\n(27) (28)\n\nStep 4 Calculate the eigendecomposition of the Gram matrices Gu , Gu , Gy and Gx and the corre sponding expansion coefficient matrices Au , Au , Ay and Ax . Then, determine the system matrices A , B , C , D , C and D by applying regularized least square regressions to the following equations (cf. Eqs. (21) and (22)): ^ = A , ^ + B Xk +1 Xk w (29) e Ay Gy (:, 2, N ) C D Au Gu (:, 1, N - 1) Yl|l = C (Ax Gx (:, 2, N )) + D (Au Gu (:, 2, N )) + e , (30) where the matrices w , e and e are the residuals. Step 5 Calculate the covariance matrices of the residuals = w 1 we w w w ew e N - 1 e \n\nw e e e\n\nsolve ARE (15), and, using the stabilizing solution, calculate the Kalman gain K in Eq. (16).\n\n,\n\n(31)\n\n5\n\nSimulation Result\n\nIn this section, we illustrate the proposed algorithm for learning nonlinear dynamical systems with synthetic data. The data was generated by simulating the following system [8] using the 4th- and 5th-order Runge-Kutta method with a sampling time of 0.05 seconds: x1 (t) = x2 (t) - 0.1 cos(x1 (t))(5x1 (t) - 4x3 (t) + x5 (t)) - 0.5 cos(x1 (t))u(t), 1 1 x2 (t) = -65x1 (t) + 50x3 (t) - 15x5 (t) - x2 (t) - 100u(t), 1 1 y (t) = x1 (t), (32)\n\nwhere y s are simulated values and the used initial state is a least square estimation with the initial i few points, were improved to 40.2 for our algorithm, from 44.1 for the linear method. The accuracy was improved by about 10 percent. The system orders are 8 for our algorithm, whle 10 for the linear method, in this case. We can see that our method can estimate the state sequence with more information and yield the model capturing the dynamics more precisely. However, the parameters involved much time and effort for tuning.\n\nwhere the input was a zero-order-hold white noise signal uniformly distributed between -0.5 and 0.5. We applied our algorithm on a set of 600 data points, and then validated the obtained model using a fresh data set of 400 points. As a kernel function, we used the RBF Gaussian kernel k (z i , z j ) = exp(- z i - z j 2/2z ). The parameters to be tuned for our method are thus the widths of the kernels for u, y , w and x, the regularization degree , and the row-block number l of the Hankel matrix. In addition, we must select the order of the system and the number of kernel principal components npc for u, y and e. Figure 2 shows free-run simulation results of the model acquired by our algorithm, in which the parameters were set as u = 2.5, y = 3.5, w = 4.5, x = 1.0, npc = npc = 4, npc = 9 and = 0.05, and, for comparison, by the linear subspace x u y identification [5]. The row-block number l was set as 10 in both identifications. The simulation errors [2] m ny s 2 (y 100 c i=1 (m i )c - (yi )c ) = , (33) 2 ny =1 j =1 ((yi )c )\n\n\f\n3 Observation Simulation 2\n\n3 Observation Simulation 2\n\n1\n\n1\n\n0\n\n0\n\n-1\n\n-1\n\n-2\n\n-2\n\n-3 0 50 100 150 200\nData Point\n\n-3 250 300 350 400 0 50 100 150 200 Data Point 250 300 350 400\n\nFigure 2: Comparison of simulated outputs. Left: Kernel subspace identification method (proposed method). Right: Linear subspace identification method [5]. The broken lines represent the observations and the solid lines represent the simulated values.\n\n6\n\nConclusion\n\nA new subspace method for learning nonlinear dynamical systems using reproducing kernel Hilbert spaces has been proposed. This approach is based on approximated solutions of two discrete WienerHopf equations by covariance factorization in kernel feature spaces. The algorithm needs no iterative optimization procedures, and hence, solutions can be obtained in a fast and reliable manner. The comparative empirical results showed the high performance of our method. However, the parameters involved much time and effort for tuning. In future work, we will develop the idea for closed-loop systems for the identification of more realistic applications. Moreover, it should be possible to extend other established subspace identification methods to nonlinear frameworks as well. Acknowledgments The present research was supported in part through the 21st Century COE Program, \"Mechanical Systems Innovation,\" by the Ministry of Education, Culture, Sports, Science and Technology. References\n[1] Roweis, S. & Ghahramani, Z. (1999) \"A Unifying Review of Linear Gaussian Models\" Neural Computation, 11 (2) : 305-345. [2] Van Overschee, P. & De Moor, B. (1996) \"Subspace Identification for Linear Systems: Theory, Implementation, Applications\" Kluwer Academic Publishers, Dordrecht, Netherlands. [3] Katayama, T. (2005) \"Subspace Methods for System Identification: A Realization Approach\" Communications and Control Engineering, Springer Verlag, 2005. [4] Moonen, M. & Moor, B. D. & Vandenberghe, L. & Vandewalle, J. (1989) \"On- and Off-line Identification of Linear State Space Models\" International Journal of Control, 49 (1) : 219-232. [5] Katayama, T. & Picci, G. (1999) \"Realization of Stochastic Systems with Exogenous Inputs and Subspace Identification Methods\" Automatica, 35 (10) : 1635-1652. [6] Goethals, I. & Pelckmans, K. & Suykens, J. A. K. & Moor, B. D. (2005) \"Subspace Identification of Hammerstein Systems Using Least Squares Support Vector Machines\" IEEE Trans. on Automatic Control, 50 (10) : 1509-1519. [7] Ni, X. & Verhaegen, M. & Krijgsman, A. & Verbruggen, H. B. (1996) \"A New Method for Identification and Control of Nonlinear Dynamic Systems\" Engineering Application of Artificial Intelligence, 9 (3) : 231-243. [8] Verdult, V. & Suykens, J. A. K. & Boets, J. & Goethals, I. & Moor, B. D. (2004) \"Least Squares Support Vector Machines for Kernel CCA in Nonlinear State-Space Identification\" Proceedings of the 16th International Symposium on Mathematical Theory of Networks and Systems, (MTNS2004). [9] Scholkopf, B. & Smola, A. (2002) \"Learning with Kernels\" MIT Press. [10] Rozanov, N. I. (1963) \"Stationary Random Processes\" Holden-Day, San Francisco, CA. [11] Kuss, M. & Graepel, T. (2003) \"The Geometry of Kernel Canonical Correlation Analysis\" Technical Report, Max Planck Institute for Biological Cybernetics, Tubingen, Germany (108). [12] Bach, F. R., & Jordan, M. I. (2002) \"Kernel Independent Component Analysis\" Journal of Machine Learning Research (JMLR), 3 : 1-48. [13] Fukumizu, K. & Bach, F. R., & Jordan, M. I. (2004) \"Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces\" Journal of Machine Learning Research (JMLR), 5 : 73-99.\n\n\f\n", "award": [], "sourceid": 3103, "authors": [{"given_name": "Yoshinobu", "family_name": "Kawahara", "institution": null}, {"given_name": "Takehisa", "family_name": "Yairi", "institution": null}, {"given_name": "Kazuo", "family_name": "Machida", "institution": null}]}