{"title": "Blind Separation of Filtered Sources Using State-Space Approach", "book": "Advances in Neural Information Processing Systems", "page_first": 648, "page_last": 656, "abstract": null, "full_text": "Blind Separation of Filtered Sources \n\nUsing State-Space Approach \n\nLiqing Zhang\u00b7 and Andrzej Cichockit \nLaboratory for Open Information Systems, \n\nBrain Science Institute, RIKEN \n\nSaitama 351-0198, Wako shi, JAPAN \n\nEmail: {zha.cia}@open.brain.riken.go.jp \n\nAbstract \n\nIn this paper we present a novel approach to multichannel blind \nseparation/generalized deconvolution, assuming that both mixing \nand demixing models are described by stable linear state-space sys(cid:173)\ntems. We decompose the blind separation problem into two pro(cid:173)\ncess: separation and state estimation. Based on the minimization \nof Kullback-Leibler Divergence, we develop a novel learning algo(cid:173)\nrithm to train the matrices in the output equation. To estimate the \nstate of the demixing model, we introduce a new concept, called \nhidden innovation, to numerically implement the Kalman filter. \nComputer simulations are given to show the validity and high ef(cid:173)\nfectiveness of the state-space approach. \n\n1 \n\nIntrod uction \n\nThe field of blind separation and deconvolution has grown dramatically during re(cid:173)\ncent years due to its similarity to the separation feature in human brain, as well as its \nrapidly growing applications in various fields, such as telecommunication systems, \nimage enhancement and biomedical signal processing. The blind source separation \nproblem is to recover independent sources from sensor outputs without assuming \nany priori knowledge of the original signals besides certain statistic features. Refer \nto review papers [lJ and [5J for the current state of theory and methods in the field. \nAlthough there exist a number of models and methods, such as the infomax, nat(cid:173)\nural gradient approach and equivariant adaptive algorithms, for separating blindly \nindependent sources, there still are several challenges in generalizing mixture to dy-\n\n\u00b7On leave from South China University of Technology, China \ntan leave from Warsaw University of Technology, Poland \n\n\fBlind Separation of Filtered Sources \n\n649 \n\nnamic and nonlinear systems, as well as in developing more rigorous and effective \nalgorithms with general convergence.[1-9], [11-13] \nThe state-space description of systems is a new model for blind separation and \ndeconvolution[9,12]. There are several reasons why we use linear state-space systems \nas blind deconvolution models. Although transfer function models are equivalent \nto the state-space ones, it is difficult to exploit any common features that may \nbe present in the real dynamic systems. The main advantage of the state space \ndescription for blind deconvolution is that it not only gives the internal description \nof a system, but there are various equivalent types of state-space realizations for a \nsystem, such as balanced realization and observable canonical forms. In particular \nit is known how to parameterize some specific classes of models which are of interest \nin applications. Also it is much easy to tackle the stability problem of state-space \nsystems using the Kalman Filter. Moreover, the state-space model enables much \nmore general description than standard finite impulse response (FIR) convolutive \nfiltering. All known filtering (dynamic) models, like AR, MA, ARMA, ARMAX and \nGamma filterings, could also be considered as special cases of flexible state-space \nmodels. \n\n2 Formulation of Problem \n\nAssume that the source signals are a stationary zero-mean i.i.d processes and mutu(cid:173)\nally statistically independent. Let s(t) = (SI (t),\"', sn(t)) be an unknown vector of \nindependent Li.d. sources. Suppose that the mixing model is described by a stable \nlinear state discrete-time system \n\nx(k + 1) \nu(k) \n\nAx(k) + Bs(k) + Lep(k), \nCx(k) + Ds(k) + 6(k), \n\n(1) \n(2) \nwhere x E RT is the state vector of system, s(k) E R n is the vector of source signals \nand u(k) E R m is the vector of sensor signals. A, B, C and D are the mixing \nmatrices of the state space model with consistent dimensions. ep(k) is the process \nnoise and 6(k) is sensor noise of the mixing system. If we ignore the noise terms \nin the mixing model, its transfer function matrix is described by a m x n matrix of \nthe form \n\nH(z) = C(zI - A)-l B + D, \n\n(3) \n\nwhere Z-1 is a delay operator. \n\nWe formulate the blind separation problem as a task to recover original signals \nfrom observations u(t) without prior knowledge on the source signals and the state \nspace matrices [A, B, C, D] besides certain statistic features of source signals. We \npropose that the demixing model here is another linear state-space system, which \nis described as follows, (see Fig. 1) \n\nx(k + 1) = Ax(k) + Bu(k) + LeR(k), \n\n(4) \n(5) \nwhere the input u(k) of the demixing model is just the output (sensor signals) \nof the mixing model and the eR(k) is the reference model noise. A, B, C and \nD are the demixing matrices of consistent dimensions. In general, the matrices \nW = [A, B, C, D, L] are parameters to be determined in learning process. \n\ny(k) = Cx(k) + DU(k), \n\nFor simplicity, we do not consider, at this moment, the noise terms both in \nthe mixing and demixing models. The transfer function of the demixing model \nis W(z) = C(zI - A)-1 B + D. The output y(k) is designed to recover the source \nsignals in the following sense \n\ny(k) = W(z)H(z)s(k) = PA(z)s(k), \n\n(6) \n\n\f650 \n\nL. Zhang and A. Cichocki \n\nu(k) \n\nFigure 1: General state-space model for blind deconvolution \n\nwhere P is any permutation matrix and A(z) is a diagonal matrix with Aiz-Ti \nin diagonal entry (i,i), here Ai is a nonzero constant and Ti is any nonnegative \ninteger. It is easy to see that the linear state space model mixture is an extension \nof instantaneous mixture. When both the matrices A, B, C in the mixing model \nand A, B, C in the demixing model are null matrices, the problem is simplified to \nstandard leA problem [1-8]. \nThe question here is whether exist matrices [A, B, C, D] in the demixing model (4) \nand (5), such that its transfer function W(z) satisfies (6). It is proven [12] that if \nthe matrix D in the mixing model is of full rank, rank(D) = n, then there exist \nmatrices [A, B, C, D], such that the output signal y of state-space system (4) and \n(5) recovers the independent source signal 8 in the sense of (6). \n\n3 Learning Algorithm \n\nAssume that p(y, W),Pi(Yi, W) are the joint probability density function of y and \nmarginal pdf of Yi, (i = 1\" \n. \" n) respectively. We employ the mutual information \nof the output signals, which measures the mutual independence of the output signals \nYi(k), as a risk function [1,2] \n\nl(W) = -H(y, W) + L H(Yi, W), \n\nn \n\ni=l \n\n(7) \n\nwhere \n\nH(y, W) = - J p(y, W)logp(y, W)dy, H(Yi, W) = - J Pi(Yi, W)logpi(Yi, W)dYi. \n\nIn this paper we do not directly develop learning algorithms to update all param(cid:173)\neters W = [A, B, C, D] in demixing model. We separate the blind deconvolution \nproblem into two procedures: separation and state-estimation. In the separation \nprocedure we develop a novel learning algorithm, using a new search direction, to \nupdate the matrices C and D in output equation (5). Then we define a hidden \ninnovation of the output and use Kalman filter to estimate the state vector x(k). \nFor simplicity we suppose that the matrix D in the demixing model (5) is nonsin(cid:173)\ngular n x n matrix. From the risk function (7), we can obtain a cost function for \non line learning \n\nl(y, W) = -2logdet(DT D) - L logpi(Yi, W), \n\nn \n\n1 \n\n(8) \n\ni=l \n\n\fBlind Separation of Filtered Sources \n\n651 \n\nwhere det(DT D) is the determinant of symmetric positive definite matrix DT D. \nFor the gradient of I with respect to W, we calculate the total differential dl of \nl(y, W) when we takes a differential dW on W \n\ndl(y, W) = l(y, W + dW) -l(y, W). \n\n(9) \n\nFollowing Amari's derivation for natural gradient methods [1-3], we have \n\ndl(y, W) = -tr(dDD- I ) + cpT(y)dy, \n\n(10) \nis the trace of a matrix and cp(y) is a vector of nonlinear activation \n\nwhere tr \nfunctions \n\n(11) \n\n(12) \n\n(13) \n\nCPi(Yi) = - dlogpi(Yi) = _P~(Yi). \nPi(Yi) \n\ndYi \n\nTaking the derivative on equation (5), we have following approximation \n\ndy = dCx(k) + dDu(k). \n\nOn the other hand, from (5), we have \n\nu(k) = D- I (y(k) - Cx(k)) \n\nSubstituting (13) into (12), we obtain \n\n(14) \nIn order to improve the computing efficiency of learning algorithms, we introduce a \nnew search direction \n\ndy = (dC - dDD-IC)x + dDD-ly. \n\n= dC-dDD-IC , \n\ndX 2 = dDD- I . \n\n(15) \n(16) \n\nThen the total differential dl can be expressed by \n\n(17) \nIt is easy to obtain the derivatives of the cost function I with respect to matrices \nXl and X 2 as \n\ndl = -tr(dX 2) + cpT(y)(dXIX + dX2Y)' \n\ncp(y(k))XT(k), \n\ncp(y(k))yT (k) - I. \n\n(18) \n\n(19) \n\nFrom (15) and (16), we derive a novel learning algorithm to update matrices C and \nD. \n\n~C(k) = \n~D(k) = \n\n'T] (-cp(y(k))xT(k) + (I - cp(y(k))yT(k))C(k)) , \n'T] (I - cp(y(k))yT(k)) D(k). \n\n(20) \n(21) \n\nThe equilibrium points of the learning algorithm satisfy the following equations \n\nE[cp(y(k))XT(k)] = 0, \nE [I - cp(y(k))yT (k)] = O. \n\n(22) \n(23) \n\nThis means that separated signals y could achieve as mutually independent as \npossible if the nonlinear activation function cp(y) are,suitably chosen and the state \nvector x(k) is well estimated. From (20) and (21), we see that the natural gradient \nlearning algorithm [2] is covered as a special case of the learning algorithm when \nthe mixture is simplified to instantaneous case. \n\nThe above derived learning algorithm enable to solve the blind separation problem \nunder assumption that state matrices A and B are known or designed appropriately. \nIn the next section instead of adjusting state matrices A and B directly, we propose \nnew approaches how to estimate state vector x. \n\n\f652 \n\nL. Zhang and A. Cichocki \n\n4 State Estimator \n\nFrom output equation (5), it is observed that if we can accurately estimate the state \nvector x(k) of the system, then we can separate mixed signals using the learning \nalgorithm (20) and (21). \n\n4.1 Kalman Filter \n\nThe Kalman filter is a useful technique to estimate the state vector in state-space \nmodels. The function of the Kalman Filter is to generate on line the state estimate \nof the state x(k). The Kalman filter dynamics are given as follows \n\nx(k + 1) = Ax(k) + BU(k) + Kr(k) + eR(k), \n\n(24) \n\nwhere K is the Kalman filter gain matrix, and r(k) is the innovation or residual \nvector which measures the error between the measured(or expected) output y(k) \nand the predicted output Cx(k) + Du(k). There are varieties of algorithms to \nupdate the Kalman filter gain matrix K as well as the state x(k), refer to [10] for \nmore details. \n\nHowever in the blind deconvolution problem there exists no explicit residual r(k) \nto estimate the state vector x(k) because the expected output y(t) means the \nunavailable source signals. In order to solve this problem, we present a new concept \ncalled hidden innovation to implement the Kalman filter in blind deconvolution case. \nSince updating matrices C and D will produces an innovation in each learning step, \nwe introduce a hidden innovation as follows \n\nr(k) = b.y(k) = t1Cx(k) + t1Du(k), \n\n(25) \nwhere t1C = C(k + 1) - C(k) and t1D = D(k + 1) - D(k). The hidden innovation \npresents the adjusting direction of the output of the demixing system and is used \nto generate an a posteriori state estimate. Once we define the hidden innovation, \nwe can employ the commonly used Kalman filter to estimate the state vector x(k), \nas well as to update Kalman gain matrix K . The updating rule in this paper is \ndescribed as follows: \n(1) Compute the Kalman gain matrix \n\nK(k) = P(k)C(kf(C(k)P(k)CT(k) + R(k))-l \n\n(2) Update state vector with hidden innovation \n\nx(k) = x(k) + K(k)r(k) \n\n(3) Update the error covariance matrix \n\nP(k) = (I - K(k)C(k))P(k) \n\n(4) evaluate the state vector ahead \n\n(5) evaluate the error covariance matrix ahead \n\nXk+l = A(k)x(k) + B(k)u(k) \n\nP(k) = A(k)P(k)A(kf + Q(k) \n\nwith the initial condition P(O) = I, where Q(k), R(k) are the covariance matrices \nof the noise vector eR and output measurement noise nk. \nThe theoretic problems, such as convergence and stability, remain to be elaborated. \nSimulation experiments show that the proposed algorithm, based on the Kalman \nfilter, can separate the convolved signals well. \n\n\fBlind Separation of Filtered Sources \n\n653 \n\n4.2 \n\nInformation Back-propagation \n\nAnother solution to estimating the state of a system is to propagate backward the \nmutual information. If we consider the cost function is also a function of the vector \nx, than we have the partial derivative of l(y, W) with respect to x \n\n8l(y , W) = C T \n\n8x \n\n( \n\n) \ncp Y . \n\nThen we adjust the state vector x(k) according to the following rule \n\nThen the estimated state vector is used as a new state of the system. \n\nx(k) = x(k) - TlC(kf cp(y(k)). \n\n(26) \n\n(27) \n\n5 Numerical Implementation \n\nSeveral numerical simulations have been done to demonstrate the validity and ef(cid:173)\nfectiveness of the proposed algorithm. Here we give a typical example \nExample 1. Consider the following MIMO mixing model \n\n10 \n\nU(k) + L AiU(k - i) = s(k) + L BiS(k - i) + v(k), \n\n10 \n\ni=l \n\ni=l \n\nwhere u, s, v E R 3 , and \n\nA2 \n\nAlO = \n\nB8 \n\n-0.16 -0.64 ), \n\n-0.48 \n-0.24 \n-0.16 -0.08 \n\n-0.48 \n-0.16 \n-0.16 \n0.32 0.19 0.38 ) \n0.16 0.29 0.20 \n0.08 0.08 0.10 \n-0.40 \n-0.08 \n-0.08 \n\n-0.08 \n-0.40 \n-0.08 \n\nA8 = \n\nB2 \n\nBlO \n\n-0.40 ) \n-0.20 \n-0.10 \n\n-0.10 \n-0.50 \n-0.10 \n\n-0.50 \n-0.10 \n-0.10 \n0.42 0.21 0.,4) \n0.10 0.56 0.14 \n, \n0.21 0.21 0.35 \n-0.19 \n-0.11 \n-0.16 \n\n-0.15 -0.,0) \n-0.27 -0.12 \n-0.18 \n-0.22 \n\n, \n\n, \n\n, \n\n-0.08 ), \n\n-0.16 \n-0.56 \n\nand other matrices are set to the null matrix. The sources s are chosen to be \nLLd signals uniformly distributed in the range (-1,1), and v are the Gaussian noises \nwith zero mean and a covariance matrix 0.11. We employ the state space approach \nto separate mixing signals. The nonlinear activation function is chosen cp(y) = y3. \nThe initial value for matrices A and B in the state equation are chosen as in \ncanonical controller form. The initial values for matrix C is set to null matrix or \ngiven randomly in the range (-1,1) , and D = 1 3 . A large number of simulations \nshow that the state space method can easily recover source signals in the sense of \nW(z)H( z ) = PA. Figure 2 illustrates the coefficients of global transfer function \nG(z) = W( z )H(z ) after 3000 iterations, where the (i,j)th sub-figure plots the \ncoefficients of the transfer function Gij (z) = E~o gijkZ-k up to order of 50. \n\nReferences \n\n[1] S. Amari and A. Cichocki, \"Adaptive blind signal processing- neural network \n\napproaches\", Proceedings of the IEEE, 86(10):2026-2048, 1998. \n\n[2] S. Amari, A. Cichocki, and H.H. Yang, \"A new learning algorithm for blind \nsignal separation\", Advances in Neural Information Processing Systems 1995 \n(Boston, MA: MIT Press, 1996), pp. 752- 763. \n\n\f654 \n\nL. Zhang and A. Cichocki \n\nG(z) 1 1 \n\nG(Z) 1 2 \n\nG(Z) '3 \n\no \n\no \n\n~ \n\na \n\n~ \n\n~ \n\na \n\n~ \n(3(:) 21 \n\n~ \nG(Z)22 \n\n00 \nG(Z) :!3 \n\n_~CJ _~CJ~Cl \n~CJ ~CJ ~CJ \n_~CJ }:~ _~CJ \n~CJ ~CJ ~CJ \n_~c:J r~~ _;CJ \n\n~CJ ~CJ ~CJ \n\n00 \n(3(Z) 3 1 \n\n00 \nG (Z')33 \n\n00 \nG(Z)32 \n\n40 \n\n00 \n\n~ \n\n0 \n\n00 \n\n~ \n\n~ \n\na \n\n~ \n\na \n\n00 \n\n~ \n\na \n\na \n\nFigure 2: The coefficients of global transfer function after 3000 iterations \n\n[3] S. Amari \"Natural gradient works efficiently in learning\", Neural Computation, \n\nVoLlO, pp251-276, 1998. \n\n[4] A. J. Bell and T. J. Sejnowski, \"An information-maximization approach to \nblind separation and blind deconvolution\", Neural Computation, Vol. 7, pp \n1129-1159, 1995. \n\n[5] J.-F Cardoso, \"Blind signal separation: statistical principles\", Proceedings \n\nof the IEEE, 86(10):2009-2025, 1998. \n\n[6] J.-F. Cardoso and B. Laheld, \"Equivariant adaptive source separation,\" IEEE \n\nTrans. Signal Processing, vol. SP-43, pp. 3017-3029, Dec. 1996. \n\n[7] A.Cichocki and R. Unbehauen, \"Robust neural networks with on-line learning \nfor blind identification and blind separation of sources\" IEEE Trans Circuits \nand Systems I : Fundamentals Theory and Applications, vol 43, No.Il, pp. \n894-906, Nov. 1996. \n\n[8] P. Comon, \"Independent component analysis: a new concept?\", Signal Pro(cid:173)\n\ncessing, vol.36, pp.287- 314, 1994. \n\n[9] A. Gharbi and F. Salam, \"Algorithm for blind signal separation and recov(cid:173)\n\nery in static and dynamics environments\", IEEE Symposium on Circuits and \nSystems, Hong Kong, June, 713-716, 1997. \n\n[10] O. L. R. Jacobs, \"Introduction to Control Theory\", Second Edition, Oxford \n\nUniversity Press, 1993. \n\n[11] T. W. Lee, A.J. Bell, and R. Lambert, \"Blind separation of delayed and \n\nconvolved sources\", NIPS 9, 1997, MIT Press, Cambridge MA, pp758-764. \n\n[12] L. -Q. Zhang and A. Cichocki, \"Blind deconvolution/equalization using state(cid:173)\nspace models\", Proc. '98 IEEE Signal Processing Society Workshop on NNSP, \nppI23-131 , Cambridge, 1998. \n\n[13] S. Choi, A. Cichocki and S. Amari, \"Blind equalization of simo channels via \nspatio-temporal anti-Hebbian learning rule\", Proc. '98 IEEE Signal Processing \nSociety Workshop on NNSP, pp93-102, Cambridge, 1998. \n\n\fPART V \n\nIMPLEMENTATION \n\n\f\f", "award": [], "sourceid": 1568, "authors": [{"given_name": "Liqing", "family_name": "Zhang", "institution": null}, {"given_name": "Andrzej", "family_name": "Cichocki", "institution": null}]}