{"title": "Nonlinear Blind Source Separation by Integrating Independent Component Analysis and Slow Feature Analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 177, "page_last": 184, "abstract": null, "full_text": "Nonlinear Blind Source Separation by\n\nIntegrating Independent Component Analysis\n\nand Slow Feature Analysis\n\nTobias Blaschke\n\nInstitute for Theoretical Biology\n\nHumboldt University Berlin\n\nInvalidenstra\u00dfe 43, D-10115 Berlin, Germany\nt.blaschke@biologie.hu-berlin.de\n\nLaurenz Wiskott\n\nInstitute for Theoretical Biology\n\nHumboldt University Berlin\n\nInvalidenstra\u00dfe 43, D-10115 Berlin, Germany\nl.wiskott@biologie.hu-berlin.de\n\nAbstract\n\nIn contrast to the equivalence of linear blind source separation and linear\nindependent component analysis it is not possible to recover the origi-\nnal source signal from some unknown nonlinear transformations of the\nsources using only the independence assumption.\nIntegrating the ob-\njectives of statistical independence and temporal slowness removes this\nindeterminacy leading to a new method for nonlinear blind source sepa-\nration. The principle of temporal slowness is adopted from slow feature\nanalysis, an unsupervised method to extract slowly varying features from\na given observed vectorial signal. The performance of the algorithm is\ndemonstrated on nonlinearly mixed speech data.\n\n1\n\nIntroduction\n\nUnlike in the linear case the nonlinear Blind Source Separation (BSS) problem can not be\nsolved solely based on the principle of statistical independence [1, 2]. Performing non-\nlinear BSS with Independent Component Analysis (ICA) requires additional information\nabout the underlying sources or to regularize the nonlinearities. Since source signal com-\nponents are usually more slowly varying than any nonlinear mixture of them we consider\nto require the estimated sources to be as slowly varying as possible. This can be achieved\nby incorporating ideas from Slow Feature Analysis (SFA) [3] into ICA.\n\nAfter a short introduction to linear BSS, nonlinear BSS, and SFA we will show a way how\nto combine SFA and ICA to obtain an algorithm that solves the nonlinear BSS problem.\n\n\f2 Linear Blind Source Separation\n\nLet x(t) = [x1 (t) ; : : : ; xN (t)]T be a linear mixture of a source signal s(t) =\n[s1 (t) ; : : : ; sN (t)]T and de\ufb01ned by\n\nwith an invertible N (cid:2) N mixing matrix A. Finding a mapping\n\nx (t) = As (t) ;\n\nu (t) = QWx (t)\n\n(1)\n\n(2)\n\nsuch that the components of u are mutually statistically independent is called Indepen-\ndent Component Analysis (ICA). The mapping is often divided into a whitening mapping\nW, resulting in uncorrelated signal components yi with unit variance and a successive or-\nthogonal transformation Q, because one can show [4] that after whitening an orthogonal\ntransformation is suf\ufb01cient to obtain independence. It is well known that ICA solves the\nlinear BSS problem [4]. There exists a variety of algorithms performing ICA and therefore\nBSS (see e.g. [5, 6, 7]). Here we focus on a method using only second-order statistics\nintroduced by Molgedey and Schuster [8]. The method consists of optimizing an objective\nfunction subject to minimization, which can be written as\n\n(cid:9)ICA (Q) =\n\n0\nX(cid:13);(cid:14)=1\n@\noperating on the already whitened signal y. C (y)\n(cid:13)(cid:14) ((cid:28) ) is an entry of a symmetrized time\ndelayed covariance matrix de\ufb01ned by\n\n(cid:13)(cid:14) ((cid:28) )1\nA\n\n(cid:11)(cid:12) ((cid:28) )(cid:17)2\n(cid:16)C (u)\n\nQ(cid:11)(cid:13)Q(cid:12)(cid:14)C (y)\n\nN\n\nX(cid:11);(cid:12)=1\n\n(cid:11)6=(cid:12)\n\nN\n\nX(cid:11);(cid:12)=1\n\n(cid:11)6=(cid:12)\n\n=\n\nN\n\n2\n\n;\n\n(3)\n\nC(y) ((cid:28) ) = Dy (t) y (t + (cid:28) )T + y (t + (cid:28) ) y (t)TE ;\n\n(4)\n\nand C(u) ((cid:28) ) is de\ufb01ned correspondingly. Q(cid:11)(cid:12) denotes an entry of Q. Minimization of\n(cid:9)ICA can be understood intuitively as \ufb01nding an orthogonal matrix Q that diagonalizes the\ncovariance matrix with time delay (cid:28) . Since, because of the whitening, the instantaneous\ncovariance matrix is already diagonal this results in signal components that are decorrelated\ninstantaneously and at a given time delay (cid:28) . This can be suf\ufb01cient to achieve statistical\nindependence [9].\n\n2.1 Nonlinear BSS and ICA\n\nAn obvious extension to the linear mixing model (1) has the form\n\nx (t) = F (s (t)) ;\n\n(5)\n\nwith a function F ((cid:1) ) RN ! RM that maps N-dimensional source vectors s onto M-\ndimensional signal vectors x. The components xi of the observable are a nonlinear mixture\nof the sources and like in the linear case source signal components si are assumed to be\nmutually statistically independent. Extracting the source signal is in general only possible\nif F ((cid:1) ) is an invertible function, which we will assume from now on.\nThe equivalence of BSS and ICA in the linear case does in general not hold for a nonlinear\nfunction F ((cid:1) ) [1, 2]. To solve the nonlinear BSS problem additional constraints on the\nmixture or the estimated signals are needed to bridge the gap between ICA and BSS. Here\nwe propose a new way to achieve this by adding a slowness objective to the independence\nobjective of pure ICA. Assume for example a sinusoidal signal component xi = sin (2(cid:25)t)\nand a second component that is the square of the \ufb01rst xj = x2\ni = 0:5 (1 (cid:0) cos (4(cid:25)t))\nis given. The second component is more quickly varying due to the frequency doubling\n\n\finduced by the squaring. Typically nonlinear mixtures of signal components are more\nquickly varying than the original components. To extract the right source components one\nshould therefore prefer the slowly varying ones. The concept of slowness is used in our\napproach to nonlinear BSS by combining an ICA part that provides the independence of\nthe estimated source signal components with a part that prefers slowly varying signals over\nmore quickly varying ones. In the next section we will give a short introduction to Slow\nFeature Analysis (SFA) building the basis of the second part of our method.\n\n3 Slow Feature Analysis\n\nAssume a vectorial input signal x(t) = [x1(t); : : : ; xM (t)]T is given. The objective of SFA\nis to \ufb01nd an in general nonlinear input-output function u (t) = g (x (t)) with g (x (t)) =\n[g1 (x (t)) ; : : : ; gR (x (t))]T such that the ui (t) are varying as slowly as possible. This\ncan be achieved by successively minimizing the objective function\n\nfor each ui under the constraints\n\n(cid:1)(ui)\n\n:= (cid:10) _u2\ni(cid:11)\n\nhuii = 0\n\n(cid:10)u2\ni(cid:11) = 1\n\nhuiuji = 0 8 j < i\n\n(zero mean),\n(unit variance),\n(decorrelation and order).\n\n(6)\n\n(7)\n(8)\n(9)\n\nConstraints (7) and (8) ensure that the solution will not be the trivial solution ui = const.\nConstraint (9) provides uncorrelated output signal components and thus guarantees that\ndifferent components carry different information. Intuitively we are searching for signal\ncomponents ui that have on average a small slope.\nInterestingly Slow Feature Analysis (SFA) can be reformulated with an objective function\nsimilar to second-order ICA, subject to maximization [10],\n\n(cid:9)SFA (Q) =\n\nM\n\nX(cid:11)=1(cid:16)C (u)\n\n(cid:11)(cid:11) ((cid:28) )(cid:17)2\n\n=\n\nM\n\nX(cid:11)=1\n\n0\n@\n\nM\n\nX(cid:12);(cid:13)=1\n\nQ(cid:11)(cid:12)Q(cid:11)(cid:13)C (y)\n\n2\n\n(cid:12)(cid:13) ((cid:28) )1\nA\n\n:\n\n(10)\n\nTo understand (10) intuitively we notice that slowly varying signal components are easier\nto predict, and should therefore have strong auto correlations in time. Thus, maximizing\nthe time delayed variances produces slowly varying signal components.\n\n4\n\nIndependent Slow Feature Analysis\n\nIf we combine ICA and SFA we obtain a method we refer to as Independent Slow Feature\nAnalysis (ISFA) that recovers independent components out of a nonlinear mixture using a\ncombination of SFA and second-order ICA. As already explained, second-order ICA tends\nto make the output components independent and SFA tends to make them slow. Since\nwe are dealing with a nonlinear mixture we \ufb01rst compute a nonlinearly expanded signal\nz = h (x) with h ((cid:1) ) RM ! RL being typically monomials up to a given degree, e.g. an\nexpansion with monomials up to second degree can be written as\n\nh (x (t)) = [x1; : : : ; xN ; x1x1; x1x2; : : : ; xM xM ]T (cid:0) hT\n0\n\n(11)\n\nwhen given an M-dimensional signal x. The constant vector hT\n0 is used to make the ex-\npanded signal mean free. In a second step z is whitened to obtain y = Wz. Thirdly we\napply linear ICA combined with linear SFA on y in order to \ufb01nd the estimated source signal\n\n\fu. Because of the whitening we know that ISFA, like ICA and SFA, is solved by \ufb01nding\nan orthogonal L (cid:2) L matrix Q. We write the estimated source signal u as\n\nv = (cid:18) u\n\n~u (cid:19) = Qy = QWz = QWh (x) ;\n\n(12)\n\nwhere we introduced ~u, since R, the dimension of the estimated source signal u, is usually\nmuch smaller than L, the dimension of the expanded signal. While the ui are statistically\nindependent and slowly varying the components ~ui are more quickly varying and may be\nstatistically dependent on each other as well as on the selected components.\n\nTo summarize, we have an M dimensional input x an L dimensional nonlinearly expanded\nand whitened y and an R dimensional estimated source signal u. ISFA searches an R\ndimensional subspace such that the ui are independent and slowly varying. This is achieved\nat the expense of all ~ui.\n\n4.1 Objective function\n\nTo recover R source signal components ui i = 1; : : : ; R out of an L-dimensional expanded\nand whitened signal y the objective reads\n\n(cid:9)ISFA (u1; : : : ; uR; (cid:28) ) = bICA\n\nR\n\nX(cid:11);(cid:12)=1;\n\n(cid:11)6=(cid:12)\n\n(cid:11)(cid:12) ((cid:28) )(cid:17)2\n(cid:16)C (u)\n\n(cid:0) bSFA\n\nR\n\nX(cid:11)=1(cid:16)C (u)\n\n(cid:11)(cid:11) ((cid:28) )(cid:17)2\n\n; (13)\n\nwhere we simply combine the ICA objective (3) and SFA objective (10) weighted by the\nfactors bICA and bSFA, respectively. Note that the ICA objective is usually applied to the\nlinear case to unmix the linear whitened mixture y whereas here it is used on the nonlinearly\nexpanded whitened signal y = Wz. ISFA tries to minimize (cid:9)ISFA which is the reason why\nthe SFA part has a negative sign.\n\n4.2 Optimization Procedure\n\nFrom (12) we know that C(u) ((cid:28) ) in (13) depends on the orthogonal matrix Q. There are\nseveral ways to \ufb01nd the orthogonal matrix that minimizes the objective function. Here we\napply successive Givens rotations to obtain Q. A Givens rotation Q(cid:22)(cid:23) is a rotation around\nthe origin within the plane of two selected components (cid:22) and (cid:23) and has the matrix form\n\nQ(cid:22)(cid:23)\n\n(cid:11)(cid:12) := 8><\n>:\n\ncos((cid:30))\n(cid:0) sin((cid:30))\nsin((cid:30))\n(cid:14)(cid:11)(cid:12)\n\nfor ((cid:11); (cid:12)) 2 f((cid:22); (cid:22)) ; ((cid:23); (cid:23))g\nfor ((cid:11); (cid:12)) 2 f((cid:22); (cid:23))g\nfor ((cid:11); (cid:12)) 2 f((cid:23); (cid:22))g\notherwise\n\n(14)\n\nwith Kronecker symbol (cid:14)(cid:11)(cid:12) and rotation angle (cid:30). Any orthogonal L (cid:2) L matrix such\nas Q can be written as a product of L(L(cid:0)1)\n(or more) Givens rotation matrices Q(cid:22)(cid:23) (for\nthe rotation part) and a diagonal matrix with elements (cid:6)1 (for the re\ufb02ection part). Since\nre\ufb02ections do not matter in our case we only consider the Givens rotations as is often used\nin second-order ICA algorithms (see e.g. [11]).\n\n2\n\nWe can therefore write the objective as a function of a Givens rotation Q(cid:22)(cid:23) as\n\n(cid:9)ISFA (Q(cid:22)(cid:23)) = bICA\n\nbSFA\n\nR\n\n(cid:11)6=(cid:12)\n\n0\nX(cid:11);(cid:12)=1;\nX(cid:13);(cid:14)=1\n@\n0\n@\n\nX(cid:12);(cid:13)=1\n\nX(cid:11)=1\n\nR\n\nL\n\nL\n\nQ(cid:22)(cid:23)\n\n(cid:11)(cid:13)Q(cid:22)(cid:23)\n\n(cid:12)(cid:14) C (y)\n\nQ(cid:22)(cid:23)\n\n(cid:11)(cid:12)Q(cid:22)(cid:23)\n\n(cid:11)(cid:13)C (y)\n\n2\n\n(cid:0)\n\n:\n\n(15)\n\n(cid:13)(cid:14) ((cid:28) )1\nA\n(cid:12)(cid:13) ((cid:28) )1\nA\n\n2\n\n\fAssume we want to minimize (cid:9)ISFA for a given R, where R denotes the number of signal\ncomponents we want to extract. Applying a Givens rotation Q(cid:22)(cid:23) we have to distinguish\nthree cases.\n\n(cid:15) Case 1: Both axes u(cid:22) and u(cid:23) lie inside the subspace spanned by the \ufb01rst R axes\n((cid:22); (cid:23) (cid:20) R). The sum over all squared cross correlations of all signal components\nthat lie outside the subspace is constant as well as those of all signal components\ninside the subspace. There is no interaction between inside and outside, in fact the\nobjective function is exactly the objective for an ICA algorithm based on second-\norder statistics e.g. TDSEP or SOBI [12, 13]. In [10] it has been shown that this\nis equivalent to SFA in the case of a single time delay.\n\n(cid:15) Case 2: Only one axis, w.l.o.g. u(cid:22), lies inside the subspace, the other, u(cid:23), outside\n((cid:22) (cid:20) R < (cid:23)). Since one axis of the rotation plane lies outside the subspace, u(cid:22) in\nthe objective function can be optimized at the expense of ~u(cid:23) outside the subspace.\nA rotation of (cid:25)=2, for instance, would simply exchange components u(cid:22) and u(cid:23).\nThis gives the possibility to \ufb01nd the slowest and most independent components in\nthe whole space spanned by all ui and ~uj (i = 1; : : : ; R; j = R + 1; : : : ; L) in\ncontrast to Case 1 where the minimum is searched within the subspace spanned\nby the R components in the objective function.\n\n(cid:15) Case 3: Both axes lie outside the subspace (R < (cid:22); (cid:23)): A Givens rotation with\nthe two rotation axes outside the relevant subspace does not affect the objective\nfunction and can therefore be disregarded.\n\nIt can be shown that like in [14] the objective function (15) as a function of (cid:30) can always\nbe written in the form\n\n(cid:9)(cid:22)(cid:23)\n\nISFA ((cid:30)) = A0 + A2 cos (2(cid:30) + (cid:30)2) + A4 cos (4(cid:30) + (cid:30)4) ;\n\n(16)\n\nwhere the second term on the right hand side vanishes for Case 1. There exists a single\n\nminimum (if w.l.o.g. (cid:30) 2 (cid:2)(cid:0) (cid:25)\n2(cid:3)) that can easily be calculated (see e.g.[14]). The\n2 ; (cid:25)\n\nderivation of (16) involves various trigonometric identities and, because of its length, is\ndocumented elsewhere1.\nIt is important to notice that the rotation planes of the Givens rotations are selected from\nthe whole L-dimensional space whereas the objective function only uses information of\ncorrelations among the \ufb01rst R signal components ui. Successive application of Givens\nrotations Q(cid:22)(cid:23) leads to the \ufb01nal rotation matrix Q which is in the ideal case such that\nQT C(y) ((cid:28) ) Q = C(v) ((cid:28) ) has a diagonal R (cid:2) R submatrix C(u) ((cid:28) ), but it is not clear if\nthe \ufb01nal minimum is also the global one. However, in various simulations no local minima\nhave been found.\n\n4.3\n\nIncremental Extracting of Independent Components\n\nIt is possible to \ufb01nd the number of independent source signal components R by succes-\nsively increasing the number of components to be extracted. In each step the objective\nfunction (13) is optimized for a \ufb01xed R. First a single signal component is extracted\n(R = 1) and then an additional one (R = 2) etc. The algorithm is stopped when no addi-\ntional signal component can be extracted. As a stopping criterion every suitable measure\nof independence can be applied; we used the sum over squared cross-cumulants of fourth\norder. In our arti\ufb01cial examples this value is typically small for independent components,\nand increases by two orders of magnitudes if the number of components to be extracted is\ngreater than the number of original source signal components.\n\n1http://itb.biologie.hu-berlin.de/~blaschke\n\n\f5 Simulation\n\nHere we show a simple example, with two nonlinearly mixed signal components as shown\nin Figure 1. The mixture is de\ufb01ned by\n\nx1 (t) = (s1 (t) + 1) sin ((cid:25)s2 (t)) ;\nx2 (t) = (s1 (t) + 1) cos ((cid:25)s2 (t)) :\n\n(17)\nWe used the ISFA algorithm with different nonlinearities (see Tab. 1). Already a nonlin-\near expansion with monomials up to degree three was suf\ufb01cient to give good results in\nextracting the original source signal (see Fig. 1). In all cases ISFA did \ufb01nd exactly two\nindependent signal components. A linear BSS method failed completely to \ufb01nd a good\nunmixing matrix.\n\n6 Conclusion\n\nWe have shown that connecting the ideas of slow feature analysis and independent com-\nponent analysis into ISFA is a possible way to solve the nonlinear blind source separation\nproblem. SFA enforces the independent components of ICA to be slowly varying which\nseems to be a good way to discriminate between the original and nonlinearly distorted\nsource signal components. A simple simulation showed that ISFA is able to extract the\noriginal source signal out of a nonlinear mixture. Furthermore ISFA can predict the num-\nber of source signal components via an incremental optimization scheme.\n\nAcknowledgments\n\nThis work has been supported by the Volkswagen Foundation through a grant to LW for a\njunior research group.\n\nReferences\n\n[1] A. Hyv\u00e4rinen and P. Pajunen. Nonlinear independent component analysis: existence\n\nand uniqueness results. Neural Networks, 12(3):429\u2013439, 1999.\n\n[2] C. Jutten and J. Karhunen. Advances in nonlinear blind source separation. In Proc.\nof the 4th Int. Symposium on Independent Component Analysis and Blind Signal Sep-\naration, Nara, Japan, (ICA 2003), pages 245\u2013256, 2003.\n\n[3] Laurenz Wiskott and Terrence Sejnowski. Slow feature analysis: Unsupervised learn-\n\ning of invariances. Neural Computation, 14(4):715\u2013770, 2002.\n\nTable 1: Correlation coef\ufb01cients of extracted (u1 and u2) and original (s1 and s2) source\nsignal components\n\nlinear\n\nu1\n\n-0.803\n0.332\n\nu2\n\n-0.544\n0.517\n\ndegree 2\n\nu1\n\n-0.001\n-0.988\n\nu2\n\n-0.978\n-0.001\n\ns1\ns2\n\ndegree 3\nu1\n\nu2\n\n0.001\n-0.995\n\n0.995\n0.001\n\ndegree 4\nu1\n\nu2\n\n0.002\n-0.996\n\n0.995\n0.000\n\nCorrelation coef\ufb01cients of extracted (u1 and u2) and original (s1 and s2) source signal com-\nponents for linear ICA (\ufb01rst column) and ISFA with different nonlinearities (monomials up\nto degree 2, 3, and 4). Using monomials up to degree 3 in the nonlinear expansion step\nalready suf\ufb01ces to extract the original source signal. Note that the source signal can only\nbe estimated up to permutation and scaling, resulting in different signs and permutations of\nthe two estimated source signal components.\n\n\fs1 \n\ns2 \n\nx1 \n\nx2 \n\nu1 \n\nu2 \n\n(a)\n\n(b)\n\n(c)\n\nFigure 1: Waveforms and Scatter-plots of (a) the original source signal components si,\n(b) the nonlinear mixture, and (c) recovered components with nonlinear ISFA (ui). As a\nnonlinearity we used all monomials up to degree 4.\n\n\f[4] P. Comon.\n\nIndependent component analysis, a new concept? Signal Processing,\n\n36(3):287\u2013314, 1994. Special Issue on Higher-Order Statistics.\n\n[5] J.-F. Cardoso and A. Souloumiac. Blind beamforming for non Gaussian signals. IEE\n\nProceedings-F, 140:362\u2013370, 1993.\n\n[6] T.-W. Lee, M. Girolami, and T.J. Sejnowski. Independent component analysis using\nan extended Infomax algorithm for mixed sub-Gaussian and super-Gaussian sources.\nNeural Computation, 11(2):409\u2013433, 1999.\n\n[7] A. Hyv\u00e4rinen. Fast and robust \ufb01xed-point algorithms for independent component\n\nanalysis. IEEE Transactions on Neural Networks, 10(3):626\u2013634, 1999.\n\n[8] L. Molgedey and G. Schuster. Separation of a mixture of independent signals using\n\ntime delayed correlations. Physical Review Letters, 72(23):3634\u20133637, 1994.\n\n[9] Lang Tong, Ruey-wen Liu, Victor C. Soon, and Yih-Fang Huang. Indeterminacy and\nIEEE Transactions on Circuits and Systems,\n\nidenti\ufb01ability of blind identi\ufb01cation.\n38(5):499\u2013509, may 1991.\n\n[10] T. Blaschke, L. Wiskott, and P. Berkes. What is the relation between independent\n\ncomponent analysis and slow feature analysis? (in preparation), 2004.\n\n[11] Jean-Fran\u00e7ois Cardoso and Antoine Souloumiac. Jacobi angles for simultaneous di-\n\nagonalization. SIAM J. Mat. Anal. Appl., 17(1):161\u2013164, 1996.\n\n[12] A. Ziehe and K.-R. M\u00fcller. TDSEP \u2013 an ef\ufb01cient algorithm for blind separation using\nIn Proc. of the 8th Int. Conference on Arti\ufb01cial Neural Networks\n\ntime structure.\n(ICANN\u201998), pages 675 \u2013 680, Berlin, 1998. Springer Verlag.\n\n[13] Adel Belouchrani, Karim Abed Meraim, Jean-Fran\u00e7ois Cardoso, and \u00c9ric Moulines.\nA blind source separation technique based on second order statistics. IEEE Transac-\ntions on Signal Processing, 45(2):434\u201344, 1997.\n\n[14] T. Blaschke and L. Wiskott. CuBICA: Independent component analysis by simulta-\nneous third- and fourth-order cumulant diagonalization. IEEE Transactions on Signal\nProcessing, 52(5):1250\u20131256, 2004.\n\n\f", "award": [], "sourceid": 2736, "authors": [{"given_name": "Tobias", "family_name": "Blaschke", "institution": null}, {"given_name": "Laurenz", "family_name": "Wiskott", "institution": null}]}