{"title": "Sparse Representation and Its Applications in Blind Source Separation", "book": "Advances in Neural Information Processing Systems", "page_first": 241, "page_last": 248, "abstract": "", "full_text": "Sparse Representation and Its Applications in\n\nBlind Source Separation\n\nYuanqing Li, Andrzej Cichocki, Shun-ichi Amari, Sergei Shishkin\n\nRIKEN Brain Science Institute, Saitama, 3510198, Japan\n\nDepartment of Electronic Engineering\n\nDepartment of Physiology and Biophysics\n\nFanji Gu\n\nFudan University\nShanghai, China\n\nJianting Cao\n\nSaitama Institute of Technology\n\nSaitama, 3510198, Japan\n\nAbstract\n\nIn this paper, sparse representation (factorization) of a data matrix is\n\ufb01rst discussed. An overcomplete basis matrix is estimated by using the\nK(cid:0)means method. We have proved that for the estimated overcom-\nplete basis matrix, the sparse solution (coef\ufb01cient matrix) with minimum\nl1(cid:0)norm is unique with probability of one, which can be obtained using\na linear programming algorithm. The comparisons of the l1(cid:0)norm so-\nlution and the l0(cid:0)norm solution are also presented, which can be used\nin recoverability analysis of blind source separation (BSS). Next, we ap-\nply the sparse matrix factorization approach to BSS in the overcomplete\ncase. Generally, if the sources are not suf\ufb01ciently sparse, we perform\nblind separation in the time-frequency domain after preprocessing the\nobserved data using the wavelet packets transformation. Third, an EEG\nexperimental data analysis example is presented to illustrate the useful-\nness of the proposed approach and demonstrate its performance. Two\nalmost independent components obtained by the sparse representation\nmethod are selected for phase synchronization analysis, and their peri-\nods of signi\ufb01cant phase synchronization are found which are related to\ntasks. Finally, concluding remarks review the approach and state areas\nthat require further study.\n\n1\n\nIntroduction\n\nSparse representation or sparse coding of signals has received a great deal of attention in\nrecent years. For instance, sparse representation of signals using large-scale linear pro-\ngramming under given overcomplete bases (e.g., wavelets) was discussed in [1]. Also, in\n[2], a sparse image coding approach using the wavelet pyramid architecture was presented.\nSparse representation can be used in blind source separation [3][4]. In [3], a two stage ap-\nproach was proposed, that is, the \ufb01rst is to estimate the mixing matrix by using a clustering\nalgorithm, the second is to estimate the source matrix. In our opinion, there are still three\nfundamental problems related to sparse representation of signals and BSS which need to be\n\n\ffurther studied: 1) detailed recoverability analysis; 2) high dimensionality of the observed\ndata; 3) overcomplete case in which the sources number is unknown.\n\nThe present paper \ufb01rst considers sparse representation (factorization) of a data matrix based\non the following model\n\n(1)\nwhere the X = [x(1); (cid:1) (cid:1) (cid:1) ; x(N )] 2 Rn(cid:2)N (N (cid:29) 1) is a known data matrix, B =\n[b1 (cid:1) (cid:1) (cid:1) bm] is a n (cid:2) m basis matrix, S = [s1; (cid:1) (cid:1) (cid:1) ; sN ] = [sij]m(cid:2)N is a coef\ufb01cient ma-\ntrix, also called a solution corresponding to the basis matrix B. Generally, m > n, which\nimplies that the basis is overcomplete.\n\nX = BS;\n\nThe discussion of this paper is under the following assumptions on (1).\nAssumption 1: 1. The number of basis vectors m is assumed to be \ufb01xed in advance and\nsatis\ufb01es the condition n (cid:20) m < N. 2. All basis vectors are normalized to be unit vectors\nwith their 2(cid:0)norms being equal to 1 and all n basis vectors are linearly independent.\nThe rest of this paper is organized as follows. Section 2 analyzes the sparse representation\nof a data matrix. Section 3 presents the comparison of the l0 norm solution and l1 norm\nsolution. Section 4 discusses blind source separation via sparse representation. An EEG\ndata analysis example is given in Section 5. Concluding remarks in Section 6 summarize\nthe advantages of the proposed approach.\n\n2 Sparse representation of data matrix\n\nIn this section, we discuss sparse representation of the data matrix X using the two-stage\napproach proposed in [3]. At \ufb01rst, we apply an algorithm based on K (cid:0)means clustering\nmethod for \ufb01nding a suboptimal basis matrix that is composed of the cluster centers of the\nnormalized, known data vectors as in [3]. With this kind of cluster center basis matrix, the\ncorresponding coef\ufb01cient matrix estimated by linear programming algorithm presented in\nthis section can become very sparse.\nAlgorithm outline 1: Step 1. Normalize the data vectors. Step 2. Begin a K (cid:0)means\nclustering iteration followed by normalization to estimate the suboptimal basis matrix. End\n\nNow we discuss the estimation of the coef\ufb01cient matrix. For a given basis matrix B in (1),\nthe coef\ufb01cient matrix can be found by solving the following optimization problem as in\nmany existing references (e.g., [3, 5]),\n\nmin\n\nm\n\nN\n\nX\n\nX\n\ni=1\n\nj=1\n\njsijj; subject to BS = X:\n\n(2)\n\nIt is not dif\ufb01cult to prove that the linear programming problem (2) is equivalent to the\nfollowing set of N smaller scale linear programming problems:\n\nmin\n\nm\n\nX\n\ni=1\n\njsijj; subject to Bsj = x(j); j = 1; (cid:1) (cid:1) (cid:1) ; N:\n\n(3)\n\nBy setting S = U (cid:0) V, where U = [uij]m(cid:2)N (cid:21) 0, V = [vij]m(cid:2)N (cid:21) 0, (3) can\nbe converted to the following standard linear programming problems with non-negative\nconstraints,\n\nmin\n\nm\n\nX\n\ni=1\n\n(uij + vij); subject to [B; (cid:0)B][uT\n\nj ; vT\n\nj ]T = x(j); uj (cid:21) 0; vj (cid:21) 0;\n\n(4)\n\nwhere j = 1; (cid:1) (cid:1) (cid:1) ; N.\n\n\fTheorem 1 For almost all bases B 2 Rn(cid:2)m, the sparse solution (l1(cid:0)norm solution) of\n(1) is unique. That is, the set of bases B, under which the sparse solution of (1) is not\nunique, is of measure zero. And there are at most n nonzero entries of the solution.\n\nIt follows from Theorem 1 that for any given basis, there exists a unique sparse solution of\n(2) with probability of one.\n\n3 Comparison of the l0 norm solution and l1 norm solution\n\nUsually, l0 norm J0(S) =\n\njsijj0 (the number of nonzero entries of S) is used as a\n\nn\n\nN\n\nP\n\ni=1\n\nP\n\nj=1\n\nsparsity measure of S, since it ensures the sparsest solution. Under this measure, the sparse\nsolution is obtained by solving the problem\n\nmin\n\nm\n\nN\n\nX\n\nX\n\ni=1\n\nj=1\n\njsijj0; subject to BS = X:\n\n(5)\n\nIn [5], is discussed optimally sparse representation in general (non-orthogonal) dictionaries\nvia l1(cid:0)norm minimization, and two suf\ufb01cient conditions are proposed on the nonzero entry\nnumber of the l0(cid:0)norm solution, under which the equivalence between l0(cid:0)norm solution\nand l1(cid:0)norm solution holds precisely. However, these bounds are very small in real world\nsituations generally, if the basis vectors are far away from orthogonality. For instance,\nthe bound is smaller than 1:5 in the simulation experiments shown in the next section.\nThis implies that the l0(cid:0)norm solution allows only one nonzero entry in order that the\nequivalence holds. In the next, we will also discuss the equivalence of the l0 norm solution\nand l1 norm solution but from the viewpoint of probability.\n\nFirst, we introduce the two optimization problems:\n\n(P0) min\n\nm\n\nP\n\ni=1\n\njsij0; subject to As = x;\n\n(P1) min\n\nm\n\nP\n\ni=1\n\njsij; subject to As = x:\n\nwhere A 2 Rn(cid:2)m, x 2 Rn are a known basis matrix and a data vector, respectively, and\ns 2 Rm, n (cid:20) m. Suppose that s0(cid:3) is a solution of (P0), and s1(cid:3) is a solution of (P1).\n\nTheorem 2 The solution of (P0) is not robust to additive noise of the model, while the\nsolution of (P1) is robust to additive noise, at least to some degree.\n\nAlthough the problem (P0) provides the sparsest solution, it is not an ef\ufb01cient way to \ufb01nd\nthe solution by solving the problem (P0). The reasons are: 1) if jjs0(cid:3)jj0 = n, then the\nsolution of (P0) is not unique generally; 2) until now, an effective algorithm to solve the\noptimization problem (P0) does not exist (it has been proved that problem (P0) is NP\nhard); 3) the solution of (P0) is not robust to noise. In contrast, the solution of (P1) is\nunique with a probability of one according to Theorem 1. It is well known that there are\nmany ef\ufb01cient optimization tools to solve the problem (P1). From the above mentioned\nfacts arises naturally a problem: what is the condition under which the solution of (P1) is\none of the sparsest solutions, that is, the solution has the same number of nonzero entries\nas the solution of (P0)? In the following, we will discuss the problem.\n\nLemma 1 Suppose that x 2 Rn and A 2 Rn(cid:2)m are selected randomly. If x is represented\nby a linear combination of k column vectors of A, then k (cid:21) n generally, that is, the\nprobability that k < n is zero.\n\n\fTheorem 3 For the optimization problems (P0) and (P1), suppose that A 2 Rn(cid:2)m is\nselected randomly, x 2 Rn is generated by As(cid:3), l = jjs(cid:3)jj0 < n, and that all nonzero\nentries of s(cid:3) are also selected randomly. We have\n\n1. s(cid:3) is the unique solution of (P0) with probability of one, that is, s0(cid:3) = s(cid:3). And\nif jjs1(cid:3)jj0 < n, then s1(cid:3) = s(cid:3) with probability of one. 2. The probability P (s1(cid:3) =\ns(cid:3)) (cid:21) (P (1; l; n; m))l, where P (1; l; n; m) (1 (cid:20) l (cid:20) n) are n probabilities satisfying\n1 = P (1; 1; n; m) (cid:21) P (1; 2; n; m) (cid:21) (cid:1) (cid:1) (cid:1) (cid:21) P (1; n; n; m) (their explanations are omit-\nted here due to limit of space). 3. For given positive integers l0 and n0, if l (cid:20) l0, and\nm (cid:0) n (cid:20) n0, then lim\n\nP (s1(cid:3) = s(cid:3)) = 1:\n\nn!+1\n\nRemarks 1: 1. From Theorem 3, if n and m are \ufb01xed, and l is suf\ufb01ciently small, then\ns1(cid:3) = s(cid:3) with a high probability. 2. For \ufb01xed l and m (cid:0) n, if n is suf\ufb01ciently large, then\ns1(cid:3)=s(cid:3) with a high probability. Theorem 3 will be used in recoverability analysis of BSS.\n\n4 Blind source separation based on sparse representation\n\nIn this section, we discuss blind source separation based on sparse representation of mixture\nsignals. The proposed approach is also suitable for the case in which the number of sensors\nis less than or equal to the number of sources, while the number of source is unknown. We\nconsider the following noise-free model,\n\nxi = Asi; i = 1; (cid:1) (cid:1) (cid:1) ; N;\n\n(6)\n\nwhere the mixing matrix A 2 Rn(cid:2)m is unknown, the matrix S = [s1; (cid:1) (cid:1) (cid:1) ; sN ] 2\nRm(cid:2)N is composed by the m unknown sources, and the only observed data matrix\nX = [x1; (cid:1) (cid:1) (cid:1) ; xN ] 2 Rn(cid:2)N that has rows containing mixtures of sources, n (cid:20) m. The\ntask of blind source separation is to recover the sources using only the observable data\nmatrix X.\n\nWe also use a two-step approach presented in [3] for BSS. The \ufb01rst step is to estimate the\nmixing matrix using clustering Algorithm 1. If the mixing matrix is estimated correctly,\nand a source vector s(cid:3) satis\ufb01es that jjs(cid:3)jj0 = l < n, then by Theorem 3, s(cid:3) is the l0-\nnorm solution of (6) with probability one. And if the source vector is suf\ufb01ciently sparse,\ne.g., l is suf\ufb01ciently small compared with n, then it can be recovered by solving the linear\nprogramming problem (P1) with a high probability. Considering the source number is\nunknown generally, we denote the estimated mixing matrix (cid:22)A = [ ~A; 4A] 2 Rn(cid:2)m0\n(m0 > m). We introduce the following optimization problem (P 0\n1) and denote its solution\n(cid:22)s = [~sT ; 4sT ]T 2 Rm0,\n\n(P 0\n\n1) min\n\nm0\nP\n\ni=1\n\njsij; subject to (cid:22)As = x:\n\nWe can prove the following recoverability result.\n\nTheorem 4 Suppose that the sub-matrix ~A (of the estimated mixing matrix (cid:22)A) is suf\ufb01-\nciently close to the true mixing matrix A neglecting scaling and permutation ambiguities,\nand that a source vector is suf\ufb01ciently sparse. Then the source vector can be recovered\nwith a high probability (close to one) by solving (P 0\n1). That is, ~s is suf\ufb01ciently close to the\noriginal source vector, and 4s is close to zero vector.\n\nTo illustrate Theorem 4 partially, we have performed two simulation experiments in which\nthe mixing matrix is supposed to be estimated correctly. Fig. 1 shows the probabilities\nthat a source vector can be recovered correctly in different cases, estimated in the two\n\n\fIn the \ufb01rst simulation, n and m are \ufb01xed to be 10 and 15, respectively, l\nsimulations.\ndenotes the number of nonzero entries of source vector and changes from 1 to 15. For\nevery \ufb01xed nonzero entry number l, the probabilities that the source vector is recovered\ncorrectly is estimated through 3000 independent repeated stochastic experiments, in which\nthe mixing matrix A and all nonzero entries of the source vector s0 are selected randomly\naccording to the uniform distribution. Fig. 1 (a) shows the probability curve. We can see\nthat the source can be estimated correctly when l = 1; 2, and the probability is greater than\n0:95 when l (cid:20) 5.\nIn the second simulation experiment, all original source vectors have 5 nonzero entries,\nthat is, l = 5; and m = 15. The dimension n of the mixture vectors varies from 5 to\n15. As in the \ufb01rst simulation, the probabilities for correctly estimated source vectors are\nestimated through 3000 stochastic experiments and showed in Fig. 1 (b). It is evident that\nwhen n (cid:21) 10, the source can be estimated correctly with probability higher than 0:95.\n\n(a)\n\ny\nt\ni\n\nb\na\nb\no\nr\nP\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n(b)\n\ny\nt\ni\nl\ni\n\nb\na\nb\no\nr\nP\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n0\n\n5\n\n10\n\n15\n\n0\n\n5\n\nl\n\n10\n\nn\n\n15\n\nFigure 1: (a) the probability curve that the source vectors are estimated correctly as a\nfunction of l obtained in the \ufb01rst simulation; (b) the probability curve that the source vectors\nare estimated correctly as a function of n obtained in the second simulation.\n\nIn order to estimate the mixing matrix correctly, the sources should be suf\ufb01ciently sparse.\nThus sparseness of the sources plays an important role not only in estimating the sources\nbut also in estimating the mixing matrix. However, if the sources are not suf\ufb01ciently sparse\nin reality, we can have a wavelet packets transformation preprocessing. In the following, a\nblind separation algorithm based on preprocessing is presented for dense sources.\nAlgorithm outline 2:\n\nStep 1. Transform the n time domain signals, (n rows of X, to time-frequency signals by a\nwavelet packets transformation, and make sure that n wavelet packets trees have the same\nstructure.\n\nStep 2. Select these nodes of wavelet packets trees, of which the coef\ufb01cients are as sparse\nas possible. The selected nodes of different trees should have the same indices. Based on\nthese coef\ufb01cient vectors, estimate the mixing matrix (cid:22)A 2 Rn(cid:2)m0 using the Algorithm 1\npresented in Section 2.\nStep 3. Based on the estimated mixing matrix (cid:22)A and the coef\ufb01cients of all nodes obtained\nin step 1, estimate the coef\ufb01cients of all the nodes of the wavelet packets trees of sources\nby solving the set of linear programming problems (4).\n\nStep 4. Reconstruct sources using the inverse wavelet packets transformation. End\n\nWe have successfully separated speech sources in a number of simulations in overcomplete\ncase (e.g., 8 sources, 4 sensors) using Algorithm 2. In the next section, we will present an\nEEG data analysis example.\nRemark 2: A challenge problem in the algorithm above is to estimate the mixing matrix as\nprecisely as possible. In our many simulations on BSS of speech mixtures, we use 7(cid:0)level\nwavelet packets transformation for preprocessing. When K(cid:0)means clustering method is\nused for estimating the mixing matrix, the number of clusters (the number of columns of\nthe estimated mixing matrix) should be set to be greater than the source number even if the\n\n\fsource number is known. In this way, the estimated matrix will contain a submatrix very\nclose to the original mixing matrix. From Theorem 4, we can estimate the source using the\noverestimated mixing matrix.\n\n5 An example in EEG data analysis\n\nThe electroencephalogram (EEG) is a mixture of electrical signals coming from multiple\nbrain sources. This is why application of ICA to EEG recently has become popular, yield-\ning new promising results (e.g., [6]). However, compared with ICA, the sparse representa-\ntion has two important advantages: 1) sources are not assumed to be mutually independent\nas in ICA, even be not stationary; 2) source number can be larger than the number of\nsensors. We believe that sparse representation is a complementary and very prospective\napproach in the analysis of EEG.\n\nHere we present the results of testing the usefulness of sparse representation in the analysis\nof EEG data based on temporal synchronization between components. The analyzed 14-\nchannel EEG was recorded in an experiment based on modi\ufb01ed Sternberg memory task.\nSubjects were asked to memorize numbers successively presented at random positions on\nthe computer monitor. After 2:5 s pause following by a warning signal, a \u201ctest number\u201d\nwas presented. If it was the same as one of the numbers in the memorized set, the subject\nhad to press the button. This cycle, including also resting (waiting) period, was repeated\n160 times (about 24 min). EEG was sampled at 256 Hz rate. Here we describe, mainly, the\nanalysis results of one subject\u2019s data.\n\nEEG was \ufb01ltered off-line in 1 (cid:0) 70 Hz range, trials with artifacts were rejected by visual\ninspection, and a data set including 20 trials with correct response, and 20 trials with incor-\nrect response, was selected for analysis (1 trial=2176 points). Thus we obtain a 14 (cid:2) 87040\ndimensional data matrix, denoted by X. Using the sparse representation algorithm pro-\nposed in this paper, we decomposed the EEG signals X into 20 components. Denote the\n20(cid:2)87040 dimensional components matrix S, which contains 20 trials for correct response,\nand 20 trials for incorrect response, respectively.\n\ni;jj is 0:5151).\n\ni;j 2 (0:18; 1] (the median of jRx\n\nAt \ufb01rst, we calculated the correlation coef\ufb01cient matrices of X and S, denoted by Rx and\nRs, respectively. We found that Rx\nIn\nthe case of components, the correlation coef\ufb01cients were considerably lower (the median\ni;jj is 0:2597). And there exist many pairs of components with small correlation\nof jRs\ncoef\ufb01cients, e.g., Rs\n8;13 = 0:0023; etc. Furthermore, we found that the\nhigher order correlation coef\ufb01cients of these pairs are also very small (e.g., the median\nof absolute value of 4th order correlation is 0:1742). We would like to emphasize that,\nalthough the independence principle was not used, many pairs of components were almost\nindependent.\n\n2;11 = 0:0471; Rs\n\nAccording to modern brain theories, dynamics of synchronization of rhythmic activities in\ndistinct neural networks plays a very important role in interactions between them. Thus,\nphase synchronization in a pair of two almost independent components (si\n14) (Rs1;14 =\n0:0085; fourth correlation coef\ufb01cient 0:0026) was analyzed using method described in [7].\nThe synchronization index is de\ufb01ned by SI(f; t) = max(SP LV (f; t) (cid:0) Ssur; 0), where\nSP LV (f; t) is a single-trial phase locking value at the frequency f and time t, which has\nbeen smoothed by a window with a length of 99, and Ssur is the 0:95 percentile of the\ndistribution of 200 surrogates (the 200 pairs of surrogate data are Gaussian distributed).\nFig. 2 shows phase synchrony analysis results. The phase synchrony is observed mainly\nin low frequency band (1 Hz-15 Hz) and demonstrated a tendency for task-related varia-\ntions.Though only ten trials are presented among the 40 trials due to page space, 32 of 40\ntrials shows similar characteristics.\n\n1; si\n\n\fIn Fig. 3 (a), two averaged synchronization index curves are presented, which are obtained\nby averaging synchronization index SI in the range 1-15 Hz and across 20 trials, separately\nfor correct and incorrect response. Note the time variations of the averaged synchroniza-\ntion index and its higher values for correct responses, especially in the beginning and the\nend of the trial (preparation and response periods). To test the signi\ufb01cance of the time\nand correctness effects, the synchronization index was averaged again for each 128 time\npoints (0.5 s) for removing arti\ufb01cial correlation between neighboring points and submitted\nto Friedman nonparametric ANOVA. The test showed signi\ufb01cance of time (p=0.013) and\ncorrectness (p=0.0017) effects. Thus, the phase synchronization between the two analyzed\ncomponents was sensitive both to changes in brain activity induced by time-varying task\ndemands and to correctness-related variations in the brain state. The higher synchroniza-\ntion for correct responses could be related to higher integration of brain systems required\nfor effective information processing. This kind of phenomena also has been seen in the\nsame analysis of EEG data from another subject (Fig. 3 (b)).\n\nA substantial part of synchronization between raw EEG channels can be explained by vol-\nume conduction effects. Large cortical areas may work as stable uni\ufb01ed oscillating sys-\ntems, and this may account for other large part of synchronization in raw EEG. This kind\nof strong synchronization may make invisible synchronization appearing for brief periods,\nwhich is of special interest in brain research. To study temporally appearing synchroniza-\ntion, components related to the activity of more or less uni\ufb01ed brain sources should be\nseparated from EEG. Our \ufb01rst results of application of sparse representation to real EEG\ndata support that they can help us to reveal brief periods of synchronization between brain\n\u201csources\u201d.\n\n30\n\n30\n\n30\n\n30\n\n1\n0.8\n0.6\n0.4\n0.2\n\n1088\n\n2176\n\n1\n\n1\n\n1088\n\n2176\n\n1\n\n1\n\n1088\n\n2176\n\n1\n\n1\n\n1088\n\n2176\n\n1\n\n1\n\n1088 2176\n\n0.5\n\n0.5\n\n0.5\n\n0.5\n\n1088\n\n2176\n\n0\n\n1\n\n1088\n\n2176\n\n0\n\n1\n\n1088\n\n2176\n\n0\n\n1\n\n1088\n\n2176\n\n0\n\n1\n\n1088\n\n2176\n\n30\n\n30\n\n30\n\n30\n\n1\n0.8\n0.6\n0.4\n0.2\n\n1088\n\n2176\n\n1\n\n1\n\n1088\n\n2176\n\n1\n\n1\n\n1088\n\n2176\n\n1\n\n1\n\n1088\n\n2176\n\n1\n\n1\n\n1088 2176\n\n0.5\n\n0.5\n\n0.5\n\n0.5\n\ny\nc\nn\ne\nu\nq\ne\nr\nF\n\n30\n\n15\n\n1\n\n1\n\n0.5\n\nI\n\n \n\nS\nd\ne\ng\na\nr\ne\nv\nA\n\n0.25\n\ny\nc\nn\ne\nu\nq\ne\nr\nF\n\n0\n\n1\n\n30\n\n15\n\n1\n\n1\n\n0.5\n\nI\n\n \n\nS\nd\ne\ng\na\nr\ne\nv\nA\n\n0.25\n\n0\n\n1\n\n1088\n\n2176\n\n0\n\n1\n\n1088\n\n2176\n\n0\n\n1\n\n1088\n\n2176\n\n0\n\n1\n\n1088\n\n2176\n\n0\n\n1\n\nk\n\nk\n\nk\n\nk\n\n1088\n\n2176\n\nk\n\nFigure 2: Time course of EEG synchrony in single trials. 1st row: time-frequency charts\nfor 5 single trials with correct response. Synchronization index values are shown for every\nfrequency and time sample point (f; k). 2nd row: mean synchronization index averaged\nacross frequencies in range 1-15 Hz, for the same trials as in the 1st row. 3d and 4th rows:\nsame for \ufb01ve trials with incorrect response.\nIn each subplot, the \ufb01rst line refers to the\nbeginning of presentation of numbers to be memorized, the second line refers to the end of\ntest number.\n\n6 Concluding remarks\n\nSparse representation of data matrices and its application to blind source separation were\nanalyzed based on a two-step approach presented in [3] in this paper. The l1 norm is used\n\n\f0.2\n\n(a)(a)\n\nI\n\nS\n \nd\ne\ng\na\nr\ne\nv\nA\n\n0.1\n\n(b)\n\nI\n\nS\n \nd\ne\ng\na\nr\ne\nv\nA\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n1\n\n1088\n\nk\n\n2176\n\n0\n\n1\n\n1088\n\nk\n\n2176\n\nFigure 3: Time course of EEG synchrony, averaged across trials. Left: same subject as in\nprevious \ufb01gure; right: another subject. The curves show mean values of synchronization\nindex averaged in the range 1-15 Hz and across 20 trials. Black curves are for trials with\ncorrect response, red dotted curves refers to trials with incorrect response. Solid vertical\nlines: as in the previous \ufb01gure.\n\nas a sparsity measure, whereas, the l0 norm sparsity measure is considered for comparison\nand recoverability analysis of BSS. From equivalence analysis of the l1 norm solution and\nl0 norm solution presented in this paper, it is evident that if a data vector (observed vector)\nis generated from a suf\ufb01ciently sparse source vector, then, with high probability, the l1\nnorm solution is equal to the l0 norm solution, the former in turn is equal to the source\nvector, which can be used for recoverability analysis of blind sparse source separation.\nThis kind of construct that employs sparse representation can be used in BSS as in [3],\nespecially in cases in which fewer sensors exist than sources while the source number is\nunknown, and sources are not completely independent. Lastly, an application example\nfor analysis of phase synchrony in real EEG data supports its validity and performance of\nthe proposed approach. Since the components separated by sparse representation are not\nconstrained by the condition of complete independence, they can be used in the analysis\nof brain synchrony maybe more effectively than components separated by general ICA\nalgorithms based on independence principle.\n\nReferences\n\n[1] Chen, S., Donoho, D.L. & Saunders M. A. (1998) Automic decomposition by basis pur-\n\nsuit.SIAM Journal on Scienti\ufb01c Computing 20(1):33-61.\n\n[2] Olshausen, B.A., Sallee, P. & Lewicki, M.S. (2001) Learning sparse image codes using a\nwavelet pyramid architecture. Advances in Neural Information Processing Systems 13, pp. 887-\n893. Cambridge, MA: MIT Press.\n\n[3] Zibulevsky M., Pearlmutter B. A., Boll P., & Kisilev P. (2000) Blind source separation by sparse\ndecomposition in a signal dictionary. In Roberts, S. J. and Everson, R. M. (Eds.), Independent\nComponents Analysis: Principles and Practice, Cambridge University Press.\n\n[4] Lee, T.W., Lewicki, M.S., Girolami, M. & Sejnowski, T.J. (1999) Blind source separation of\nmore sources than mixtures using overcomplete representations. IEEE Signal Processing Letter\n6(4):87-90.\n\n[5] Donoho, D.L. & Elad, M. (2003) Maximal sparsity representation via l1 minimization. the Proc.\n\nNat. Aca. Sci. 100:2197-2202.\n\n[6] Makeig, S., Wester\ufb01eld, M., Jung, T.P., Enghoff, S., Townsend, J., Courchesne, E. & Sejnowski,\n\nT.J. (2002) Dynamic brain sources of visual evoked responses. Science 295:690-694.\n\n[7] Le Van Quyen, M., Foucher, J., Lachaux, J.P., Rodriguez, E., Lutz, A., Martinerie, J. & Varela,\nF.J. (2001) Comparison of Hilbert transform and wavelet methods for the analysis of neuronal\nsynchrony. Journal of Neuroscience Methods 111:83-98.\n\n\f", "award": [], "sourceid": 2379, "authors": [{"given_name": "Yuanqing", "family_name": "Li", "institution": null}, {"given_name": "Shun-ichi", "family_name": "Amari", "institution": null}, {"given_name": "Sergei", "family_name": "Shishkin", "institution": null}, {"given_name": "Jianting", "family_name": "Cao", "institution": null}, {"given_name": "Fanji", "family_name": "Gu", "institution": null}, {"given_name": "Andrzej", "family_name": "Cichocki", "institution": null}]}