{"title": "Blind Source Separation via Multinode Sparse Representation", "book": "Advances in Neural Information Processing Systems", "page_first": 1049, "page_last": 1056, "abstract": null, "full_text": "BLIND SOURCE SEPARATION VIA \n\nMULTINODE SPARSE REPRESENTATION \n\nMichael Zibulevsky \n\nPavel Kisilev \n\nDepartment of Electrical Engineering \n\nDepartment of Electrical Engineering \n\nTechnion, Haifa 32000, Israel \n\nmzib@ee.technion.ac. if \n\nTechnion, Haifa 32000, Israel \n\npaufk@tx.technion.ac. if \n\nYehoshua Y.  Zeevi \n\nDepartment of Electrical Engineering \n\nTechnion, Haifa 32000, Israel \n\nzeevi@ee.technion.ac. if \n\nBarak Pearlmutter \n\nDepartment of Computer Science \n\nUniversity of New Mexico \n\nAlbuquerque, NM 87131  USA \n\nbap@cs. unm. edu \n\nAbstract \n\nWe  consider a problem of blind  source  separation from  a set  of instan(cid:173)\ntaneous  linear mixtures,  where  the  mixing  matrix  is  unknown.  It was \ndiscovered recently, that  exploiting the  sparsity of sources  in  an  appro(cid:173)\npriate  representation  according  to  some  signal  dictionary,  dramatically \nimproves the quality  of separation.  In this work  we  use the property of \nmulti scale transforms, such as wavelet or wavelet packets, to decompose \nsignals  into  sets  of local  features  with  various  degrees  of sparsity.  We \nuse this intrinsic property  for selecting the  best (most sparse) subsets of \nfeatures  for further separation.  The performance of the algorithm is ver(cid:173)\nified  on noise-free and noisy  data.  Experiments with simulated signals, \nmusical sounds and images demonstrate significant improvement of sep(cid:173)\naration quality over previously reported results. \n\n1  Introduction \n\nIn the  blind  source  separation problem an  N-channel  sensor signal x(~ )  is  generated  by \nM  unknown scalar source signals srn(~) ,  linearly mixed together by an unknown N  x  M \nmixing, or crosstalk, matrix A , and possibly corrupted by additive noise n(~): \n\nx(~) =  As(~) + n(~). \n\n(1) \n\nThe  independent variable ~ is  either time  or spatial coordinates  in  the  case of images.  We \nwish to estimate the mixing matrix A  and the M-dimensional source signal s(~). \nThe assumption of statistical independence of the source components Srn(~) ,  m  =  1, ... , M \nleads to the Independent Component Analysis (lCA) [1], [2].  A stronger assumption is the \n\n\u00b0Supported  in  part by the Ollendorff Minerva Center, by the Israeli Ministry of Science, by  NSF \n\nCAREER award 97-02-311  and by the National Foundation for Functional Brain Imaging \n\n\fsparsity  of decomposition  coefficients, when the  sources are  properly represented  [3].  In \nparticular, let each 8 m (~)  have a sparse representation obtained by means of its decompo(cid:173)\nsition coefficients Cmk  according to a signal dictionary offunctions Y k (~): \n\n8m (~)  = L Cmk Yk(~)' \n\nk \n\n(2) \n\nThe  functions  Yk (~ )  are  called  atoms  or elements  of the  dictionary.  These  elements  do \nnot  have  to  be  linearly  independent,  and  instead  may  form  an  overcomplete  dictionary, \ne.g.  wavelet-related dictionaries (wavelet packets, stationary wavelets, etc., see for exam(cid:173)\nple  [9]).  Sparsity  means that only a small  number of coefficients Cmk  differ significantly \nfrom zero.  Then, unmixing of the sources is performed in the transform domain, i.e.  in the \ndomain of these coefficients Cmk.  The property of sparsity often yields much better source \nseparation than standard ICA, and can work well even with more sources than mixtures.  In \nmany cases there are distinct groups of coefficients, wherein sources have different sparsity \nproperties.  The key  idea  in  this  study  is  to  select only  a  subset of features  (coefficients) \nwhich  is  best  suited  for  separation, with respect to  the  following  criteria:  (1)  sparsity of \ncoefficients  (2)  separability  of sources'  features.  After this  subset is  formed ,  one  uses  it \nin  the  separation  process, which  can  be  accomplished  by  standard  ICA algorithms  or by \nclustering.  The performance of our approach is  verified on noise-free and noisy data.  Our \nexperiments  with  ID  signals  and  images  demonstrate  that  the  proposed  method  further \nimproves separation quality, as  compared with result obtained by using sparsity of all  de(cid:173)\ncomposition coefficients. \n\n2  Two approaches to sparse source separation:  InfoMax and \n\nClustering \n\nSparse sources can be separated by each one of several techniques, e.g.  the Bell-Sejnowski \nInformation  Maximization  (BS  InfoMax)  approach  [1],  or  by  approaches  based  on  geo(cid:173)\nmetric considerations (see for example [8]).  In the former case, the algorithm estimates the \nunmixing matrix  W  =  A - I, while  in the  later  case the  output  is  the  estimated  mixing \nmatrix.  In both cases, these matrices can be estimated only up to a column permutation and \na scaling factor [4]. \n\nInfoMax.  Under the assumption of a noiseless system and  a square mixing matrix  in (1), \nthe BS InfoMax is equivalent to the maximum likelihood (ML) formulation of the problem \n[4],  which is  used  in  this  section.  For the  sake  of simplicity  of the  presentation,  let  us \nconsider the  case where the dictionary  of functions  used  in  a source  decomposition (2) is \nan  orthonormal  basis.  (In  this  case,  the  corresponding coefficients  Cmk  =<  8m, 'Pk  >, \nwhere < ',' > denotes the inner product).  From (1) and (2) the decomposition coefficients \nof the noiseless mixtures, according to the same signal dictionary of functions Y k (~) '  are: \n\nAk= ACk, \n\n(3) \n\nwhere M -dimensional vector Ck forms the k-th column of the matrix C  =  { Cmk}. \n\nLet Y  be thefeatures, or (new) data, matrix of dimension M  x K , where K  is the number of \nfeatures.  Its rows are either the samples of sensor signals (mixtures), or their decomposition \ncoefficients.  In the later case, the coefficients Ak'S form the columns ofY. (In the following \ndiscussion  we  assume  this  setting  for  Y ,  if not  stated  other).  We  are  interested  in  the \nmaximum likelihood estimate of A given the data Y. \n\nLet the corresponding coefficients Cmk  be independent random variables with a probability \ndensity function (pdf) of an exponential type \n\n(4) \n\n\fwhere  the  scalar  function  v(\u00b7)  is  a  smooth approximation  of an  absolute  value  function. \nSuch kind of distribution is widely used  for modeling sparsity [5].  In view of the indepen(cid:173)\ndence of Cmk,  and (4), the prior pdf of C  is \n\n(5) \n\np(C)  ex II exp{ - V(Cmk)}. \n\nm,k \n\nTaking into account that Y  =  AC, the parametric model for the pdf of Y  with respect to \nparameters A  is \n\n(6) \nLet W  =  A -I be  the  unmixing matrix,  to  be  estimated.  Then,  substituting  C  =  WY, \ncombining (6) with (5) and taking the logarithm we arrive at the log-likelihood function: \n\nLw(Y) = Klog ldetWI- L  LV((WY)mk). \n\n(7) \n\nM  K \n\nm=l k = l \n\nMaximization  of Lw(Y) with  respect  to  W  is  equivalent  to  the  BS  InfoMax,  and  can \nbe  solved  efficiently  by  the  Natural  Gradient  algorithm  [6].  We  used  this  algorithm  as \nimplemented in the ICAlEEG Matlab toolbox [7]. \n\nClustering.  In  the  case  of geometry  based methods, separation of sparse sources can be \nachieved by clustering along orientations of data concentration in the N-dimensional space \nwherein each column Yk  of the matrix Y  represents a data point (N is  the number of mix(cid:173)\ntures).  Let us consider a two-dimensional noiseless case, wherein two source signals, Sl(t) \nand  S2(t),  are mixed by a 2x2 matrix A, arriving at two mixtures Xl(t)  and  X2(t).  (Here, \nthe  data  matrix  is  constructed from  these mixtures  Xl (t)  and  xd t)).  Typically,  a scatter \nplot of two sparse mixtures X1(t)  versus X2(t),  looks like the rightmost plot in Figure 2.  If \nonly one source, say Sl (t), was present, the sensor signals would be  Xl (t)  =  all Sl (t) \nand  X2(t)  =  a21s1 (t)  and the data points at the scatter diagram of Xl (t)  versus X2(t) \nwould  belong to the straight  line  placed  along the vector  [ana21 ]T.  The same thing hap(cid:173)\npens,  when two  sparse  sources  are  present.  In this  sparse  case,  at  each particular index \nwhere a sample of the first source is  large, there is  a high probability, that the correspond(cid:173)\ning sample of the second source is small, and the point at the scatter diagram still lies close \nto the mentioned straight line.  The same arguments are  valid  for the second  source.  As a \nresult,  data  points are concentrated around two  dominant orientations, which  are  directly \nrelated to the columns of A.  Source signals are rarely  sparse in their original domain.  In \ncontrast,  their  decomposition  coefficients  (2)  usually  show  high sparsity.  Therefore,  we \nconstruct  the  data  matrix  Y  from  the  decomposition  coefficients  of mixtures  (3),  rather \nthan from the mixtures themselves. \n\nIn  order  to  determine  orientations  of scattered  data,  we  project  the  data  points  onto  the \nsurface of a unit sphere by normalizing corresponding vectors, and  then apply a standard \nclustering  algorithm.  This  clustering  approach  works  efficiently  even  if the  number  of \nsources is greater than the number of sensors.  Our clustering procedure can be summarized \nas follows: \n\n1.  Form the feature matrix Y , by putting samples of the sensor signals or (subset of) their \ndecomposition coefficients into the corresponding rows ofthe matrix; \n2.  Normalize  feature  vectors  (columns  ofY):  Yk  =  Yk /II Yk I12'  in  order to  project data \npoints  onto  the  surface  of a  unit  sphere,  where  11 \u00b711 2 denotes  the  l2  norm.  Before  nonnal(cid:173)\nization,  it  is  reasonable to  remove  data  points  with  a  very  small  norm,  since these  very  likely  to  be \ncrosstalk-corrupted by small coefficients from  others' sources. \n3.  Move data points to  a half-sphere,  e.g.  by  forcing the sign of the first  coordinate yk  to \nbe positive:  IF  yk  < 0 THEN Yk  =  - Yk.  Without this operation each set oflineariy (i.e., along \na line) clustered data points would yield two clusters on opposite sides of the sphere. \n\n\f-s \n\n5 \n\n\u00b7 : \n,: \n, : \n, : \n\n-s \n\ntOO \n\n200 \n\n300 \n\n~oo \n\n\u00bb:I \n\neoo \n\n700 \n\n~  900 \n\ntOC>:l \n\n100 \n\n200 \n\n300 \n\n~OO \n\n500 \n\n600 \n\n700 \n\n1\\00 \n\n900 \n\n1000 \n\ntOO \n\n200 \n\n300 \n\n~OO \n\n500 \n\neoo \n\n700 \n\n1\\00 \n\n900 \n\ntOC>:l \n\n-5 \n\n100 \n\n200 \n\n300 \n\n~OO \n\n500 \n\n600 \n\n700 \n\n1\\00 \n\n900 \n\n1():Xl \n\nFigure  1:  Random block signals (two upper) and their mixtures (two lower) \n\n4.  Estimate cluster centers by using a clustering algorithm.  The coordinates of these centers \nwill  form  the  columns  of the  estimated  mixing matrix  A.  We  used  Fuzzy  C-means  (FCM) \nclustering algorithm  as  implemented in  Matlab Fuzzy Logic Toolbox. \nSources  recovery.  The  estimated  unmixing  matrix  A-I is  obtained  by  either  the  BS \nInfoMax or the above clustering procedure, applied to either complete data set, or to  some \nsubsets of data (to be explained in the next section).  Then, the sources are recovered in their \noriginal domain by s(t)  =  A - lX(t).  We  should stress here that if the clustering approach \nis  used, the  estimation of sources  is  not restricted to  the  case  of square mixing matrices, \nalthough  the  sources  recovery  is  more  complicated  in  the  rectangular cases  (this topic  is \nout of scope of this paper). \n\n3  Multinode based source separation \n\nMotivating example:  sparsity of random blocks in the Haar basis.  To provide intuitive \ninsight  into  the  practical  implications  of our main  idea, we first  use  ID block  functions, \nthat  are  piecewise  constant,  with random  amplitude  and  duration  of each  constant piece \n(Figure 1).  It is known, that the Haar wavelet basis provides compact representation of such \nfunctions.  Let us  take a close  look at the  Haar wavelet coefficients at different resolution \nlevels j =O, 1, ... ,1.  Wavelet  basis  functions  at  the  finest  resolution  level j =J  are  obtained \nby  translation  of the  Haar  mother wavelet:  <p(t)  =  {I , ift  E  [0, 1); - I , ift  E  [1, 2) ; 0 \notherwise}.  Taking the scalar product ofa function  s(t)  with  the wavelet <PJ(t - T) , we \nproduce a finite differentiation of the function  s(t)  at the point t  =  T.  This means that the \nnumber of non-zero coefficients at the finest resolution for a block function will correspond \nroughly to the number of jumps ofthis function.  Proceeding to the next, coarser resolution \nlevel,  we  have  <P J -\nthe number of non-zero coefficients still corresponds to the number of jumps, but the total \nnumber of coefficients at this  level  is halved,  and  so  is the sparsity.  If we  further proceed \nto  coarser resolutions,  we  will  encounter  levels  where  the  support  of a wavelet <Pj (t)  is \ncomparable to  the typical  distance between jumps in  the  function  s(t).  In  this  case, most \nof the coefficients are expected to be nonzero, and, therefore, sparsity will fade away. \n\nl  (t)  =  {I , ift E  [0, 2) ; - 1, if t  E  [2, 4) ; \u00b0 otherwise}.  At this level, \n\nTo  demonstrate  how  this  influences  accuracy  of a  blind  source  separation,  we  randomly \ngenerated  two  block-signal  sources  (Figure  1, two  upper  plots.),  and  mixed  them  by  the \n\n\fcrosstalk matrix A  with colwnns  [0.83  -0.55]  and  [0.62  0.78].  Resulting  sensor signals, \nor mixtures,  X l (t)  and  X2 (t)  are  shown  in  the  two  lower plots  of Figure  l.  The  scatter \nplot of X l (t)  versus X2( t)  does not exhibit any visible distinct orientations (Figure 2, left). \nSimilarly,  in  the  scatter  plot  of the  wavelet  coefficients  at  the  lowest  resolution  distinct \norientations  are  hardly  detectable  (Figure  2,  middle).  In  contrast,  the  scatter plot  of the \nwavelet coefficients at the highest resolution (Figure 2, right) depicts two distinct orienta(cid:173)\ntions, which correspond to the columns of the mixing matrix. \n\nRaw signals \n\n: :\"~~:\u00b7;K\u00b7;\\.\" \n\n, \" \"1 .::;; Of  :~~.: \u2022 \u2022\u2022\u2022\u2022 \n\nAll wavelet \ncoefficients \n\nHigh resolution \nWT coefficients \n\n~ .\"'~'>!~. \n: >/_~:.~ --\n\n; /\" \n\" \"\" \n\n.,/', \n\nI \n\n, \n\n1\u00b7. \n\nInfoMax \n\nFCM \n\nl.93 \n1.78 \n\n0.183 \n0.058 \n\n0.005 \n0.002 \n\nFigure  2:  Separation  of block  signals:  scatter plots  of sensor signals (left),  and  of their \nwavelet  coefficients  (middle  and  right).  Lower  colwnns  present  the  normalized  mean(cid:173)\nsquared  separation  error  (%)  corresponding  to  the  Bell-Sejnowski  InfoMax,  and  to  the \nFuzzy C-Means clustering, respectively. \n\nSince a crosstalk matrix A is estimated only up to a column permutation and a scaling fac(cid:173)\ntor,  in  order to  measure the  separation accuracy, we normalize the original sources  sm(t) \nand  their  corresponding  estimated  sources  sm(t).  The  averaged  (over  sources)  normal-\nized squared error (NSE) is  then computed as:  NSE =  it 2:~=1 (ilsm - sm ll\u00a7/llsmll\u00a7)\u00b7 \nResulting separation errors  for  block sources are presented in the  lower part of Figure 2. \nThe  largest error (l.93%)  is  obtained  on  the raw  data,  and  the  smallest \u00ab0.005%) - on \nthe wavelet coefficients  at the highest resolution, which have  the  best sparsity.  Using all \nwavelet coefficients yields intermediate sparsity and performance. \n\nMultinode  representation.  Our choice of a  particular wavelet  basis  and  of the  sparsest \nsubset of coefficients was obvious in the above example:  it was based on knowledge of the \nstructure  of piecewise  constant signals.  For sources having  oscillatory  components (like \nsounds or images with textures), other systems of basis functions , such as wavelet packets \nand  trigonometric  function  libraries  [9],  might be  more  appropriate.  The wavelet packet \nlibrary consists of the triple-indexed family of functions:  i.f!j ,i,q(t)  =  2j / 2i.f!q(2j t - i),  j , i  E \nZ , q E  N,where j , i  are the scale and shift parameters, respectively, and q is the frequency \nparameter.  [Roughly speaking, q is  proportional to the nwnber of oscillations of a mother \nwavelet i.f!q(t)].  These functions  form  a binary tree whose nodes are indexed by the depth \nof the  level  j  and  the  node  number q  =  0, 1, 2, 3, ... , 2j - l  at  the  specified  level  j.  This \nsame indexing is used for corresponding subsets of wavelet packet coefficients (as well as \nin scatter diagrams in the section on experimental results). \n\nAdaptive selection of sparse subsets.  When signals have a complex nature, it is  difficult \nto decide in advance which nodes contain the sparsest sets of coefficients.  That is  why we \nuse the following simple adaptive approach.  First, for every node of the tree, we apply our \nclustering algorithm, and compute a measure of clusters' distortion.  In our experiments we \nused a standard global distortion, the mean squared distance of data points to the centers of \ntheir own (closest) clusters (here again, the weights of the data points can be incorporated): \nd=2:f=l min II  U m  - Yk  II ,where K  is the nwnber of data points, U m  is the m-th centroid \ncoordinates, Yk  is  the k-th data point coordinates, and  11 . 11  is  the sum-of-squares distance. \n\nm \n\n\fSecond, we choose a few best nodes with the minimal distortion, combine their coefficients \ninto one data set, and apply a separation algorithm (clustering or Infomax) to these data. \n\n4  Experimental results \n\nThe  proposed  blind  separation  method  based  on  the  wavelet-packet  representation,  was \nevaluated by using several types of signals.  We have already discussed the relatively simple \nexample  of a random  block signal.  The  second  type  of signal  is  a  frequency  modulated \n(FM) sinusoidal signal.  The carrier frequency is modulated by either a sinusoidal function \n(FM signal) or by  random blocks (BFM  signal).  The third  type is  a musical recording of \nflute  sounds.  Finally, we apply  our algorithm  to  images.  An example  of such  images  is \npresented in the left part of Figure 3. \n\n111 \n\n, \n\n, \n' 22 \n\n00 \n\n8 \n\n, \n\n'JJ \n\nS. \n\n' 12 \n\n' 13 \n\n'10 \n\n' 11 \n\n'~  \u2022 \u2022 t : ,  ' \n\u2022\u2022 .. \n0\u00b0  \u2022 .  '. \n~ :.  , \n'11  t , \"*, '  , :, \n\nSI \n'~' \nfoo 0 \n8 \nSs \n\n\",t, \n\n. , \n. \n11 \n\n:Y6~, \n\n\"  ' \n'21 \n\n' 26 \n\n\\; \n\n'8 \n\n' \n\n\" \n\n\u2022 \n\n'lI \n\nFigure 3:  Left:  two source images (upper pair), their mixtures (middle pair) and estimated \nimages (lower pair).  Right:  scatter plots ofthe wavelet packet (WP) coefficients of mixtures \nof images; subsets are  indexed on the WP tree. \n\nIn order to  compare  accuracy  of our adaptive  best nodes  method  with that  attainable  by \nstandard methods, we form the following feature sets:  (1) raw data, (2) Short Time Fourier \nTransform  (STFT)  coefficients  (in  the  case  of ID signals), (3) Wavelet Transform  coeffi(cid:173)\ncients (4)  Wavelet packet coefficients at the  best nodes found  by  our method, while using \nvarious wavelet families with different smoothness (haar, db-4, db-S).  In the case of image \nseparation,  we  used  the  Discrete  Cosine Transform  (DCT)  instead  of the  STFT,  and  the \nsym4  and  symS  mother wavelet  instead  of db-4  and  db-S,  when  using  wavelet transform \nand wavelet packets. \n\nThe right part of Figure  3  presents  an  example  of scatter plots of the  wavelet packet co(cid:173)\nefficients obtained at various nodes of the wavelet packet tree.  The upper left scatter plot, \nmarked with 'C' , corresponds to the complete set of coefficients at all nodes.  The rest are \nthe scatter plots of sets of coefficients indexed on a wavelet packet tree.  Generally speak(cid:173)\ning, the more distinct the two dominant orientations appear on these plots, the more precise \n\n\fis the estimation of the mixing matrix, and, therefore, the better is the quality of separation. \nNote,  that  only  two  nodes,  C22  and  C23 ,  show  clear orientations.  These nodes will most \nlikely be selected by the algorithm for  further estimation process. \n\nSignals \n\nBlocks \n\nBFM sine \nFM sine \nFlutes \n\nImages \n\nraw \ndata \n10.16 \n24.51 \n25 .57 \n1.48 \nraw \ndata \n4.88 \n\nSTFT  WT \ndb8 \n2.669 \n0.174 \n0.667 \n0.665 \n0.32 \n1.032 \n0.287 \n0.355 \nOCT  WT \nsym8 \nl.l64 \n\n3.651 \n\nWT \nhaar \n0.037 \n2.34 \n6.105 \n0.852 \nWT \nhaar \nl.l14 \n\nWP \ndb8 \n0.073 \n0.2 \n0.176 \n0.154 \nWP \nsym8 \n0.365 \n\nWP \nhaar \n0.002 \n0.442 \n0.284 \n0.648 \nWP \nhaar \n0.687 \n\nTable  1:  Experimental results:  normalized  mean-squared  separation error  (%)  for  noise(cid:173)\nfree signals and images, applying the FCM separation to raw data and decomposition coef(cid:173)\nficients  in various domains.  In the case of wavelet packets (WP) the best nodes selected by \nour algorithm were used. \n\nTable  1 summarizes results  of experiments  in  which we  applied  our approach of the  best \nfeatures  selection along with the  FCM separation to  each noise-free  feature  set.  In  these \nexperiments, we compared the quality of separation of deterministic signals by  calculating \nN SE's (i.e., residual crosstalk errors).  In the case of random block and BFM signals, we \nperformed  100  Monte-Carlo  simulations and calculated the normalized mean-squared  er(cid:173)\nrors (N M SE) for  the  above  feature  sets.  From Table  1 it is  clear that using our adaptive \nbest nodes  method  outperforms  all  other  feature  sets  (including  complete  set  of wavelet \ncoefficients),  for  each  type  of signals.  Similar  improvement was  achieved  by  using  our \nmethod along with the BS InfoMax separation, which provided even better results for  im(cid:173)\nages.  In  the  case  of the  random  block  signals,  using  the  Haar  wavelet  function  for  the \nwavelet packet representation yields a better separation than  using some smooth wavelet, \ne.g.  db-S.  The reason is that these block signals, that are not natural signals, have a sparser \nrepresentation  in  the  case  of the  Haar wavelets.  In contrast,  as  expected,  natural  signals \nsuch as  the Flute's signals are  better represented by  smooth wavelets, that in  turn provide \na better separation.  This  is  another advantage  of using  sets  of features  at  multiple nodes \nalong with various families  of 'mother'  functions:  one can choose best nodes from  several \ndecomposition trees simultaneously. \n\nIn order to  verify  the performance of our method  in presence of noise, we  added  various \ntypes of noise (white gaussian and salt&pepper) to three mixtures of three images at various \nsignal-to-noise energy ratios  (SNR).  Table  2  summarizes these  experiments  in  which we \napplied  our  approach  along  with  the  BS  InfoMax  separation.  It turns  out  that  the  ideas \nused  in  wavelet based signal  denoising (see  for  example  [10]  and references therein), are \napplied to  signal  separation from  noisy mixtures.  In  particular,  in  case  of white gaussian \nnoise,  the  noise  energy  is  uniformly  distributed  over  all  wavelet  coefficients  at  various \nscales.  Therefore, at sufficiently high  SNR's, the  large  coefficients of the signals are  only \nslightly  distorted  by  the  noise  coefficients,  and  the  estimation of the  unmixing  matrix  is \nalmost  not  affected  by  the  presence  of noise.  (In  contrast,  the  BS  InfoMax  applied  to \nthree noisy mixtures themselves,  failed  completely,  arriving at  N S E  of 19% even in the \ncase of SNR=12dB).  We  should stress here that, although our adaptive best nodes method \nperforms reasonably well in the presence of noise, it  is not supposed to further denoise the \nreconstructed images (this can be achieved by some denoising method, after source signals \nare  separated).  More  experimental  results,  as  well  as  parameters  of simulations,  can  be \nfound  in [11]. \n\n\fSNR [dB] \n\nMixtures w.  white gaussian noise \nMixtures w. salt&pepper noise \n\nTable 2:  Perfonnance of the algorithm  in  presence of various sources of noise in mixtures \nof images:  nonnalized mean-squared separation error (%), applying our adaptive approach \nalong with the BS InfoMax separation. \n\n5  Conclusions \n\nExperiments  with  both  one- and  two-dimensional  simulated  and  natural  signals  demon(cid:173)\nstrate that multinode sparse representations improve the efficiency of blind source separa(cid:173)\ntion.  The  proposed  method  improves  the  separation quality  by  utilizing the  structure  of \nsignals, wherein several subsets of the wavelet packet coefficients have significantly better \nsparsity  and  separability than others.  In this  case,  scatter plots of these coefficients show \ndistinct  orientations  each  of which specifies  a  column of the  mixing  matrix.  We  choose \nthe  'good subsets' according to the global distortion adopted as a measure of cluster qual(cid:173)\nity.  Finally,  we  combine  together  coefficients  from  the  best  chosen  subsets  and  restore \nthe mixing matrix using only this  new subset of coefficients by the Infomax algorithm  or \nclustering.  This yields significantly better results than those obtained by applying standard \nInfomax and  clustering approaches directly to the raw data.  The advantage of our method \nis  in  particular noticeable in the case of noisy mixtures. \n\nReferences \n\n[1]  A.  1.  Bell and T.  1.  Sejnowski, \"An information-maximization approach to  blind sep(cid:173)\n\naration and blind deconvolution,\" Neural Computation, vol.  7, no.  6, pp.  1129- 1159, \n1995. \n\n[2]  A.  Hyvarinen, \"Survey on independent component analysis,\" Neural Computing Sur(cid:173)\n\nveys, no. 2, pp.  94- 128,  1999. \n\n[3]  M.  Zibulevsky and B. A.  Pearlmutter, \"Blind separation of sources with sparse repre(cid:173)\nsentations in a given signal dictionary,\" Neural Computation, vol.  l3 , no. 4, pp.  863-\n882,2001. \n\n[4]  1.-F.  Cardoso. \"Infomax and maximum likelihood for  blind separation,\" IEEE Signal \n\nProcessing Letters 4  112-114,  1997. \n\n[5]  M.  S.  Lewicki and T. 1.  Sejnowski,  \"Learning overcomplete representations,\" Neural \n\nComputation,  12(2):  337-365,  2000. \n\n[6]  S.  Amari,  A. Cichocki, and H.  H.  Yang,  \"A new learning algorithm for  blind  signal \nseparation,\"  In Advances  in  Neural Information  Processing Systems  8.  MIT Press. \n1996. \n\n[7]  S.  Makeig,  ICAlEEG toolbox.  Computational Neurobiology  Laboratory,  the  Salk \n\nInstitute.  http://www.cnl.salk.edurtewonlica _ cnl.html,  1999. \n\n[8]  A.  Prieto,  C.  G.  Puntonet,  and B.  Prieto, \"A neural algorithm for  blind separation of \nsources based on geometric prperties.,\" Signal Processing, vol. 64, no. 3, pp. 315- 331, \n1998. \n\n[9]  S.  Mallat, A  Wavelet Tour of Signal Processing.  Academic Press,  1998. \n[10]  D.  L.  Donoho, \"De-Noising by Soft Thresholding,\" IEEE Trans. Inf. Theory, vol. 41, \n\n3, 1995, pp.613-627. \n\n[11]  P.  Kisilev, M.  Zibulevsky, Y.  Y.  Zeevi, and B.  A.  Pearlmutter, Multiresolution frame(cid:173)\n\nworkfor sparse blind source separation, CCIT Report no.317, June 2000 \n\n\f", "award": [], "sourceid": 1980, "authors": [{"given_name": "Michael", "family_name": "Zibulevsky", "institution": null}, {"given_name": "Pavel", "family_name": "Kisilev", "institution": null}, {"given_name": "Yehoshua", "family_name": "Zeevi", "institution": null}, {"given_name": "Barak", "family_name": "Pearlmutter", "institution": null}]}