{"title": "Broadband Direction-Of-Arrival Estimation Based on Second Order Statistics", "book": "Advances in Neural Information Processing Systems", "page_first": 775, "page_last": 781, "abstract": null, "full_text": "Broadband Direction-Of-Arrival Estimation \n\nBased On Second Order Statistics \n\nJustinian Rosca \n\nJoseph 6 Ruanaidh \n\n{rosca,oruanaidh,jourjine,rickard}@scr.siemens.com \n\nAlexander Jourjine \n\nScott Rickard \n\nSiemens Corporate Research, Inc. \n\n755 College Rd E \nPrinceton, NJ 08540 \n\nAbstract \n\nN  wideband  sources  recorded  using  N  closely  spaced  receivers  can \nfeasibly  be separated  based  only  on  second  order statistics when  using \na physical model  of the mixing process.  In  this case  we  show that  the \nparameter estimation problem can  be essentially reduced to considering \ndirections of arrival and attenuations of each signal.  The paper presents \ntwo demixing methods operating in  the time and frequency  domain and \nexperimentally shows that it is always possible to demix signals arriving at \ndifferent angles.  Moreover,  one can use spatial cues to solve the channel \nselection  problem  and  a post-processing Wiener filter  to  ameliorate the \nartifacts caused by demixing. \n\n1 \n\nIntroduction \n\nBlind source separation (BSS) is capable of dramatic results when used to separate mixtures \nof independent signals. The method relies on simultaneous recordings of signals from two \nor more  input sensors and  separates  the original sources  purely on  the basis of statistical \nindependence between them.  Unfortunately, BSS literature is primarily concerned with the \nidealistic instantaneous mixing model. \n\nIn this paper, we formulate a low dimensional and fast solution to the problem of separating \ntwo signals from a mixture recorded using two closely spaced receivers.  Using a physical \nmodel of the mixing process reduces the complexity of the model and allows one to identify \nand to invert the mixing process using second order statistics only. \n\nWe describe the theoretical basis of the new approach,  and  then focus  on two algorithms, \nwhich were implemented and successfully  applied to extensive sets of real-world data.  In \nessence,  our separation architecture is a system of adaptive directional receivers designed \nusing the principles ofBSS. The method bears resemblance to methods in beamforming [8] \nin  that  it  works  by  spatial  filtering.  Array  processing  techniques  [2]  reduce  noise  by \nseparating signal space from noise space,  which necessitates more receivers than emitters. \nThe main  differences are that standard beamforming and  array  processing techniques  [8, \n2]  are  generally  strictly  concerned  with  processing directional  narrowband  signals.  The \ndifference with BSS [7, 6]  is that our approach is model-based and  therefore the elements \nof the mixing matrix are highly constrained:  a feature that aids in the robust and reliable \nidentification of the mixing process. \n\n\f776 \n\nJ.  Rosca, J.  6  Ruanaidh, A.  Jourjine and S.  Rickard \n\nThe layout of the paper is as follows.  Sections 2 and 3 describe the theoretical foundation of \nthe separation method that was pursued.  Section 4 presents algorithms that were developed \nand experimental results.  Finally we summarize and conclude this work. \n\n2  Theoretical foundation for the BSS solution \n\nAs a first  approximation to the general multi-path model,  we use the delay-mixing model. \nIn this model, only direct path signal components are considered.  Signal components from \none source arrive with a fractional delay between the time of arrivals at two receivers.  By \nfractional delays, we mean that delays between receivers are not generally integer multiples \nof the  sampling  period.  The  delay  depends  on  the  position of the  source  with  respect \nto  the receiver  axis  and  the  distance  between  receivers.  Our BSS  algorithms demix  by \ncompensating for the fractional delays.  This, in effect,  is a form of adaptive beamforming \nwith directional notches being placed in the direction of sources of interference [8].  A more \ndetailed account of the analytical structure of the solutions can be found in [1]. \n\nBelow we address the case  of two inputs and  two outputs but there is no reason  why  the \ndiscussion cannot be generalized to multiple inputs and multiple outputs.  Assume a linear \nmixture of two sources, where source amplitude drops off in proportion to distance: \n\nR-2 \nXi(t)  =  -S I  (t  - _Z  )  + -S2(t - _Z  ) \nC \n\nR-I \nC \n\n1 \nRil \n\n1 \nRi2 \n\n(1) \n\nj  =  1, 2,  where  c is the speed  of wave  propagation, and  Rij  indicates the distance from \nreceiver i  to source j. This describes signal propagation through a uniform non-dispersive \nmedium.  In the Fourier domain, Equation 1 results in a mixing matrix A( w)  given by: \n\nA(w) =  [~lle-jW~  ~12e-jW~ 1 \n\n1  -jw~  1  _jw!!JJ.. \nR21e \n\nR 22 e \n\nc \n\nc \n\n(2) \n\nIt is important to note that the columns can be scaled arbitrarily without affecting separation \nof sources because rescaling is absorbed into the sources.  This implies that row scaling in \nthe demixing matrix (the inverse of A( w\u00bb \nUsing  the Cosine Rule,  Rij  can  be expressed  in  terms  of the distance  Rj  of source j  to \nthe midpoint between  two receivers,  the direction  of arrival  of source j, and  the distance \nbetween receivers, d, as follows: \n\nis arbitrary. \n\nR;j = [HJ + (~)' + 2(-1)' m Hj  COS OJ r \n\n1 \n\n(3) \n\nExpanding the right term above using the binomial expansion and preserving only zeroth \nand first order terms, we can express distance from the receivers to the sources as: \n\nRij  =  ( Rj + 8~j) + (_l)i (~) cosOj \n\n(4) \n\nThis approximation is valid within a 5% relative error when d ::;  ~. With the substitution \nfor  Rij  and  with  the redefinition  of source j  to include the delay due to  the term  within \nbrackets in Equation 4 divided by c, Equation 1 becomes: \n\nXi(t)  = ~ ~ij .Sj  (t+(-l)i\u00b7(:c).cosOj )  , i= 1,2 \n\n(5) \n\nJ \n\nIn the Fourier domain, equation 5 results in the simplification to the mixing matrix A( w): \n\nA(w)  = \n\n[ \n\n_1_  e-jwo1 \nR Il \u00b7\n_1_  eJW01 \nR21\u00b7 \n\n. \n\n_1_  e-jw02  ] \nRl2 . \n_1_  ejw02 \nR 22' \n\n(6) \n\n\fBroadband DOA  Estimation Based on Second Order Statistics \n\n777 \n\nHere phases are functions of the directions of arrival ()j  (defined with respect to the midpoint \nbetween  receivers),  the  distance  between  receivers  d,  and  the  speed  of propagation  c: \nOi  = 2dc  cos ()i ,i = 1, 2.  Rij  are unknown, but we can  again  redefine sources so diagonal \nelements are unity: \n\n(7) \n\nwhere  c),  C2  are  two  positive  real  numbers. \nIn  wireless  communications  sources  are \ntypically distant compared to antenna distance.  For distant sources and a well matched pair \nof receivers  c)  ~ C2  ~ 1.  Equation 7  describes  the mixing matrix for the delay  model  in \nthe frequency  domain, in terms of four parameters, 0) ,02, c), C2. \nThe corresponding ideal demixing matrix W(w), for each frequency w, is given by: \n\nW(w) =  A(w)  = detA(w) \n\n[ \n\n] _) \n\n1 \n\n[e jW02 \n\n-c2 .ejwol \n\n(8) \n\nThe outputs, estimating the sources, are: \n\n[  z)(w)  ]  _  W  w  [X)(W)  ]  _ \n\n()  X2(W) \n\nZ2(W) \n\n-\n\n1 \n\n- detA(w) \n\n[ \n\n_c)e- jW02  ] \ne-; WO l \n\n[  x)(w)  ] \n\nX2(W) \n\n(9) \nMaking  the  transition  back  to  the  time  domain  results  in  the  following  estimate  of the \noutputs: \n\nwhere @  is convolution, and \n\n(10) \n\n(11) \n\nFormulae 9  and  10  form  the  basis  for  two  algorithms to  be  described  next,  in  the  time \ndomain and the frequency  domains.  The algorithms have the role of determining the four \nunknown  parameters.  Note that  the filter  corresponding to  H (w, 0) , 02, C), C2)  should be \napplied to the output estimates in order to map back to the original inputs. \n\n3  Delay and attenuation compensation algorithms \n\nThe estimation of the four unknown parameters 0),  02,  C),  C2  can  be carried out based on \nsecond order criteria that impose the constraint that outputs are decorrelated ([9, 4, 6, 5]). \n\n3.1  Time and frequency domain approaches \n\nThe time domain  algorithm is  based  on  the idea of imposing the decorrelation constraint \n(Z) (t), Z2(t)}  = 0 between the estimates ofthe outputs, as a function of the delays D)  and \nD2  and scalar coefficients c)  and C2.  This is equivalent to the following criterion: \n\nwhere F(.) measures the cross-correlations between the signals given below, representing \nfiltered  versions of the differences of fractionally delayed measurements: \n\n(12) \n\n\f778 \n\nJ  Rosca. J  6  Ruanaidh. A.  Jourjine and S.  Rickard \n\nZ)(t)  = h(t, D), D2, e), e2) 0  (X)(t + D2)  - e)X2(t\u00bb) \nZ2(t)  = h(t, D) , D2, e) , e2)  0  (e2X) (t  + D2)  - X2(i\u00bb) \nF(D), D2, e), e2)  = (Z)(t), Z2(t)} \n\nIn the frequency  domain, the cross-correlation of the inputs is expressed as follows: \n\nRX(w) = A(w)Rs(w)AH(w) \n\n(13) \n\n( 14) \n\nThe mixing matrix in  the frequency  domain  has  the form  given  in  Equation 7.  Inverting \nthis cross correlation equation yields four equations that are written in matrix form as: \n\nSource orthogonality implies that the off-diagonal terms in  the covariance matrix must be \nzero: \n\n( 15) \n\nRT2(W) =0 \nRf)(w) = 0 \n\n(16) \n\nFor far field conditions (i.e.  the distance between the receivers is much less than the distance \nfrom sources) one obtains the following equations: \n\nThe terms a  = e- jw1h  and b =  e- jwoz  are  functions of the time delays.  Note that there is \na pair of equations of this  kind for each  frequency.  In practice,  the  unknowns should be \nestimated from data at all available frequencies to obtain a robust estimate. \n\n3.2  Channel selection \n\nUp  to this point,  there  was  no  guarantee that estimated  parameters  would ensure  source \nseparation in some specific order.  We could not decide a priori whether estimated parameters \nfor the first output channel correspond to the first or second source.  However, the dependence \nof the phase delays on the angles of arrival suggests a way to break the permutation symmetry \nin source estimation, that is to decide precisely which estimate to present on the first channel \n(and henceforth on the second channel as well). \n\nThe  core  idea  is  that directionality  and  spatial  cues  provide the  information required  to \nbreak the symmetry.  The criterion  we use is  to sort sources in  order of increasing delay. \nNote that the correspondence between delays and sources is  unique when  sources are not \nsymmetrical with respect to the receiver axis.  When sources are symmetric there is no way \nof distinguishing between  their positions because  the cosine of the angles  of arrival,  and \nhence the delay, is invariant to the sign of the angle. \n\n4  Experimental results \n\nA  robust  implementation  of criterion  12  averages  cross-correlations  over  a  number  of \nwindows, of given size.  More precisely F is defined as follows: \n\nF( 0),02) =  L  I(Z) (t), Z2(t)W \n\nBlocks \n\n( 18) \n\n\fBroadband DOA  Estimation Based on Second Order Statistics \n\n779 \n\nNormally  q  =  1 to obtain a robust estimate.  Ngo and  Bhadkamkar [5]  suggest a  similar \ncriterion using q =  2 without making use of the determinant of the mixing matrix. \n\nAfter  taking  into  account  all  terms  from  Equation  18,  including  the  determinant  of the \nmixing matrix A, we obtain the function to be used for parameter estimation in the frequency \ndomain: \n\nF(01,02)  =  ~  2 \u00b7  -bRl1 (W)  -\n\n~  1 \n\nI a  x \n\n{  det A}  + TJ \n\nw \n\nb  x \n-R22(W) - abR21 (w)  -\na \n\nx \n\nIq \n1  x \n-bRI2(w) \na \n\n(19) \n\nwhere TJ  is a (Wiener Filter-like) constant that helps prevent singularities and  q is normally \nset to one. \n\nComputing the  separated  sources  using  only  time  differences  leads  to  highpass  filtered \noutputs.  In order to implement exactly  the theoretical demixing procedure presented one \nhas to divide by the determinant of the mixing matrix. Obviously one could filter using the \ninverse of the determinant to obtain optimal results.  This can be implemented in the form \nof a Wiener filter.  The Wiener filter requires knowledge both ofthe signal and noise power \nspectral densities.  This information is not available to us but a reasonable approximation is \nto assume that the (wideband) sources have a flat  spectral density and the noise corrupting \nthe mixtures is white.  In this case, the Wiener Filter becomes: \n\nH  w  _  ( \n\n(  )  -\n\n{detA(W)}2) \n\n{ det A (w )} 2 + TJ \n\n1 \n\ndet A (w ) \n\n(20) \n\nwhere the parameter  TJ  has been  empirically set to the  variance of the mixture.  Applying \nthis choice of filter usually dramatically improves the quality of the separated outputs. \n\nThe technique of postprocessing using the determinant of the mixing matrix  is perfectly \ngeneral  and  applies  equally  well  to  demixtures  computed  using  matrices  of FIR filters. \nThe  quality  of the  result  depends  primarily  on  the  care  with  which  the  inverse  filter  is \nimplemented.  It also depends on  the accuracy  of the estimate for the mixing parameters. \nOne should avoid using the Wiener filter for near-degenerate mixtures. \n\nThe proof of concept for the theory outlined above was obtained using speech signals which \nif anything pose a  greater  challenge  to  separation  algorithms because  of the  correlation \nstructure  of speech.  Two  kinds  of data  are  considered  in  this  paper:  synthetic  direct \npropagation  delay  data  and  synthetic  mUlti-path  data.  Data  can  be  characterized  along \ntwo  dimensions  of difficulty:  synthetic  vs.  real-world,  and  direct  path  vs.  multi-path. \nCombinations along these dimensions represented the main type of data we used. \n\nThe  value of distance between  receivers  dictates the order of delays  that can  appear due \nto direct path propagation, which is used by the demixing algorithms.  Data was generated \nsynthetically  employing  fractional  delays  corresponding  to  the  various  positions  of the \nsources [3]. \n\nWe modeled multi-path by taking into account the decay in signal amplitude due to propa(cid:173)\ngation distance as well as the absorption of waves.  Only the direct path and one additional \npath were considered. \n\nThe algorithms developed proved successful for separation of two voices from direct path \nmixtures, even  where the sources  had very  similar spectral power characteristics,  and  for \nseparation of one source for multi-path mixtures. Moreover, outputs were free from artifacts \nand were obtained with modest computational requirements. \n\nFigure 1 presents mean separation results of the first and second channels, which correspond \nto the first and second  sources, for various synthetic data sets.  Separation depends on the \nangles  of arrival.  Plots  show  no  separation  in  the  degenerate  case  of equal  or closeby \nangles of arrival, but more than  lOdB mean separation in the anechoic case and 5dB in the \nmUlti-path case. \n\n\f780 \n\nJ.  Rosca, J.  6  Ruanaidh, A.  Jourjine and S.  Rickard \n\n50 \n\n.. \nf  .. \nI \nI \ni\" \n\nI ,. \n\n\\ / .~ .. ,,\" . \n\nt \n\nI~ Anechoic F \n\nDoma1 \nAnechoicT~  _ \n.... \n\n'00 \n\n-1\u00b00 \n\nso \n\n,so \n\n-\"-\n\n50 \n\n.. \ni\" \nj\" \nI ,. \n\nI \n\nso \n\n=t~Oomal \n.... \n\n'so \n\n'00 \n\n-\"-\n\n,. \n.. \ni \nI\" \n1.. \n\" f. \n\nI \n\n-6. \n,. \n,. \n\nIf \n\n\" \n\n/ \n\n:-..\\ \n\n'\" \n\n\"-1'-\n\nI .,  =~H \n... \n\n'00 \n\n50 \n\n,,., \n\n... \n-\"-\n\n... \n\n.... \n\ni,. \nI . \n1. \n\" \nf \u00b7 \nI \u00b7 \n\nso \n\n.... :-' .... \n\n: \\ \n\n\" \n\n... \n\n=t~ \n... \n-\"-\n\n'00 \n\n'50 \n\n') \n\n.... \n\n... \n\n210 \n\ni \n\nI \nill \n\n, ' \n\n... \n\n.... \n\n210 \n\nFigure 1:  Two sources were positioned at a relatively large distance from a pair of closely \nspaced  receivers.  The  first  source was  always placed at  zero  degrees  whilst the second \nsource was  moved  uniformly from  30 to 330 degrees in steps of 30 degrees.  The above \nshows mean separation and standard deviation error bars of first and second sources for six \nsynthetic delay mixtures or synthetic mUlti-path data mixtures using the time and frequency \ndomain algorithms. \n\n5  Conclusions \n\nThe present source separation approach is based on minimization of cross-correlations of \nthe estimated sources,  in  the time or frequency  domains,  when  using a delay  model  and \nexplicitly employing dirrection of arrival.  The great advantage of this approach is that it \nreduces  source  separation to  a decorrelation problem,  which is  theoretically solved by  a \nsystem of equations.  Although the delay  model  used  generates essentially anechoic time \ndelay  algorithms,  the results of this work show systematic improvements even  when  the \nalgorithms are applied to real multi-path data.  In all cases separation improvement is robust \nwith respect to the power ratios of sources. \n\nAcknowledgments \n\nWe thank Radu Balan and Frans Coetzee for useful discussions and  proofreading various \nversions of this document  and  our collaborators  within Siemens  for providing extensive \ndata for testing. \n\n\fBroadband DOA Estimation Based on Second Order Statistics \n\n781 \n\nReferences \n[1]  A.  Jourjine, S. Rickard, J. 6 Ruanaidh, and J.  Rosca.  Demixing of anechoic time delay \nmixtures  using  second  order  statistics.  Technical  Report SCR-99-TR-657,  Siemens \nCorporate Research, 755 College Road East, Princeton, New Jersey, 1999. \n\n[2]  Hamid Krim and Mats Viberg.  Two decades of array signal processing research.  IEEE \n\nSignal Processing Magazine,  13(4), 1996. \n\n[3]  Tim Laakso, Vesa Valimaki, Matti Karjalainen, and Unto Laine. Splitting the unit delay. \n\nIEEE Signal Processing Magazine, pages 30-60,1996. \n\n[4]  L. Molgedey and H.G.  Schuster.  Separation of a mixture of independent signals using \n\ntime delayed correlations. Phys.Rev.Lett.,  72(23):3634-3637, July  1994. \n\n[5]  T.  J.  Ngo  and  N.A.  Bhadkamkar.  Adaptive  blind  separation  of audio  sources  by  a \nphysically compact device using second order statistics. In First International Workshop \non leA and BSS, pages 257-260, Aussois, France, January  1999. \n\n[6]  Lucas  Parra,  Clay  Spence,  and  Bert  De Vries.  Convolutive blind source  separation \n\nbased on multiple decorrelation.  In NNSP98,  1988. \n\n[7]  K. Torkolla. Blind separation for audio signals:  Are we there yet?  In First International \nWorkshop on Independent component analysis and blind source separation, pages 239-\n244, Aussois, France, January  1999. \n\n[8]  V.  Van  Veen  and  Kevin  M.  Buckley.  Beamforrning:  A  versatile  approach  to  spatial \n\nfiltering.  IEEE ASSP Magazine, 5(2),  1988. \n\n[9]  E. Weinstein, M. Feder, and A. Oppenheim.  Multi-channel signal separation by decor(cid:173)\n\nrelation.  IEEE Trans. on Speech and Audio Processing,  1 (4):405-413, 1993. \n\n\f", "award": [], "sourceid": 1680, "authors": [{"given_name": "Justinian", "family_name": "Rosca", "institution": null}, {"given_name": "Joseph", "family_name": "Ruanaidh", "institution": null}, {"given_name": "Alexander", "family_name": "Jourjine", "institution": null}, {"given_name": "Scott", "family_name": "Rickard", "institution": null}]}