{"title": "Periodic Component Analysis: An Eigenvalue Method for Representing Periodic Structure in Speech", "book": "Advances in Neural Information Processing Systems", "page_first": 807, "page_last": 813, "abstract": null, "full_text": "Periodic Component Analysis: \n\nAn Eigenvalue Method for Representing \n\nPeriodic Structure in Speech \n\nLawrence K. Saul and Jont B. Allen \n\n{lsaul,jba}@research.att.com \n\nAT&T Labs, 180 Park Ave,  Florham Park, NJ 07932 \n\nAbstract \n\nAn  eigenvalue  method  is  developed  for  analyzing  periodic  structure in \nspeech.  Signals are analyzed by  a matrix diagonalization reminiscent of \nmethods  for principal component analysis  (PCA)  and  independent com(cid:173)\nponent analysis (ICA).  Our method-called periodic component analysis \n(1l\"CA)-uses  constructive interference to  enhance periodic  components \nof the frequency  spectrum and  destructive interference  to  cancel  noise. \nThe front end emulates important aspects of auditory processing, such as \ncochlear filtering, nonlinear compression, and insensitivity to phase, with \nthe  aim  of approaching the robustness of human  listeners.  The  method \navoids the inefficiencies of autocorrelation at the pitch period:  it does not \nrequire  long  delay  lines,  and  it correlates  signals  at  a clock rate  on  the \norder  of the  actual pitch,  as  opposed  to  the  original  sampling rate.  We \nderive its cost function and present some experimental results. \n\n1  Introduction \n\nPeriodic structure in the time waveform conveys important cues for recognizing and under(cid:173)\nstanding speech[I].  At the end of an  English  sentence,  for example, rising  versus falling \npitch indicates the asking of a question; in tonal  languages, such as  Chinese, it carries lin(cid:173)\nguistic information.  In fact, early in the speech chain-prior to the recognition of words or \nthe assignment of meaning-the auditory system divides the frequency  spectrum into peri(cid:173)\nodic and non-periodic components.  This division is  geared to  the recognition of phonetic \nfeatures[2].  Thus,  a voiced fricative might be  identified by  the presence of periodicity in \nthe lower part of the spectrum, but not the upper part.  In complicated auditory scenes, peri(cid:173)\nodic components of the spectrum are further segregated by their fundamental frequency [3 ]. \nThis enables  listeners  to  separate simultaneous speakers  and explains  the relative ease of \nseparating male versus female speakers, as opposed to two recordings of the same voice[4]. \n\nThe  pitch  and  voicing  of speech  signals have  been  extensively  studied[5].  The  simplest \nmethod  to  analyze periodicity  is  to  compute  the  autocorrelation function  on  sliding  win(cid:173)\ndows of the speech waveform.  The peaks in the autocorrelation function provide estimates \nof the pitch  and  the degree of voicing.  In clean wideband  speech,  the pitch of a speaker \ncan  be  tracked  by  combining  a  peak-picking  procedure  on  the  autocorrelation  function \nwith some form of smoothing[6], such  as  dynamic  programming.  This method,  however, \n\n\fdoes  not approach  the  robustness  of human listeners  in noise,  and  at  best, it provides  an \nextremely  gross  picture  of the  periodic  structure  in  speech.  It  cannot  serve  as  a  basis \nfor  attacking harder problems  in computational  auditory  scene  analysis,  such  as  speaker \nseparation[7],  which  require  decomposing  the  frequency  spectrum  into  its  periodic  and \nnon-periodic components. \n\nThe correlogram is  a more powerful method for analyzing periodic structure in  speech.  It \nlooks for periodicity in  narrow frequency  bands.  Slaney and Lyon[8]  proposed a percep(cid:173)\ntual  pitch  detector  that  autocorrelates  multichannel  output from  a model  of the  auditory \nperiphery.  The  auditory  model  includes  a cochlear  filterbank  and  periodicity-enhancing \nnonlinearities.  The information in the correlogram is summed over channels to produce an \nestimate of the pitch.  This method has two compelling features:  (i)  by  measuring autocor(cid:173)\nrelation, it produces pitch estimates that are insensitive to phase changes across channels; \n(ii)  by  working in narrow frequency  bands,  it produces estimates that are robust to noise. \nThis method,  however,  also has its  drawbacks.  Computing multiple autocorrelation func(cid:173)\ntions is expensive.  To avoid aliasing in upper frequency  bands,  signals must be correlated \nat  clock rates  much  higher  than  the  actual  pitch.  From a theoretical  point of view,  it is \nunsatisfying that the combination of information across channels is  not derived from some \nprinciple of optimality.  Finally, in the absence of conclusive evidence for long delay  lines \n(~1O ms) in the peripheral auditory system, it seems worthwhile-for both  scientists and \nengineers-to study ways of detecting periodicity that do not depend on autocorrelation. \n\nIn this paper, we develop an eigenvalue method for analyzing periodic structure in speech. \nOur method emulates important aspects of auditory processing but avoids the inefficiencies \nof autocorrelation at the pitch period.  At the same time,  it is highly robust to  narrowband \nnoise and insensitive to phase changes across channels.  Note that while certain aspects of \nthe method are biologically inspired, its details are not intended to be biologically realistic. \n\n2  Method \n\nWe develop the method in four stages. These stages are designed to convey  the main tech(cid:173)\nnical  ideas  of the paper:  (i)  an  eigenvalue  method for  combining  and enhancing  weakly \nperiodic signals; (ii)  the use of Hilbert transforms  to compensate for phase changes across \nchannels; (iii) the measurement of periodicity by efficient sinusoidal fits;  and (iv) the hier(cid:173)\narchical analysis of information across different frequency bands. \n\n2.1  Cross-correlation of critical bands \n\nConsider  the  multichannel  output of a  cochlear filterbank.  If the  input  to  this  filterbank \nconsists  of noisy  voiced  speech,  the  output will consist of weakly  periodic  signals  from \ndifferent critical  bands.  Can  we  combine  these  signals  to  enhance the periodic  signature \nof the speaker's pitch?  We  begin  by  studying a mathematical  idealization of the problem. \nGiven n real-valued signals, {xi(t)}i=l' what linear combination s(t)  =  Li WiXi(t)  max(cid:173)\nimizes the periodic  structure at some fundamental frequency  fa,  or equivalently, at some \npitch period  T  =  1/ fa?  Ideally,  the linear combination  should use constructive interfer(cid:173)\nence to enhance periodic components of the spectrum and destructi ve interference to cancel \nnoise.  We measure the periodicity of the combined signal by the cost function: \n\n( \nc  w, T \n\n-\n\n)  _  Lt Is(t + T)  - s(tW \n\nLt Is(t)12 \n\nwith  s(t)  =  L WiXi(t). \n\n(1) \n\nHere, for  simplicity, we have assumed that the signals are discretely sampled and that the \nperiod T is an integer multiple of the sampling interval. The cost function c (w , T)  measures \nthe normalized prediction error, with the period T  acting as a prediction lag.  Expanding the \n\n\fright hand side in terms of the weights  Wi  gives: \n\ne(W,7) =  L \n\nLij Wi wj Aij(7) \n' \n\nij WiwjBij \n\n(2) \n\nwhere the matrix elements Aij (7) are determined by the cross-correlations, \nAij (7)  = L [Xi(t)Xj(t) + Xi(t + 7)Xj(t + 7)  - Xi(t)Xj (t + 7)  - Xi (t + 7)Xj (t)]  , \n\nt \n\nand  the matrix elements Bij are  the equal-time cross-correlations,  Bij  =  Lt Xi (t)Xj(t). \nNote  that  the  denominator  and  numerator  of  eq.  (2)  are  both  quadratic  forms  in  the \nweights  Wi.  By  the  Rayleigh-Ritz  theorem  of linear algebra,  the  weights  Wi  minimizing \neq.  (2)  are given by  the eigenvector of the matrix  B-1 A( 7)  with  the smallest eigenvalue. \nFor fixed  7, this  solution corresponds to the global minimum of the cost function e( w, 7). \nThus, matrix diagonalization (or simply computing the bottom eigenvector, which is  often \ncheaper) provides a definitive answer to the above problem. \n\nThe matrix diagonalization which optimizes eq. (2) is reminiscent of methods for principal \ncomponent analysis  (PCA)  and  independent component analysis  (IcA)[9].  Our method(cid:173)\nwhich by  analogy we call periodic component analysis (1I'cA)-uses an eigenvalue princi(cid:173)\nple to combine periodicity cues from different parts of the frequency  spectrum. \n\n2.2 \n\nInsensitivity to phase \n\nThe  eigenvalue  method  in  the  previous  section  has  one  obvious  shortcoming:  it cannot \ncompensate for phase changes across channels.  In particular, the real-valued linear combi(cid:173)\nnation 8(t)  =  Li WiX;(t)  cannot align the peaks of signals that are (say) 11'/2  radians out \nof phase, even though such an alignment-prior to combining the signals-would signifi(cid:173)\ncantly reduce the normalized prediction error in eq.  (1). \n\nA  simple  extension  of the  method  overcomes  this  shortcoming.  Given  real-valued  sig(cid:173)\nnals,  {x;(t)} , we consider the analytic signals, {x;(t)}, whose imaginary components are \ncomputed by Hilbert transforms[lO].  The Fourier series of these signals are related by: \n\nX;(t)  =  L D:k  COS(Wkt + \u00a2k) \n\n\u00a2:::::> \n\nx;(t)  = L D:k e;(Wkt+\u00a2k). \n\n(3) \n\nk \n\nk \n\nWe now reconsider the problem of the previous section, looking for the linear combination \nof analytic  signals, 8(t)  =  L; w;x;(t), that minimizes the cost function in eq.  (1).  In this \nsetting, moreover,  we allow the weights W;  to  be complex so that they can compensate for \nphase changes across channels.  Eq. (2) generalizes in a straightforward way to: \n\ne(w ,7)=  L \n\nL;j wi wj A;j(7) \n' \n\n;j w; wjB;j \n\n* \n\n(4) \n\nwhere A ( 7) and B  are Hermitian matrices with matrix elements \nA;j(7) =  L  [x;(t)Xj(t) + x;(t + 7)Xj(t + 7)  - x;(t)Xj(t + 7) - x;(t + 7)Xj(t)] \nand  B;j  = Lt x;(t)Xj(t).  Again,  the  optimal  weights  W;  are  given  by  the  eigenvector \ncorresponding to  the smallest eigenvalue of the matrix B- 1 A ( 7).  (Note that all the eigen(cid:173)\nvalues of this matrix are real because the matrix is Hermitian.) \n\nt \n\nOur analysis  so far suggests  a simple-minded approach to  investigating periodic structure \nin  speech.  In  particular,  consider  the  following  algorithm  for  pitch  tracking.  The  first \nstep of the  algorithm is  to pass speech through a cochlear filterbank and compute analytic \n\n\fsignals, Xi (t), via Hilbert transforms.  The next step is to diagonalize the matrices B- 1 A( T) \non sliding windows of Xi(t)  over a range of pitch periods, T  E [Tmin, Tmaxl.  The final  step \nis  to  estimate the pitch periods  by  the values of T  that minimize the cost function,  eq.  (1), \nfor  each  sliding  window.  One  might expect  such  an  algorithm to  be  relatively robust to \nnoise  (because  it can  zero  the  weights  of corrupted  channels),  as  well  as  insensitive  to \nphase changes across channels (because it can absorb them with complex weights). \n\nDespite  these  attractive features,  the  above  algorithm has  serious  deficiencies.  Its  worst \nshortcoming is the amount of computation needed to estimate the pitch period, T.  Note that \nthe analysis step requires computing n 2 cross-correlation functions, Lt xi(t)x j  (t+T), and \ndiagonalizing the n  x  n  matrix, B- 1 A(T).  This step is unwieldy for three reasons:  (i) the \nburden of recomputing  cross-correlations for  different values  of T,  (ii)  the high  sampling \nrates required to avoid aliasing in upper frequency bands, and (iii) the poor scaling with the \nnumber of channels, n. We address these concerns in the following sections. \n\n2.3  Extracting the fundamental \n\nFurther signal processing is  required to  create multichannel  output whose periodic  struc(cid:173)\nture  can be  analyzed more efficiently.  Our front end,  shown in  Fig. 1,  is  designed to  an(cid:173)\nalyze  voiced  speech  with fundamental  frequencies  in  the range  fa  E  [fmin, fmax] , where \nfmax < 2fmin.  The one-octave restriction on  fa  can be lifted by considering parallel, over(cid:173)\nlapping implementations of our front end for different frequency  octaves. \n\nThe  stages in  our front  end are  inspired by  important aspects  of auditory processing[lO]. \nCochlear filtering  is modeled by a Bark scale filterbank with contiguous passbands.  Next, \nwe compute narrowband envelopes by passing the outputs of these filters through two non(cid:173)\nlinearities:  half-wave rectification and cube-root compression.  These operations are com(cid:173)\nmonly used  to model the compressive unidirectional response of inner hair cells  to move(cid:173)\nment along the basilar membrane.  Evidence for comparison of envelopes in the peripheral \nauditory system comes from experiments on comodulation masking release[ll]. Thus, the \nnext  stage  of our front end  creates  a  multichannel  array  of signals  by  pairwise multiply(cid:173)\ning envelopes from nearby parts of the frequency  spectrum.  Allowed pairs consist of any \ntwo  envelopes,  including  an  envelope  with  itself,  that  might in  principle  contain  energy \nat two consecutive harmonics of the fundamental.  Multiplying these harmonics-just like \nmultiplying two  sine waves-produces intermodulation distortion with energy  at the sum \nand  difference frequencies.  The energy  at the difference frequency  creates  a signature of \n\"residue\" pitch at fa.  The energy at the sum frequency is  removed by bandpass filtering to \nfrequencies  [fmin'!max]  and  aggressively  downsampling to  a sampling rate  fs  = 4fmin. \nFinally, we  use  Hilbert transforms  to  compute the  analytic  signal in each channel,  which \nwe call Xi(t). \nIn  sum,  the  stages of the front end create an  array  of bandlimited analytic  signals,  Xi (t), \nthat-while derived from different parts of the frequency  spectrum-have energy concen(cid:173)\ntrated  at the  fundamental  frequency,  fa.  Note  that  the  bandlimiting  of these  channels  to \nfrequencies  [fmin, fmax]  where  fmax <2fmin  removes  the  possibility  that a channel con(cid:173)\ntains periodic  energy  at  any  harmonic  other than  the fundamental.  In  voiced .speech,  this \nhas the effect that periodic channels contain noisy sine waves with frequency fa. \n\nspeech \n\nwaveform \n\ncochlear \nfilterbank \n\nhalf-wave  Q \nx8. \nrectification; \ncube-root \ncompression \n\n/'0, \n\npairwise \n\nmultiplication \n\nbandlimiting; \n\ndownsampling  =: \n\ncompute \nanalytic \nsignals \n\n~----~  ~----~ \n\nX \n\nFigure 1:  Signal  processing in the front end. \n\n\fHow  can  we  combine  these  \"baseband\"  signals  to  enhance  the  periodic  signature  of  a \nspeaker's  pitch?  The  nature  of these  signals  leads  to  an  important  simplification  of the \nproblem.  As  opposed to measuring the autocorrelation at lag T, as in eq.  (1), here we can \nmeasure the periodicity of the combined signal by a simple sinusoidalfit. Let ~ =  27r fo/ f. \ndenote the  phase  accumulated per sample by  a sine  wave  with  frequency  fo  at  sampling \nrate f., and let S (t)  =  I:i Wi Xi (t)  denote the combined signal.  We measure the periodic(cid:173)\nity of the combined signal by \n\n(\nC  w,u  -\n\nA)  _  I:t Is(t + 1) - s(t)ei~ 12  _  I:ij wiWjAij(~) \n, \n\n(5) \n\nI:t Is(t)12 \n\n-\n\nI:ij Wi WjBij \n\nwhere the matrix B  is  again formed  by  computing  equal-time cross-correlations,  and  the \nmatrix A(~) has elements \n\nAij(~) =  L  [x;(t)Xj(t)+X;(t+l)Xj(t+l)-e-i~x;(t)Xj(t+l)-ei~x;(t+l)xj(t)] . \n\nt \n\nFor  fixed  ~, the  optimal  weights  Wi  are  given  by  the  eigenvector  corresponding  to  the \nsmallest eigenvalue of the matrix B- 1 A( ~). \n\nNote  that optimizing  the  cost function  in  eq.  (5)  over  the phase,  ~, is equivalent to  opti(cid:173)\nmizing  over the  fundamental  frequency,  fo,  or the  pitch  period,  T.  The  structure  of this \ncost function  makes  it much easier to  optimize  than  the  earlier measure  of periodicity in \neq.  (1).  For instance, the matrix elements  Aij(~) depend only on the equal-time and one(cid:173)\nsample-lagged cross-correlations, which do not need to be recomputed for different values \nof ~. Also,  the channels  Xi(t)  appearing in this  cost function are sampled at a clock rate \non the order of fo,  as  opposed to the original sampling rate of the speech.  Thus, the  few \ncross-correlations  that are required can be computed with many  fewer  operations.  These \nproperties lead to  a more efficient algorithm than the one in the previous  section.  The im(cid:173)\nproved algorithm, working with baseband signals, estimates the pitch by optimizing eq. (5) \nover w  and ~ for sliding windows of Xi (t).  One problem still remains, however-the need \nto invert and diagonalize large numbers of n x n matrices, where the number of channels, n, \nmay be prohibitively large. This final  obstacle is removed in the next section. \n\n2.4  Hierarchical analysis \n\nWe  have developed a fast recursive algorithm to  locate  a good approximation to  the  min(cid:173)\nimum of eq.  (5).  The recursive algorithm works  by constructing and diagonalizing  2 x  2 \nmatrices, as opposed to the n  x n  matrices required for an exact solution.  Our approximate \nalgorithm also provides a hierarchical analysis of the frequency spectrum that is interesting \nin its own right.  A sketch of the algorithm is given below. \n\nThe  base  step of the recursion estimates  a value  ~i for  each individual channel by  mini(cid:173)\nmizing the error of a sinusoidal fit: \n\n(6) \n\nThe minimum of the right hand side can be computed by  setting its derivative to  zero and \nsolving a quadratic equation in the variable ei~ \u2022.  If this minimum does  not correspond to \na legitimate value of fo  E  [fmin, fmax],  the ith channel is  discarded from future analysis, \neffectively  setting its  weight Wi  to zero.  Otherwise,  the algorithm passes three arguments \nto a higher level  of the recursion:  the values of ~i and Ci (~i)' and the channel Xi (t)  itself. \nThe  recursive  step  of  the  algorithm  takes  as  input  two  auditory  \"substreams\",  Sl(t) \nand  su(t),  derived  from  \"lower\"  and  \"upper\"  parts  of the  frequency  spectrum,  and  re(cid:173)\nturns  as  output  a  single  combined  stream,  s(t)  =  WISI(t)  + wusu(t).  In  the  first  step \n\n\fFigure 2:  Measures of pitch  (fo) and periodicity (e l )  in nested regions  of the frequency \nspectrum.  The  nodes  in  this  tree describe periodic  structure  in  the  vowel luI from  400-\n1080  Hz.  The  nodes  in  the  first  (bottom)  layer  describe  periodicity  cues  in  individual \nchannels; the nodes in the kth layer measure cues integrated across 2k - l  channels. \n\nof the recursion,  the substreams correspond to individual channels Xi (t),  while in the  kth \nstep,  they  correspond  to  weighted  combinations  of 2k - l  channels.  Associated  with  the \nsubstreams  are phases,  ~I and  ~t\" corresponding to  estimates of fo  from different parts \nof the frequency  spectrum.  The combined stream is  formed by  optimizing eq.(5) over the \ntwo-component weight vector, W  =  [WI , w u ].  Note that the eigenvalue problem in this case \ninvolves only a 2 x 2 matrix, as opposed to an n  x n matrix. The value of ~ determines the \nperiod of the combined stream; in practice, we  optimize it over the interval defined by  ~I \nand ~u. Conveniently, this interval tends to  shrink at each level  of the recursion. \n\nThe  algorithm works  in a  bottom-up  fashion.  Channels  are  combined  pairwise  to  form \nstreams,  which  are  in  turn  combined  pairwise  to  form  new  streams.  Each  stream has  a \npitch period  and a measure of periodicity computed  by  optimizing eq.  (5).  We  order the \nchannels so that streams are derived from contiguous (or nearly contiguous) parts of the fre(cid:173)\nquency  spectrum.  Fig.  2 shows partial output of this recursive procedure for a windowed \nsegment of the  vowel luI.  Note how  as  one  ascends  the  tree,  the  combined  streams have \ngreater periodicity  and  less  variance  in  their pitch  estimates.  This  shows  explicitly  how \nthe algorithm integrates information across  narrow frequency  bands of speech.  The recur(cid:173)\nsive  output  also  suggests  a useful  representation  for  studying problems,  such  as  speaker \nseparation, that depend on grouping different parts of the spectrum by their estimates of fo. \n\n3  Experiments \n\nWe  investigated  the  performance  of our  algorithm in  simple  experiments  on  synthesized \nvowels.  Fig. 3 shows results from experiments on the vowel luI.  The pitch contours in these \nplots were computed by the recursive algorithm in the previous section, with f min  =  80 Hz, \nfmax  =  140 Hz,  and  60  ms  windows  shifted  in  10  ms intervals.  The  solid curves  show \nthe  estimated  pitch  contour for  the  clean  wideband  waveform,  sampled  at  8  kHz.  The \nleft  panel  shows  results  for  filtered  versions  of the  vowel,  bandlimited  to  four  different \nfrequency octaves.  These plots show that the algorithm can extract the pitch from different \nparts of the frequency  spectrum. The right panel shows the estimated pitch contours for the \nvowel in 0 dB  white noise and four types of -20 dB  bandlimited noise. The signal-to-noise \nratios  were computed  from  the  ratio  of (wideband)  speech  energy  to  noise  energy.  The \nwhite  noise  at 0 dB  presents  the most difficulty; by contrast,  the  bandlimited noise leads \nto  relatively  few  failures,  even  at -20 dB.  Overall,  the  algorithm  is  quite  robust to  noise \nand  filtering.  (Note that the particular frequency  octaves used in these experiments had no \nspecial relation to the filters in our front end.) The pitch contours could be further improved \nby some form of smoothing, but this was not done for the plots shown. \n\n\f130 l--~--~-r=======il \n\n1 30 L-~--r=========il \n\nbandhmlted speech \n\nnoisy speech \n\nclean \no dB, white noise \n\n-20 dB, 0250 - 0500  Hz \n-20 dB, 0500 - 1000 Hz \n-20 dB, 1000 - 2000  Hz \n-20 dB, 2000 - 4000  Hz \n\n125 \n\n120 \n\nwide band \n\n0250 - 0500  Hz \n0500 - 1000 Hz \n1000 - 2000  Hz \n2000 - 4000  Hz \n\n125 \n\n120 \n\n90 \n\n~-----=-0'-::-\n\n.2---=-0.':-4 ---=-0.':-6 ---::'0.-=-B -----: \n\n90 \n\nL----::'o .-=-2 ---::'o.~4 ---::'0.~6 --~0.B~-----: \n\ntime (sec) \n\ntime (sec) \n\nFigure 3:  Tracking the pitch of the vowel lui in corrupted speech. \n\n4  Discussion \n\nMany aspects of this work need refinement.  Perhaps the most important is the initial filter(cid:173)\ning into narrow frequency bands.  While narrow filters have the ability to resolve individual \nharmonics, overly narrow filters-which reduce all speech input to sine waves~o not ad(cid:173)\nequately differentiate periodic  versus noisy excitation.  We hope to replace the Bark scale \nfilterbank in Fig.  1 by one that optimizes this tradeoff.  We also want to incorporate adapta(cid:173)\ntion and gain control into the front end, so as  to improve the performance in non stationary \nlistening conditions.  Finally, beyond the  problem of pitch tracking,  we intend to  develop \nthe hierarchical representation shown in Fig. 2 for harder problems in phoneme recognition \nand speaker separation[7].  These harder problems seem to require a method, like ours, that \ndecomposes the frequency  spectrum into its periodic and non-periodic components. \n\nReferences \n\n[1]  Stevens, K.  N.  1999. Acoustic Phonetics. MIT Press:  Cambridge, MA. \n[2]  Miller, G. A. and Nicely, P. E.  1955. An analysis of perceptual confusions among some English \n\nconsonants. Journal of the Acoustical Society of America 27, 338- 352. \n\n[3]  Bregman,  A.  S.  1994. Auditory  Scene  Analysis:  the  Perceptual Organization  of Sound.  MIT \n\nPress: Cambridge, MA. \n\n[4]  Brokx, J. P. L. and Noteboom, S. G. 1982. Intonation and the perceptual separation of simulta(cid:173)\n\nneous voices.  J.  Phonetics 10, 23- 26. \n\n[5]  Hess,  W.  1983.  Pitch  Determination  of Speech Signals:  Algorithms  and Devices. Springer(cid:173)\n\nVerlag. \n\n[6]  Talkin, D. 1995. A Robust Algorithm for Pitch Tracking (RAPT). In Kleijn, W. B. and Paliwal, \n\nK. K.  (Eds.), Speech Coding and Synthesis , 497- 518. Elsevier Science. \n\n[7]  Roweis, S.  2000.  One microphone source separation. In Tresp, v., Dietterich, T.,  and Leen, T. \n(Eds.), Advances in Neural Information Processing Systems 13. MIT Press:  Cambridge, MA. \n\n[8]  Slaney, M. and Lyon, R. F.  1990. A perceptual pitch detector. In  Proc. ICASSP-90, 1, 357- 360. \n[9]  Molgedey, L. and Schuster, H.  G.  1994. Separation of a  mixture of independent signals using \n\ntime delayed correlations. Phys.  Rev.  Lett.  72(23), 3634-3637. \n\n[10]  Hartmann, W.  A.  1997. Signals, Sound,  and Sensation.  Springer-Verlag. \n[11]  Hall, J. w., Haggard, M. P., and Fernandes, M. A.  1984. Detection in noise by spectro-temporal \n\npattern analysis. J. Acoust. Soc. Am. 76,50- 56. \n\n\f", "award": [], "sourceid": 1939, "authors": [{"given_name": "Lawrence", "family_name": "Saul", "institution": null}, {"given_name": "Jont", "family_name": "Allen", "institution": null}]}