{"title": "Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement", "book": "Advances in Neural Information Processing Systems", "page_first": 1221, "page_last": 1228, "abstract": null, "full_text": "Bayesian Estimation of Time-Frequency\n\nCoef\ufb01cients for Audio Signal Enhancement\n\nPatrick J. Wolfe \n\nDepartment of Engineering\nUniversity of Cambridge\nCambridge CB2 1PZ, UK\npjw47@eng.cam.ac.uk\n\nSimon J. Godsill\n\nDepartment of Engineering\nUniversity of Cambridge\nCambridge CB2 1PZ, UK\nsjg@eng.cam.ac.uk\n\nAbstract\n\nThe Bayesian paradigm provides a natural and effective means of exploit-\ning prior knowledge concerning the time-frequency structure of sound\nsignals such as speech and music\u2014something which has often been over-\nlooked in traditional audio signal processing approaches. Here, after con-\nstructing a Bayesian model and prior distributions capable of taking into\naccount the time-frequency characteristics of typical audio waveforms,\nwe apply Markov chain Monte Carlo methods in order to sample from the\nresultant posterior distribution of interest. We present speech enhance-\nment results which compare favourably in objective terms with standard\ntime-varying \ufb01ltering techniques (and in several cases yield superior per-\nformance, both objectively and subjectively); moreover, in contrast to\nsuch methods, our results are obtained without an assumption of prior\nknowledge of the noise power.\n\n1 Introduction\n\nNatural sounds can be meaningfully represented as a superposition of translated and\nfrequency-modulated versions of simple functions (atoms). As a result, so-called time-\nfrequency representations are ubiquitous in audio signal processing. The focus of this\npaper is on signal enhancement via a regression in which time-frequency atoms form the\nregressors. This choice is motivated by the notion that an atomic time-frequency decompo-\nsition is the most natural way to split an audio waveform into its constituent parts\u2014such as\nnote attacks and steady pitches for music, voiced and unvoiced speech, and so on. More-\nover, these features, along with prior knowledge concerning their generative mechanisms,\nare most easily described jointly in time and frequency through the use of Gabor frames.\n\n1.1 Gabor Frames\n\nWe begin by brie\ufb02y reviewing the concept of Gabor systems; detailed results and proofs\nmay be found in, for example, [1]. Consider a function \u0001 whose time-frequency support\n\n\u0002 Audio examples described in this paper, as well as Matlab code allowing for their reproduction,\npjw47.\n\nmay be found at the author\u2019s web page: http://www-sigproc.eng.cam.ac.uk/\n\n\u0003\n\fsampling grid over the time-frequency plane. Then (roughly speaking) if \u0001\n\nis centred about the origin, and let \u0001\u0001\nfrequency-shifted (modulation by \u0007\nwell-behaved and the lattice \u0005\n\t\f\u000b\n\n\u0003 denote a time-shifted (translation by \u0004\u0006\u0005 ) and\n\b ) version thereof; such a collection of shifts de\ufb01nes a\nis reasonably\n\b\u0013\u0012 pro-\nis suf\ufb01ciently dense, the Gabor system \r\n\u0001\u000f\u000e\u0010\u0005\u0011\u000e\nvides a (possibly non-orthogonal, or even redundant) series expansion of any function in a\nHilbert space, and is thus said to generate a frame.\n\nis a dictionary of time-frequency shifted versions of\n, having the additional property that there exist constants\n\nMore formally, a Gabor frame \r\na single basic window function \u0001\n\u000e\u0016\u0015\u0018\u0017\u001a\u0019\n \"!\n\n(frame bounds) such that\n\n\u0014\u001c\u001b\u001d\u001b\n\n\u001e\u001f\u001b\u001d\u001b\n\n\u001e\u001f\u001b\u001d\u001b\nis the Hilbert space of functions of interest and '65\n\n\u0003\n$&%\n\n \"!\n\n\u001b(')\u001e\n\n\u0003,+\n\n\u0001*\n\n\u001b-\u001b\n\nwhere 3\nThis property can be understood as an approximate Plancherel formula, guaranteeing com-\npleteness of the set of building blocks in the function space. That is, any signal \u001e/173\ncan\nbe represented as an absolutely convergent in\ufb01nite series of the \u0001\n, or in the \ufb01nite case,\na linear combination thereof. Such a representation is given by the following formula:\n\n /.0\u001e2143\n+ denotes the inner product.\n\n.\n\n,\n\n(1)\n\n\u000e<;\n\n\u0001*\n\n\u0003\n$&%\n\n\u001e98\nis a dual frame for \u0001=\n8?>A@\u0006B\n')\u001e\n\n':\u001e\n\u0003\n$&%\nwhere ;\n. Dual frames exist for any frame; however,\n\u0001*\nthe canonical dual frame, guaranteeing minimal (two-)norm coef\ufb01cients in the expan-\n\u0003 , where >\nsion of (1), is given by ;\nis the frame operator, de\ufb01ned by\n\u0001=\n>C\u001e78ED\n\u0001*\n\u0003F+\nThe notion of a frame thus incorporates bases as well as certain redundant representations;\n8IH ; the union of\nfor example, an orthonormal basis is a tight frame (\u0014G8\n) with \u0014G8\n8JK . Importantly, a\ntwo orthonormal bases yields a tight frame with frame bounds \u0014J8\nkey result in time-frequency theory (the Balian-Low Theorem) implies that redundancy is\na necessary consequence of good time-frequency localisation. 1 However, even with redun-\ndancy, the frame operator may, in certain special cases, be diagonalised. If, furthermore,\nthe \u0001=\n\n\u0003 are normalised in such a case, then analysis and synthesis can take place using\n\nthe same window and inversion of the frame operator is avoided completely. Accordingly,\nDaubechies et al. [2] term such cases \u2018painless nonorthogonal expansions\u2019.\n\n\u0001*\n\n1.2 Short-Time Spectral Attenuation\n\nThe standard noise reduction method in engineering applications is actually such an expan-\nsion in disguise (see, e.g., [3]). In this method, known as short-time spectral attenuation,\na time-varying \ufb01lter is applied to the frequency-domain transform of a noisy signal, using\nthe overlap-add method of short-time Fourier analysis and synthesis. The observed signal\ny is \ufb01rst divided into overlapping segments through multiplication by a smooth, \u2018sliding\u2019\nwindow function, which is non-zero only for a duration on the order of tens of millisec-\n\n), and the resultant N\n\nvectors of spectral values \r Y\n\nonds. The Fourier transform is then taken on each length-L interval (possibly zero-padded\nto length M\n@\u0006B\ncan be plotted side by side to yield a time-frequency representation known as the Gabor\ntransform, or sub-sampled short-time Fourier transform, the modulus of which is the well-\nknown spectrogram. The coef\ufb01cients of this transform are attenuated to some degree in\norder to reduce the noise; as shown in Fig. 1, individual short-time intervals Y\ninverse-transformed, multiplied by a smoothing window, and added together in an appro-\n\n\u0003 are then\npriate manner to form a time-domain signal reconstruction Wx.\n1There is, however, an exception for real signals, which will be explored in more detail inX 3.2.\n\n1PO\u001fQ\n\n\u0003\n$&RS\u0002\n\n\u0002UTUTUT\n\n\u0002\n\b\n\t\n\u0001\n\n\u0002\n\u0003\n\u0012\n\u0014\n#\n\n\u0002\n\u000e\n\u0002\n\u001b\n\u0015\n\u000e\n\u000e\n5\n\n\u0002\n\u0003\n#\n\n\u0002\n\u0001\n\n\u0002\n\u0003\n+\n\u0001\n\n\u0002\n\u0003\n\u0002\n\u0003\n\u0002\n\u0003\n\u0002\n\u0003\n\u0002\n\n\u0002\n\u000e\n\u0002\n\u0002\n\u0003\n\u0015\n\u0015\n\u0015\n\u0002\n\u0003\n\u0012\nB\n\u0002\nV\n\fFigure 1: Short-time spectral attenuation\n\nThis method of noise reduction, while being relatively fast and easily understood, exhibits\nseveral shortcomings: in its most basic form it ignores dependencies between the time-\ndomain data in adjacent short-time blocks, and it assumes knowledge of the noise variance.\nMoreover, previous approaches in this vein have relied (either explicitly or implicitly) on\nindependence assumptions amongst the time-frequency coef\ufb01cients; see, e.g., [4]. Thus,\nwith the aim of improving upon this popular class of audio noise reduction techniques, we\nhave used these approaches as a starting point from which to proceed with a fully Bayesian\nanalysis. As a step in this direction, we propose a Gabor regression model as follows.\n\n2 Coef\ufb01cient Shrinkage for Audio Signal Enhancement\n\n2.1 Gabor Regression\nLet x 1\u0001\u0003\u0002 denote a sampled audio waveform, the observation of which has been corrupted\n , yielding the simple additive model y 8\nby additive white Gaussian noise of variance\nx\nd. We consider regression in this case using a design matrix obtained from a Gabor\nframe.2\nIn our particular case, this choice of regressors is motivated by a desire for constant abso-\nlute bandwidth, as opposed to, e.g., the constant relative bandwidth of wavelets. We do not\nattempt to address here the relative merits of Gabor and wavelet frames per se; rather, we\nsimply note that the changing frequency content of natural sound signals carries much of\ntheir information, and thus a time-frequency representation may well be more appropriate\nthan a time-scale one. Moreover, audio signal enhancement results with wavelets have been\nfor the most part disappointing (witness the dearth of literature in this area), whereas stan-\ndard engineering practice has evolved to use time-varying \ufb01ltering\u2014which is inherently\nGabor analysis.\n\nAlthough space does not permit a discussion of the relevance of Gabor-type transforms\nto auditory perception (see, e.g., [5]), as a \ufb01nal consideration it is interesting to note that\nGabor\u2019s original formulations [6]\u2013[7] were motivated by psychoacoustic as well as infor-\nmation theoretic considerations.\n\n2Technically, we consider the ring\n\nmod\n\n, under the assumption (without loss of gener-\nin a proper way at its\n\nality) that the vector of sampled observations y has been extended to length\nboundary before being periodically extended on\n\n.\n\n\u0006\b\u0007\n\t\u000b\u0006\n\n\u0004\n\u0005\n\f\n\f\n\u0006\n\u0007\n\f2.2 Bayesian Model\nBy the completeness property of Gabor frames, any x 1\ncombination of the elements of the frame. Thus, one has the model\n\ncan be represented as a linear\n\ny 8 Gc\n\nd\u000e\n\nwhere the columns of G 17O\nrepresent the respective synthesis coef\ufb01cients. To complete this model we assume an inde-\npendent, identically distributed Gaussian noise vector, conditionally Gaussian coef\ufb01cients,\nand inverted-Gamma conjugate priors:\n\nform the Gabor synthesis atoms, and elements of c 19O\u0005\u0003\n\n\u0002\u0002\u0001\u0004\u0003\n\n \u0007\u0006\t\b\n\nd \u001b\n c \u0006\u0016\b\n\n\u0019,\u000e\n\u0019,\u000e diag\n\n I\u0012\u000b\n\n c\u0012\u0010\u0012\u000b\n\nc \u001b\u0015\u0014\n\n\u000e\u0013\u0012\n\n \f\u0006\u000e\r\u0010\u000f\n\u0006\t\r\u0010\u000f\n\n\u0012 ,\n\n\u000e\u001f\u001e\n\nwhere diag\r\nto be distributed as in (2) above, and \u0011\n\n c\u0012 denotes a diagonal matrix, the individual elements of which are assumed\n\n\u0017\u0019\u0018\nare hyperparameters. We note that it is\npossible to obtain vague priors through the choice of these hyperparameters; alternatively,\none may wish to incorporate genuine prior knowledge about audio signal behaviour through\nthem. In\n3.2, we consider the case in which frequency-dependent coef\ufb01cient priors are\nspeci\ufb01ed in order to exploit the time-frequency structure of natural sound signals.\n\n\u000e and\n\n\u000e! \n\n\u001b\u001a\u001d\u001c\n\n(2)\n\nis justi\ufb01ed by its \ufb02exibility; for instance,\nThe choice of an inverted-Gamma prior for\nin many audio enhancement applications one may be able to obtain a good estimate of\n\n. However, in order to demonstrate the performance of our model in the \u2018worst-case\u2019\n\nthe noise variance, which may in turn be re\ufb02ected in the choice of hyperparameters \u0011 and\nscenario of little prior information, we assume here a diffuse prior \n\n\u0012 for\n\n .\n\n\u0012\u0016$%$\n\n2.3 Implementation\n\nAs a means of obtaining samples from the posterior distribution and hence the correspond-\ning point estimates, we propose to sample from the posterior using Markov chain Monte\nCarlo (MCMC) methods [8]. By design, all model parameters may be easily sampled from\ntheir respective full conditional distributions, thus allowing the straightforward employ-\nment of a Gibbs sampler [9].\n\nIn all of the experiments described herein, a tight, normalised Hanning window was em-\nployed as the Gabor window function, and a regular time-frequency lattice was constructed\nto yield a redundancy of two (corresponding to the common practice of a 50% window\noverlap in the overlap-add method.) The arithmetic mean of the signal reconstructions\nfrom 1000 iterations (following 1000 iterations of \u2018burn-in\u2019, by which time the sampler\nappeared to have reached a stationary regime in each case) was taken to be the \ufb01nal result.\nAs a further note, colour plots and representative audio examples may be found at the URL\nspeci\ufb01ed on the title page of this paper.\n\nWhile here we show results from random initialisations, with no attempt made to optimise\nparameters, we note that in practice it may be most ef\ufb01cient to initialise the sampler with\nthe Gabor expansion of the noisy observation vector (such an initialisation will indeed\nbe possible without inversion of the frame operator in the cases we consider here, which\n1.2). It can also be expected that,\ncorrespond to the overlap-add method described in\nwhere possible, convergence may be speeded by starting the sampler in regions of likely\nhigh posterior probability, via use of a preliminary noise reduction method to obtain a\nrobust coef\ufb01cient initialisation.\n\n\n\u0002\n\u0005\n\u0004\n\n\u0004\n\u0004\n\n\u0011\nK\nK\n\u0012\n\n\u0014\n\u0004\n \n\u001c\n\u0014\n\u000e\n\u0012\n\"\n#\n\u0004\n \n\u0012\n\u0011\n\u000e\nH\n\u0004\n#\n\f3 Simulations\n\n3.1 Coef\ufb01cient Shrinkage in the Overcomplete Case\n\nTo test the noise reduction capabilities of the Gabor regression model, a speech signal of the\nshort utterance \u2018sound check\u2019, sampled at 11.025 kHz, was arti\ufb01cially degraded with white\nGaussian noise to yield signal-to-noise ratios (SNR) between 0 and 20 dB. At each SNR, ten\nruns of the sampler, at different random initialisations and using different pseudo-random\nnumber sequences, were performed as speci\ufb01ed above. By way of comparison, three stan-\ndard methods of short-time spectral attenuation (the Wiener \ufb01lter, magnitude spectral sub-\ntraction, and the Ephraim and Malah suppression rule (EMSR) [4]) were also tested on the\nsame data (noise variances were estimated from 5 seconds of the noise realisation in these\ncases); the results are shown in Fig. 2, along with estimates of the noise variance averaged\nover each of the ten runs.\n\n25\n\n20\n\n15\n\n10\n\n)\n\nB\nd\n(\n \n\nR\nN\nS\n\n \nt\nu\np\nt\nu\nO\n\n5\n\n0\n0\n\n\u2212 \u2212 Wiener filter rule\n\u2212 Gabor regression\n\u2212 . Magnitude spectral subtraction\n.. Ephraim and Malah rule\n\n10\u22122\n\n10\u22123\n\n)\n\n2\n\n(s\ng\no\nl\n\n10\u22124\n\nTrue noise variance\nEstimated noise variance\n\n5\n\n10\n\nInput SNR (dB)\n\n15\n\n20\n\n10\u22125\n0\n\n5\n\n10\n\nInput SNR (dB)\n\n15\n\n20\n\n(a) Gains and corresponding interpolants.\nIndividual realisations corresponding to the\nten sampler runs are so closely spaced as to\nbe indistinguishable.\n\n(b) True and estimated noise variances\n(each averaged over ten runs of the sampler)\n\nFigure 2: Noise reduction results for the Gabor regression experiment of\n\n3.1\n\nAs it is able to outperform many of the short-time methods over a wide range of SNR (de-\nspite its relative disadvantage of not being given the estimated noise variance), and is also\nable to accurately estimate the noise variance over this range, the results of Fig. 2 would\nseem to indicate the appropriateness of the Gabor regression scheme for audio signal en-\nhancement. However, listening tests reveal that the algorithm, while improving upon the\n1.2, still suffers from the same \u2018musi-\nshortcomings of standard approaches discussed in\ncal\u2019 residual noise. The EMSR, on the other hand, is known for its more colourless residual\nnoise (although as can be seen from Fig. 2, it tends to exhibit severe over-smoothing at\nhigher SNR); we address this issue in the following section.\n\n3.2 Coef\ufb01cient Shrinkage Using Wilson Bases\n\nIn the case of a real signal, it is still possible to obtain good time-frequency localisation\nwithout incurring the penalty of redundancy through the use of Wilson bases (also known\nin the engineering literature as lapped transforms; see, e.g., [1]).\n\n#\n#\n\fAs an example of incorporating basic prior knowledge about audio signal structure in a rel-\natively simple and straightforward manner, now consider letting the scale factor\nof (2) be-\ncome an inverse function of frequency, so that elements of the inverted-Gamma-distributed\n c , although independent, are no longer identically distributed.\ncoef\ufb01cient variance vector \u0014\nTo test the effects of such a frequency-dependent prior in the context of a Wilson regression\n3.1), the speech signal of the\nmodel (in comparison with the diffuse priors employed in\n\nprevious example was degraded with white Gaussian noise of variance H\u0001\n\nan SNR of 10 dB. Once again, posterior mean estimates over the last 1000 iterations of a\n2000-iteration Gibbs sampler run were taken as the \ufb01nal result. Figure 3 shows samples of\nthe noise variance parameter in this case. While both the diffuse and frequency-dependent\n\nK\u0003\u0002\n\n@\u0005\u0004 , to yield\n\nNoise variance, identical prior case\nNoise variance, true value\nFrequency\u2212dependent prior case\n\nx 10\u22124\n\n2\n\n1.8\n\n1.6\n\n2\n\n1.4\n\n1.2\n\n1\n\n1\n\n500\n\n1000\n\nIteration\n\n1500\n\n2000\n\nFigure 3: Noise variance samples for the two Wilson regression schemes of\n\n3.2\n\nprior schemes yield an estimate close to the true noise variance, and indeed give similar\nSNR gains of 3.07 and 2.85 dB, respectively, the corresponding restorations differ greatly\nin their perceptual quality. Figure 4 shows spectrograms of the clean and noisy test signal,\nas well as the resultant restorations; whereas Fig. 5 shows waveform and spectrogram plots\nof the corresponding residuals (for greater clarity, colour plots are provided on-line).\n\nIt may be seen from Figs. 4 and 5 that the residual noise in the case of the frequency-\ndependent priors appears less coloured, and in fact this restoration suffers much less from\nthe so-called \u2018musical noise\u2019 artefact common to audio signal enhancement methods. It is\nwell-known that a \u2018whiter-sounding\u2019 residual is perceptually preferable; in fact, some noise\nreduction methods have attempted this explicitly [10].\n\n4 Discussion\n\nHere we have presented a model for regression of audio signals, using elements of a Gabor\nframe as a design matrix. Note that in alternative contexts, others have also considered\nscale mixtures of normals as we do here (see, e.g., [11]\u2013[12]); in fact, the priors discussed\nin [13] constitute special cases of those employed in the Gabor regression model. This\nmodel may also be extended to include indicator variables, thus allowing one to perform\nBayesian model averaging [8]\u2013[9]. In this case it may be desirable to employ an even larger\n\n\"\n#\n\u000b\nH\n\u0019\ns\n#\n\fOriginal Speech Signal\n\nPosterior Mean Reconstruction,\n\nIdentical Prior Case\n\n5000\n\n4000\n\n3000\n\n2000\n\n1000\n\n0\n\n0\n\n5000\n\n4000\n\n3000\n\n2000\n\n1000\n\n0\n\n0\n\n0.1\n\n0.2\n\n0.3\n\n0.4\n\nDegraded Speech Signal\n\n0.1\n\n0.2\n\n0.3\n\n0.4\n\n5000\n\n4000\n\n3000\n\n2000\n\n1000\n\n0\n\n0\n\n5000\n\n4000\n\n3000\n\n2000\n\n1000\n\n0\n\n0\n\n0.1\n\n0.2\n\n0.3\n\n0.4\n\nPosterior Mean Reconstruction,\nFrequency\u2212Dependent Prior Case\n\n0.1\n\n0.2\n\n0.3\n\n0.4\n\nFigure 4: Spectrograms for the two Wilson regression schemes of\ndiffuse vs. frequency-dependent priors (grey scale is proportional to log-amplitude)\n\n3.2 in the case of\n\n\u2018dictionary\u2019 of regressors, in order to obtain the most parsimonious representation possi-\nble.3 Multi-resolution wavelet-like schemes are one of many possibilities; for an example\napplication in this vein we refer the reader to [14].\n\nThe strength of such a fully Bayesian approach lies largely in its extensibility to allow for\nmore accurate signal and noise models; in this vein work is continuing on the development\nof appropriate conditional prior structures for audio signals, including the formulation of\nMarkov random \ufb01eld models. The main weakness of this method at present lies in the\ncomputational intensity inherent in the sampling scheme; a comparison to more recent and\nsophisticated probabilistic methods (e.g., [15]\u2013[16]) is now in order to determine whether\nthe bene\ufb01ts to be gained from such an approach outweigh its computational drawbacks.\n\nReferences\n\n[1] Gr\u00a8ochenig, K. (2001). Foundations of Time-Frequency Analysis. Boston: Birkh\u00a8auser.\n\n[2] Daubechies, I., Grossmann, A., and Meyer, Y. (1986). Painless nonorthogonal expansions. J.\nMath. Phys. 27, 1271\u20131283.\n\n[3] D\u00a8or\ufb02er, M. (2001). Time-frequency analysis for music signals: a mathematical approach.J. New\nMus. Res. 30, 3\u201312.\n\n[4] Ephraim, Y. and Malah, D. (1984). Speech enhancement using a minimum mean-square error\nshort-time spectral amplitude estimator. IEEE Trans. Acoust., Speech, Signal Processing ASSP-32,\n1109\u20131121.\n\n3It remains an open question as to whether the resultant variable selection problem would be\n\namenable to approaches other than MCMC\u2014for instance, a perfect sampling scheme.\n\n#\n\f0.03\n\n0.02\n\n0.01\n\n0\n\n\u22120.01\n\n\u22120.02\n\n\u22120.03\n\n\u22120.04\n\n0.04\n\n0.02\n\n0\n\n\u22120.02\n\n\u22120.04\n\ne\nd\nu\nt\ni\nl\np\nm\nA\n\n \nl\na\nn\ng\ni\nS\n\nResidual, Identical Prior Case\n\n5000\n\n4000\n\n3000\n\n2000\n\n1000\n\n0\n\n0\n\n0.1\n\n0.2\n\n0.3\n\n0.4\n\n1000\n\n2000\n\n3000\n\n4000\n\n5000\n\nResidual, Frequency\u2212Dependent Prior Case\n\n5000\n\n4000\n\n3000\n\n2000\n\n1000\n\n)\nz\nH\n\n(\n \ny\nc\nn\ne\nu\nq\ne\nr\nF\n\n0\n\n0\n\n0.1\n\n0.2\nTime (s)\n\n0.3\n\n0.4\n\n1000\n\n2000\n\n3000\n\n4000\n\n5000\n\nSample Number\n\nFigure 5: Waveform and spectrogram plots of the Wilson regression residuals\n\n[5] Wolfe, P. J. and Godsill, S. J. (2001). Perceptually motivated approaches to music restoration. J.\nNew Mus. Res. 30, 83\u201392.\n\n[6] Gabor, D. (1946). Theory of communication. J. IEE 93, 429\u2013457.\n\n[7] Gabor, D. (1947). Acoustical quanta and the theory of hearing. Nature 159, 591\u2013594.\n\n[8] Robert, C. P. and Casella, G. (1999). Monte Carlo Statistical Methods. New York: Springer.\n\n[9] Gilks, W. R., Richardson, S., and Spiegelhalter, D. J. (1996). Markov Chain Monte Carlo in\nPractice. London: Chapman & Hall.\n\n[10] Ephraim, Y. and Van Trees, H. L. (1995). A signal subspace approach for speech enhancement.\nIEEE Trans. Speech Audio Processing 3, 251\u2013266.\n\n[11] Shepard, N. (1994). Partial non-Gaussian state space. Biometrika 81, 115\u2013131.\n\n[12] Godsill, S. J. and Rayner, P. J. W. (1998). Digital Audio Restoration: A Statistical Model Based\nApproach. Berlin: Springer-Verlag.\n\n[13] Figueiredo, M. A. T. (2002). Adaptive sparseness using Jeffreys prior. In T. G. Dietterich, S.\nBecker, and Z. Ghahramani (eds.), Advances in Neural Information Processing Systems 14, pp. 697\u2013\n704. Cambridge, MA: MIT Press.\n\n[14] Wolfe, P. J., D\u00a8or\ufb02er, M., and Godsill, S. J. (2001). Multi-Gabor dictionaries for audio time-\nfrequency analysis. In Proc. IEEE Worksh. App. Signal Processing Audio Acoust., pp. 43\u201346.\n\n[15] H. Attias, L. Deng, A. Acero, and J. C. Platt (2001). A new method for speech denoising\nand robust speech recognition using probabilistic models for clean speech and for noise. In Proc.\nEurospeech 2001, vol. 3, pp. 1903\u20131906.\n\n[16] H. Attias, J.C. Platt, A. Acero, and L. Deng (2001). Speech denoising and dereverberation\nusing probabilistic models. In T. Leen (ed.), Advances in Neural Information Processing Systems 13,\npp. 758\u2013764. Cambridge, MA: MIT Press.\n\n\f", "award": [], "sourceid": 2186, "authors": [{"given_name": "Patrick", "family_name": "Wolfe", "institution": null}, {"given_name": "Simon", "family_name": "Godsill", "institution": null}]}