{"title": "Bayesian Nonparametric Spectral Estimation", "book": "Advances in Neural Information Processing Systems", "page_first": 10127, "page_last": 10137, "abstract": "Spectral estimation (SE) aims to identify how the energy of a signal (e.g., a time series) is distributed across different frequencies. This can become particularly challenging when only partial and noisy observations of the signal are available, where current methods fail to handle uncertainty appropriately. In this context, we propose a joint probabilistic model for signals, observations and spectra, where SE is addressed as an inference problem. Assuming a Gaussian process prior over the signal, we apply Bayes' rule to find the analytic posterior distribution of the spectrum given a set of observations. Besides its expressiveness and natural account of spectral uncertainty, the proposed model also provides a functional-form representation of the power spectral density, which can be optimised efficiently. Comparison with previous approaches is addressed theoretically, showing that the proposed method is an infinite-dimensional variant of the Lomb-Scargle approach, and also empirically through three experiments.", "full_text": "Bayesian Nonparametric Spectral Estimation\n\nFelipe Tobar\n\nUniversidad de Chile\n\nftobar@dim.uchile.cl\n\nAbstract\n\nSpectral estimation (SE) aims to identify how the energy of a signal (e.g., a time\nseries) is distributed across different frequencies. This can become particularly\nchallenging when only partial and noisy observations of the signal are available,\nwhere current methods fail to handle uncertainty appropriately. In this context, we\npropose a joint probabilistic model for signals, observations and spectra, where SE\nis addressed as an exact inference problem. Assuming a Gaussian process prior\nover the signal, we apply Bayes\u2019 rule to \ufb01nd the analytic posterior distribution of\nthe spectrum given a set of observations. Besides its expressiveness and natural\naccount of spectral uncertainty, the proposed model also provides a functional-form\nrepresentation of the power spectral density, which can be optimised ef\ufb01ciently.\nComparison with previous approaches, in particular against Lomb-Scargle, is\naddressed theoretically and also experimentally in three different scenarios.\nCode and demo available at github.com/GAMES-UChile.\n\n1\n\nIntroduction\n\nThe need for frequency representation arises naturally in a number of disciplines such as natural\nsound processing [1, 2], astrophysics [3], biomedical engineering [4] and Doppler-radar data analysis\n[5]. When the signal of interest is known without uncertainty, the frequency representation can\nbe obtained by means of the Fourier transform [6]. However, real-world applications usually only\nprovide us with a limited number of observations corrupted by noise. In this sense, the main challenge\nin Spectral Estimation (SE) comes from the fact that, due to the convolutional structure of the Fourier\ntransform, the uncertainty related to missing, noisy and unevenly-sampled data propagates across\nthe entire frequency domain. In this article, we take a probabilistic perspective to SE, thus aiming to\nquantify uncertainty in a principled manner.\nClassical\u2014yet still widely used\u2014methods for spectral estimation can be divided in two categories.\nFirst, parametric models that impose a deterministic structure on the latent signal, which result in a\nparametric form for the spectrum [7\u20139]. Second, nonparametric models that do not assume structure\nin the data, such as the periodogram [10] computed through the Fast Fourier Transform (FFT) [11].\nUncertainty is not inherently accounted for in either of these approaches, although one can equip\nparameter estimates with error bars in the \ufb01rst case, or consider subsets of training data to then\naverage over the estimated spectra.\nDespite the key role of the frequency representation in various applications as well as recent advances\nin probabilistic modelling, the Bayesian machinery has not been fully exploited for the construction\nof rigorous and meaningful SE methods. In particular, our hypothesis is that Bayesian nonparametric\nmodels can greatly advance SE theory and practice by incorporating temporal-structure parameter-\nfree generative models, inherent uncertainty representation, and a natural treatment of missing and\nnoisy observations. Our main contribution is then to propose a nonparametric joint generative model\nfor a signal and its spectrum, where SE is addressed by solving an exact inference problem.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\f2 Background\n\n2.1 Prior art, current pitfalls and desiderata\n\nThe beginning of a principled probabilistic treatment of the spectral estimation problem can be\nattributed to E.T. Jaynes, who derived the discrete Fourier transform using Bayesian inference [12].\nThen, G.L. Bretthorst proposed to place a prior distribution over spectra and update it in the light of\nobserved temporal data, for different time series models [13]. This novel approach, in the words of\nP.C. Gregory, meant a Bayesian revolution in spectral analysis [14]. The so developed conceptual\nframework paved the way for a plethora of methods addressing spectral estimation as (parametric)\nBayesian inference. In this context, by choosing a parametric model for time series with closed-form\nFourier transform, a Bayesian treatment provides error bars on the parameters of such a model and,\nconsequently, error bars on the parametric spectral representation, e.g., [15\u201317].\nWithin Bayesian nonparametrics, the increasing popularity and ease of use of Gaussian processes (GP,\n[18]), enabled [19, 20] to detect periodicities in time series by (i) \ufb01tting a GP to the observed data, and\nthen (ii) analysing the so learnt covariance kernel, or equivalently, its power spectral density (PSD).\nAlthough meaningful and novel, this GP-based method has a conceptual limitation when it comes to\nnonparametric modelling: though a nonparametric model is chosen for the time series, the model for\nthe PSD (or kernel) is still only parametric. Bayesian nonparametric models for PSDs can be traced\nback to [21], which constructed a prior directly on PSDs using Bernstein polynomials and a Dirichlet\nprocess, and more recently to [22, 23], which placed a prior on covariance kernels by convolving a\nGP with itself. Yet novel, both these methodologies produced intractable posteriors for the PSDs,\nwhere the former relied on Monte Carlo methods and the latter on variational approximations.\nThe open literature is lacking a framework for spectral estimation that is:\n\n\u2022 Nonparametric, thus its complexity grows with the amount of data.\n\u2022 Bayesian, meaning that it accounts for its own uncertainty.\n\u2022 Tractable, providing exact solutions at low computational complexity.\n\nWe aim to ful\ufb01l these desiderata by modelling time series and their spectra, i.e., Fourier transform,\nusing Gaussian processes. A key consequence of using GPs is that missing/unevenly-sampled\nobservations are naturally handled.\n\n(cid:90)\n\nX\n\n\u2212j2\u03c0\u03betdt\n\nf (t)e\n\nF (\u03be) = F {f} (\u03be) (cid:44)\n\n2.2 The Fourier transform\nLet us consider a signal, e.g., a time series or an image, de\ufb01ned by the function f : X (cid:55)\u2192 R, where\nfor simplicity we will assume X = R. The spectrum of f (t) is given by its Fourier transform [6]\n(1)\nfor F (\u03be) to exist, f (t) is required to be Lebesgue integrable, that is,(cid:82)\n\nwhere j is the imaginary unit and the frequency \u03be is the argument of the function F (\u00b7). Notice that\nObserve that F (\u03be) is the inner product between the signal f (t) and the Fourier operator e\u2212j2\u03c0\u03bet =\ncos(2\u03c0\u03bet)\u2212 j sin(2\u03c0\u03bet), therefore, the complex-valued function F (\u03be) contains the frequency content\nof the even part (cf. odd part) of f (t) in its real part (cf. imaginary part). We also refer to the square\nabsolute value S(\u03be) = |F (\u03be)|2, which comprises the total frequency content at frequency \u03be, as the\npower spectral density (PSD).\nCalculating the integral in eq. (1) is far from trivial for general Lebesgue-integrable signals f (t).\nThis has motivated the construction of parametric models for SE that approximate f (\u00b7) by analytic\nexpressions that admit closed-form Fourier transform such as sum of sinusoids [8], autoregressive\nprocesses [9] and Hermite polynomials. The proposed method will be inspired in this rationale: we\nwill use a stochastic-process model for the signal (rather than a parametric function), to then apply\nthe Fourier transform to such process and \ufb01nally obtain a stochastic representation of the spectrum.\nA family of stochastic processes that admit closed-form Fourier transform is presented next.\n\nX |f (t)|dt < \u221e.\n\n2\n\n\f2.3 Gaussian process priors over functions\n\nThe Gaussian process (GP [18]) is the in\ufb01nite-dimensional generalisation of the multivariate normal\ndistribution. Formally, the stochastic process f (t) is a GP if and only if for any \ufb01nite collection of\ni=1, N \u2208 N, the scalar random variables {f (ti)}N\ni=1 are jointly Gaussian. A GP f (t) with\ninputs {ti}N\nmean m function and covariance kernel K will be denoted as\nf (t) \u223c GP(m, K)\n\n(2)\nwhere we usually assume zero (or constant) mean, and a kernel function K(t, t(cid:48)) denoting the\ncovariance between f (t) and f (t(cid:48)). The behaviour of the GP is encoded in its covariance function, in\nparticular, if the GP f (t) is stationary, we have K(t, t(cid:48)) = K(t \u2212 t(cid:48)) and the PSD of f (t) is given by\nS(\u03be) = F {K(t)} (\u03be) [24]. The connection between temporal and frequency representations of GPs\nhas aided the design of the GPs to have speci\ufb01c (prior) harmonic content in both parametric [25\u201329]\nand non-parametric [22, 23] ways.\nGPs are \ufb02exible nonparametric models for functions, in particular, for latent signals involved in SE\nsettings. Besides their strength as a generative model, there are two key properties that position\nGPs as a sound prior within SE: \ufb01rst, as the Fourier transform is a linear operator, the Fourier\ntransform of a GP (if it exists) is also a (complex-valued) GP [30, 31] and, critically, the signal and\nits spectrum are jointly Gaussian. Second, Gaussian random variables are closed under conditioning\nand marginalisation, meaning that the exact posterior distribution of the spectrum conditional to a set\nof partial observations of the signal is also Gaussian. This turns the SE problem into an inference one\nwith two new challenges: to \ufb01nd the requirements for the existence of the spectrum of a GP, and to\ncalculate the statistics of the posterior spectrum given the (temporal) observations.\n\n3 A joint generative model for signals and their spectra\n\nThe proposed model is presented through the following building blocks: (i) a GP model for the\nlatent signal, (ii) a windowed version of the signal for which the Fourier transform exists, (iii)\nthe closed-form posterior distribution of the windowed-signal spectrum, and (iv) the closed-form\nposterior power spectral density.\n\n3.1 De\ufb01nition of the local spectrum\nWe place a stationary GP prior over f (t) \u223c GP(0, K) and model the observations as evaluations of\nf (t) corrupted by Gaussian noise, denoted by y = [y(ti)]N\ni=1. This GP model follows the implicit\nstationarity assumption adopted when computing the spectrum via the Fourier transform. However,\nnotice that the draws of a stationary GP are not Lebesgue integrable almost surely (a.s.) and therefore\ntheir Fourier transforms do not exist a.s. [32]. We avoid referring to the spectrum of the complete\nsignal and only focus on the spectrum in the neighbourhood of a centre c. Then, we can then choose\ni=1 to\nan arbitrarily-wide neighbourhood (as long as it is \ufb01nite), or consider multiple centres {ci}Nc\nform a bank of \ufb01lters. We refer to the spectrum in such neighbourhood as the local spectrum and\nde\ufb01ne it through the Fourier transform as\n\nFc(\u03be) (cid:44) F {fc(t)} = F\n\nf (t \u2212 c)e\n\n\u2212\u03b1t2(cid:111)\n\n(cid:110)\n\n(cid:90)\n\ne\n\nR\n\n(3)\n\n(4)\n\nwhere fc(t) = f (t\u2212 c)e\u2212\u03b1t2 is a windowed version of the signal f (t) centred at c with width 1/\u221a2\u03b1.\n(cid:90)\nObserve that since fc(t) decays exponentially for t \u2192 \u00b1\u221e, it is in fact Lebesgue integrable:\nR |fc(t)|dt =\n< \u221e a.s.\n\n(cid:114) \u03c0\n\u2212\u03b1t2dt = max(|f|)\n\n|dt < max(|f|)\n\nR |f (t \u2212 c)e\n\n\u2212\u03b1t2\n\n(cid:90)\n\n\u03b1\n\nsince the max(|f|) is \ufb01nite a.s. due to the GP prior. As a consequence, the local spectrum Fc {f (t)}\nexists and it is \ufb01nite.\nThe use of windowed signals is commonplace in SE, either as a consequence of acquisition de-\nvices or for algorithmic purposes (as in our case). In fact, windowing allows for a time-frequency\nrepresentation, meaning that the signal does not need to be stationary but only piece-wise station-\nary, i.e., different centres ci might have different spectra. Finally, we clarify that the choice of a\nsquare-exponential window e\u2212\u03b1t2 obeys to tractability of the statistics calculated in the next section.\n\n3\n\n\fA summary of the proposed generative model is shown in eqs. (5)-(8) and a graphical model\nrepresentation is shown in \ufb01g. 1.\n\nlatent signal:\nobservations:\nwindowed signal:\n\nf (t) \u223c GP(0, K)\ny(ti) = f (ti) + \u03b7i, \u03b7i \u223c N (0, \u03c32\nfc(t) = e\n\n\u2212\u03b1t2\n\nf (t \u2212 c)\nlocal spectrum: Fc(\u03be) (cid:44) F {fc(t)} = F\n\n(cid:110)\n\nn),\u2200i = 1 . . . N,\n\n\u2212\u03b1t2(cid:111)\n\n(cid:90)\n\nf (t \u2212 c)e\n\n=\n\nR\n\n\u2212\u03b1t2\n\ne\n\nf (t \u2212 c)e\n\n(5)\n(6)\n(7)\n\u2212j2\u03c0\u03betdt (8)\n\nFigure 1: Proposed model for a latent signal f (t), observations y(t), a windowed version fc(t) and\nlocal spectrum Fc(\u03be). We have considered N observations and C centres.\n\n3.2 The local-spectrum Gaussian process\nAs a complex-valued linear transformation of f (t) \u223c GP, the local spectrum Fc(\u03be) is a complex-GP\n[31, 30] and thus completely determined by its covariance and pseudocovariance [33] given by\n(9)\n(10)\n\n\u2217\n) = E [Fc(\u03be)F\nc (\u03be)] = E [Fc(\u03be)Fc(\u2212\u03be\n) = E [Fc(\u03be)Fc(\u03be\n\nKF (\u03be, \u03be\nPF (\u03be, \u03be\n\n)]\n\n(cid:48)\n(cid:48)\n\n)\n\n(cid:48)\n\n(cid:48)\n\n(cid:48)\n\n)] = KF (\u03be,\u2212\u03be\n\nwhere the last identities in each line are due to the fact that the latent function f (t) is real valued.\nRecall that we are ultimately interested in the real and imaginary parts of the local spectrum ((cid:60)Fc(\u03be)\nand (cid:61)Fc(\u03be) respectively) which are in fact real-valued GPs. However, we will calculate the statistics\nof the complex-valued Fc(\u03be) for notational simplicity, to then calculate the statistics of the real-valued\nprocesses (cid:60)Fc(\u03be) and (cid:61)Fc(\u03be) according to:\n\n2 (KF (\u03be, \u03be\n2 (KF (\u03be, \u03be\n\ncovariance((cid:60)Fc(\u03be)) = Krr(\u03be, \u03be\ncovariance((cid:61)Fc(\u03be)) = Kii(\u03be, \u03be\ncovariance((cid:60)Fc(\u03be),(cid:61)Fc(\u03be)) = Kri(\u03be, \u03be\n\n(11)\n(12)\n(13)\nThe above expressions are due to the identity in eq. (10) and the fact that both KF (\u03be, \u03be(cid:48)) and\nKF (\u03be,\u2212\u03be(cid:48)) are real-valued. The relationship between the covariance of a GP and the covariance of\nthe spectrum of such GP is given by the following proposition\nProposition 1 The covariance of the local spectrum Fc(\u03be) of a stationary signal f (t) \u223c GP(0, K(t))\nis given by\n\n) = 1\n) = 1\n) = Kir(\u03be, \u03be\n\n) + KF (\u03be,\u2212\u03be\n) \u2212 KF (\u03be,\u2212\u03be\n\n) = 0.\n\n(cid:48)\n(cid:48)\n(cid:48)\n\n))\n\n))\n\n(cid:48)\n(cid:48)\n\n(cid:48)\n(cid:48)\n\n(cid:48)\n\n(cid:114)\n\n(cid:33)(cid:12)(cid:12)(cid:12)(cid:12)\u03c1= \u03be+\u03be(cid:48)\n\n(cid:114) \u03c0\nwhere K(\u03be) = F {K(t)} (\u03be) =(cid:82)\n\nKF (\u03be, \u03be\n\n) =\n\n(cid:48)\n\n\u2212 \u03c02\n\n2\u03b1 (\u03be\u2212\u03be\n\n(cid:48))2\n\ne\n\nK(\u03c1) \u2217\n\n\u2212 2\u03c02\n\n\u03b1 \u03c12\n\n2\u03c0\n\u03b1\n\ne\n\n2\u03b1\nR K(t)e\u2212j2\u03c0\u03betdt is the Fourier transform of the kernel K. Equiva-\nlently, as pointed out in eq. (10), the pseudocovariance is given by replacing the above expression in\nPF (\u03be, \u03be(cid:48)) = KF (\u2212\u03be, \u03be(cid:48)).\nSee the proof in Section 1.1 of the supplementary material. Notice that the covariance of the local\nspectrum KF is a sequence of linear transformations of the covariance of the signal K according to:\n(i) the Fourier transform due to the domain change, (ii) convolution with e\u2212 2\u03c02\n\u03b1 \u03c12 due to windowing\neffect, and (iii) a smoothness factor e\u2212 \u03c02\n(cid:48))2 that depends on the window width; this means that\n\n2\u03b1 (\u03be\u2212\u03be\n\n2\n\n(14)\n\n(cid:32)\n\n4\n\nf(\u00b7)fc(\u00b7)\u03b1yFc(\u00b7)PSDmKtGPNC\ffor wider windows the values of the local spectrum at different frequencies become independent.\nCritically, observe that each of the Gaussian functions in eq. (14) are divided by their normalising\nconstants, therefore the norm of KF is equal to the norm K, which is in turn equal to the norm of the\ncovariance of the signal K due to the unitary property of the Fourier transform.\nWith an illustrative purpose, we evaluate KF for the Q-component spectral mixture (SM) kernel [26]\n\nthe Fourier transform of which is known explicitly and given by\n\n(cid:62)\nq \u03c4 )\n\n\u2212\u03b3q\u03c4 2(cid:1) cos(2\u03c0\u03b8\n\uf8f6\uf8f8 =\n(cid:88)\nQ(cid:88)\n\nq=1\n\n\u03b8=\u00b1\u03b8q\n\nq exp(cid:0)\n\n\u03c32\n\nQ(cid:88)\n\nq=1\n\nKSM(\u03c4 ) =\n\n\uf8eb\uf8ed e\nQ(cid:88)\n\n\u2212 \u03c02\n\n\u03b3q\n\n(\u03be\u2212\u03b8q)2\n\n\u2212 \u03c02\n\n\u03b3q\n\n+ e\n2\n\n(\u03be+\u03b8q)2\n\n(cid:88)\n\nq=1\n\n\u03b8=\u00b1\u03b8q\n\n2(cid:112)\u03b1(\u03b1 + 2\u03b3q)\n\n\u03c32\nq \u03c0\n\n(cid:114) \u03c0\n\n\u03b3q\n\n\u03c32\nq\n\nQ(cid:88)\n\nq=1\n\nKSM(\u03be) =\n\n(cid:48)\n\nKF (\u03be, \u03be\n\n) =\n\n(15)\n\n(cid:114) \u03c0\n\n\u03b3q\n\n\u03c32\nq\n2\n\n\u2212 \u03c02\ne\n\n\u03b3q\n\n(\u03be\u2212\u03b8q)2\n\n. (16)\n\nFor this SM kernel, the covariance kernel of the local-spectrum process is (see supp. mat., \u00a71.2)\n\n\u2212 \u03c02\n\n2\u03b1 (\u03be\u2212\u03be\n\n(cid:48))2\n\ne\n\n\u2212 2\u03c02\n\n\u03b1+2\u03b3q\n\ne\n\n.\n\n(17)\n\n(cid:16) \u03be+\u03be(cid:48)\n2 \u2212\u03b8q\n\n(cid:17)2\n\nWith the explicit expression of KF in eq. (17) and the relationships in eqs. (9)-(13), we can compute\nthe statistics of the real and imaginary parts of the local spectrum and sample from it. Fig. 2 shows\nthese covariances and 3 sample paths revealing the odd and even properties of the covariances.\n\nFigure 2: Covariance and sample paths of the local-spectrum of a SM signal with Q = 1, \u03c3q =\n1, \u03b3q = 5e \u2212 3, \u03b8q = 2.5, \u03b1 = 5e \u2212 5. Real (cf. imaginary) part shown in the left (cf. right) half.\n\nJoint samples and the conditional density p(Fc(\u03be)|y)\n\n3.3\nAlthough the joint distribution over the signal f (t) and its local spectrum Fc(\u03be) is Gaussian, sampling\ndirectly from this joint distribution is problematic due to the deterministic relationship between the\n(complete and noiseless) signal f and its local spectrum. We thus proceed hierarchically: we \ufb01rst\nsample y \u223c GP(t; 0, K), y \u2208 RN , and then Fc(\u03be) \u223c p(Fc|y), where the posterior is normally-\ndistributed with mean and covariance given respectively by\n\u22121y\n(18)\n(19)\n\nE [Fc(\u03be)|y] = K\n\u2217\nc (\u03be)Fc(\u03be\n\n)|y] = KF (\u03be, \u03be\n\n) \u2212 K\nwhere KyFc(t, \u03be) is presented in the next proposition.\nProposition 2 The covariance KyFc(t, \u03be) between the observations y at times t coming from a\nstationary signal f (t) \u223c GP(0, K) and its local spectrum at frequency \u03be is given by\n\n(cid:62)\nyFc (t, \u03be)K(t, t)\n\n(cid:62)\nyFc(t, \u03be)K(t, t)\n\n\u22121KyFc(t, \u03be)\n\nE [F\n\n(cid:48)\n\n(cid:48)\n\nKyFc(t, \u03be) = E [y\n\nwhere K(\u03be) = F {K(t)} (\u03be) =(cid:82)\n\n\u2217\nc (t)Fc(\u03be)] = K(\u03be)e\n\n\u2212j2\u03c0\u03bet \u2217\n\n\u2212 \u03c02\u03be2\n\n\u03b1\n\ne\n\n(20)\n\nR K(t)e\u2212j2\u03c0\u03betdt is the Fourier transform of the kernel K.\n\n5\n\n(cid:114) \u03c0\n\n\u03b1\n\n061218243036424854606672frequency 0610162026303640465056606670frequency covariance of real part0.300.150.000.150.305.02.50.02.55.0frequency 1.00.50.00.51.0sample paths (real)061218243036424854606672frequency 0610162026303640465056606670covariance of imag. part0.300.150.000.150.305.02.50.02.55.0frequency 1.51.00.50.00.51.01.5sample paths (imag.)\fSee the proof in Section 1.3 of the supplementary material. Notice that the convolution against\ne\u2212 \u03c02\u03be2\nFor the SM kernel, shown in eq.(15), KyFc becomes (details in supp. mat., \u00a71.4)\n\nis also due to the windowing effect and that the norms of KyFc and K are equal.\n\n\u03b1\n\n(cid:18)\n\n(cid:88)\n\nQ(cid:88)\n\nq=1\n\n2(cid:112)\u03c0(\u02dc\u03b1 + \u02dc\u03b3q)\n\n\u03c32\nq\n\nKyF =\n\n(\u03be \u2212 \u03b8q)2\n\u02dc\u03b1 + \u02dc\u03b3q\nwhere \u02dc\u03b1 = \u03b1/\u03c02, \u02dc\u03b3q = \u03b3q/\u03c02 and Lq = (\u02dc\u03b1\u22121 + \u02dc\u03b3\u22121\njoint samples of the signal and its spectrum (colour-coded).\n\n\u03b8=\u00b1\u03b8q\n\nexp\n\nexp\n\n\u2212\n\n(cid:19)\n\n(cid:18)\n\n(cid:19)\n\n(cid:18)\n\n\u03c02t2\nLq\n\n\u2212\n\nexp\n\n\u2212j\n\n2\u03c0t\nLq\n\n+\n\n\u03be\n\u02dc\u03b1\n\n(cid:19)(cid:19)\n\n(cid:18) \u03b8q\n\n\u02dc\u03b3q\n\nq )\u22121. Fig. 3 shows this covariance together with\n\nFigure 3: Hierarchical sampling. From left to right: Signal samples (solid) and window (dashed),\ncovariance KyF for the SM, real-part local-spectrum samples, imaginary-part local-spectrum. Pa-\nrameters were Q = 1, \u03c3q = 1, \u03b3q = 2, \u03b8q = 2.5, \u03b1 = 1. Notice how KyFc(t, \u03be) vanishes as the\nfrequency \u03be departs from \u03b8q.\n\nWe conclude this section with the following result.\nProposition 3 The power spectral density of a stationary signal f (t) \u223c GP(0, K), conditional to a\nset of observations y, is a \u03c72-distributed stochastic process and its mean is known in closed form.\n\nThis result follows from the fact the (posterior) real and imaginary parts of the spectrum are inde-\npendent Gaussian process with explicit mean and covariance. This is a critical contribution of the\nproposed model, where the search for periodicities can be performed by optimising a closed-form\nexpression which has a linear evaluation cost.\n\n4 Spectral estimation as Bayesian inference\n\nHenceforth, the proposed method for Bayesian nonparametric spectral estimation will be referred to\nas BNSE. This section analyses BNSE in terms of interpretability, implementation, and connection\nwith other methods.\n\n4.1 Training and computational cost\n\nBNSE can be interpreted as \ufb01tting a continuous-input interpolation to the observations, computing\nthe Fourier transform of the interpolation and \ufb01nally average over all the possibly in\ufb01nitely-many\ninterpolations. Consequently, as our interpolation is a GP, both the Fourier transform and the in\ufb01nite\naverage can be performed analytically. Within BNSE, \ufb01nding the appropriate interpolation family\nboils down to selecting the model hyperparameters, where the GP prior protects the model from\nover\ufb01tting [18]. In this regard, the proposed BNSE can readily rely upon state-of-the-art training\nprocedures for GPs and bene\ufb01t from sparse approximations for computationally-ef\ufb01cient training.\nFinally, as the hyperparameters of the posterior spectrum are given by those of the GP in the time\ndomain, computing the posterior local spectrum poses no additional computational complexity.\n\n4.2 Model consistency and interpretation\n\nThe problem of global (rather than local) SE can be addressed by choosing an arbitrarily-wide\nwindow. However, as pointed out in Section 3.1 recall that the local-spectrum process is not de\ufb01ned\n\n6\n\n1.00.50.00.51.0time index321012signals081624324048566472808896time index04812162024283236404448525660covariance of real part0.40.20.00.20.4012345frequency 0.750.500.250.000.250.500.75real part of spectrum012345frequency 1.00.50.00.51.0imaginary part of spectrum\f(cid:18)\n\nfor \u03b1 \u2192 0, since it turns into the sum of in\ufb01nitely-many Gaussian RVs; in fact, note from eq. (14)\nthat lim\u03b1\u21920 KF (\u03be, \u03be(cid:48)) = \u221e. Despite the lack of convergence for the posterior law of the spectrum\nwhen \u03b1 \u2192 0, let us only consider the point estimate as the posterior mean de\ufb01ned from eqs. (18) and\n(20) as\n\n\u2212j2\u03c0\u03bet \u2217\n\nE [Fc(\u03be)|y] =\n\nK(\u03be)e\n\n(21)\nObserve that we can indeed apply the limit \u03b1 \u2192 0 above, where the second argument of the convolu-\ntion converges to a (unit-norm) Dirac delta function. Additionally, let us consider an uninformative\nprior over the latent signal by choosing K(t, t) = I, which implies K(\u03be) = 1. Under these condi-\ntions (in\ufb01nitely-wide window and uninformative prior for temporal structure in the signal) the point\nestimate of the proposed model becomes the discrete-time Fourier transform.\n\nK(t, t)\n\n\u03b1\n\ne\n\n\u03b1\n\n\u22121y\n\n\u2212 \u03c02 \u03be2\n\n(cid:114) \u03c0\n\n(cid:19)\n\nlim\n\u03b1\u21920\n\nE [Fc(\u03be)|y] = e\n\n\u2212j2\u03c0\u03bety =\n\n\u2212j2\u03c0\u03betiy(ti).\n\ne\n\n(22)\n\nN(cid:88)\n\ni=1\n\nThis reveals the consistency of the model and offers a clear interpretation of the functional form in\neq. (21): the posterior mean of the local spectrum is a linear transformation of a whitened version of\nthe observations that depends on the width of the window and the prior belief over frequencies.\n\n4.3 Approximations for non-exponential covariances\n\nThough Sec. 3 provides explicit expressions of the posterior local-spectrum statistics for the spectral\nmixture kernel [26], the proposed method is independent of the stationary kernel considered. For\ngeneral kernels with known Fourier transform but for which the convolutions in eqs. (14) and\n(20) are intractable such as the Sinc, Laplace and Mat\u00e9rn kernels [34], we consider the following\napproximation for \u03b1 suf\ufb01ciently small\n\n(cid:114) \u03c0\n\n\u03b1 \u03c12(cid:17)(cid:12)(cid:12)(cid:12)(cid:12)\u03c1= \u03be+\u03be(cid:48)\n\n\u2212 2\u03c02\n\n\u2248\n\u2212j2\u03c0\u03bet\n\n2\n\n(cid:18) \u03be + \u03be(cid:48)\n\n(cid:19)\n\n2\n\n(23)\n\n\u2212 \u03c02\n\n2\u03b1 (\u03be\u2212\u03be\n\n(cid:48))2\n\ne\n\nK\n\n2\u03b1\n\n(cid:48)\n\n) =\n\n\u03c0\n\u03b1\n\n\u2212 \u03c02\ne\n\nKF (\u03be, \u03be\n\n2\u03b1 (\u03be\u2212\u03be\n\n(cid:48))2(cid:16)\n(cid:114) \u03c0\n\u2212j2\u03c0\u03bet \u2217\nWe did not approximate the term(cid:112) \u03c0\n\nK(\u03c1) \u2217 e\n\u2212 \u03c02 \u03be2\n\ne\n\n\u03b1\n\n\u03b1 \u2248 K(\u03be)e\n\nKycFc(t, \u03be) = K(\u03be)e\nwhere we approximated the second argument in both convolutions as a Dirac delta as in Sec. 4.2.\n(cid:48))2 in eq. (23) since placing a Dirac delta outside a\nconvolution will result on a degenerate covariance. We emphasise that this is an approximation for\nnumerical computation and not applying the limit \u03b1 \u2192 0, in which case BNSE does not converge.\n4.4 Proposed model as the limit of the Lomb-Scargle method\n\n2\u03b1 e\u2212 \u03c02\n\n2\u03b1 (\u03be\u2212\u03be\n\n(24)\n\nThe Lomb-Scargle method (LS) [8] is the de facto approach for estimating the spectrum of\nnonuniformly-sampled data. LS proceeds by \ufb01tting a set of sinusoids via least squares to the\nobservations and then reporting the estimated spectrum as the weights of the sinusoids. The proposed\nBNSE method is closely related to the LS method with clear differences: (i) we assume a probabilistic\nmodel (the GP) which allows for the spectrum to be stochastic, (ii) we assume a nonparametric model\nwhich expressiveness increases with the amount of data, (iii) BNSE is trained once and results in a\nfunctional form for Fc(\u03be), whereas LS needs to be retrained should new frequencies be considered,\n(iv) the functional form Fc(\u03be) allows for \ufb01nding periodicities via optimisation, while LS can only do\nso through exhaustive search and retraining in each step. In Section 2 of the supplementary material,\nwe show that the proposed BNSE model is the limit of the LS method when an in\ufb01nite number of\ncomponents is considered with a Gaussian prior over the weights.\n\n5 Simulations\n\nThis experimental section contains three parts focusing respectively on: (i) consistency of BNSE\nin the classical sum-of-sinusoids setting, (ii) robustness of BNSE to over\ufb01t and ability to handle\nnon-uniformly sampled noisy observations (heart-rate signal), and (iii) exploiting the functional form\nof the PSD estimate of BNSE to \ufb01nd periodicities (astronomical signal).\n\n7\n\n\f5.1\n\nIdentifying line spectra\n\nSignals composed by a sum of sinusoids have spectra given by Dirac delta functions (or vertical lines)\nreferred to as line spectra. We compared BNSE against classic line spectra models such as MUSIC\n[7], Lomb-Scargle [8] and the Periodogram [10]. We considered 240 evenly-sampled observations\nof the signal f (t) = 10 cos(2\u03c00.5t) \u2212 5 sin(2\u03c01.0t) in the domain t \u2208 [\u221210, 10] corrupted by\nzero-mean unit-variance Gaussian noise. The window parameter was set to \u03b1 = 1/(2 \u00b7 502) for\nan observation neighbourhood much wider than the support of the observations, and we chose an\nSM kernel with rather permissive hyperparameters: a rate \u03b3 = 1/(2 \u00b7 0.052) and \u03b8 = 0 for a prior\nover frequencies virtually uninformative. Fig. 4 shows the real and imaginary parts of the posterior\nlocal spectrum and the sample PSD against LS, MUSIC, and the Periodogram. Notice how BNSE\nrecovered the spectrum with tight error bars and appropriate relative magnitudes. Additionally, from\nthe PSD estimates notice how both BNSE and LS coincided with the periodogram and MUSIC at the\npeaks of the PSD. Finally, observe that in line with the structural similarities between BNSE and LS,\nthey both exhibit the same lobewidths and that LS falls within the errorbars of BNSE.\n\nFigure 4: Line spectrum estimates: BNSE is shown in red and its PSD is computed by \ufb01rst sampling\nform the real and imaginary parts of the posterior spectrum and then adding the square samples (LS:\nLomb-Scargle and pgram: periodogram).\n\ns\n\n5.2 Discriminating between heart-rate signals\n\nWe next considered two heart-rate signals from http://ecg.mit.edu/time-series/. The \ufb01rst\none is known to have frequency components at the respiration rate of the subject, whereas the second\none exhibits low-frequency energy which may be attributed to congestive heart failure [35]. To show\nthat the proposed method does not over\ufb01t to the spectrum of the training data, we used the \ufb01rst signal\nto train BNSE and then used BNSE to analyse the posterior PSD of the second signal. To make\nthe experiment more realistic, we only used an unevenly-sampled 10% of the data from the second\n(test) signal and considered the LS method with the entire (noiseless) signal as ground truth. Fig. 5\nshows the PSDs for both signals and methods. Observe that in both cases, BNSE\u2019s posterior PSD\ndistribution includes the ground truth (LS), even for the previously-unseen test signal. Crucially, this\nreveals that BNSE can be used for SE beyond the training data to \ufb01nd critical harmonic features from\nnoisy and limited observations.\n\nFigure 5: PSD estimates for heart-rate time series. Notice how BNSE recovered the spectral content\nof the test signal from only a few noisy measurements\n\n8\n\n0.000.250.500.751.001.251.50frequency 250255075100BNSE local spectrum (real)0.000.250.500.751.001.251.50frequency 402002040BNSE local spectrum (imag)0.000.250.500.751.001.251.50frequency 106105104103102101100BNSE power spectral density (red)LSpgramMUSIC0.000.010.020.030.040.05frequency 0.000.250.500.751.001.25Train set PSDBNSE (10% data)Lomb-Scargle (100% data)0.000.010.020.030.040.05frequency 0.000.250.500.751.001.25Test set PSDBNSE (10% data)Lomb-Scargle (100% data)\f5.3 Finding periodicities via ef\ufb01cient optimisation\n\nLastly, we considered the sunspots dataset, an astronomical time series that is known to have a\nperiod of approximately 11 years, corresponding to a fundamental frequency of 1/11 \u2248 0.089.\nFinding this period is challenging due to the nonstationarity of the signal. We implemented BNSE,\nLomb-Scargle and a GP with spectral mixture kernel [26] to \ufb01nd the fundamental frequency of the\nseries. Satisfactory training for Lomb-Scargle and the SM kernel was not possible via gradient-\nbased maximum likelihood (we used GP\ufb02ow [36]), even starting from the neighbourhood of the\ntrue frequency (0.089) or using minibatches. Our conjecture is that this is due to the fact that the\nsunspots series is neither strictly periodic nor Gaussian. We implemented BNSE with a lengthscale\nequal to one and \u03b8 = 0 for a broad prior over frequencies, and \u03b1 = 10\u22123 for a wide observation\nneighbourhood. Finally, the posterior mean of the PSD reported by BNSE was maximised using\nthe derivative-free Powell method [37] due to its non-convexity. Notice that optimising the PSD of\nBNSE with Powell has a computational cost that is linear in the number observations and dimensions,\nwhereas maximising SM via maximum likelihood has a cubic cost in the observations. Fig. 6 shows\nthe estimates of the PSD for BNSE and LS (recall that SM was not able to train) and their maxima,\nobserve how the global maximum of the PSD estimated by BNSE is the true period \u2248 0.089.\n\nFigure 6: Finding periodicities via optimisation. Left: sunspots data. Right: PSDs estimates reported\nby BNSE (red) and LS (blue) with corresponding maxima in vertical dashed lines. The correct\nfundamental frequency of the series is approximately 1/11 \u2248 0.089.\n\n6 Discussion\n\nWe have proposed a nonparametric model for spectral estimation (SE), termed BNSE, and have\nshown that it admits exact Bayesian inference. BNSE builds on a Gaussian process (GP) prior over\nsignals and its relationship to existing methods in the SE and GP literature has been illuminated\nfrom both theoretical and experimental perspectives. To the best of our knowledge, BNSE is the\n\ufb01rst nonparametric approach for SE where the representation of uncertainty related to missing and\nnoisy observations is inherent (due to its Bayesian nature). Another unique advantage of BNSE\nis a nonparametric functional form for the posterior power spectral density (PSD), meaning that\nperiodicities can be found through linear-cost optimisation of the PSD rather than by exhaustive search\nor expensive non-convex optimisation routines. We have shown illustrative examples and results\nusing time series and exponential kernels, however, the proposed BNSE is readily available to take\nfull advantage of GP theory to consider arbitrary kernels in multi-input, multi-output, nonstationary\nand even non-Gaussian applications. The promising theoretical results also open new avenues in\nmodern SE, this may include novel interpretations of Nyquist frequency, band-pass \ufb01ltering and\ntime-frequency analysis.\n\nAcknowledgments\n\nThis work was funded by the projects Conicyt-PIA #AFB170001 Center for Mathematical Modeling\nand Fondecyt-Iniciaci\u00f3n #11171165.\n\n9\n\n1700175018001850190019502000year050100150200number of sunspotssunspots data0.000.050.100.150.200.250.30frequency0.00.20.40.60.81.0Power spectral densityBNSELS\fReferences\n[1] R. Turner. Statistical Models for Natural Sounds. PhD thesis, Gatsby Computational Neuroscience Unit,\n\nUCL, 2010.\n\n[2] A. Cuevas, A. Veragua, S. Espa\u00f1ol-Jim\u00e9nez, G. Chiang, and F. Tobar. Unsupervised blue whale call\n\ndetection using multiple time-frequency features. In Proc. of CHILECON, pages 1\u20136. 2017.\n\n[3] P. Huijse, P. A. Estevez, P. Protopapas, P. Zegers, and J. C. Principe. An information theoretic algorithm for\n\ufb01nding periodicities in stellar light curves. IEEE Transactions on Signal Processing, 60(10):5135\u20135145,\n2012.\n\n[4] C. M. Ting, S. H. Salleh, Z. M. Zainuddin, and A. Bahar. Spectral estimation of nonstationary EEG\nusing particle \ufb01ltering with application to event-related desynchronization (ERD). IEEE Transactions on\nBiomedical Engineering, 58(2):321\u2013331, 2011.\n\n[5] A. Ahrabian, D. Looney, F. A. Tobar, J. Hallatt, and D. P. Mandic. Noise assisted multivariate empirical\nmode decomposition applied to Doppler radar data. In Proc. of the Sensor Signal Processing for Defence\n(SSPD), pages 1\u20134. 2012.\n\n[6] S. Kay. Modern Spectral Estimation: Theory and Application. Prentice Hall, 1988.\n\n[7] V. F. Pisarenko. The retrieval of harmonics from a covariance function. Geophysical Journal of the Royal\n\nAstronomical Society, 33(3):347\u2013366, 1973.\n\n[8] N. R. Lomb. Least-squares frequency analysis of unequally spaced data. Astrophysics and Space Science,\n\n39(2):447\u2013462, 1976.\n\n[9] G. Walker. On periodicity in series of related terms. Proc. of the Royal Society of London A: Mathematical,\n\nPhysical and Engineering Sciences, 131(818):518\u2013532, 1931.\n\n[10] A. Schuster. On the investigation of hidden periodicities with application to a supposed 26 day period of\n\nmeteorological phenomena. Terrestrial Magnetism, 3(1):13\u201341, 1898.\n\n[11] J. W. Cooley and J. W. Tukey. An algorithm for the machine calculation of complex Fourier series.\n\nMathematics of Computation, 19:297\u2013301, 1965.\n\n[12] E. T. Jaynes. Bayesian spectrum and chirp analysis. In Maximum-Entropy and Bayesian Spectral Analysis\n\nand Estimation Problems, pages 1\u201337. Springer, 1987.\n\n[13] G. L. Bretthorst. Bayesian Spectrum Analysis and Parameter Estimation. Lecture Notes in Statistics.\n\nSpringer, 1988.\n\n[14] P. C. Gregory. A Bayesian revolution in spectral analysis. AIP Conference Proceedings, 568(1):557\u2013568,\n\n2001.\n\n[15] P. M. Djuric and H.-T. Li. Bayesian spectrum estimation of harmonic signals. IEEE Signal Processing\n\nLetters, 2(11):213\u2013215, 1995.\n\n[16] R. Turner and M. Sahani. Time-frequency analysis as probabilistic inference. IEEE Transactions on Signal\n\nProcessing, 62(23):6171\u20136183, 2014.\n\n[17] Y. Qi, T. P. Minka, and R. W. Picard. Bayesian spectrum estimation of unevenly sampled nonstationary\n\ndata. In Proc. of ICASSP, volume 2, pages 1473\u20131476, 2002.\n\n[18] C. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2006.\n\n[19] Y. Wang, R. Khardon, and P. Protopapas. Nonparametric Bayesian estimation of periodic light curves. The\n\nAstrophysical Journal, 756(1):67, 2012.\n\n[20] N. Durrande, J. Hensman, M. Rattray, and N. D. Lawrence. Detecting periodicities with Gaussian processes.\n\nPeerJ Computer Science, 2:e50, 2016.\n\n[21] N. Choudhuri, S. Ghosal, and A. Roy. Bayesian estimation of the spectral density of a time series. Journal\n\nof the American Statistical Association, 99(468):1050\u20131059, 2004.\n\n[22] F. Tobar, T. Bui, and R. Turner. Learning stationary time series using Gaussian processes with nonparametric\nkernels. In Advances in Neural Information Processing Systems 28, pages 3501\u20133509. Curran Associates,\nInc., 2015.\n\n10\n\n\f[23] F. Tobar, T. Bui, and R. Turner. Design of covariance functions using inter-domain inducing variables. In\n\nNIPS 2015 - Time Series Workshop, 2015.\n\n[24] S. Bochner, M. Tenenbaum, and H. Pollard. Lectures on Fourier Integrals. Princeton University Press,\n\n1959.\n\n[25] M. L\u00e1zaro-Gredilla, J. Qui\u00f1onero Candela, C. E. Rasmussen, and A. R. Figueiras-Vidal. Sparse spectrum\n\nGaussian process regression. Journal of Machine Learning Research, 11(Jun):1865\u20131881, 2010.\n\n[26] A. G. Wilson and R. P. Adams. Gaussian process kernels for pattern discovery and extrapolation. In Proc.\n\nof ICML, pages 1067\u20131075, 2013.\n\n[27] K. R. Ulrich, D. E. Carlson, K. Dzirasa, and L. Carin. GP kernels for cross-spectrum analysis. In Advances\n\nin Neural Information Processing Systems 28, pages 1999\u20132007. Curran Associates, Inc., 2015.\n\n[28] G. Parra and F. Tobar. Spectral mixture kernels for multi-output Gaussian processes. In Advances in Neural\n\nInformation Processing Systems 30, pages 6681\u20136690. Curran Associates, Inc., 2017.\n\n[29] J. Hensman, N. Durrande, and A. Solin. Variational fourier features for Gaussian processes. Journal of\n\nMachine Learning Research, 18(151):1\u201352, 2018.\n\n[30] F. Tobar and R. Turner. Modelling of complex signals using Gaussian processes. In Proc. of IEEE ICASSP,\n\npages 2209\u20132213, 2015.\n\n[31] R. Boloix-Tortosa, J. J. Murillo-Fuentes, F. J. Pay\u00e1n-Somet, and F. P\u00e9rez-Cruz. Complex Gaussian\nprocesses for regression. IEEE Transactions on Neural Networks and Learning Systems, 29(11):5499\u2013\n5511, 2018.\n\n[32] H. L. Royden and P. Fitzpatrick. Real analysis, volume 2. Macmillan, 1968.\n\n[33] D. P. Mandic and S. L. Goh. Complex Valued Nonlinear Adaptive Filters: Noncircularity, Widely Linear\n\nand Neural Models. John Wiley & Sons, 2009.\n\n[34] D. Duvenaud. Automatic model construction with Gaussian processes. PhD thesis, University of Cambridge,\n\n2014.\n\n[35] A. L. Goldberger and D. R. Rigney. Theory of Heart: Biomechanics, Biophysics, and Nonlinear Dynamics\nof Cardiac Function, chapter Nonlinear dynamics at the bedside, pages 583\u2013605. Springer-Verlag, 1991.\n\n[36] A. G. de G. Matthews, M. van der Wilk, T. Nickson, K. Fujii, A. Boukouvalas, P. Le\u00f3n-Villagr\u00e1, Z. Ghahra-\nmani, and J. Hensman. GP\ufb02ow: A Gaussian process library using TensorFlow. Journal of Machine\nLearning Research, 18(40):1\u20136, 2017.\n\n[37] M. J. D. Powell. An ef\ufb01cient method for \ufb01nding the minimum of a function of several variables without\n\ncalculating derivatives. The Computer Journal, 7(2):155\u2013162, 1964.\n\n11\n\n\f", "award": [], "sourceid": 6512, "authors": [{"given_name": "Felipe", "family_name": "Tobar", "institution": "Universidad de Chile"}]}