{"title": "Bayesian Modelling of fMRI lime Series", "book": "Advances in Neural Information Processing Systems", "page_first": 754, "page_last": 760, "abstract": null, "full_text": "Bayesian modelling of tMRI time series \n\nPedro A. d. F. R. H~jen-S~rensen, Lars K. Hansen and Carl Edward Rasmussen \n\nDepartment of Mathematical Modelling, Building 321 \n\nTechnical University of Denmark \n\nDK-2800 Lyngby, Denmark \n\nphs,lkhansen,carl@imrn.dtu.dk \n\nAbstract \n\nWe present a Hidden Markov Model (HMM) for inferring the hidden \npsychological state (or neural activity) during single trial tMRI activa(cid:173)\ntion experiments with blocked task paradigms. Inference is based on \nBayesian methodology, using a combination of analytical and a variety \nof Markov Chain Monte Carlo (MCMC) sampling techniques. The ad(cid:173)\nvantage of this method is that detection of short time learning effects be(cid:173)\ntween repeated trials is possible since inference is based only on single \ntrial experiments. \n\n1 Introduction \n\nFunctional magnetic resonance imaging (tMRI) is a non-invasive technique that enables \nindirect measures of neuronal activity in the working human brain. The most common \ntMRI technique is based on an image contrast induced by temporal shifts in the relative \nconcentration of oxyhemoglobin and deoxyhemoglobin (BOLD contrast). Since neuronal \nactivation leads to an increased blood flow, the so-called hemodynamic response, the mea(cid:173)\nsured tMRI signal reflects neuronal activity. Hence, when analyzing the BOLD signal \nthere are two unknown factors to consider; the task dependent neuronal activation and the \nhemodynamic response. Bandettini et al. [1993] analyzed the correlation between a bi(cid:173)\nnary reference function (representing the stimulus/task sequence) and the BOLD signal. \nIn the following we will also make reference to the binary representation of the task as \nthe paradigm. Lange and Zeger [] 997] discuss a parameterized hemodynamic response \nadapted by a least squares procedure. Mu]tivariate strategies have been pursued in [Wors(cid:173)\nley et al. 1997, Hansen et al. ] 999]. Several explorative strategies have been proposed \nfor finding spatio-tempora] activation patterns without explicit reference to the activation \nparadigm. McKeown et al. [1998] used independent component analysis and found sev(cid:173)\neral types of activations including components with \"transient task related\" response, i.e., \nresponses that could not simply be accounted for by the paradigm. The model presented \nin this paper draws on the experimental observation that the basic coupling between the \nnet neural activity and hemodynamic response is roughly linear while the relation between \nneuronal response and stimulus/task parameters is often nonlinear [Dale ]997]. We will \nrepresent the neuronal activity (integrated over the voxel and sampling time interval) by a \nbinary signal while we will represent the hemodynamic response as a linear filter of un(cid:173)\nknown form and temporal extent. \n\n\fBayesian Modelling ofjMRJ Time Series \n\n755 \n\n2 A Bayesian model of fMRI time series \n\nLet S = {St : t = 0, ... , T - 1} be a hidden sequence of binary state variables St E \n{O, 1}, representing the state of a single voxel over time; the time variable, t, indexes the \nsequence of tMRI scans. Hence, St is a binary representation of the neural state. The \nhidden sequence is governed by a symmetric first order Hidden Markov Model (HMM) \nwith transition probability a = P(St+1 = jiSt = j). We expect the activation to mimic \nthe blocked structure of the experimental paradigm so for this reason we restrict a to be \nlarger than one half. The predicted signal (noiseless signal) is given by Yt = h*s+()o +()I t, \nwhere * denotes the linear convolution and h is the impulse response of a linear system of \norder M f . The dc off-set and linear trend which are typically seen in tMRI time series \nare given by ()o and ()b respectively. Finally, it is assumed that the observable is given by \nZt = Yt + Ct, where Ct is iid. Gaussian noise with variance \u00a37;. The generative model \nconsidered is therefore given by: \n\np(StISt-l , a) \np(zls,\u00a37n ,(),Mf ) \n\n'\" \n\na88t ,8t_1 + (1- a)(l- 88t , 8t_1)' \nN(y,\u00a37~I), where Y = {yt} = H8()' andz = {zt}. \n\nFurthermore, 88t ,8t - 1 is the usual Kronecker delta and H8 = [1,~, 'Yos, 'YIS, ... , 'YM/ -ISJ, \nwhere 1 = (1, .. . 1)', ~=(l, .. . ,T)'IT and 'Yi is a i-step shift operator, that is 'YiS = \n(O , .. . , O, SO,SI, ... ,ST-I-i)'. The linear parameters are collected in () = (()o'(h , h)'. \n~ \n\ni \nSo \n\nSI \n\nS2 \n\nS3 \n\nST_I \n\n... ~ \nZ, ': \\:,1 \n.. \\0 \n\nXT_I \n\nXo \n\nXI \n\nx 2 \n\nX3 \n\nThe graphical model. The hidden states \nX t = (St - l, St-2,\" ., St-(M/-l)) have \nbeen introduced to make the model first or-\nder. \n\n3 Analytic integration and Monte Carlo sampling \n\nIn this section we introduce priors over the model parameters and show how inference may \nbe performed. The filter coefficients and noise parameters may be handled analytically, \nwhereas the remaining parameters are treated using sampling procedures (a combination \nof Gibbs and Metropolis sampling). Like in the previous section explicit reference to the \nfilter order l'vlf may be omitted to ease the notation. \nThe dc off-set ()o and the linear trend ()1 are given (improper) uniform priors. The filter \ncoefficients are given priors that are uniform on an interval of length (3 independently for \neach coefficient: \n\nIhil < ~, \n\nfor \notherwise \n\nAssuming that all the values of () for which the associated likelihood has non-vanishing \ncontributions lie inside the box where the prior for () has support, we may integrate out the \nfilter coefficients via a Gaussian integral: \n\n\f756 \n\nP. A. d. F R. Hejen-Serensen, L. K. Hansen and C. E. Rasmussen \n\nWe have here defined the mean fiLter, 8s = (H;Hs)-l Hsz and mean predicted signaL, \nYs = Hils, for given state and filter length. We set the interval-length, (3 to be 4 times the \nstandard deviation of the observed signal z. This is done, since the response from the filter \nshould be able to model the signal, for which it is thought to need an interval of plus/minus \ntwo standard deviations. \n\nWe now proceed to integrate over the noise parameter; using the (improper) non(cid:173)\ninformative Jeffreys prior, P(O'n) ex 0'~1, we get a Gamma integral: \n\np(zls , Mf) = \n\np(zIO'n, S, Mf )P(O'n)dO'n = -2r ( \n\n1 T - M f \n\n2 \n\n- 1) \n\n! \n\n(ll'(ZIZ - Y~Ys)) 2 +1 \n\nM[-T \n\n/ \n\n(3M[ V IH~Hsl \n\nThe remaining variables cannot be handled analytically, and will be treated using various \nforms of sampling as described in the following sections. \n\n3.1 Gibbs and Metropolis updates of the state sequence \n\nWe use a flat prior on the states, p(St = 0) = p(St = 1), together with the first order \nMarkov property for the hidden states and Bayes' rule to get the conditional posterior for \nthe individual states: \n\np(St = jls\\st, a, Mf) \n\nex \n\np(St = jISt-1, a)p(st+llst = j , a)p(zls, Mf)\u00b7 \n\nThese probabilities may (in normalized form) be used to implement Gibbs updates for the \nhidden state variables, updating one variable at a time and sweeping through all variables. \nHowever, it turns out that there are significant correlations between states which makes it \ndifficult for the Markov Chain to move around in the hidden state-space using only Gibbs \nsampling (where a single state is updated at a time). To improve the situation we also \nperform global state updates, consisting of proposing to move the entire state sequence \none step forward or backward (the direction being chosen at random) and accepting the \nproposed state using the Metropolis acceptance procedure. The proposed movements are \nmade using periodic boundary conditions. The Gibbs sweep is computationally involved, \nsince it requires computation of several matrix expressions for every state-variable. \n\n3.2 Adaptive Rejection Sampling for the transition probability \n\nThe likelihood for the transition probability a is derived from the Hidden Markov Model: \n\np(sla) = p(so) II p(stlst-t,a) = 2'aE(s)(l- af- 1-E(s) , \n\nT-1 \n\n1 \n\nt=l \n\nwhere E(s) = L,;=~1 6S , , 8'_1 is the number of neighboring states in S with identical values. \nThe prior on the transition probabilities is uniform, but restricted to be larger than one half, \nsince we expect the activation to mimic the blocked structure of the experimental paradigm. \nIt is readily seen that p(als) ex p(sla), a E [~, 1] is log-concave. Hence, we may use \nthe Adaptive Rejection Sampling algorithm [Gilks and Wild, t 992] to sample from the \ndistribution for the transition probability. \n\n3.3 Metropolis updates for the filter length \n\nIn practical applications using real tMRI data, we do typically not know the necessary \nlength of the filter. The problem of finding the \"right\" model order is difficult and has re(cid:173)\nceived a lot of attention. Here, we let the Markov Chain sample over different filter lengths, \neffectively integrating out the filter-length rather than trying to optimize it. Although the \n\n\fBayesian Modelling offMRl Time Series \n\n757 \n\nvalue of Mf determines the dimensionality of the parameter space, we do not need to use \nspecialized sampling methodology (such as Reversible Jump MCMC [Green, 1995]), since \nthose parameters are handled analytically in our model. We put a flat (improper) prior on \nM f and propose new filter lengths using a Gaussian proposal centered on the current value, \nwith a standard deviation of 3 (non-positive proposed orders are rejected). This choice of \nthe standard deviation only effects the mixing rate of the Markov chain and does not have \nany influence on the stationary distribution. The proposed values are accepted using the \nMetropolis algorithm, usingp(Mfls, y) ex: p(yls, Mf). \n\n3.4 The posterior mean and uncertainty of the predicted signal \n\nSince () has a flat prior the conditional probability for the filter coefficients is proportional \nto the likelihood p(zl(),') and by (*) we get: \n\np(()lz, s, an, Mf) '\" N(Dsz, a;DsD~), Ds = (H~Hs)-l H~ . \n\nThe posterior mean of the predicted signal, y, is then readily computed as: \n\nY = (y(),un,s,M,)() ~ s M = (Ys)s Mj = (HiJs)s,M, = (Fs)s M,z, \n\n'Vn\" f ' \n\n, \n\nwhere Fs = HsDs. Here, the average over () and an is done analytically, and the average \nover the state and filter length is done using Monte Carlo. The uncertainty in the posterior, \ncan also be estimated partly by analytical averaging, and partly Monte Carlo: \n\n~ - \u00ab( \n\ny -\n\nY() ,un,s,M, Y Y(),un,s,M, \n\n- ')( \n\n- ')') \n\nY \n\n= T _ M _ 2 \n\n((' \n\nz z - YsYs \n\n\" , )F F') \n\ns s \n\n1 \nf \n\n() ~ s M \n,Vn, , \nJ \n+ \n\n(F \n\n8,M, \n\nsZz \n\n'F') \n\ns \n\n\" , \n\n- YY . \n\ns,M, \n\n4 Example: synthetic data \n\nIn order to test the model, we first present some results on a synthetic data set. A signal \nz of length 100 is generated using a Mf = 10 order filter, and a hidden state sequence s \nconsisting of two activation bursts (indicated by dotted bars in figure I top left). In this \nexample, the hidden sequence is actually not generated from the generative model (*); \nhowever, it still exhibits the kind of block structure that we wish to be able to recover. \nThe model is run for 10000 iterations, which is sufficient to generate 500 approximately \nindependent samples from the posterior; figure 2 (right) shows the auto-covariance for \nMf as a function of the iteration lag. It is thought that changes in Mf are indicative of \ncorrelation time of the overall system. \n\nThe correlation plot for the hidden states (figure 2, left) shows that the state activation onset \ncorrelates strongly with the second onset and negatively with the end of the activation (and \nvice versa). This indicates that the Metropolis updates described in section 3.] may indeed \nbe effective. Notice also that the very strong correlation among state variables does not \nstrongly carryover to the predicted signal (figure 1, bottom right). \n\nTo verify that the model can reasonably recover the parameters used to generate the data, \nposterior samples from some of the model variables are shown in figure 3. For all these \nparameters the posterior density is large around the correct values. Notice, that there in the \noriginal model (*) is an indeterminacy in the simultaneous inference of the state sequence \nand the filter parameters (but no indeterminacy in the predicted signal); for example, the \nsame signal is predicted by shifting the state sequence backward in time and introducing \nleading zero filter coefficients. However, the Bayesian methodology breaks this symmetry \nby penalizing complex models. \n\n\f758 \n\nN \n\nCI:S 4 \nc: \n0> \n'(j) \n-0 2 \nQ.) \n\"-\n:J \n(/) \n~ 0 \n~ \n\nP. A. d. F. R. Hojen-Sorensen, L. K. Hansen and C. E. Rasmussen \n\no \n\n_2L---~--~----~--~--~ \n100 \n\n80 \n\n60 \n\n20 \n\n40 \n\no \n\nScan number, t \n\n6~--~_~, --~-------------~ \n\n: \"\" , .. ~-\n\n....... \n-\n\n'.-\" \n\n. ....... \n\n_2L---~--~----~--~--~ \n100 \n\n20 \n\n80 \n\n60 \n\n40 \n\no \n\nScan number, t \n\n-2~--~~~--~~~--~ \n100 \n\n80 \n\n60 \n\n20 \n\n40 \n\nScan number, t \n\n..... -20 \n.: \nQ.) \n-0 40 \nE \n:J c: 60 \nc: \nCI:S \n(.) en 80 \n\n100 \n\n40 \n\n20 \n80 \nScan number, t \nJ \n\n60 \n\n0.8 \n\n0.6 \n\n0.4 \n\n0 .2 \n\no \n\n-0.2 \n\n100 \n\nFigure 1: Experiments with synthetic data. Top left, the measured response from a voxel \nis plotted for 100 consecutive scans. In the bottom left, the underlying signal is seen in \nthin, together with the posterior mean, fj (thick), and two std. dev. error-bars in dotted. Top \nright, the posterior probabilities are shown as a grey level, for each scan. The true activated \ninstances are indicated by the dotted bars and the pseudo MAP estimate of the activation \nsequence is given by the crossed bars. Bottom right, shows the posterior uncertainty ~Y' \n\nThe posterior mean and the two standard deviations are plotted in figure 1 bottom left. No(cid:173)\ntice, however, that the distribution of y is not Gaussian, but rather a mixture of Gaussians, \nand is not necessarily well characterized by mean and variance alone. In figure] (top left), \nthe distribution of Yt is visualized using grey-scale to represent density. \n\n5 Simulations on real fMRI data and discussion \n\nIn figure 4 the model has been applied to two measurements in the same voxel in visual \ncortex. The fMRI scans were acquired every 330 ms. The experimental paradigm consisted \nof 30 scans of rest followed by 30 scans of activation and 40 rest. Visual activation con(cid:173)\nsisted of a flashing (8 Hz) annular checkerboard pattern. The model readily identifies the \nactivation burst of somewhat longer duration than the visual stimulus and delayed around \n2 seconds. The delay is in part caused by the delay in the hemodynamic response. \n\nThese results show that the integration procedure works in spite of the very limited data \nat hand. In figure 4 (top) the posterior model size suggests that (at least) two competing \nmodels can explain the signal from this trial. One of these models explains the measured \nsignal as a simple square wave function which seems reasonable by considering the signal. \nConversely, figure 4 (bottom), suggests that the signal from the second trial can not be \nexplained by a simple model. This too, seems reasonable because of the long signal raise \ninterval suggested in the signal. \n\n\fBayesian Modelling o!iMRl Time Series \n\n759 \n\n(!) \n\n'\" * liS \n\nc \n'\" u \nu \nJ: \n\nHidden state variables. s \n\nOS\" \nlag \n\nFigure 2: The covariance of the hidden states based on a long run of the model is shown to \nthe left. Notice, that the states around the front (back) of the activity \"bumps\" are highly \n(anti-) correlated. Right The auto-covariance for the filter length M f as a function of the \nlag time in iterations. The correlation length is about 20, computed as the sum of auto(cid:173)\ncovariance coefficients from lag -400 to 400. \n\nSince the posterior distribution of the filter length is very broad it is questionable whether \nan optimization based procedure such as maximum likelihood estimation would be able \nto make useful inference in this case were data is very limited. Also, it is not obvious \nhow one may use cross-validation in this setting. One might expect such optimization \nbased strategies to get trapped in suboptimal solutions. This, of course, remains to be \ninvestigated. \n\n6 Conclusion \n\nWe have presented a model for voxel based explorative data analysis of single trial fMRI \nsignals during blocked task activation studies. The model is founded on the experimental \nobservation that the basic coupling between the net neural activity and hemodynamic re(cid:173)\nsponse is roughly linear. The preliminary investigation reported here are encouraging in \nthat the model reliably detects reasonable hidden states from the very noisy fMRI data. \n\nOne drawback of this method is that the Gibbs sampling step is computational expensive. \nTo improve on this step one could make use of the large class of variational/mean field \nmethods known from the graphical models literature. Finally, current work is in progress \nfor generalizing the model to multiple voxels, including spatial correlation due to e.g. spill(cid:173)\nover effects. \n\n0.15 \n\n0.1 \n\n0.05 \n\n0.15 \n\n0.1 \n\n0.05 \n\na \n\n0.8 \n\n1.2 \n\ncr \n\na \n\n1.5 \n\n2 \n\n2.5 \nDC off-set \n\n0.15 \n\n0.1 \n\n0.05 A \n\na \n-2 \n\n-1 \n\nTrend \n\n0.3 \n\n0.2 \n\n0.1 \n\nrL \n\na \n\n10 \n\n5 \nMf \n\nFigure 3: Posterior distributions of various model parameters. The parameters used to \ngenerate the data are: a = 1.0, DC off-set = 2, trend = -1 and filter order M f = 10. \n\n\f760 \n\nP. A . d. F R. HfJjen-SfJrensen, L. K. Hansen and C. E. Rasmussen \n\n320~--------------------~---. \n\n::;r ~nnnen~~~~~nn_ I \n\n20 \n\n14 \n\n18 \n\n12 \n\n16 \ncr \n\n180~~----~~~~~~~~--~ \n\n20 \n\n40 \n\n60 \n\neo \n\n100 \n\no \n\nScan number, t \n\n280 . - - - - - - - - -__ - - - - . - - -____ - - - , \n\n260 \n\n180 \n\n::~r _~cOmmm[[n~~ 1 \n\n10 \n\n16 \n\n14 \n\n12 \n\ncr \n\n180~------==~~~~--------~ \n\neo \n\n80 \n\n20 \n\n40 \n\no \n\nScan number. t \n\nFigure 4: Analysis of two experimental trials of the same voxel in visual cortex. The left \nhand plot shows the posterior inferred signal distribution superimposed by the measured \nsignal. The dotted bar indicates the experimental paradigm and the crossed bar indicates \nthe pseudo MAP estimate of the neural activity. To the right the posterior noise level and \ninferred filter length are displayed. \n\nAcknowledgments \nThanks to Egill Rostrup for providing the fMRI data. This work is funded by the Danish \nResearch Councils through the Computational Neural Network Center (CONNECT) and \nthe THOR Center for Neuroinformatics. \n\nReferences \nBandettini, P. A. (1993). Processing strategies for time-course data sets in functional MRI of the \nhuman brain Magnetic Resonance in Medicine 30, 161-173. \n\nDale, A. M. and R. L. Buckner (1997). Selective Averaging of Individual Trials Using fMRI. Neu(cid:173)\nro/mage 5, Abstract S47. \n\nGreen, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model \ndetermination. Biometrika 82, 711-732. \n\nGilks, W. R. and P. Wild (1992). Adaptive rejection sampling for Gibbs sampling. Applied Statis(cid:173)\ntics 41, 337-348. \n\nHansen, L. K. et al. (1999). Generalizable Patterns in Neuroimaging: How Many Principal Compo(cid:173)\nnents? Neuro/mage, to appear. \n\nLange, N. and S. L. Zeger (1997). Non-linear Fourier time series analysis for human brain mapping \nby functional magnetic resonance imaging. Journal of the Royal Statistical Society - Series C Applied \nStatistics 46, 1-30. \n\nMcKeown, M. J. et al. (1998). Spatially independent activity patterns in functional magnetic reso(cid:173)\nnance imaging data during the stroop color-naming task. Proc. Nat!. Acad. Sci. USA. 95,803-810. \n\nWorsley, K. J. et al. (1997). Characterizing the Response of PET and fMRI Data Using Multivariate \nLinear Models (MLM). Neuro/mage 6, 305-319. \n\n\f", "award": [], "sourceid": 1637, "authors": [{"given_name": "Pedro", "family_name": "H\u00f8jen-S\u00f8rensen", "institution": null}, {"given_name": "Lars", "family_name": "Hansen", "institution": null}, {"given_name": "Carl", "family_name": "Rasmussen", "institution": null}]}