{"title": "Poisson Process Jumping between an Unknown Number of Rates: Application to Neural Spike Data", "book": "Advances in Neural Information Processing Systems", "page_first": 730, "page_last": 738, "abstract": "We introduce a model where the rate of an inhomogeneous Poisson process is modified by a Chinese restaurant process. Applying a MCMC sampler to this model allows us to do posterior Bayesian inference about the number of states in Poisson-like data. Our sampler is shown to get accurate results for synthetic data and we apply it to V1 neuron spike data to find discrete firing rate states depending on the orientation of a stimulus.", "full_text": "Poisson Process Jumping between an Unknown\n\nNumber of Rates: Application to Neural Spike Data\n\nFlorian Stimberg\n\nComputer Science, TU Berlin\n\nAndreas Ruttor\n\nComputer Science, TU Berlin\n\nFlorian.Stimberg@tu-berlin.de\n\nAndreas.Ruttor@tu-berlin.de\n\nManfred Opper\n\nComputer Science, TU Berlin\n\nManfred.Opper@tu-berlin.de\n\nAbstract\n\nWe introduce a model where the rate of an inhomogeneous Poisson process is\nmodi\ufb01ed by a Chinese restaurant process. Applying a MCMC sampler to this\nmodel allows us to do posterior Bayesian inference about the number of states in\nPoisson-like data. Our sampler is shown to get accurate results for synthetic data\nand we apply it to V1 neuron spike data to \ufb01nd discrete \ufb01ring rate states depending\non the orientation of a stimulus.\n\n1\n\nIntroduction\n\nEvent time data is often modeled as an inhomogeneous Poisson process, whose rate \u03bb(t) as a func-\ntion of time t has to be learned from the data. Poisson processes have been used to model a wide\nvariety of data, ranging from network traf\ufb01c [25] to photon emission data [12]. Although neuronal\nspikes are in general not perfectly modeled by a Poisson process [17], there has been extensive work\nbased on the simpli\ufb01ed Poisson assumption [e.g. 19, 20]. Prior assumptions about the rate process\nstrongly in\ufb02uence the result of inference. Some models assume that the rate \u03bb(t) changes contin-\nuously [1, 7, 22], but for certain applications it is more useful to model it as a piecewise constant\nfunction of time, which switches between a \ufb01nite number of distinct states. Such an assumption\ncould be of interest, when one tries to relate the change of the rate to sudden changes of certain\nexternal experimental conditions, e.g. changes of neural spike activity when external stimuli are\nswitched.\n\nAn example for a discrete state rate process is the Markov modulated Poisson process (MMPP)\n[10, 18], where changes between the states of the rate follow a continuous time Markov jump pro-\ncess (MJP). For the MMPP one has to specify the number of states beforehand and it is often not\nclear how this number should be chosen. Comparing models with different numbers of states by\ncomputing Bayes factors can be cumbersome and time consuming. On the other hand, nonparamet-\nric Bayesian methods for models with an unknown number of model parameters based on Dirichlet\nor Chinese restaurant processes have been highly popular in recent years [e.g. 24, 26].\n\nHowever\u2014to our knowledge\u2014such an idea has not yet been applied to the conceptually simpler\nPoisson process scenario. In this paper, we present a computationally ef\ufb01cient MCMC approach to\nthis model, which utilizes its feature that given the jump process the observed Poisson events are\nindependent. This property makes computing the data likelihood very fast in each iteration of our\nsampler and leads to a highly ef\ufb01cient estimation of the rate. This allows us to apply our sampler to\nlarge data sets.\n\n1\n\n\ff\n\np\u03bb\n\n\u03b1\n\nc\n\n\u03c0\n\n\u03c4i\n\nc\n\n\u03bbi\n\ns\n\n\u03bb[0 : T ]\n\nY\n\nFigure 1: Generative model.\n\n2 Model\n\nWe assume that the data comes from an inhomogeneous Poisson process, which has rate \u03bb(t) at time\nt. In our model \u03bb(t) is a latent, piecewise constant process. The likelihood of the data given a path\n\u03bb(0:T ) with s distinct states then becomes [8]\n\ns\n\nP (Y|\u03bb(0:T )) \u221d\n\n\u03bbni\ni e\u2212\u03c4i\u03bbi ,\n\n(1)\n\nYi=1\n\nwhere \u03c4i is the overall time spent in state i de\ufb01ned as \u03bb(t) = \u03bbi and ni is the number of Poisson\nevents in the data Y, while the system is in this state. A trajectory of \u03bb(0:T ) is generated by drawing\nc jump times from a Poisson process with rate f . This means \u03bb(0:T ) is separated in c + 1 segments\nduring which it remains in one state \u03bbi. To deal with an unknown number of discrete states and\ntheir unknown probability \u03c0 of being visited, we assume that the distribution \u03c0 is drawn from a\nDirichlet process with concentration parameter \u03b1 and base distribution p\u03bb. By integrating out \u03c0 we\nget a Chinese restaurant process (CRP) with the same parameters as the Dirichlet process. For a\nderivation of this result see [27].\n\nLet us assume we already have i segments and draw the next jump time from an exponential distribu-\ntion with rate f . The next segment gets a new \u03bb-value sampled from p\u03bb with probability \u03b1/(\u03b1 + i),\notherwise one of the previous segments is chosen with equal probability and its \u03bb-value is also used\nfor the new segment. This leads to the following prior probability of a path \u03bb(0:T ):\n\nP (\u03bb(0:T )|f, \u03b1, p\u03bb) \u221d f ce\u2212f T \u03b1sQs\n\nj=1 (p\u03bb(\u03bbj)(#j \u2212 1)!)\n\n,\n\n(2)\n\ni=0(\u03b1 + i)\n\nQc\n\nwhere s is the number of distinct values of \u03bb. To summarize, we have f as the rate of jumps, p\u03bb as\na prior distribution over the values of \u03bb, #j as the number of segments assigned to state j, and \u03b1 as\na hyperparameter which determines how likely a jump will lead to a completely new value for \u03bb. If\nthere are c jumps in the path \u03bb(0:T ), then a priori the expected number of distinct \u03bb-values is [28]\n\nWe choose a gamma distribution for p\u03bb with shape a and scale b,\n\nE[s|c] =\n\n\u03b1\n\n\u03b1 + i \u2212 1\n\n.\n\nc+1\n\nXi=1\n\nwhich is conjugate to the likelihood (2). The generative model is visualized in \ufb01gure 1.\n\np\u03bb(\u03bb) = Gamma(\u03bb; a, b) \u221d \u03bba\u22121e\u2212\u03bb/b,\n\n(3)\n\n(4)\n\n3 MCMC Sampler\n\nWe use a Metropolis-within-Gibbs sampler with two main steps: First, we change the path of the\nChinese restaurant process conditioned on the current parameters with a Metropolis Hastings ran-\ndom walk. In the seconds step, the time of the jumps and the states are held \ufb01xed, and we directly\nsample the \u03bb-values and f from their conditional posteriors.\n\n2\n\n\f3.1 Random Walk on the many-state Markov jump process\n\nTo generate a proposal path \u03bb\u2217\n(0:T ) (for the remainder of this paper \u2217 will always denote a variable\nconcerning the proposal path) we manipulate the current path \u03bb(0:T ) by one of the following actions:\nshifting one of the jumps in time, adding a jump, removing one of the existing jumps, switching the\nstate of a segment, joining two states, or dividing one state into two. This is similar to the birth-death\napproach, which has been used before for other types of MJPs [e.g. 5].\n\nWe shift a jump by drawing the new time from a Gaussian distribution centered at the current time\nwith standard deviation \u03c3t and truncated at the neighboring jumps. \u03c3t is a parameter of the sampler,\nwhich we chose by hand and which should be in the same scale as the typical time between Poisson\nevents. If in doubt, a high value should be chosen, so that the truncated distribution becomes more\nuniform.\n\nWhen adding a jump the time of the new jump is drawn from a uniform distribution over the whole\ntime interval. With probability qn a new value of \u03bb is added, otherwise we reuse an old one. The\nparameter qn was chosen by hand to be 0.1, which worked well for all data sets we tested the sampler\non.\n\nTo remove a jump we choose one of the jumps with equal probability.\n\nSwitching the state of a segment is done by choosing one of the segments at random and either\nassigning it to an existing value or introducing a value which was not used before, again with prob-\nability qn.\n\nWhen adding a new value of \u03bb, both when adding a jump or when switching the state of a segment,\nwe draw it from the conditional density\n\nP (\u03bb\u2217\n\ns+1|Y, \u03bb(0:T )) \u221d Gamma(\u03bb\u2217\n\ns+1; a, b) Gamma(\u03bb\u2217\n\ns+1; ns+1 + 1, 1/\u03c4s+1)\n\n\u221d Gamma(cid:0)\u03bb\u2217\n\ns+1; a + ns+1, b/(\u03c4s+1b + 1)(cid:1) .\n\n(5)\n\nIf we instead reuse an already existing \u03bb, we choose which state to use by drawing it from a discrete\ndistribution with probabilities proportional to (5), but with n and \u03c4 being the number of Poisson\nevents and the time in this segment, respectively.\n\nChanging the number of states through adding and removing jumps or switching the states of seg-\nments is suf\ufb01cient to guarantee that the sampler converges to the posterior density. However, the\nsampler is very unlikely to reduce the number of states through these actions, if all states are used\nin multiple segments, so that convergence might take a very long time in this case. Therefore, we\nintroduce the option to join all segments assigned to a neighboring (when ordered by their \u03bb-values)\npair of states into one state. Here the geometrical mean \u03bb\u2217\nthe joined state.\n\nj =p\u03bbi1\u03bbi2 of both \u03bb-values is used for\n\nBecause we added the join action, we need an inverted action, which divides a state into two new\nones, in order to guarantee reversibility and therefore ful\ufb01ll detailed balance. The state to divide is\nrandomly chosen among the states which have at least two segments assigned to them. Then a small\nfactor \u01eb > 1 is drawn from a shifted exponential distribution and the \u03bb-value of the chosen state is\nj2 = \u03bbi/\u01eb of\nmultiplied with and divided by \u01eb, respectively, to get the \u03bb-values \u03bb\u2217\nthe two new states. The distribution over \u01eb is bounded, so that the new \u03bb-values are assured to be\nbetween the neighboring ones. After this, the segments of the old state are randomly assigned to the\ntwo new states with probability proportional to the data likelihood (1). If by the last segment only\none of the two states was chosen for all segments, the last segment is set to the other state. This\nmethod assures that every possible assignment (where both states are used) of the two states to the\nsegments of the old state can occur. Additionally, there is exactly one way for each assignment to\nbe drawn allowing a simple calculation of the Metropolis-Hastings acceptance probability for both\nthe join and the divide action. Figure 2 shows how these actions work on the path.\n\nj1 = \u03bbi \u01eb and \u03bb\u2217\n\nA proposed path \u03bb\u2217\n\n(0:T ) is accepted with probability\n\npMH = min 1,\n\nP (Y|\u03bb\u2217\n(0:T ))\nP (Y|\u03bb(0:T ))\n\nQ(\u03bb(0:T )|\u03bb\u2217\nQ(\u03bb\u2217\n\n(0:T ))\n(0:T )|\u03bb(0:T ))\n\n3\n\nP (\u03bb\u2217\n\n(0:T )|f, \u03b1, p\u03bb)\n\nP (\u03bb(0:T )|f, \u03b1, p\u03bb)! .\n\n(6)\n\n\fSwitch\n\nShift\n\nRemove\n\nAdd\n\nJoin\n\nDivide\n\nFigure 2: Example showing how the proposal actions modify the path of the Chinese restaurant\n\nprocess. The new path is drawn in dark blue, the old one in light blue.\n\nWhile the data likelihood ratio is the same for all proposal actions and follows from (1), the proposal\nand prior ratios\n\n\u03a8 =\n\nQ(\u03bb(0:T )|\u03bb\u2217\nQ(\u03bb\u2217\n\n(0:T ))\n(0:T )|\u03bb(0:T ))\n\n(0:T )|f, \u03b1, p\u03bb)\nP (\u03bb\u2217\nP (\u03bb(0:T )|f, \u03b1, p\u03bb)\n\n(7)\n\ndepend on the chosen proposal action. The acceptance probability for each action (provided in the\nsupplementary material) can be calculated based on its description and the probability of a path (2).\n\nBecause our proposal process is a simple random walk, the major contribution to the computation\ntime comes from calculating the data likelihood. Luckily, this can be done very ef\ufb01ciently, because\nwe only need to know how many Poisson events occur during the segments of \u03bb\u2217\n(0:T ) and \u03bb(0:T ),\nhow often the process changes state, and how much time it spends in each state. In order to avoid\niterating over all the data for each proposal, we compute the index of the next event in the data for\na \ufb01ne time grid before the sampler starts. This ensures that the computational time is linear in the\nnumber of jumps in \u03bb(0:T ), while the number of Poisson events in the data only introduces one-\ntime costs for calculating the grid, which are negligible in practice. Additionally, we only need to\ncompute the likelihood ratio over those segments which are changed in the proposal, because the\nunchanged parts cancel each other out.\n\n3.2 Sampling the parameters\n\nAs we use a gamma prior Gamma(\u03bbi; a, b) for each \u03bbi, it is easy to see from (1) that this leads to\ngamma posteriors\n\n(8)\nover \u03bbi. Thus a Gibbs sampling step is used to update each \u03bbi. As for the rate f of change points,\nif we assume a gamma prior for f \u223c Gamma(af , bf ), the posterior becomes a gamma distribution,\ntoo:\n\nGamma (\u03bbi; a + ni, b/(\u03c4ib + 1))\n\nGamma (f ; af + c, bf /(T bf + 1)) .\n\n(9)\n\n4 Experiments\n\nWe \ufb01rst validate our sampler on synthetic data sets, then we test our Chinese restaurant approach on\nneural spiking data from a cat\u2019s primary visual cortex.\n\n4.1 Synthetic Data\n\nWe sampled 100 data sets from the prior with f = 0.02 and \u03b1 = 3.0. Figure 3 compares the\ntrue values for the number of states and number of jumps with the posterior mean after 1.1 million\nsamples with the \ufb01rst 100, 000 dropped as burn-in. On average the sampler took around 25 seconds\nto generate the samples on an Intel Xeon CPU with 2.40 GHz.\n\nThe amounts of both jumps and states seem to be captured well, but for a large number of distinct\nstates the mean seems to underestimate the true value. This is not surprising, because the \u03bb parame-\nters are drawn from the same base distribution. For a large number of states the probability that two\n\n4\n\n\f10.0\n\ns\ne\na\n\nt\n\nt\ns\n \nf\n\no\n \nr\ne\nb\nm\nu\nn\n\n \n\nn\na\ne\nm\n\n \nr\no\ni\nr\ne\nt\ns\no\np\n\n7.5\n\n5.0\n\n2.5\n\n0.0\n\n0.0\n\n30\n\n20\n\n10\n\ns\np\nm\nu\n\nj\n \nf\n\no\n\n \nr\ne\nb\nm\nu\nn\nn\na\ne\nm\n\n \n\n \nr\no\ni\nr\ne\n\nt\ns\no\np\n\n10.0\n\n0\n\n0\n\n2.5\ntrue number of states\n\n5.0\n\n7.5\n\n30\n\n20\n\n10\n\n0\n\n250\n\n500\nt\n\n750\n\n1000\n\n0\n\n250\n\n500\nt\n\n750\n\n1000\n\n18\n\n14\n\n10\n\n30\n\n20\n\n10\n\n0\n\n20\n\n15\n\n10\n\n5\n\n30\n\n0\n\n250\n\n500\nt\n\n750\n\n1000\n\n0\n\n250\n\n500\nt\n\n750\n\n1000\n\n10\n\n20\n\ntrue number of jumps\n\nFigure 3: Posterior mean vs.\n\ntrue number of\nstates (left) and jumps (right) for 100\ndata sets drawn from the prior. The red\nline shows the identity function.\n\nFigure 4: Posterior of \u03bb over t for the \ufb01rst 4 toy\ndata sets. The black line is the true\npath, while the posterior mean is drawn\nas a dashed green lined surrounded by\na 95% con\ufb01dence interval.\n\n90\n\ne\nt\na\nr\n\n60\n\n30\n\n0\n\n\u00b0\n \nn\ni\n \nn\no\ni\nt\na\nt\nn\ne\ni\nr\no\n\n100\n\n50\n\n0\n\ntriangle width = 0.1\ntriangle width = 1\ntriangle width = 10\n\n540\n\n550\n\ntime in s\n\n560\n\n6\n\n5\n\n4\n\ns\ne\nt\na\nt\ns\n \nf\no\n \nr\ne\nb\nm\nu\nn\n \nn\na\ne\nm\n\n \nr\no\ni\nr\ne\nt\ns\no\np\n\n3\n\n0\n\n4000\n\n8000\n\n12000\n\n0\n\nnumber of spikes\n\n50\n\n100\n\n150\n\n200\n\nposterior mean number of jumps\n\nFigure 5: Stimulus and data for a part of the\nrecordings from the \ufb01rst neuron. (top)\nMean rates computed by using a mov-\ning triangle function.\n(middle) Spik-\ning times. (bottom) Orientation of the\nstimulus.\n\nFigure 6: (left) Posterior mean number of states\nvs. number of spikes in the data for all\nneurons. (right) Posterior mean num-\nber of states over the posterior mean\nnumber of jumps.\n\nstates are very similar becomes high, which makes them indistinguishable without observing more\ndata. For four of the 100 data sets the posterior distribution over \u03bb(t) is compared to the true path in\n\ufb01gure 4. While we used the true value of \u03b1 for our simulations the model seems to be robust against\ndifferent choices of the parameter. This is shown in the supplementary material.\n\n4.2 Bursting of Cat V1 Neurons\n\nPoisson processes are not an ideal model for single neuron spiking times [3]. The two main reasons\nfor this are the refractory period of neurons and bursting [14]. Despite this, Poisson processes have\nbeen used extensively to analyze spiking data [e.g. 19, 20]. Additionally, both reasons should not be\na problem for us. The refractory period is not as important for inference since spiking during it will\nnot be observed. Bursting, on the other hand, is exactly what models with jumping Poisson rates are\nmade to explain: sudden changes in the spiking rate.\n\nThe data set used in this paper was obtained from multi-site silicon electrodes in the primary visual\ncortex of an anesthetized cat. For further information on the experimental setup see [4]. The data\nset contains spike trains from 10 different neurons, which were recorded while bars of varying\norientation moved through the visual \ufb01eld of the cat. Since the stimulus is discrete (the orientation\n\n5\n\nl\nl\nl\nl\n\f1.00\n\n0.75\n\ny\nt\ni\nl\ni\n\nb\na\nb\no\nr\np\n\n0.50\n\n0.25\n\n0.00\n\nstate\n\n1\n\n2\n\n3\n\n4\n\n5\n\ny\nt\ni\nl\ni\n\nb\na\nb\no\nr\np\n\n \n\ne\na\n\nt\n\nt\ns\n\n0.6\n\n0.4\n\n0.2\n\n0.0\n\ny\nt\ni\nl\ni\n\nb\na\nb\no\nr\np\ne\n\n \n\nt\n\na\nt\ns\n\n0.6\n\n0.4\n\n0.2\n\n0.0\n\n0\n\n100\n\n200\n\n300\n\n0\n\n100\n\n200\n\n300\n\nstate 4\nstate 5\n\nstate 1\nstate 2\nstate 3\n\n260\n\n270\n\n280\n\ntime in s\n\n290\n\n0\n\n100\n\n200\norientation\n\n300\n\n0\n\n100\n\n200\norientation\n\n300\n\nFigure 7: Detail of the results for one of the neu-\nrons. The black lines at the bottom rep-\nresent the spike data, while the colors\nindicate the state with the highest pos-\nterior probability, which is represented\nby the height of the area. The states are\nordered by increasing rate \u03bb.\n\nFigure 8: Probability distribution of the orienta-\ntion of the stimulus conditioned on the\nactive state. The states are ordered by\nincreasing rate \u03bb and the results are\ntaken from samples at the MAP num-\nber of states.\n\nranges from 0\u25e6 to 340\u25e6 in steps of 20\u25e6), we expect to \ufb01nd discrete states in the response of the\nneurons. The recording lasted for 720 seconds and, while the orientation of the stimulus changed\nrandomly, each orientation was shown 8 times for 5 seconds each over the whole experiment. In\n\ufb01gure 5, a section of the spiking times of one neuron is shown together with the orientation of the\nstimulus. When computing a mean spiking rate by sliding a triangle function over the data, it is\ncrucial to select a good width for the triangle function. A small width makes it possible to \ufb01nd short\nphases of very high spiking rate (so called bursts), but also leads to jumps in the rate even for single\nspikes. A larger width, on the other hand, smoothes the bursts out. Using our sampler for Bayesian\ninference based on our model allows us to \ufb01nd bursts and cluster them by their spiking rate, but at\nthe same time the spikes between bursts are explained by one of the ground states, which have lower\nrates, but longer durations.\n\nWe used an exponential prior for f with mean rate 10\u22124 and a low value of \u03b1 = 0.1 to prevent\nover\ufb01tting. A second simulation running with a ten times higher prior mean for f and \u03b1 = 0.5 lead\nto almost the same posterior number of states and only a slightly higher number of jumps, of which\na larger fraction had no impact, because the state was not changed. The base distribution p\u03bb was\nchosen to be exponential with mean 106, which is a fairly uninformative prior, because the duration\nof a single spike is in the order of magnitude of 1ms [11] resulting in an upper bound for the rate at\naround 1000/s.\n\nThe posterior number of states for all of the 10 neurons is in the same region, as shown in \ufb01gure\n6, even though the number of spikes differs widely (from 725 to 13244). Although there seem\nto be more states if more jumps are found, the posterior differs strongly from the prior\u2014a priori\nthe expected number of states is under 2\u2014indicating that the posterior is dominated by the data\nlikelihood.\n\nFor a small time frame of the spiking data from one of the neurons \ufb01gure 7 shows which state had\nthe highest posterior probability at each time and how high this probability was. It can be seen\nthat the bursting states, which have high rates, are only active for a short time. Figure 8 shows that\nthese burst states are clearly orientation dependent (see the supplementary material for results of all\n10 neurons). Over the whole experiment all orientations were shown for exactly the same amount\nof time. While the highest state is always clearly concentrated on a range of about 60\u25e6, the lower\nbursting states cover neighboring orientations. Often a smaller reaction can be seen for bars rotated\nby 180\u25e6 from the favored angle. The lowest state might indicate inhibition, because it is mostly\nactive between the favored state and the one rotated by 180\u25e6.\n\nAs we can see in \ufb01gure 9, some of the rates of the states are pretty similar over all the neurons,\nalthough it has to be noted that the orientation is probably not the only feature of the stimulus the\n\n6\n\n\fneurons are receptive to. Especially the position of the bar in the visual \ufb01eld should be important\nand could explain, why only some of the neurons reach the highest burst rate.\n\nIt may seem that \ufb01nding bursts is a simple task, but there has been extensive work in this \ufb01eld\n[e.g. 6, 13, 16] and naive approaches, like looking at the mean rate of events over time, fail easily,\nif the time resolution is not chosen well (as seen in \ufb01gure 5). Additionally, our sampler not only\ndistinguishes between burst and non-burst phases, but also uncovers discrete intensities, which are\nassociated with features of the stimulus.\n\n4.3 Comparison to a continuous rate model\n\nWhile our model assumes that the Poisson rates are discrete values, there have been other approaches\napplying continuous functions to estimate the rate. [1] use a Gaussian process prior over \u03bb(t) and\npresent a Markov chain Monte Carlo sampler to sample from the posterior. Since the sampler is very\nslow for our neuron data, we restricted the inference task to a small time window of the spike train\nfrom only one of the neurons.\n\nIn \ufb01gure 10 the results from the Sigmoidal Gaussian Cox Process (SGCP) model of [1] are shown for\ndifferent values of the length scale hyperparameter and contrasted with the results from our model.\nSimilar to the naive approach of computing a moving average of the rate (as in \ufb01gure 5) the GP\nseems to either smooth out the bursts or becomes so sensitive that even single spikes change the rate\nfunction signi\ufb01cantly depending on the choice of the GP hyperparameters.\n\nOur neural data seems to be especially bad for the performance of this algorithm, because it is based\non the principle of uniformization. Uniformization was introduced by [9] and allows to sample from\nan inhomogeneous Poisson process by \ufb01rst sampling from a homogeneous one. If the rate of the\nhomogeneous process is an upper bound of the rate function of the inhomogeneous Poisson process,\nthen a sample of the latter can be generated by thinning out the events, where each event is omitted\nwith a certain probability. The sampler for the SGCP model performs inference using this method,\nso that events are sampled at the current estimate of the maximum rate for the whole data set and\nthinned out afterwards.\n\nFor our neural data the maximum rate would have to be the spiking rate during the strongest bursts,\nbut this would lead to a very large number of (later thinned out) event times to be sampled in the\nlong periods between bursts, which slows down the algorithm severely. This problem only occurs if\nuniformization is applied on \u03bb(t) while other approaches, like [21], use it on the rate of a MJP with\na \ufb01xed number of states.\n\nWhen we use a practically \ufb02at prior for the sampling of the maximum rate, it will be very low\ncompared to the bursting rates our algorithm \ufb01nds (see \ufb01gure 10). On the other hand, if we use a\nvery peaked prior around our burst rates, the algorithm becomes extremely slow (taking hours for\njust 100 samples) even when used on less than a tenth of the data for one neuron.\n\n5 Conclusion\n\nWe have introduced an inhomogeneous Poisson process model with a \ufb02exible number of states.\nOur inference is based on a MCMC sampler which detects recurring states in the data set and joins\nthem in the posterior. Thus the number of distinct event rates is estimated directly during MCMC\nsampling.\n\nClearly, sampling the number of states together with the jump times and rates needs considerably\nmore samples to fully converge compared to a MJP with a \ufb01xed number of states. For our application\nto neural data in section 4.2 we generated 110 million samples for each neuron, which took between\n80 and 325 minutes on an Intel Xeon CPU with 2.4 GHz. For all neurons the posterior had converged\nat the latest after a tenth of the time. It has to be remembered that to obtain similar results without\nthe Chinese restaurant process, we would need to compute the Bayes factors for different number\nof states. This is a more complicated task than just doing posterior inference for a \ufb01xed number\nof states and would require more computationally demanding approaches, e.g. a bridge sampler, in\norder to get reasonably good estimates. Additionally, it would be hard to decide for what range of\nstate dimensionality the samplers should be run. In contrast to this, our sampler typically gave a\ngood estimate of the number of states in the data set already after just a few seconds of sampling.\n\n7\n\n\f75\n\ne\n\nt\n\na\nr\n \n\ne\nr\ni\nf\n \n\nn\na\ne\nm\n\n \nr\no\ni\nr\ne\n\nt\ns\no\np\n\n50\n\n25\n\n0\n\nprior mean lengthscale=0.25\nprior mean lengthscale=0.50\nprior mean lengthscale=0.75\nprior mean lengthscale=1.00\n\n60\n\n40\n\n20\n\n0\n\n1\n\n2\n\n3\n\n4\nstates\n\n5\n\n1\n\n2\n\n3\n\n4\nstates\n\n5\n\n1\n\n2\n\n3\n\n4\nstates\n\n5\n\n1\n\n2\n\n3\n\n4\nstates\n\n5\n\n260\n\n270\n\ntime in s\n\n280\n\ntime in s\n\n290\n\n300\n\nFigure 9: Posterior mean rates \u03bbi for the MAP\n\nnumber of states.\n\nFigure 10: Results of the SGCP Sampler on a\nsmall part of the data of one neuron.\nThe black dashed line shows the pos-\nterior mean from our sampler. The\nspiking times are drawn as black ver-\ntical lines below.\n\nLonger run times are only needed for a higher accuracy estimate of the posterior distribution over\nthe number of states.\n\nAlthough our prior for the transition rates of the MJP is state-independent, which facilitates the\nintegration over the maximum number of states and gives rise to the Chinese restaurant process, this\ndoes not hold for the posterior. We can indeed compute the full posterior state transition matrix\u2014\nwith state-dependent jump rates\u2014from the samples.\n\nA huge advantage of our algorithm is that its computation time scales linearly in the number of\njumps in the hidden process and the in\ufb02uence of the number of events can be neglected in practice.\nThis has been shown to speed up inference for MMPPs [23], but our more \ufb02exible model makes it\npossible to \ufb01nd simple underlying structures in huge data sets (e.g. network access data with millions\nof events) in reasonable time without the need to \ufb01x the number of states beforehand.\n\nIn contrast to other MCMC algorithms [2, 8, 15] for MMPPs, our sampler is very \ufb02exible and can be\neasily adapted to e.g. Gamma processes generating the data or semi-Markov jump processes, which\nhave non-exponentially distributed waiting times for the change of the rate. For Gamma process\ndata the computation time to calculate the likelihood would no longer be independent of the number\nof events, but it might lead to better results for data which is strongly non-Poissonian.\n\nWe showed that our model can be applied to neural spike trains and that our MCMC sampler \ufb01nds\ndiscrete states in the data, which are linked to the discreteness of the stimulus.\nIn general, our\nmodel should yield the best results when applied to data with many events and a discrete structure\nof unknown dimensionality in\ufb02uencing the rate.\n\nAcknowledgments\n\nNeural data were recorded by Tim Blanche in the laboratory of Nicholas Swindale, University of\nBritish Columbia, and downloaded from the NSF-funded CRCNS Data Sharing website.\n\nReferences\n\n[1] Ryan Prescott Adams, Iain Murray, and David J. C. MacKay. Tractable nonparametric bayesian inference\nin poisson processes with gaussian process intensities. In Proceedings of the 26th Annual International\nConference on Machine Learning, ICML \u201909, pages 9\u201316, New York, NY, USA, 2009. ACM.\n\n[2] Elja Arjas and Dario Gasbarra. Nonparametric Bayesian inference from right censored survival data,\n\nusing the Gibbs sampler. Statistica Sinica, 4:505\u2013524, 1994.\n\n[3] R. Barbieri, M. C. Quirk, L. M. Frank, M. A. Wilson, and E. N. Brown. Construction and analysis of\nnon-poisson stimulus-response models of neural spiking activity. J. Neurosci. Methods, 105(1):25\u201337,\nJanuary 2001.\n\n8\n\nl\n\f[4] Timothy J. Blanche, Martin A. Spacek, Jamille F. Hetke, and Nicholas V. Swindale. Polytrodes: High-\nDensity Silicon Electrode Arrays for Large-Scale Multiunit Recording. Journal of Neurophysiology,\n93(5):2987\u20133000, 2005.\n\n[5] R. J. Boys, D. J. Wilkinson, and T. B. Kirkwood. Bayesian inference for a discretely observed stochastic\n\nkinetic model. Statistics and Computing, 18(2):125\u2013135, June 2008.\n\n[6] M. Chiappalone, A. Novellino, I. Vajda, A. Vato, S. Martinoia, and J. van Pelt. Burst detection algo-\nrithms for the analysis of spatio-temporal patterns in cortical networks of neurons. Neurocomputing,\n6566(0):653\u2013662, 2005.\n\n[7] John P. Cunningham, Vikash Gilja, Stephen I. Ryu, and Krishna V. Shenoy. Methods for estimating\nneural \ufb01ring rates, and their application to brainmachine interfaces. Neural Networks, 22(9):1235\u20131246,\nNovember 2009.\n\n[8] Paul Fearnhead and Chris Sherlock. An exact Gibbs sampler for the Markov-modulated Poisson process.\nJournal of the Royal Statistical Society: Series B (Statistical Methodology), 68(5):767\u2013784, November\n2006.\n\n[9] W.K. Grassmann. Transient solutions in markovian queueing systems. Computers & Operations Re-\n\nsearch, 4(1):47\u201353, 1977.\n\n[10] H. Heffes and D. Lucantoni. A markov modulated characterization of packetized voice and data traf\ufb01c\nand related statistical multiplexer performance. Selected Areas in Communications, IEEE Journal on,\n4(6):856\u2013868, 1986.\n\n[11] Peter R. Huttenlocher. Development of cortical neuronal activity in the neonatal cat. Experimental Neu-\n\nrology, 17(3):247\u2013262, 1967.\n\n[12] Mark J\u00a8ager, Alexander Kiel, Dirk-Peter Herten, and Fred A. Hamprecht. Analysis of single-molecule \ufb02u-\norescence spectroscopic data with a markov-modulated poisson process. ChemPhysChem, 10(14):2486\u2013\n2495, 2009.\n\n[13] Y. Kaneoke and J.L. Vitek. Burst and oscillation as disparate neuronal properties. Journal of Neuroscience\n\nMethods, 68(2):211\u2013223, 1996.\n\n[14] R. E. Kass, V. Ventura, and E. N. Brown. Statistical issues in the analysis of neuronal data. J Neurophysiol,\n\n94(1):8\u201325, July 2005.\n\n[15] S. C. Kou, X. Sunney Xie, and Jun S. Liu. Bayesian analysis of single-molecule experimental data.\n\nJournal of the Royal Statistical Society: Series C (Applied Statistics), 54(3):469\u2013506, June 2005.\n\n[16] C. R. Leg\u00b4endy and M. Salcman. Bursts and recurrences of bursts in the spike trains of spontaneously\n\nactive striate cortex neurons. Journal of Neurophysiology, 53(4):926\u2013939, April 1985.\n\n[17] Gaby Maimon and John A. Assad. Beyond poisson: Increased spike-time regularity across primate pari-\n\netal cortex. Neuron, 62(3):426\u2013440, 2009.\n\n[18] K. S. Meier-Hellstern. A \ufb01tting algorithm for markov-modulated poisson processes having two arrival\n\nrates. European Journal of Operational Research, 29(3):370\u2013377, 1987.\n\n[19] Martin Nawrot, Ad Aertsen, and Stefan Rotter. Single-trial estimation of neuronal \ufb01ring rates: From\n\nsingle-neuron spike trains to population activity. Journal of Neuroscience Methods, 94:81\u201392, 1999.\n\n[20] D. H. Perkel, G. L. Gerstein, and G. P. Moore. Neuronal spike trains and stochastic point processes. I.\n\nThe single spike train. Biophysical Journal, 7(4):391\u2013418, July 1967.\n\n[21] V. A. Rao. Markov chain Monte Carlo for continuous-time discrete-state systems. PhD thesis, University\n\nCollege London, 2012.\n\n[22] V. A. Rao and Y. W. Teh. Gaussian process modulated renewal processes. In J. Shawe-Taylor, R.S. Zemel,\nP. Bartlett, F.C.N. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing\nSystems 24, pages 2474\u20132482. 2011.\n\n[23] V. A. Rao and Y. W. Teh. Fast MCMC sampling for Markov jump processes and extensions. Journal of\n\nMachine Learning Research, 14:3207\u20133232, 2013.\n\n[24] Ardavan Saeedi and Alexandre Bouchard-C\u02c6ot\u00b4e. Priors over Recurrent Continuous Time Processes. In\n\nAdvances in Neural Information Processing Systems (NIPS), volume 24, 2011.\n\n[25] K. Sriram and W. Whitt. Characterizing superposition arrival processes in packet multiplexers for voice\n\nand data. Selected Areas in Communications, IEEE Journal on, 4(6):833\u2013846, sep 1986.\n\n[26] Florian Stimberg, Andreas Ruttor, and Manfred Opper. Bayesian inference for change points in dynam-\nical systems with reusable states\u2014a chinese restaurant process approach. Journal of Machine Learning\nResearch, Proceedings Track, 22:1117\u20131124, 2012.\n\n[27] Yee Whye Teh. Dirichlet processes. In Encyclopedia of Machine Learning. Springer, 2010.\n\n[28] Xinhua Zhang. A very gentle note on the construction of dirichlet process. Technical report, Canberra,\n\nAustralia, 09 2008.\n\n9\n\n\f", "award": [], "sourceid": 507, "authors": [{"given_name": "Florian", "family_name": "Stimberg", "institution": "TU Berlin"}, {"given_name": "Andreas", "family_name": "Ruttor", "institution": "TU Berlin"}, {"given_name": "Manfred", "family_name": "Opper", "institution": "TU Berlin"}]}