{"title": "Bayesian Modeling and Classification of Neural Signals", "book": "Advances in Neural Information Processing Systems", "page_first": 590, "page_last": 597, "abstract": null, "full_text": "Bayesian Modeling and Classification of \n\nNeural Signals \n\nMichael S. Lewicki \n\nComputation and Neural Systems Program \nCalifornia Institute of Technology 216-76 \n\nPasadena, CA 91125 \n\nlewickiOcns.caltech.edu \n\nAbstract \n\nSignal processing and classification algorithms often have limited \napplicability resulting from an inaccurate model of the signal's un(cid:173)\nderlying structure. We present here an efficient, Bayesian algo(cid:173)\nrithm for modeling a signal composed of the superposition of brief, \nPoisson-distributed functions. This methodology is applied to the \nspecific problem of modeling and classifying extracellular neural \nwaveforms which are composed of a superposition of an unknown \nnumber of action potentials CAPs). Previous approaches have had \nlimited success due largely to the problems of determining the spike \nshapes, deciding how many are shapes distinct, and decomposing \noverlapping APs. A Bayesian solution to each of these problems is \nobtained by inferring a probabilistic model of the waveform. This \napproach quantifies the uncertainty of the form and number of the \ninferred AP shapes and is used to obtain an efficient method for \ndecomposing complex overlaps. This algorithm can extract many \ntimes more information than previous methods and facilitates the \nextracellular investigation of neuronal classes and of interactions \nwithin neuronal circuits. \n\n590 \n\n\fBayesian Modeling and Classification of Neural Signals \n\n591 \n\n1 \n\nINTRODUCTION \n\nExtracellular electrodes typically record the activity of several neurons in the vicin(cid:173)\nity of the electrode tip (figure 1). Most electrophysiological data is collected by \nisolating action potentials (APs) from a single neuron by using a level detector or \nwindow discriminator. Methods for extracting APs from multiple neurons can, in \naddition to the obvious advantage of providing more data, provide the means to \ninvestigate local neuronal interactions and response properties of neuronal popula(cid:173)\ntions. Determining from the voltage waveform what cell fired when is a difficult, \nill-posed problem which is compounded by the fact that cells frequently fire simul(cid:173)\ntaneously resulting in large variations in the observed shapes. \n\nThere are three major difficulties in identifying and classifying action potentials \n(APs) in a neuron waveform. The first is determining the AP shapes, the second is \ndeciding the number of distinct shapes, and the third is decomposing overlapping \nspikes into their component parts. In general, these problems cannot be solved \nindependently, since the solution of one will affect the solution of the others. \n\n2: rn_Cl. \n\nFigure 1: Each neuron generates a stereotyped action potential (AP) which is observed \nthrough the electrode as a voltage fluctuation. This shape is primarily a function of \nthe position of a neuron relative to the tip. The extracellular waveform shows several \ndifferent APs generated by an unknown number of neurons. Note the frequent presence of \noverlapping APs which can completely obscure individual spikes. \n\nThe approach summarized here is to model the waveform directly to obtain a prob(cid:173)\nabilistic description of each action potential and, in turn, of the whole waveform. \nThis method allows us to compute the class conditional probabilities of each AP. \nIn addition, it is possible to quantify the certainty of both the form and number of \nspike shapes. Finally, we can use this description to decompose overlapping APs \nefficiently and assign probabilities to alternative spike sequences. \n\n2 MODELING SINGLE ACTION POTENTIALS \n\nThe data from the event observed (at time zero) is modeled as resulting from a \nfixed underlying spike function, s(t), plus noise: \n\n(1) \n\n\f592 \n\nLewicki \n\nwhere v is the parameter vector that defines the spike function. The noise, 1], is \nmodeled as Gaussian with zero mean and standard deviation u1]' \nFrom the Bayesian perspective, the task is to infer the posterior distribution of the \nspike function parameters (assuming, for the moment, that u1] and Uw are known): \n\nP( ID \n\nv \n\n'O\"1]'O\"w, \n\nM) - P(Dlv, 0\"'1' M) P(vluw, M) \n. \n\nP(DIO\"1],O\"w,M) \n\n-\n\n(2) \n\nThe two terms specifying the posterior distribution of v are 1) the probability of \nthe data given the model: \n\n(3) \n\nand 2) the prior assumptions of the structure of s(t) which are assumed to be of \nthe form: \n\n(4) \n\nThe superscript (m) denotes differentiation which for these demonstrations we as(cid:173)\nsumed to be m = 1 corresponding to linear splines. The smoothness of s(t) is \ncontrolled through Uw with small values of Uw penalizing large fluctuations. \nThe final step in determining the posterior distribution is to eliminate the depen(cid:173)\ndence of P(vID, 0\"1]' O\"w, M) on 0\"1] and O\"w. Here, we use the approximation: \n\n(5) \n\nThe most probable values of 0\"1] and O\"w were obtained using the methods of MacKay \n(1992) in which reestimation formulas are obtained from a Gaussian approximation \nof the posterior distribution for 0\"1] and O\"w, P(O\"1] , O\"wID, M). Correct inference of O\"w \nprevents the spike function from overfitting the data. \n\n3 MODELING MULTIPLE ACTION POTENTIALS \n\nWhen a waveform contains multiple types of APs, determining the component spike \nshapes is more difficult because the classes are not known a priori. The uncertainty \nof which class an event belongs to can be incorporated with a mixture distribution. \nThe probability of a particular event, D n , given all spike models, M 1 :K , is \n\nP(Dnlvl:K' 1r, 0\"1]' M1 :K) = L 1I\"k P(Dnlvk, 0\"'1' Mk), \n\nK \n\nk=l \n\n(6) \n\nwhere 1I\"k is the a priori probability that a spike will be an instance of Mk, and \nE 1I\"k = l. \nAs before, the objective is to determine the posterior distribution for the parameters \ndefining a set of spike models, P(V 1 :K, 1rID 1:N , 0\"1]1 trw, M 1 :K) which is obtained again \nusing Bayes' rule. \n\n\fBayesian Modeling and Classification of Neural Signals \n\n593 \n\nFinding the conditions satisfied at a posterior maximum leads to the equation: \n\n(7) \n\nwhere 'Tn is the inferred occurrence time (typically to sub-sample period accuracy) of \nthe event Dn. This equation is solved iteratively to obtain the most probable values \nof V l :K \u2022 Note that the error for each event, D n , is weighted by P(Mk IOn, Vk, 1r, 0''7) \nwhich is the probability that the event is an instance of the kth spike model. This is \na soft clustering procedure, since the events are not explicitly assigned to particular \nclasses. Maximizing the posterior yields accurate estimates of the spike functions \neven when the clusters are highly overlapping. \n\nThe techniques described in the previous section are used to determine the most \nprobable values for 0''7 and rTw and, in turn, the most probable values of V l :K and 1r. \n\n4 DETERMINING THE NUMBER OF SPIKE SHAPES \n\nChoosing a set of spike models that best fit the data, would result eventually in a \nmodel for each event in the waveform. Heuristics might indicate whether two spike \nmodels are identical or distinct, but ad hoc criteria are notoriously dependent on \nparticular circumstances, and it is difficult to state precisely what information the \nrules take into account. \n\nTo determine the most probable number of spike models, we apply probability theory. \nLet Sj = {MHJ} denote a set of spike models and H denote information known \na priori. The probability of Sj, conditioned only on H and the data, is obtained \nusing Bayes' rule: \n\n(8) \n\nThe only data-dependent term is P(OI:NISj, H) which is the evidence for Sj \n(MacKay, 1992). With the assumption that all hypotheses SI :3 are equally probable \na priori, P(D l :NISj, H) ranks alternative spike sets in terms of their probability. \nThe evidence term P(OI :N[Sj, H) is convenient because it is the normalizing con(cid:173)\nstant for the posterior distribution of the parameters defining the spike set. Al(cid:173)\nthough calculation of P(O I :N I Sj ,H) is analytically intractable, it is often well(cid:173)\napproximated with a Gaussian integral which was the approximation used for these \ndemonstrations. \n\nA convenient way of collapsing the spike set is to compare spike models pairwise. \nTwo models in the spike set are selected along with a sampled set of events fit by \neach model. We then evaluate P(DISl) and P(D[S2)' S1 is the hypothesis that \nthe data is modeled by a single spike shape, S2 says there are two spike shapes. If \nP(D[S1) > P(D[S2), we replace both models in S2 by the one in S1. The procedure \nterminates when no more pairs can be combined to increase the evidence. \n\n\f594 \n\nLewicki \n\n5 DECOMPOSING OVERLAPPING SPIKES \n\nOverlaps must be decomposed into their component spikes for accurate inference \nof the spike functions and accurate classification of the events. Determining the \nbest-fitting decomposition is difficult becaus(~ of the enormous number of possible \nspike sequences, not only all possible model combinations for each event but also \nall possible event times. \n\nA brute-force approach to this problem is to perform an exhaustive search of the \nspace of overlapping spike functions and event times to find the sequence with \nmaximum probability. This approach was used by Atiya (1992) in the case of two \noverlapping spikes with the times optimized to one sample period. Unfortunately, \nthis is often computationally too demanding even for off-line analysis. \n\nWe make this search efficient utilizing dynamic programming and k-dimensional \ntrees (Friedman et al., 1977). Once the best-fitting decomposition can be obtained, \nhowever, it may not be optimal, since adding more spike shapes can overfit the \ndata. This problem is minimized by evaluating the probability for alternative de(cid:173)\ncompositions to determine the most probable spike sequence (figure 2) . \n\na \n\n.. ,,' \n\n. b' \n\nc \n\nFigure 2: Many spike function sequences can account for the same region of data. The \nthick lines show the data, thin lines show individual spike functions. In this case, the best(cid:173)\nfitting overlap solution is not the most probable: the sequence with 4 spike functions is \nmore than 8 time& more probable than the other solutions, even though these have smaller \nmean squared error. Using the best-fitting overlap solution may increase the classification \nerror. Classification error is minimized by using t he overlap solution that is most probable. \n\n6 PERFORMANCE \n\nThe algorithm was tested on 40 seconds of neurophysiological data. The task is \nto determine the form and number of spike ~hapes in a waveform and to infer the \noccurrence times of each spike shape. The output of the algorithm is shown in \nfigure 3. The uniformity of the residual error indicates that the six inferred spike \nshapes account for the entire 40 seconds of data. The spike functions M2 and M3 \nappear similar by eye, but the probabilities calculated with the methods in section 4 \nindicate that the two functions are significantly different. When plotted against each \n\n\fBayesian Modeling and Classification of Neural Signals \n\n595 \n\n\"Xl +----IHf----+-----+---+----j \n\n.~ ... \n. .. ...\u2022 ;: ~ \n\n. \n\n. , '\" \n\n,.\" ,...-.:.' ... \n\n\\\"f' \n\ny \n\n\u00b7m+----+~~-+_---+---+----1 \n\n~~-- --r----r_--_r----r_--~ \n\n. .... ,: ..... v~\u00b7 \u00b7 \n\n.... ''' .. '., '. ~;: \n\nTlme(rTS) \n\nT .... ( ... l \n\n'(Xl +_---iIIIr---+_-- -+---+---_j \n\n. \"J~ \n\n'~l,~ \n. m +_---+--'W',.~ .. -.+_----+---~---_j \n\n\\ .<\" \n\n~'~~'k'! .. \", . \n\n-bI \n\n!~>,\"1:~.~iJ;o:~~\u00b7\u00b7\u00b7 >' .. ;;,:; \u00b7\"t',,' \n'\",:.',1' ;~\\' .\u2022 ~. . .::\\t\" ,~;-\u2022. i .. \n\n'::\"~a\\f ~\":''';'lf'. . \n\n;\";': .. ~:~\";':-~. ,~; ,'; \n\n:;'~::~ ';R~:' \n. ..... ~:._:.,l.., :., . . ',~.?\" .. \".: :-.. ! :-t'r.i\"~'~. \n\nTlrTl {rT15} \n\nM5 \n\n\"\"+_---+---+----;----+-------1 \n\n. \"\" l:..,E:\u00b7 \n.. ~~~ .. . \n\n.~. \" \n\n.:.(, :.:. , ;. \n:: .\u2022\u2022.. \n\n.,,,,, +_---;----+-----+-----t--.-\n\n', : , .... \n\ni';', \n\n. \u00b7\u00b7:.h'i II\" \n\n... \n\n.' ',.\" .' , , \n\n, .~ \n\nr\"~ \u2022..\u2022 ' \n\nc. ':\"\"\". ', . .,..\n\n. \n\n\"-\".'.,'. :. \n\nTme,,,..) \n\n\"Xl +----+---+--- -+----+-----j \n\n,(Xl +_- --+----I----4---.-+------l \n\n\u2022 L'tI\" ... 'IJ ... \n\n.':~ ' \u2022\n\n,\"\" ..... , , -~, r-' ..... \n\n\\.. ,.,~~.:.~~ \n\n' \n\n\u2022\u2022 ~.,,:< .. , '::;', \",,~.: \n.' .-:.'\"'' \n\n.:-:;,\\'., ,'. ,:,;\\ ... ~: \n. :'~:: \". d' ., \n\n; ...... :, .~;.:~. \n\n\u00b7m +_---+---+-----+---+- - -_ j \n\n' 300 +-----+-~~_+__---+--... +-----l \n\n., \n\nTIIT.'mI) \n\n.\"\" \n\n~ \n\n-\\. \" \n'~' ... ,.,' .,\".' \" \n\n' 4~ '\" . 'Ii\", ~:\",'~ ' \n\":' ,:\" \n,:,:\" \"\" \n\n, ...\u2022. ;/;~~~: ... : ''':~; . .. <,.h/., 6-' ... , .. ~{:~:..r.J,G:, \n: \u2022 :'.:r:-':\"\"-':.\" .. ', \"\". '.'\\ \n.-:'''i.-.c . \n\nFigure 3: The solid lines are the inferred spike models. The data overlying each model \nis a sample of at most 40 events with overlapping spikes subtracted out. The residual \nerrors are plotted below each model. This spike set was obtained after three iterations of \nthe algorithm, decomposing overlaps and determining the most probable number of spike \nfunctions after each iteration. The whole inference procedure used 3 minutes of CPU \ntime on a Sparc IPX. Once the spike set is infe! red, classification of the same 40 second \nwaveform takes about 10 seconds. \n\n\f596 \n\nLewicki \n\nother, the two populations of APs are distinctly separated in the region around the \npeak with M3 being wider than M 2 \u2022 \n\nThe accuracy of the algorithm was tested by generating an artificial data set com(cid:173)\nposed of the six inferred shapes shown in figure 3. The event times were Poisson \ndistributed with frequency equal the inferred firing rate of the real data set. Gaus(cid:173)\nsian noise was then added with standard deviation equal to 0\"'1. The classification \nresults are summarized in the tables below. \n\nTable 1: Results of the spike model inference algorithm on the synthesized data set. \n\nI Model \nI b.max/O\"fJ \n\n1 I 2 \n\nII \n/I \nII 0.44 I 0.36 I 1.07 I 0.78 I 0.84 I 0.40 II \n\nI 3 \n\nI 4 \n\nI 5 \n\nI 6 \n\nThe number of spike models was correctly determined by the algorithm with the six-model \nspike set was preferred over the most probable five-model spike set byexp(34) : 1 and over \nthe most probable seven-model spike set by exp(19) : 1. The inferred shapes were accurate \nto within a maximum error of 1.0717'1. The row elements show the maximum absolute \ndifference, normalized by 17'1' between each true spike function and the corresponding \ninferred function. \n\nTable 2: Classification results for the synthesized data set (non-overlapping events). \n\nTrue \nModels \n\n1 \n2 \n3 \n4 \n5 \n6 \n\n1 \n17 \n0 \n0 \n0 \n0 \n0 \n\nInferred Models \n6 \n2 \n5 \n0 \n0 \n0 \n0 \n0 \n25 \n0 \n0 \n0 \n0 \n0 \n0 \n56 \n0 \n0 \n0 393 \n0 \n\n4 \n0 \n0 \n0 \n116 \n0 \n0 \n\n3 \n0 \n1 \n15 \n0 \n0 \n0 \n\nTotal \nMissed \nEvents Events \n17 \n26 \n15 \n117 \n73 \n647 \n\n0 \n0 \n0 \n1 \n17 \n254 \n\nTable 3: Classification results for the synthesized data set (overlapping events). \n\nTrue \nModels \n\n1 \n2 \n3 \n4 \n5 \n6 \n\nInferred Models \n6 \n2 \n5 \n1 \n0 \n0 \n22 \n0 \n0 \n0 \n0 36 \n0 \n0 \n0 \n0 \n1 \n0 \n1 \n0 \n61 \n0 \n1 \n0 \n2 243 \n0 \n0 \n\n4 \n0 \n0 \n0 \n116 \n1 \n3 \n\n3 \n0 \n1 \n20 \n0 \n0 \n0 \n\nMissed \nTotal \nEvents Events \n22 \n37 \n20 \n121 \n82 \n408 \n\n0 \n0 \n0 \n3 \n19 \n160 \n\nTables 2 and 3: Each matrix component indicates the number of times true model i was \nclassified as inferred model j. Events were missed if the true spikes were not detected \nin an overlap sequence or if all sample values for the spike fell below the event detection \nthreshold (417'1). There was 1 false positive for Ms and 7 for M 6 \u2022 \n\n\fBayesian Modeling and Classification of Neural Signals \n\n597 \n\n7 DISCUSSION \n\nFormulating the task as having to infer a probabilistic model made clear what was \nnecessary to obtain accurate spike models. The soft clustering procedure accurately \ndetermines the spike shapes even when the true underlying shapes are similar. U n(cid:173)\nless the spike shapes are well-separated, commonly used hard clustering procedures \nwill lead to inaccurate estimates. \n\nProbability theory also allowed for an objective means of determining the number of \nspike models which is an essential reason for the success of this algorithm. With the \nwrong number of spike models overlap decomposition becomes especially difficult . \nThe evidence has proved to be a sensitive indicator of when two classes are distinct . \n\nProbability theory is also essential to accurate overlap decomposition. Simply fit(cid:173)\nting data with compositions of spike models leads to the same overfitting problem \nencountered in determining the number of spike models and in determining the \nspike shapes. Previous approaches have been able to handle only a limited class of \noverlaps, mainly due to the difficultly in making the fit efficient. The algorithm used \nhere can fit an overlap sequence of virtually arbitrary complexity in milliseconds. \n\nIn practice, the algorithm extracts many times more information from a neural \nwaveform than previous methods. Moreover, this information is qualitatively dif(cid:173)\nferent from a simple list of spike times. Having reliable estimates of the action \npotential shapes makes it possible to study the properties of these classes, since \ndistinct neuronal types can have distinct neuronal spikes. Finally, accurate over(cid:173)\nlap decomposition makes it possible to investigate interactions among local neurons \nwhich were previously very difficult to observe. \n\nAcknowledgements \n\nI thank David MacKay for helpful discussions and Jamie Mazer for many conver(cid:173)\nsations and extensive help with the development of the software. This work was \nsupported by Caltech fellowships and an NIH Research Training Grant. \n\nReferences \n\nA.F. Atiya. (1992) Recognition of multiunit neural signals. IEEE Transactions on \nBiomedical Engineering 39(7):723-729. \n\nJ .H. Friedman, J.L. Bently, and R.A. Finkel. (1977) An algorithm for finding best \nmatches in logarithmic expected time. ACM Trans. Math. Software 3(3):209-226. \n\nD. J. C. MacKay. (1992) Bayesian interpolation. Neural Computation 4(3):415-445. \n\n\f", "award": [], "sourceid": 777, "authors": [{"given_name": "Michael", "family_name": "Lewicki", "institution": null}]}