{"title": "Spike Sorting: Bayesian Clustering of Non-Stationary Data", "book": "Advances in Neural Information Processing Systems", "page_first": 105, "page_last": 112, "abstract": null, "full_text": "            Spike Sorting: Bayesian Clustering of\n                            Non-Stationary Data\n\n\n\n             Aharon Bar-Hillel                                 Adam Spiro\n         Neural Computation Center              School of Computer Science and Engineering\n     The Hebrew University of Jerusalem             The Hebrew University of Jerusalem\n      aharonbh@cs.huji.ac.il                              adams@cs.huji.ac.il\n\n\n                                              Eran Stark\n                                   Department of Physiology\n                             The Hebrew University of Jerusalem\n                              eranstark@md.huji.ac.il\n\n\n\n\n                                              Abstract\n\n           Spike sorting involves clustering spike trains recorded by a micro-\n           electrode according to the source neuron. It is a complicated problem,\n           which requires a lot of human labor, partly due to the non-stationary na-\n           ture of the data. We propose an automated technique for the clustering\n           of non-stationary Gaussian sources in a Bayesian framework. At a first\n           search stage, data is divided into short time frames and candidate descrip-\n           tions of the data as a mixture of Gaussians are computed for each frame.\n           At a second stage transition probabilities between candidate mixtures are\n           computed, and a globally optimal clustering is found as the MAP so-\n           lution of the resulting probabilistic model. Transition probabilities are\n           computed using local stationarity assumptions and are based on a Gaus-\n           sian version of the Jensen-Shannon divergence. The method was applied\n           to several recordings. The performance appeared almost indistinguish-\n           able from humans in a wide range of scenarios, including movement,\n           merges, and splits of clusters.\n\n\n\n1      Introduction\n\nNeural spike activity is recorded with a micro-electrode which normally picks up the ac-\ntivity of multiple neurons. Spike sorting seeks the segmentation of the spike data such that\neach cluster contains all the spikes generated by a different neuron. Currently, this task is\nmostly done manually. It is a tedious mission, requiring many hours of human labor for\neach recording session. Several algorithms were proposed in order to help automating this\nprocess (see [7] for a review, [9],[10]) and some tools were implemented to assist in manual\nsorting [8]. However, the ability of suggested algorithms to replace the human worker has\nbeen quite limited.\n\nOne of the main obstacles to a successful application is the non-stationary nature of the data\n[7]. The primary source of this non-stationarity is slight movements of the recording elec-\n\n\f\ntrode. Slight drifts of the electrode's location, which are almost inevitable, cause changes in\nthe typical shapes of recorded spikes over time. Other sources of non-stationarity include\nvariable background noise and changes in the characteristic spike generated by a certain\nneuron. The increasing usage of multiple electrode systems turns non-stationarity into an\nacute problem, as electrodes are placed in a single location for long durations.\n\nUsing the first 2 PCA coefficients to represent the data (which preserves up to 93% of the\nvariance in the original recordings [1]), a human can cluster spikes by visual inspection.\nWhen dividing the data into small enough time frames, cluster density can be approximated\nby a multivariate Gaussian with a general covariance matrix without loosing much accu-\nracy [7]. Problematic scenarios which can appear due to non-stationarity are exemplified\nin Section 4.2 and include: (1) Movements and considerable shape changes of the clus-\nters over time, (2) Two clusters which are reasonably well-separated may move until they\nconverge and become indistinguishable. A split of a cluster is possible in the same manner.\n\nMost spike sorting algorithms do not address the presented difficulties at all, as they assume\nfull stationarity of the data. Some methods [4, 11] try to cope with the lack of stationarity by\ngrouping data into many small clusters and identifying the clusters that can be combined\nto represent the activity of a single unit. In the second stage, [4] uses ISI information\nto understand which clusters cannot be combined, while [11] bases this decision on the\ndensity of points between clusters. In [3] a semi-automated method is suggested, in which\neach time frame is clustered manually, and then the correspondence between clusters in\nconsecutive time frames is established automatically. The correspondence is determined\nby a heuristic score, and the algorithm doesn't handle merge or split scenarios.\n\nIn this paper we suggest a new fully automated technique to solve the clustering problem\nfor non-stationary Gaussian sources in a Bayesian framework. We divide the data into\nshort time frames in which stationarity is a reasonable assumption. We then look for good\nmixture of Gaussians descriptions of the data in each time frame independently. Transi-\ntion probabilities between local mixture solutions are introduced, and a globally optimal\nclustering solution is computed by finding the Maximum-A-Posteriori (MAP) solution of\nthe resulting probabilistic model. The global optimization allows the algorithm to success-\nfully disambiguate problematic time frames and exhibit close to human performance. We\npresent the outline of the algorithm in Section 2. The transition probabilities are computed\nby optimizing the Jensen-Shannon divergence for Gaussians, as described in Section 3.\nEmpirical results and validation are presented in Section 4.\n\n\n2    Clustering using a chain of Gaussian mixtures\n\nDenote the observable spike data by D = {d}, where each spike d  Rn is de-\nscribed by the vector of its PCA coefficients. We break the data into T disjoint groups\n                 T\n{Dt = {dt}Nt }         . We assume that in each frame, the data can be well approximated by\n          i i=1 t=1\na mixture of Gaussians, where each Gaussian corresponds to a single neuron. Each Gaus-\nsian in the mixture may have a different covariance matrix. The number of components in\nthe mixture is not known a priori, but is assumed to be within a certain range (we used 1-6).\n\nIn the search stage, we use a standard EM (Expectation-Maximization) algorithm to find\na set of M t candidate mixture descriptions for each time frame t. We build the set of\ncandidates using a three step process. First, we run the EM algorithm with different number\nof clusters and different initial conditions. In a second step, we import to each time frame\nt the best mixture solutions found in the neighboring time frames [t - k, .., t + k] (we\nused k = 2). These solutions are also adapted by using them as the initial conditions for\nthe EM and running a low number of EM rounds. This mixing of solutions between time\nframes is repeated several times. Finally, the solution list in each time frame is pruned\nto remove similar solutions. Solutions which don't comply with the assumption of well\n\n\f\nshaped Gaussians are also removed.\n\nIn order to handle outliers, which are usually background spikes or non-spike events, each\nmixture candidate contains an additional 'background model' Gaussian. This model's pa-\nrameters are set to 0, K  t where t is the covariance matrix of the data in frame t and\nK > 1 is a constant. Only the weight of this model is allowed to change during the EM\nprocess.\n\nAfter the search stage, each time frame t has a list of M t models {t}T,Mt\n                                                                                                                                   i t=1,i=1. Each mix-\n\nture model is described by a triplet t = {t , t , t }Ki,t , denoting Gaussian mixture's\n                                                                    i           i,l     i,l       i,l l=1\nweights, means, and covariances respectively. Given these candidate models we define a\ndiscrete random vector Z = {zt}T                                          in which each component zt has a value range of\n                                                              t=1\n{1, 2, .., M t}. \"zt = j\" has the semantics of \"at time frame t the data is distributed accord-\ning to the candidate mixture t \". In addition we define for each spike dt a hidden discrete\n                                                    j                                                                                    i\n'label' random variable lt. This label indicates which Gaussian in the local mixture hy-\n                                        i\n\npothesis is the source of the spike. Denote by Lt = {lt}Nt the vector of labels of time\n                                                                                                         i i=1\nframe t, and by L the vector of all the labels.\n\n\nO                                                                                                                                                          H\n  z 1                         O \n                               z 2                            O \n                                                                   z 3                   O \n                                                                                               z T                                                     O\n           O                                                                                                                                                              H\n                                                                                                                                               \n            L1                         O \n                                        L2                                                               O \n                                                                                                          LT\n\nO \n D1                           O \n                               D 2                                                       O \n                                                                                               D T                                 O\n                                                                                                                                         L                                O\n                                                                                                                                                                           D\n\n\n                                             (A)                                                                                                           (B)\n\nFigure 1: (A) A Bayesian network model of the data generation process. The network has an HMM\nstructure, but unlike HMM it does not have fixed states and transition probabilities over time. The\nvariables and the CPDs are explained in Section 2. (B) A Bayesian network representation of the\nrelations between the data D and the hidden labels H (see Section 3.1). The visible labels L and the\nsampled data points are independent given the hidden labels.\n\n\nWe describe the probabilistic relations between D, L, and Z using a Bayesian network\nwith the structure described in Figure 1A. Using the network structure and assuming i.i.d\nsamples the joint log probability decomposes into\n\n                                T                                          T     N t\n\n          log P (z1) +                logP (zt|zt-1) +                                  [log P (lt|                                           |\n                                                                                                         i zt) + log P (dti lt\n                                                                                                                                                   i , zt)]                     (1)\n                               t=2                                        t=1 i=1\n\nWe wish to maximize this log-likelihood over all possible choices of L, Z. Notice that\nby maximizing the probability of both data and labels we avoid the tendency to prefer\nmixtures with many Gaussians, which appears when maximizing the probability for the\ndata alone. The conditional probability distributions (CPDs) of the points' labels and the\npoints themselves, given an assignment to Z, are given by\n\n          log P (lt =\n                    k         j|zt = i) = log ti,j                                                                                                                             (2)\n                                                         1                                                                          t                -1\n   log P (dt | =                                              [                                         | + (         -            ) t                    (         -            )]\n                   k lt\n                         i    j, zt = i) = -                  n log 2 + log |t                                dt         t                              dt             t\n                                                         2                                       i,j             k          i,j               i,j               k          i,j\n\nThe transition CPDs P (zt|zt-1) are described in Section 3. For the first frame's prior we\nuse a uniform CPD. The MAP solution for the model is found using the Viterbi algorithm.\nLabels are then unified using the correspondences established between the chosen mixtures\nin consecutive time frames. As a final adjustment step, we repeat the mixing process using\nonly the mixtures of the found MAP solution. Using this set of new candidates, we calculate\nthe final MAP solution in the same manner described above.\n\n\f\n3      A statistical distance between mixtures\n\n\nThe transition CPDs of the form P (zt|zt-1) are based on the assumption that the Gaus-\nsian sources' distributions are approximately stationary in pairs of consecutive time frames.\nUnder this assumption, two mixtures candidates estimated at consecutive time frames are\nviewed as two samples from a single unknown Gaussian mixture. We assume that each\nGaussian component from any of the two mixtures arises from a single Gaussian compo-\nnent in the joint hidden mixture, and so the hidden mixture induces a partition of the set of\nvisible components into clusters. Gaussian components in the same cluster are assumed to\narise from the same hidden source. Our estimate of p(zt = j|zt-1 = i) is based on the\nprobability of seeing two large samples with different empirical distributions (t-1 and t\n                                                                                           i         j\nrespectively) under the assumption of such a single joint mixture. In Section 3.1, the esti-\nmation of the transition probability is formalized as an optimization of a Jensen-Shannon\nbased score over the possible partitions of the Gaussian components set.\n\nIf the family of allowed hidden mixture models is not further constrained, the optimization\nproblem derived in Section 3.1 is trivially solved by choosing the most detailed partition\n(each visible Gaussian component is a singleton). This happens because a richer partition,\nwhich does not merge many Gaussians, gets a higher score. In Section 3.2 we suggest\nnatural constraints on the family of allowed partitions in the two cases of constant and\nvariable number of clusters through time, and present algorithms for both cases.\n\n\n\n3.1         A Jensen-Shannon based transition score\n\n\nAssume that in two consecutive time frames we observed two labeled samples\n(X1, L1), (X2, L2) of sizes N 1, N 2 with empirical distributions 1, 2 respectively. By\n'empirical distribution', or 'type' in the notation of [2], we denote the ML parameters of\nthe sample, for both the multinomial distribution of the mixture weights and the Gaus-\nsian distributions of the components. As stated above, we assume that the joint sample\nof size N = N 1 + N 2 is generated by a hidden Gaussian mixture H with KH com-\nponents, and its components are determined by a partition of the set of all components\nin 1, 2. For convenience of notation, let us order this set of K1 + K2 Gaussians\nand refer to them (and to their parameters) using one index. We can define a function\nR : {1, .., K1 + K2}  {1, .., KH } which matches each visible Gaussian component in\n1 or 2 to its hidden source component in H . Denote the labels of the sample points\n                                             N j\nunder the hidden mixture H = {hj}                   , j = 1, 2. The values of these variables are given\n                                       i     i=1\nby hj = R(lj), where lj is the label index in the set of all components.\n       i          i        i\n\nThe probabilistic dependence between a data point, its visible label, and its hidden label is\nexplained by the Bayesian network model in Figure 1B. We assume a data point is obtained\nby choosing a hidden label and then sample the point from the relevant hidden component.\nThe visible label is then sampled based on the hidden label using a multinomial distribution\n\nwith parameters  = {q}K1+K2\n                                q=1         , where q = P (l = q|h = R(q)), i.e., the probability\nof the visible label q given the hidden label R(q) (since H is deterministic given L, P (l =\nq|h) = 0 for h = R(q)). Denote this model, which is fully determined by R, , and H ,\nby M H .\n\nWe wish to estimate P ((X1, L1)  1|(X2, L2)  2, M H ). We use ML approxima-\ntions and arguments based on the method of types [2] to approximate this probability and\noptimize it with respect to H and . The obtained result is (the derivation is omitted)\n\n\n                P ((X1, L1)  1|(X2, L2)  2, M H )                                              (3)\n\n\f\n                                           KH\n\n           max exp(-N                            H                                                                                              )))\n                                                   m                            q Dkl(G(x|q , q )|G(x|H\n                                                                                                                                      m, H\n                                                                                                                                              m\n             R\n                                           m=1           {q:R(q)=m}\n\nwhere G(x|, ) denotes a Gaussian distribution with the parameters ,  and the opti-\nmized H ,  appearing here are given as follows. Denote by wq (q  {1, .., K1 + K2})\nthe weight of model q in a naive joint mixture of 1,2, i.e., wq = Nj \n                                                                                                                          N     q where j = 1 if\ncomponent q is part of 1 and the same for j = 2.\n                                                                                w\n            H =                                                                     q                             =                        \n                 m                           wq          ,       q =                        ,          H                                    q q        (4)\n                                                                           H                               m\n                            {q:R(q)=m}                                          R(q)                                    {q:R(q)=m}\n\n            H =                                                                    )(                     )t)\n                 m                                q (q + (q - H\n                                                                                m         q - H\n                                                                                                       m\n\n                            {q:R(q)=m}\n\n\nNotice that the parameters of a hidden Gaussian, H\n                                                                                     m and H\n                                                                                                       m, are just the mean and covari-\nance of the mixture                                \n                                      q:R(q)=m          q G(x|q , q ). The summation over q in expression (3)\ncan be interpreted as the Jensen-Shannon (JS) divergence between the components assigned\nto the hidden source m, under Gaussian assumptions.\n\nFor a given parametric family, the JS divergence is a non-negative measurement which\ncan be used to test whether several samples are derived from a single distribution from the\nfamily or from a mixture of different ones [6]. The JS divergence is computed for a mixture\nof n empirical distributions P1, .., Pn with mixture weights 1, .., n. In the Gaussian\ncase, denote the mean and covariance of the component distributions by {i, i}n . The\n                                                                                                                                                   i=1\nmean and covariance of the mixture distribution ,  are a function of the means and\ncovariances of the components, with the formulae given in (4) for H\n                                                                                                                         m,H\n                                                                                                                                m. The Gaussian\nJS divergence is given by\n\n                                                        n\n\n         J SG               (\n                            P1, .., Pn) =                     iDkl(G(x|i, i), G(x|, ))                                                            (5)\n                 1,..,n\n                                                   i=1\n                                                   n                                              1                            n\n         = H(G(x|, )) -                              iH(G(x|i, i)) =                            (log || -                    \n                                                                                                  2                                        i log |i|)\n                                                  i=1                                                                          i=1\n\n\nusing this identity in (3), and setting 1 = t, 2 = t-1, we finally get the following\n                                                                           i                      j\nexpression for the transition probability\n\n                      log P (zt = i|zt-1 = j) =                                                                                                           (6)\n\n                                           KH\n\n                      -N  max                    H\n                                                   mJ SG\n                                                               {\n                                      R                              q :R(q)=m} ({G(x|q , q ) : R(q) = m})\n                                           m=1\n\n\n3.2    Constrained optimization and algorithms\n\nConsider first the case in which a one-to-one correspondence is assumed between clusters\nin two consecutive frames, and hence the number of Gaussian components K is constant\nover all time frames. In this case, a mapping R is allowed iff it maps to each hidden\nsource i a single Gaussian from mixture 1 and a single Gaussian from 2. Denoting\nthe Gaussians matched to hidden i by R-1\n                                                                      1 (i), R-1\n                                                                                     2 (i), the transition score (6) takes the\n                                 K\nform of -N  max                      S(R-1\n                                            1 (i), R-1\n                                                              2 (i)). Such an optimization of a pairwise matching\n                       R     i=1\nscore can be seen as a search for a maximal perfect matching in a weighted bipartite graph.\nThe nodes of the graph are the Gaussian components of 1, 2 and the edges' weights are\n\n\f\ngiven by the scores S(a, b). The global optimum of this problem can be efficiently found\nusing the Hungarian algorithm [5] in O(n3), which is unproblematic in our case.\n\nThe one-to-one correspondence assumption is too strong for many data sets in the spike\nsorting application, as it ignores the phenomena of splits and merges of clusters. We wish\nto allow such phenomena, but nevertheless enforce strong (though not perfect) demands of\ncorrespondence between the Gaussians in two consecutive frames. In order to achieve such\nbalance, we place the following constraints on the allowed partitions R:\n\n\n        1. Each cluster of R should contain exactly one Gaussian from 1 or exactly one\n           Gaussian from 2. Hence assignment of different Gaussians from the same mix-\n           ture to the same hidden source is limited only for cases of a split or a merge.\n\n        2. The label entropy of the partition R should satisfy\n\n                                                  N 1                              N 2\n                    H(H\n                           1 , .., H                    H(1                             H(2\n                                         KH )  N           1, .., 1\n                                                                              K1 ) + N          1, .., 2\n                                                                                                       K2 )    (7)\n\n\nIntuitively, the second constraint limits the allowed partitions to ones which are not richer\nthan the visible partition, i.e., do not have much more clusters. Note that the most detailed\npartition (the partition into singletons) has a label entropy given by the r.h.s of inequality\n(7) plus H( N1 , N2 ), which is one bit for N 1 = N 2. This extra bit is the price of using the\n               N     N\nconcatenated 'rich' mixture, so we look for mixtures which do not pay such an extra price.\n\nThe optimization for this family of R does not seem to have an efficient global optimiza-\ntion technique, and thus we resort to a greedy procedure. Specifically, we use a bottom\nup agglomerative algorithm. We start from the most detailed partition (each Gaussian is a\nsingleton) and merge two clusters of the partition at each round. Only merges that com-\nply with the first constraint are considered. At each round we look for a merge which\nincurs a minimal loss to the accumulated Jensen Shannon score (6) and a maximal loss to\nthe mixture label entropy. For two Gaussian clusters (1, 1, 1), (2, 2, 2) these two\nquantities are given by\n\n                     log JS = -N (w1 + w2)JSG                                (\n                                                                             G(x|1, 1), G(x|2, 2))        (8)\n                                                                      1,2\n\n                    H = -N (w1 + w2)H(1, 2)\n\nwhere 1, 2 are          w1      ,     w2      and w\n                     w                              i are as in (4). We choose at each round the merge\n                          1+w2         w1+w2\nwhich minimizes the ratio between these two quantities. The algorithm terminates when\nthe accumulated label entropy reduction is bigger than H( N1 , N2 ) or when no allowed\n                                                                                     N     N\nmerges exist anymore. In the second case, it may happen that the partition R found by the\nalgorithm violates the constraint (7). We nevertheless compute the score based on the R\nfound, since this partition obeys the first constraint and usually is not far from satisfying\nthe second.\n\n\n4      Empirical results\n\n4.1    Experimental design and data acquisition\n\nNeural data were acquired from the dorsal and ventral pre-motor (PMd, PMv) cortices of\ntwo Macaque monkeys performing a prehension (reaching and grasping) task. At the be-\nginning of each trial, an object was presented in one of six locations. Following a delay\nperiod, a Go signal prompted the monkey to reach for, grasp, and hold the target object.\nA recording session typically lasted 2 hours during which monkeys completed 600 tri-\nals. During each session 16 independently-movable glass-plated tungsten micro-electrodes\n\n\f\n        f 1 score        Number of frames (%)                       Number of electrodes (%)\n              2\n        0.9-1.0           3386 (75%)                                     13    (30%)\n        0.8-0.9            860 (19%)                                     10    (23%)\n        0.7-0.8            243               (5%)                        10    (23%)\n        0.6-0.7             55               (1%)                        11    (25%)\n\n\nTable 1: Match scores between manual and automatic clustering. The rows list the appearance\nfrequencies of different f 1 scores.\n                            2\n\n\nwere inserted through the dura, 8 into each area. Signals from these electrodes were am-\nplified (10K), bandpass filtered (5-6000Hz), sampled (25 kHz), stored on disk (Alpha-Map\n5.4, Alpha-Omega Eng.), and subjected to 3-stage preprocessing. (1) Line influences were\ncleaned by pulse-triggered averaging: the signal following a pulse was averaged over many\npulses and subtracted from the original in an adaptive manner. (2) Spikes were detected\nby a modified second derivative algorithm (7 samples backwards and 11 forward), accen-\ntuating spiky features; segments that crossed an adaptive threshold were identified. Within\neach segment, a potential spike's peak was defined as the time of the maximal derivative.\nIf a sharper spike was not encountered within 1.2ms, 64 samples (10 before peak and 53\nafter) were registered. (3) Waveforms were re-aligned s.t. each started at the point of max-\nimal fit with 2 library PCs (accounting, on average, for 82% and 11% of the variance, [1]).\nAligned waveforms were projected onto the PCA basis to arrive at two coefficients.\n\n\n4.2    Results and validation\n\n\n                                                                    1                               1\n\n                                       1             3                          3 4                           3              1\n         2\n        4                        2\n                            4                             2\n                                                          4                            2                           5\n\n\n\n                                                                         1                               1\n\n\n                                                                                4                             4                   1\n                   1                                 4                               2\n         3 \n        2                              1 \n                                 3 \n                            2                             23                              3                        5\n\n        (0.80)                   (0.77)                        (0.98)                (0.95)                        (0.98)\n\nFigure 2: Frames 3,12,24,34, and 47 from a 68-frames data set. Each frame contains 1000 spikes,\nplotted here (with random number assignments) according to their first two PCs. In this data one\ncluster moves constantly, another splits into distinguished clusters, and at the end two clusters are\nmerged. The top and bottom rows show manual and automatic clustering solutions respectively.\nNotice that during the split process of the bottom left area some ambiguous time frames exist in\nwhich 1,2, or 3 cluster descriptions are reasonable. This ambiguity can be resolved using global\nconsiderations of past and future time frames. By finding the MAP solution over all time frames, the\nalgorithm manages such considerations. The numbers below the images show the f 1 score of the\n                                                                                                                        2\nlocal match between the manual and the automatic clustering solutions (see text).\n\n\nWe tested the algorithm using recordings of 44 electrodes containing a total of 4544 time\nframes. Spike trains were manually clustered by a skilled user in the environment of Alpha-\nSort 4.0 (Alpha-Omega Eng.). The manual and automatic clustering results were compared\nusing a combined measure of precision P and recall R scores f 1 = 2P R . Figure 2 demon-\n                                                                                               2    R+P\nstrates the performance of the algorithm using a particularly non-stationary data set.\n\nStatistics on the match between manual and automated clustering are described in Table\n1. In order to understand the score's scale we note that random clustering (with the same\n\n\f\nlabel distribution as the manual clustering) gets an f 1 score of 0.5. The trivial clustering\n                                                                2\nwhich assigns all the points to the same label gets mean scores of 0.73 and 0.67 for single\nframe matching and whole electrode matching respectively. The scores of single frames\nare much higher than the full electrode scores, since the problem is much harder in the\nlatter case. A single wrong correspondence between two consecutive frames may reduce\nthe electrode's score dramatically, while being unnoticed by the single frame score. In most\ncases the algorithm gives reasonably evolving clustering, even when it disagrees with the\nmanual solution. Examples can be seen at the authors' web site1.\n\nLow matching scores between the manual and the automatic clustering may result from\ninherent ambiguity in the data. As a preliminary assessment of this hypothesis we obtained\na second, independent, manual clustering for the data set for which we got the lowest\nmatch scores. The matching scores between manual and automatic clustering are presented\nin Figure 3A.\n\n             A\n\n   0.68            0.62                                 3\n                                         3\n                                              2                      2\n                                                   1                      1\n H1      0.68        H2 \n\n            (A)                       (B1)                   (B2)              (B3)            (B4)\n\nFigure 3: (A) Comparison of our automatic clustering with 2 independent manual clustering solu-\ntions for our worst matched data points. Note that there is also a low match between the humans,\nforming a nearly equilateral triangle. (B) Functional validation of clustering results: (1) At the begin-\nning of a recording session, three clusters were identified. (2) 107 minutes later, some shifted their\nposition. They were tracked continuously. (3) The directional tuning of the top left cluster (number 3)\nduring the delay periods of the first 100 trials (dashed lines are 99% confidence limits). (4) Although\nthe cluster's position changed, its tuning curve's characteristics during the last 100 trials were similar.\n\n\nIn some cases, validity of the automatic clustering can be assessed by checking functional\nproperties associated with the underlying neurons. In Figure 3B we present such a valida-\ntion for a successfully tracked cluster.\n\nReferences\n\n[1] Abeles M., Goldstein M.H. Multispike train analysis. Proc IEEE 65, pp. 762-773, 1977.\n[2] Cover T., Thomas J. Elements of information theory. John wiley and sons, New York 1991.\n[3] Emondi A.A, Rebrik S.P, Kurgansky A.V, Miller K.D. Tracking neurons recorded from tetrodes\n    across time. J. of Neuroscience Methods, vol. 135:95-105, 2004.\n[4] Fee M., Mitra P., Kleinfeld D. Automatic sorting of multiple unit neuronal signals in the presence\n    of anisotropic and non-gaussian variability. J. of Neuroscience Methods, vol. 69:175-188, 1996.\n[5] Kuhn H.W. The Hungarian method for the assignment problem. Naval research logistics quar-\n    terly, pp. 83-87, 1995.\n[6] Lehmann E.L. Testing statistical hypotheses John Wiley and Sons, New York 1959.\n[7] Lewicki, M.S. A review of methods for spike sorting: the detection and classification of neural\n    action potentials. Network: Computation in Neural Systems. 9(4):R53-R78, 1998.\n[8] Lewicki's Bayesian spike sorter, sslib (ftp.etho.caltech.edu).\n[9] Penev P., Dimitrov A., Miller J. Characterization of and compensation for the non-stationarity\n    of spike shapes during physiological recordings. Neurocomputing 38-40:1695-1701, 2001.\n[10] Shoham S., Fellows M.R., Normann R.A. Robust, automatic spike sorting using mixtures of\n    multivariate t-distributions. J. of Neuroscience Methods vol. 127(2):111-122, 2003.\n[11] Snider R.K. , Bonds A.B. Classification of non-stationary neural signals. J. of Neuroscience\n    Methods, vol. 84(1-2):155-166, 1998.\n\n   1http://www.cs.huji.ac.il/~aharonbh,~adams\n\n\f\n", "award": [], "sourceid": 2559, "authors": [{"given_name": "Aharon", "family_name": "Bar-hillel", "institution": null}, {"given_name": "Adam", "family_name": "Spiro", "institution": null}, {"given_name": "Eran", "family_name": "Stark", "institution": null}]}