{"title": "Fast and accurate spike sorting of high-channel count probes with KiloSort", "book": "Advances in Neural Information Processing Systems", "page_first": 4448, "page_last": 4456, "abstract": "New silicon technology is enabling large-scale electrophysiological recordings in vivo from hundreds to thousands of channels. Interpreting these recordings requires scalable and accurate automated methods for spike sorting, which should minimize the time required for manual curation of the results. Here we introduce KiloSort, a new integrated spike sorting framework that uses template matching both during spike detection and during spike clustering. KiloSort models the electrical voltage as a sum of template waveforms triggered on the spike times, which allows overlapping spikes to be identified and resolved. Unlike previous algorithms that compress the data with PCA, KiloSort operates on the raw data which allows it to construct a more accurate model of the waveforms. Processing times are faster than in previous algorithms thanks to batch-based optimization on GPUs. We compare KiloSort to an established algorithm and show favorable performance, at much reduced processing times. A novel post-clustering merging step based on the continuity of the templates further reduced substantially the number of manual operations required on this data, for the neurons with near-zero error rates, paving the way for fully automated spike sorting of multichannel electrode recordings.", "full_text": "Fast and accurate spike sorting of high-channel count\n\nprobes with KiloSort\n\nMarius Pachitariu1, Nick Steinmetz1, Shabnam Kadir1\n\nMatteo Carandini1 and Kenneth Harris1\n1 UCL, UK {ucgtmpa, }@ucl.ac.uk\n\nAbstract\n\nNew silicon technology is enabling large-scale electrophysiological recordings in\nvivo from hundreds to thousands of channels. Interpreting these recordings re-\nquires scalable and accurate automated methods for spike sorting, which should\nminimize the time required for manual curation of the results. Here we introduce\nKiloSort, a new integrated spike sorting framework that uses template matching\nboth during spike detection and during spike clustering. KiloSort models the\nelectrical voltage as a sum of template waveforms triggered on the spike times,\nwhich allows overlapping spikes to be identi\ufb01ed and resolved. Unlike previous\nalgorithms that compress the data with PCA, KiloSort operates on the raw data\nwhich allows it to construct a more accurate model of the waveforms. Processing\ntimes are faster than in previous algorithms thanks to batch-based optimization\non GPUs. We compare KiloSort to an established algorithm and show favorable\nperformance, at much reduced processing times. A novel post-clustering merg-\ning step based on the continuity of the templates further reduced substantially the\nnumber of manual operations required on this data, for the neurons with near-\nzero error rates, paving the way for fully automated spike sorting of multichannel\nelectrode recordings.\n\n1\n\nIntroduction\n\nThe oldest and most reliable method for recording neural activity involves lowering an electrode\ninto the brain and recording the local electrical activity around the electrode tip. Action potentials\nof single neurons can then be observed as a stereotypical temporal de\ufb02ection of the voltage, called\na spike waveform. When multiple neurons close to the electrode \ufb01re action potentials, their spikes\nmust be identi\ufb01ed and assigned to the correct cell, based on the features of the recorded waveforms, a\nprocess known as spike sorting [1, 2, 3, 4, 5, 6, 7]. Spike sorting is substantially helped by the ability\nto simultaneously measure the voltage at multiple closely-space sites in the extracellular medium.\nIn this case, the recorded waveforms can be seen to have characteristic spatial shapes, determined\nby each cell\u2019s location and physiological characteristics. Together, the spatial and temporal shape of\nthe waveform provides all the information that can be used to assign a given spike to a cell.\nNew high-density electrodes, currently being tested, can record from several hundred closely-spaced\nrecording sites. Fast algorithms are necessary to quickly and accurately spike sort tens of millions\nof spikes coming from 100 to 1,000 cells, from recordings performed with such next-generation\nelectrodes in awake, behaving animals. Here we present a new algorithm which provides accurate\nspike sorting results, with run times that scale near-linearly with the number of recording channels.\nThe algorithm takes advantage of the computing capabilities of low-cost commercially available\ngraphics processing units (GPUs) to enable approximately realtime spike sorting from 384-channel\nprobes.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fFigure 1: Data from high-channel count recordings. a, High-pass \ufb01ltered and channel-whitened\ndata. Negative peaks are action potentials. b, Example mean waveforms, centered on their peaks. c,\nExample cross-correlation matrix across channels (before whitening).\n\n1.1 High-density electrophysiology and structured sources of noise\n\nNext-generation high-density neural probes allow the spikes of most neurons to be recorded on 5 to\n50 channels simultaneously (Fig. 1b). This provides a substantial amount of information per spike,\nbut because other neurons also \ufb01re on the same channels, a clustering algorithm is still required to\ndemix the signals and assign spikes to the correct cluster. Although the dense spacing of channels\nprovides a large amount of information for each spike, structured sources of noise can still negatively\nimpact the spike sorting problem. For example, the superimposed waveforms of neurons distant\nfrom the electrode (non-sortable units) add up and constitute a continuous random background (Fig.\n1a) against which the features of sortable spikes (Fig. 1b) must be distinguished.\nIn behaving\nanimals, another major confound is given by the movement of the electrode relative to the tissue,\nwhich creates an apparent inverse movement of the waveform along the channels of the probe.\n\n1.2 Previous work\n\nA traditional approach to spike sorting divides the problem into several stages. In the \ufb01rst stage,\nspikes are detected that have maximum amplitudes above a pre-de\ufb01ned threshold and these spikes\nare projected into a common low-dimensional space, typically obtained by PCA. In the second stage,\nthe spikes are clustered in this low-dimensional space using a variety of approaches, such as mixtures\nof Gaussians [8] or peak-density based approaches [9]. Some newer algorithms also include a third\nstage of template matching in which overlapping spikes are found in the raw data, that may have\nbeen missed in the \ufb01rst detection phase. Finally, a manual stage in a GUI is required for awake\nrecordings, to manually perform merge and split operations on the imperfect automated results.\nHere instead we combine these steps into a single model with a cost function based on the error of\nreconstructing the entire raw voltage dataset with the templates of a set of candidate neurons. We\nderive approximate inference and learning algorithms that can be successfully applied to very large\nchannel count data. This approach is related to a previous study [6],\nbut whereas the previous work scales is impractically slow for recordings with large numbers of\nchannels, our further modelling and algorithmic innovations have enabled the approach to be used\nquickly and accurately on real datasets. We improve the generative model of [] from a spiking\nprocess with continuous L1-penalized traces, to a model of spikes as discrete temporal events. The\napproach of [6] does not scale well to high channel count probes, as it requires the solution of a\ngeneric convex optimization problem in high dimensions.\n\n2\n\n500100015002000250030003500400045005000samples (25kHz)20406080100120channelsabccorrelation of channel noise20406080100120channels20406080100120channels\f2 Model formulation\n\nWe start with a generative model of the raw electrical voltage. Unlike previous approaches, we do\nnot pre-commit to the times of the spikes, nor do we project the waveforms of the spikes to a lower-\ndimensional PCA space. Both of these steps discard potentially useful information, as we show\nbelow.\n\n2.1 Pre-processing: common average referencing, temporal \ufb01ltering and spatial whitening\n\nTo remove low-frequency \ufb02uctuations, such as the local \ufb01eld potential, we high-pass \ufb01lter each\nchannel of the raw data at 300 Hz. To diminish the effect of artifacts shared across all channels, we\nsubtract at each timepoint the median of the signal across all recording sites, an operation known as\ncommon average referencing. This step is best performed after high-pass \ufb01ltering, because the LFP\nmagnitude is variable across channels but can be comparable in size to the artifacts.\nFinally, we whiten the data in space to remove noise that is correlated across channels (Fig. 1c).\nThe correlated noise is mostly due to far neurons with small spikes [10], which have a large spatial\nspread over the surface of the probe. Since there are very many such neurons at all recording\nsites, their noise averages out to have normal statistics with a stereotypical cross-correlation pattern\nacross channels (Fig. 1c). We distinguish the noise covariance from the covariance of the large,\nsortable spikes, by removing the times of putative spikes (detected with a threshold criterion) from\nthe calculation of the covariance matrix. We use a symmetrical whitening matrix that maintains the\nspatial structure of the data, known as ZCA, de\ufb01ned as WZCA = \u03a3\u22121/2 = ED\u22121/2ET , where E, D\nare the singular vectors and singular values of the estimated covariance matrix \u03a3. To regularize\nD, we add a small value to its diagonal. For very large channel counts, estimation of the full\ncovariance matrix \u03a3 is noisy, and we therefore compute the columns of the whitening matrix WZCA\nindependently for each channel, based on its nearest 32 channels.\n\n2.2 Modelling mean spike waveforms with SVD\n\nWhen single spike waveforms are recorded across a large number of channels, most channels will\nhave no signal and only noise. To prevent these channels from biasing the spike sorting problem, pre-\nvious approaches estimate a mask over those channels with suf\ufb01cient SNR to be included in a given\nspike. To further reduce noise and lower the dimensionality of the data for computational reasons,\nthe spikes are usually projected into a small number of temporal principal components per channel,\ntypically three. Here we suggest a different method for simultaneous spatial denoising/masking and\nfor lowering the dimensionality of spikes, which is based on the observation that mean spike wave-\nforms are very well explained by an SVD decomposition of their spatiotemporal waveform, with\nas few as three components (Fig. 2ab). However the spatial and temporal components of the SVD\nvary substantially from neuron to neuron, hence the same set of temporal basis functions per chan-\nnel cannot be used to model all neurons (Fig. 2ab), as typically done in standard approaches. We\nanalyzed the ability of the classical and proposed methods for dimensionality reduction, and found\nthat the proposed decomposition can reconstruct waveforms with \u223c5 times less residual variance\nthan the classical approach. This allows it to capture small but very distinguishable features of the\nspikes, which ultimately can help distinguish between neurons with very similar waveforms.\n\n2.3\n\nIntegrated template matching framework\n\nTo de\ufb01ne a generative model of the electrical recorded voltage, we take advantage of the approxi-\nmately linear additivity of electrical potentials from different sources in the extracellular medium.\nWe combine the spike times of all neurons into a Nspikes-dimensional vector s, such that the wave-\nforms start at time samples s + 1. We de\ufb01ne the cluster identity of spike k as \u03c3(k), taking values into\nthe set {1, 2, 3, ..., N}, where N is the total number of neurons. We de\ufb01ne the unit-norm waveform\nof neuron n as the matrix Kn = UnWn, of size number of channels by number of sample timepoints\nts (typically 61). The matrix Kn is de\ufb01ned by its low-dimensional deconstruction into three pairs\nof spatial and temporal basis functions, Un and Wn, such that the norm of UnWn is 1. The value of\n\n3\n\n\fFigure 2: Spike reconstruction from three private PCs. a, Four example average waveforms\n(black) with their respective reconstruction with three common temporal PCs/channel (blue) and\nwith reconstruction based on three spatiotemporal PCs (red), private to each spike. The red traces\nmostly overlap the black traces. b, Summary of residual waveform variance for all neurons in one\ndataset.\n\nthe electrical voltage at time t on channel i is de\ufb01ned by\nV (i, t) = V0(i, t) + N (0, \u0001)\n\nV0(i, t) =\n\ns(k)\u2265t\u2212ts(cid:88)\nxk \u223c N(cid:16)\n\nk,s(k) 0 is the amplitude of spike k. Spike amplitudes in the data can vary signi\ufb01cantly even\nfor spikes from the same neuron, due to factors like burst adaptation and drift. We modelled the\nmean and variance of the amplitude variability, with the variance of the distribution scaling with the\nsquare of the mean. \u03bb and \u0001 are hyperparameters that control the relative scaling with respect to each\nother of the reconstruction error and the prior on the amplitude. In practice we set these constant for\nall recordings.\nThis model formulation leads to the following cost function, which we minimize with respect to the\nspike times, cluster assignments, amplitudes and templates\n\n(cid:88)\n\n(cid:18) xk\n\n\u00b5\u03c3k\n\nk\n\n(cid:19)2\n\n\u2212 1\n\n(2)\n\nL(s, x, K, \u03c3) = (cid:107)V \u2212 V0(cid:107)2 +\n\n\u0001\n\u03bb\n\n3 Learning and inference in the model\n\nTo optimize the cost function, we alternate between \ufb01nding the best spike times s, cluster assign-\nments \u03c3 and amplitudes x (template matching) and optimizing the template K parametrization with\nrespect to s, \u03c3, x (template optimization). We initialize the templates using a simple scaled K-means\nclustering model, which we in turn initialize with prototypical spikes determined from the data.\nAfter the \ufb01nal spike times and amplitudes have been extracted, we run a \ufb01nal post-optimization\nmerging algorithm which \ufb01nds pairs of clusters whose spikes form a single continuous density.\nThese steps are separately described in detail below.\n\n3.1 Stacked initializations with scaled K-means and prototypical spikes\n\nThe density of spikes can vary substantially across the probe, depending on the location of each\nrecording site in the brain. Initialization of the optimization in a density-dependent way can thus\nassign more clusters to regions that require more, relieving the main optimization from the local-\nminima prone problem of moving templates from one part of the probe to another. For the initializa-\ntion, we thus start by detecting spikes using a threshold rule, and as we load more of the recording\nwe keep a running subset of prototypical spikes that are suf\ufb01ciently different from each other by an\nL2 norm criterion. We avoid overlapping spikes to be counted as prototypical spikes by enforcing\n\n4\n\nrawwaveformtemporalPC(common)spatiotemporalPC(private)101102103temporal PC (common)101102103spatiotemporal PC (private)residual waveform varianceab\fa minimum spatiotemporal peak isolation criterion on the detected spikes. Out of the prototypical\nspikes thus detected, we consider a \ufb01xed number N which had most matches to other spikes in the\nrecording.\nWe then used this initial set of spikes to initialize a scaled K-means algorithm. This algorithm uses\nthe same cost function described in equation 2, with spike times s \ufb01xed to those found by a threshold\ncriterion. Unlike standard K-means, each spike is allowed to have variable amplitude [11].\n\n3.2 Learning the templates via stochastic batch optimization\n\nThe main optimization re-estimates the spike times s at each iteration. The \u201conline\u201d nature of the\noptimization helps to accelerate the algorithm and to avoid local minima. For template optimization\nwe use a simple running average update rule\n\n\u03c3(k)=n(cid:88)\n\nk\u2208batch\n\nn (i, t0) \u2190 (1 \u2212 p)jnAold\nAnew\n\nn (i, t0) + (1 \u2212 (1 \u2212 p)jn )\n\nV (i, s(k) + t0),\n\n(3)\n\nwhere An is the running average waveform for cluster n, jn represents the number of spikes from\ncluster n identi\ufb01ed in the current batch, and the running average weighs past samples exponentially\nwith a forgetting constant p. Thus An approximately represents the average of the past p samples\nassigned to cluster n. Note that different clusters will therefore update their mean waveforms at\ndifferent rates, depending on their number of spikes per batch. Since \ufb01ring rates vary over two\norders of magnitude in typical recordings (from < 0.5 to 50 spikes/s), the adaptive running average\nprocedure allows clusters with rare spikes to nonetheless average enough of their spikes to generate\na smooth average template.\nLike most clustering algorithms, the model we developed here is prone to non-optimal local min-\nima. We used several techniques to ameliorate this problem. First, we annealed several parameters\nduring learning, to encourage exploration of the parameter space, which stems from the random-\nness induced by the stochastic batches. We annealed the forgetting constant p from a small value\n(typically 20) at the beginning of the optimization to a large value at the end (typically several hun-\ndred). We also anneal from small to large the ratio \u0001/\u03bb, which controls the relative impact of the\nreconstruction term and amplitude bias term in equation 2. Therefore, at the beginning of the opti-\nmization, spikes assigned to the same cluster are allowed to have more variable amplitudes. Finally,\nwe anneal the threshold for spike detection (see below), to allow a greater mismatch between spikes\nand the available templates at the beginning of the optimization. As optimization progresses, the\ntemplates become more precise, and spikes increase their projections onto their preferred template,\nthus allowing higher thresholds to separate them from the noise.\n\n3.3\n\nInferring spike times and amplitudes via template matching\n\nThe inference step of the proposed model attempts to \ufb01nd the best spike times, cluster assignments\nand amplitudes, given a set of templates {Kn}n with low rank-decompositions Kn = UnWn and\nmean amplitudes \u00b5n. The templates are obtained from the running average waveform An, after an\nSVD decomposition to give An \u223c \u00b5nKn = \u00b5nUnWn, with (cid:107)UnWn(cid:107) = 1, with Un orthonormal\nand Wn orthogonal. The primary roles of the low-rank representation are to guarantee fast inferences\nand to regularize the waveform model.\nWe adopt a parallelized matching pursuit algorithm to iteratively estimate the best \ufb01tting templates\nand subtract them off from the raw data. In standard matching pursuit, the best \ufb01tting template is\nidenti\ufb01ed over the entire batch, its best reconstruction is subtracted from the raw data, and then the\nnext best \ufb01tting template is identi\ufb01ed, iteratively until the amount of explained variance falls below a\nthreshold, which constitutes the stopping criterion. To \ufb01nd the best \ufb01tting template, we estimate for\neach time t and each template n, the decrease in the cost function obtained by introducing template n\nat location t, with the best-\ufb01tting amplitude x. This is equivalent to minimizing a standard quadratic\nfunction of the form ax2 \u2212 2bx + c over the scalar variable x, with a, \u22122b and c derived as the\ncoef\ufb01cients of x2, x and 1 from equation 2\n\na = 1 +\n\n\u0001\n\u03bb\u00b52\nn\n\n; b = (Kn (cid:63) V )(t) +\n\n\u0001\n\n\u03bb\u00b5n\n\n; c = \u03bb\u00b52\nn,\n\n(4)\n\n5\n\n\fwhere (cid:63) represents the operation of temporal \ufb01ltering (convolution with the time-reversed \ufb01lter).\nHere the \ufb01ltering is understood as channel-wise \ufb01ltering followed by a summation of all \ufb01ltered\ntraces, which computes the dot product between the template and the voltage snippet starting at\neach timepoint t. The decrease in cost dC(n, t) that would occur if a spike of neuron n were added\nat time t, and the best x are given by\n\nxbest =\n\ndC(n, t) =\n\nb\na\nb2\na\n\n\u2212 c\n\n(5)\n\nComputing b requires \ufb01ltering the data V with all the templates Kn, which amounts to a very\nlarge number of operations, particularly when the data has many channels. However, our low-\nrank decomposition allows us to reduce the number of operations by a factor of Nchan/Nrank, where\nNchan is the number of channels (typically > 100) and Nrank is the rank of the decomposed template\n(typically 3). This follows from the observation that\n\nV (cid:63) Kn = V (cid:63) (UnWn)\n\n(cid:88)\n\n=\n\n(Un(:, j)T \u00b7 V ) (cid:63) Wn(j, :),\n\n(6)\n\nj\n\nwhere Un(:, j) is understood as the j-th column of matrix Un and similarly Wn(j, :) is the j-th row\nn V and\nof Wn. We have thus replaced the matrix convolution V (cid:63) Kn with a matrix product U T\nNrank one-dimensional convolutions. We implemented the matrix products and \ufb01ltering operations\nef\ufb01ciently using consumer GPU hardware. Iterative updates of dC after template subtraction can be\nobtained quickly using pre-computed cross-template products, as typically done in matching pursuit\n[]. The iterative optimization stops when a pre-de\ufb01ned threshold criterion on dC is larger than all\nelements of dC.\nDue to its greedy nature, matching pursuit can have bad performance at reducing the cost function\nin certain problems. It is, however, appropriate to our problem, because spikes are very rare events,\nand overlaps are typically small, particularly in high-dimensions over the entire probe. Furthermore,\ntypical datasets contain millions of spikes and only the simple form of matching pursuit can be ef\ufb01-\nciently employed. We implemented the simple matching pursuit formulation ef\ufb01ciently on consumer\nGPU hardware. Consider the cost improvement matrix dC(n, t). When the largest element of this\nmatrix is found and the template subtracted, no values of dC need to change except those very close\nin time to the \ufb01tted template (ts samples away). Thus, instead of \ufb01nding the global maximum of dC,\nwe can \ufb01nd local maxima above the threshold criterion, and impose a minimal distance (ts) between\nsuch local maxima. The identi\ufb01ed spikes can then be processed in parallel without affecting each\nother\u2019s representations.\nWe found it unnecessary to iterate the (relatively expensive) parallel matching pursuit algorithm\nduring the optimization of the templates. We obtained similar templates when we aborted the paral-\nlel matching pursuit after the \ufb01rst parallel detection step, without detecting any further overlapping\nspikes. To improve the ef\ufb01ciency of the optimization we therefore only apply the full parallel tem-\nplate matching algorithm on the \ufb01nal pass, thus obtaining the overlapping spikes.\n\n4 Benchmarks\n\nFirst, we timed the algorithm on several large scale datasets. The average run times for 32, 128\nand 384 channel recordings were 10, 29 and 140 minutes respectively, on a single GPU-equipped\nworkstation. These were signi\ufb01cant improvements over an established framework called KlustaK-\nwik [8], which needed approximately 480 and 10-20 thousand minutes when ran on 32 and 128\nchannel datasets on a standard CPU cluster (we did not attempt to run KlustaKwik on 384 channel\nrecordings).\nThe signi\ufb01cant improvements in speed could have come at the expense of accuracy losses. We\ncompared Kilosort and Klustakwik on 32 and 128 channel recordings, using a technique known as\n\u201chybrid ground truth\u201d [8]. To create this data, we \ufb01rst selected all the clusters from a recording\nthat had been previously analysed with KlustaKwik, and curated by a human expert. For each\n\n6\n\n\fFigure 3: Hybrid ground truth performance of proposed (KiloSort) versus established (Klus-\ntaKwik) algorithm. a, Distribution of false positive rates. b, Distribution of misses. c, Total score.\ndef, Same as (abc) after greedy best possible merges. g, Number of merges required to reach best\nscore.\n\ncluster, we extracted its raw waveform and denoised it with an SVD decomposition (keeping the\ntop 7 dimensions of variability). We then addded the de-noised waveforms at a different but nearby\nspatial location on the probe with a constant channel shift, randomly chosen for each neuron. To\navoid increasing the spike density at any location on the probe, we also subtracted off the denoised\nwaveform from its original location.\nFinally, we ran both KiloSort and KlustaKik on 16 instantiations of the hybrid ground truth. We\nmatched ground truth cells with clusters identi\ufb01ed by the algorithms to \ufb01nd the maximizer of the\nscore = 1\u2212false positive rate\u2212miss rate, where the false positive rate was normalized by the number\nof spikes in the test cluster, and the miss rate was normalized by the number of spikes in the ground\ntruth cluster. Values close to 1 indicate well-sorted units. Both KiloSort and KlustaKwik performed\nwell, with KiloSort producing signi\ufb01cantly more cells with well-isolated clusters (53% vs 35% units\nwith scores above 0.9).\nWe also estimated the best achievable score following manual sorting of the automated results. To\nminimize human operator work, algorithms are typically biased towards producing more clusters\nthan can be expected in the recording, because manually merging an over-split cluster is easier,\nless time-consuming and, less error-prone than splitting an over-merged cluster (the latter requires\nchoosing a carefully de\ufb01ned separation surface). Both KiloSort and KlustaKwik had such a bias,\nproducing between two and four times more clusters than the expected number of neurons.\nTo estimate the best achievable score after operator merges, we took advantage of the ground truth\ndata, and automatically merged together candidate clusters so as to greedily maximize their score.\nFinal best results as well as the required number of matches are shown in Figure 3defg (KiloSort\nvs KlustaKwik 69% vs 60% units with scores above 0.9). The relative performance improvement\nof KiloSort is clearly driven by fewer misses (Fig 3e), which are likely due to its ability to detect\noverlapping spikes.\n\n5 Extension: post-hoc template merging\n\nWe found that we can further reduce human operator work by performing most of the merges in\nan automated way. The most common oversplit clusters show remarkable continuity of their spike\ndensities (Fig. 4). In other words, no discrimination boundary can be identi\ufb01ed orthogonal to which\nthe oversplit cluster appears bimodal. Instead, these clusters arise as a consequence of the algorithm\npartitioning clusters with large variance into multiple templates, so as to better explain their total\nvariance. In KiloSort, we can exploit the fact that the decision boundaries between any two clusters\n\n7\n\n200400600800sorted GT neurons00.20.40.60.8false positive ratesfalse positive ratesKlustaKwikKilosort200400600800sorted GT neurons00.20.40.60.8miss ratesmiss ratesKlustaKwikKilosort200400600sorted GT neurons00.20.40.60.81total scoretotal scoreKlustaKwikKilosort200400600800sorted GT neurons00.20.40.60.8false positive ratesfalse positive ratesKlustaKwikKilosort200400600800sorted GT neurons00.20.40.60.8miss ratesmiss ratesKlustaKwikKilosort200400600sorted GT neurons00.20.40.60.81total scoretotal scoreKlustaKwikKilosort200400600800sorted GT neurons02468number of mergesnumber of merges for best scoreKlustaKwikKilosortAfter best mergesabcdefg\fFigure 4: PC and feature-space projections of two pairs of clusters that should be merged. ae,\nMean waveforms of merge candidates. bf, Spike projections into the top PCs of each candidate\ncluster. cg, Template feature projections for the templates corresponding to the candidate clusters.\ndh, Discriminant of the feature projections from (cg) (see main text for exact formula).\n\nare in fact planes (which we show below). If two clusters belong to the same neuron, their one-\ndimensional projections in the space orthogonal to the decision boundary will show a continuous\ndistribution (Fig. 4cd and 4gh), and the clusters can be merged. We use this idea to sequentially\nmerge any two clusters with continuous distributions in their 2D feature spaces. Note that the best\nprincipal components for each cluster\u2019s main channel are much less indicative of a potential merge\n(Fig 4b and 4f).\nTo see why the decision boundaries in KiloSort are linear, consider two templates Ki and Kj and\nconsider that we have arrived at the instance of template matching where a spike k needs to be\nassigned to one of these two templates. Their respective cost function improvements are dC(i, t) =\na2\n, using the convention from equations 4. The decision of assigning spike k to\ni\nbi\none or the other of these templates is then equivalent to determining the sign of dC(i, t) \u2212 dC(j, t),\nwhich is a linear discriminant of the feature projections\n\n, and dC(j, t) =\n\na2\nj\nbj\n\nsign(dC(i, t) \u2212 dC(j, t)) = sign(ai/b\n\n1\n2\n\ni \u2212 aj/b\n\n1\n2\n\nj )\n\n(7)\n\nwhere bi and bj do not depend on the data and ai,j are linear functions of the raw voltage, hence the\ndecision boundary between any two templates is linear (Fig. 4).\n\n6 Discussion\n\nWe have demonstrated here a new framework for spike sorting of high-channel count electrophys-\niology data, which offers substantial accuracy and speed improvements over previous frameworks,\nwhile also reducing the amount of manual work required to isolate single units. KiloSort is cur-\nrently enabling spike sorting of up to 1,000 neurons recorded simultaneously in awake animals and\nwill help to enable the next generation of large-scale neuroscience. The code is available online at\nhttps://github.com/cortex-lab/KiloSort.\n\n8\n\nabcdefgh\fReferences\n[1] Rodrigo Quian Quiroga. Spike sorting. Current Biology, 22(2):R45\u2013R46, 2012.\n\n[2] Gaute T Einevoll, Felix Franke, Espen Hagen, Christophe Pouzat, and Kenneth D Harris. Towards reliable\nspike-train recordings from thousands of neurons with multielectrodes. Current opinion in neurobiology,\n22(1):11\u201317, 2012.\n\n[3] Daniel N Hill, Samar B Mehta, and David Kleinfeld. Quality metrics to accompany spike sorting of\n\nextracellular signals. The Journal of Neuroscience, 31(24):8699\u20138705, 2011.\n\n[4] Kenneth D Harris, Darrell A Henze, Jozsef Csicsvari, Hajime Hirase, and Gy\u00a8orgy Buzs\u00b4aki. Accuracy\nof tetrode spike separation as determined by simultaneous intracellular and extracellular measurements.\nJournal of neurophysiology, 84(1):401\u2013414, 2000.\n\n[5] Jonathan W Pillow, Jonathon Shlens, EJ Chichilnisky, and Eero P Simoncelli. A model-based spike\nsorting algorithm for removing correlation artifacts in multi-neuron recordings. PloS one, 8(5):e62123,\n2013.\n\n[6] Chaitanya Ekanadham, Daniel Tranchina, and Eero P Simoncelli. A uni\ufb01ed framework and method for\n\nautomatic neural spike identi\ufb01cation. Journal of neuroscience methods, 222:47\u201355, 2014.\n\n[7] Felix Franke, Robert Pr\u00a8opper, Henrik Alle, Philipp Meier, J\u00a8org RP Geiger, Klaus Obermayer, and\nMatthias HJ Munk. Spike sorting of synchronous spikes from local neuron ensembles. Journal of neuro-\nphysiology, 114(4):2535\u20132549, 2015.\n\n[8] C Rossant, SN Kadir, DFM Goodman, J Schulman, MLD Hunter, AB Saleem, A Grosmark, M Belluscio,\nGH Den\ufb01eld, AS Ecker, AS Tolias, S Solomon, G Buzsaki, M Carandini, and KD Harris. Spike sorting\nfor large, dense electrode arrays. Nature Neuroscience, 19:634\u2013641, 2016.\n\n[9] Alex Rodriguez and Alessandro Laio. Clustering by fast search and \ufb01nd of density peaks. Science,\n\n344(6191):1492\u20131496, 2014.\n\n[10] Joana P Neto, Gonc\u00b8alo Lopes, Jo\u02dcao Fraz\u02dcao, Joana Nogueira, Pedro Lacerda, Pedro Bai\u02dcao, Arno Aarts,\nAlexandru Andrei, Silke Musa, Elvira Fortunato, et al. Validating silicon polytrodes with paired juxtacel-\nlular recordings: method and dataset. bioRxiv, page 037937, 2016.\n\n[11] Adam Coates, Andrew Y Ng, and Honglak Lee. An analysis of single-layer networks in unsupervised\nfeature learning. In International conference on arti\ufb01cial intelligence and statistics, pages 215\u2013223, 2011.\n\n9\n\n\f", "award": [], "sourceid": 2199, "authors": [{"given_name": "Marius", "family_name": "Pachitariu", "institution": "Gatsby Unit, UCL"}, {"given_name": "Nicholas", "family_name": "Steinmetz", "institution": "UCL"}, {"given_name": "Shabnam", "family_name": "Kadir", "institution": "University College London"}, {"given_name": "Matteo", "family_name": "Carandini", "institution": "UCL"}, {"given_name": "Kenneth", "family_name": "Harris", "institution": "UCL"}]}