{"title": "YASS: Yet Another Spike Sorter", "book": "Advances in Neural Information Processing Systems", "page_first": 4002, "page_last": 4012, "abstract": "Spike sorting is a critical first step in extracting neural signals from large-scale electrophysiological data. This manuscript describes an efficient, reliable pipeline for spike sorting on dense multi-electrode arrays (MEAs), where neural signals appear across many electrodes and spike sorting currently represents a major computational bottleneck. We present several new techniques that make dense MEA spike sorting more robust and scalable. Our pipeline is based on an efficient multi-stage ''triage-then-cluster-then-pursuit'' approach that initially extracts only clean, high-quality waveforms from the electrophysiological time series by temporarily skipping noisy or ''collided'' events (representing two neurons firing synchronously). This is accomplished by developing a neural network detection method followed by efficient outlier triaging. The clean waveforms are then used to infer the set of neural spike waveform templates through nonparametric Bayesian clustering. Our clustering approach adapts a ''coreset'' approach for data reduction and uses efficient inference methods in a Dirichlet process mixture model framework to dramatically improve the scalability and reliability of the entire pipeline. The ''triaged'' waveforms are then finally recovered with matching-pursuit deconvolution techniques. The proposed methods improve on the state-of-the-art in terms of accuracy and stability on both real and biophysically-realistic simulated MEA data. Furthermore, the proposed pipeline is efficient, learning templates and clustering faster than real-time for a 500-electrode dataset, largely on a single CPU core.", "full_text": "YASS: Yet Another Spike Sorter\n\nJinHyung Lee1, David Carlson2, Hooshmand Shokri1, Weichi Yao1, Georges Goetz3, Espen Hagen4,\n\nEleanor Batty1, EJ Chichilnisky3, Gaute Einevoll5, and Liam Paninski1\n\n1Columbia University, 2Duke University, 3Stanford University, 4University of Oslo, 5Norwegian\n\nUniversity of Life Sciences\n\nAbstract\n\nSpike sorting is a critical \ufb01rst step in extracting neural signals from large-scale\nelectrophysiological data. This manuscript describes an ef\ufb01cient, reliable pipeline\nfor spike sorting on dense multi-electrode arrays (MEAs), where neural signals\nappear across many electrodes and spike sorting currently represents a major\ncomputational bottleneck. We present several new techniques that make dense MEA\nspike sorting more robust and scalable. Our pipeline is based on an ef\ufb01cient multi-\nstage \u201ctriage-then-cluster-then-pursuit\u201d approach that initially extracts only clean,\nhigh-quality waveforms from the electrophysiological time series by temporarily\nskipping noisy or \u201ccollided\u201d events (representing two neurons \ufb01ring synchronously).\nThis is accomplished by developing a neural network detection method followed\nby ef\ufb01cient outlier triaging. The clean waveforms are then used to infer the set\nof neural spike waveform templates through nonparametric Bayesian clustering.\nOur clustering approach adapts a \u201ccoreset\u201d approach for data reduction and uses\nef\ufb01cient inference methods in a Dirichlet process mixture model framework to\ndramatically improve the scalability and reliability of the entire pipeline. The\n\u201ctriaged\u201d waveforms are then \ufb01nally recovered with matching-pursuit deconvolution\ntechniques. The proposed methods improve on the state-of-the-art in terms of\naccuracy and stability on both real and biophysically-realistic simulated MEA data.\nFurthermore, the proposed pipeline is ef\ufb01cient, learning templates and clustering\nfaster than real-time for a ' 500-electrode dataset, largely on a single CPU core.\n\n1\n\nIntroduction\n\nThe analysis of large-scale multineuronal spike train data is crucial for current and future neuroscience\nresearch. These analyses are predicated on the existence of reliable and reproducible methods that\nfeasibly scale to the increasing rate of data acquisition. A standard approach for collecting these data\nis to use dense multi-electrode array (MEA) recordings followed by \u201cspike sorting\u201d algorithms to\nturn the obtained raw electrical signals into spike trains.\nA crucial consideration going forward is the ability to scale to massive datasets: MEAs currently scale\nup to the order of 104 electrodes, but efforts are underway to increase this number to 106 electrodes1.\nAt this scale any manual processing of the obtained data is infeasible. Therefore, automatic spike\nsorting for dense MEAs has enjoyed signi\ufb01cant recent attention [15, 9, 51, 24, 36, 20, 33, 12]. Despite\nthese efforts, spike sorting remains the major computational bottleneck in the scienti\ufb01c pipeline when\nusing dense MEAs, due both to the high computational cost of the algorithms and the human time\nspent on manual postprocessing.\nTo accelerate progress on this critical scienti\ufb01c problem, our proposed methodology is guided by\nseveral main principles. First, robustness is critical, since hand-tuning and post-processing is not\n\n1DARPA Neural Engineering System Design program BAA-16-09\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fAlgorithm 1 Pseudocode for the complete proposed pipeline.\n\nInput: time-series of electrophysiological data V 2 RT\u21e5C, locations 2 R3\n[waveforms, timestamps] Detection(V) % (Section 2.2)\n% \u201cTriage\u201d noisy waveforms and collisions (Section 2.4):\n[cleanWaveforms, cleanTimestamps] Triage(waveforms, timestamps)\n% Build a set of representative waveforms and summary statistics (Section 2.5)\n[representativeWaveforms, suf\ufb01cientStatistics] coresetConstruction(cleanWaveforms)\n% DP-GMM clustering via divide-and-conquer (Sections 2.6 and 2.7)\n[{representativeWaveformsi, suf\ufb01cientStatisticsi}i=1,...]\nfor i=1,. . . do % Run ef\ufb01cient inference for the DP-GMM\n\n splitIntoSpatialGroups(representativeWaveforms, suf\ufb01cientStatistics, locations)\n[clusterAssignmentsi] SplitMergeDPMM(representativeWaveformsi, suf\ufb01cientStatisticsi)\n\nend for\n% Merge spatial neighborhoods and similar templates\n[allClusterAssignments, templates] \n% Pursuit stage to recover collision and noisy waveforms\n[\ufb01nalTimestamps, \ufb01nalClusterAssignments] deconvolution(templates)\nreturn [\ufb01nalTimestamps, \ufb01nalClusterAssignments]\n\nmergeTemplates({clusterAssignmentsi}i=1,...,{representativeWaveformsi}i=1,..., locations)\n\nfeasible at scale. Second, scalability is key. To feasibly process the oncoming data deluge, we use\nef\ufb01cient data summarizations wherever possible and focus computational power on the \u201chard cases,\u201d\nusing cheap fast methods to handle easy cases. Next, the pipeline should be modular. Each stage in\nthe pipeline has many potential feasible solutions, and the pipeline is improved by rapidly iterating\nand updating each stage as methodology develops further. Finally, prior information is leveraged\nas much as possible; we share information across neurons, electrodes, and experiments in order to\nextract information from the MEA datastream as ef\ufb01ciently as possible.\nWe will \ufb01rst outline the methodology that forms the core of our pipeline in Section 2.1, and then we\ndemonstrate the improvements in performance on simulated data and a 512-electrode recording in\nSection 3. Further supporting results appear in the appendix.\n\n2 Methods\n\n2.1 Overview\n\nThe inputs to the pipeline are the band-pass \ufb01ltered voltage recordings from all C electrodes and\ntheir spatial layout, and the end result of the pipeline is the set of K (where K is determined by\nthe algorithm) binary neural spike trains, where a \u201c1\u201d in the time series re\ufb02ects a neural action\npotential from the kth neuron at the corresponding time point. The voltage signals are spatially\nwhitened prior to processing and are modeled as the superposition of action potentials and background\nGaussian noise [12]. Spatial whitening is performed by removing potential spikes using amplitude\nthresholding and estimating the whitening \ufb01lter under a Gaussianity assumption. Succinctly, the\npipeline is a multistage procedure as follows: (i) detecting waveforms and extracting features, (ii)\nscreening outliers and collided waveforms, (iii) clustering, and (iv) inferring missed and collided\nspikes. Pseudocode for the \ufb02ow of the pipeline can be found in Algorithm 1. A brief overview is\nbelow, followed by additional details.\nOur overall strategy can be considered a hybrid of a matching pursuit approach (similar to that\nemployed by [36]) and a classical clustering approach, generalized and adapted to the large dense\nMEA setting. Our guiding philosophy is that it is essential to properly handle \u201ccollisions\u201d between\nsimultaneous spikes [37, 12], since collisions distort the extracted feature space and hinder clustering.\nA typical approach to this issue utilizes matching pursuit methods (or other sparse deconvolution\nstrategies), but these methods are relatively computationally expensive compared to clustering\nprimitives. This led us to a \u201ctriage-then-cluster-then-pursuit\u201d approach: we \u201ctriage\u201d collided or overly\nnoisy waveforms, putting them aside during the feature extraction and clustering stages, and later\nrecover these spikes during a \ufb01nal \u201cpursuit\u201d or deconvolution stage. The triaging begins during\nthe detection stage in Section 2.2, where we develop a neural network based detection method that\n\n2\n\n\fsigni\ufb01cantly improves sensitivity and selectivity. For example, on a simulated 30 electrode dataset\nwith low SNR, the new approach reduces false positives and collisions by 90% for the same rate of\ntrue positives. Furthermore, the neural network is signi\ufb01cantly better at the alignment of signals,\nwhich improves the feature space and signal-to-noise power. The detected waveforms then are\nprojected to a feature space and restricted to a local spatial subset of electrodes as in [24] in Section\n2.3. Next, in Section 2.4 an outlier detection method further \u201ctriages\u201d the detected waveforms and\nreduces false positives and collisions by an additional 70% while removing only a small percentage\nof real detections. All of these steps are achievable in nearly linear time. Simulations demonstrate\nthat this large reduction in false positives and collisions dramatically improves accuracy and stability.\nFollowing the above steps, the remaining waveforms are partitioned into distinct neurons via cluster-\ning. Our clustering framework is based on the Dirichlet Process Gaussian Mixture Model (DP-GMM)\napproach [48, 9], and we modify existing inference techniques to improve scalability and performance.\nSuccinctly, each neuron is represented by a distinct Gaussian distribution in the feature space. Directly\ncalculating the clustering on all of the channels and all of the waveforms is computationally infeasible.\nInstead, the inference \ufb01rst utilizes the spatial locality via masking [24] from Section 2.3. Second, the\ninference procedure operates on a coreset of representative points [13] and the resulting approximate\nsuf\ufb01cient statistics are used in lieu of the full dataset (Section 2.5). Remarkably, we can reduce a\ndataset with 100k points to a coreset of ' 10k points with trivial accuracy loss. Next, split and merge\nmethods are adapted to ef\ufb01ciently explore the clustering space [21, 24] in Section 2.6. Using these\nmodern scalable inference techniques is crucial for robustness because they empirically \ufb01nd much\nmore sensible and accurate optima and permit Bayesian characterization of posterior uncertainty.\nFor very large arrays, instead of operating on all channels simultaneously, each distinct spatial\nneighborhood is processed by a separate clustering algorithm that may be run in parallel. This\nparallelization is crucial for processing very large arrays because it allows greater utilization of\ncomputer resources (or multiple machines). It also helps improve the ef\ufb01cacy of the split-merge\ninference by limiting the search space. This divide-and-conquer approach and the post-processing\nto stitch the results together is discussed in Section 2.7. The computational time required for the\nclustering algorithm scales nearly linearly with the number of electrodes C and the experiment time.\nAfter the clustering stage is completed, the means of clusters are used as templates and collided and\nmissed spikes are inferred using the deconvolution (or \u201cpursuit\u201d [37]) algorithm from Kilosort [36],\nwhich recovers the \ufb01nal set of binary spike trains. We limit this computationally expensive approach\nonly to sections of the data that are not well handled by the rest of the pipeline, and use the faster\nclustering approach to \ufb01ll in the well-explained (i.e. easy) sections.\nWe note \ufb01nally that when memory is limited compared to the size of the dataset, the preprocessing,\nspike detection, and \ufb01nal deconvolution steps are performed on temporal minibatches of data; the\nother stages operate on signi\ufb01cantly reduced data representations, so memory management issues\ntypically do not arise here. See Section B.4 for further details on memory management.\n\n2.2 Detection\n\nThe detection stage extracts temporal and spatial windows around action potentials from the noisy\nraw electrophysiological signal V to use as inputs in the following clustering stage. The number\nof clean waveform detections (true positives) should be maximized for a given level of detected\ncollision and noise events (false positives). Because collisions corrupt feature spaces [37, 12] and\nwill simply be recovered during pursuit stage, they are not included as true positives at this stage. In\ncontrast to the plethora of prior work on hand-designed detection rules (detailed in Section C.1), we\nuse a data-driven approach with neural networks to dramatically improve both detection ef\ufb01cacy and\nalignment quality. The neural network is trained to return only clean waveforms on real data, not\ncollisions, so it de facto performs a preliminary triage prior to the main triage stage in Section 2.4.\nThe crux of the data-driven approach is the availability of prior training data. We are targeting the\ntypical case that an experimental lab performs repeated experiments using the same recording setup\nfrom day to day. In this setting hand-curated or otherwise validated prior sorts are saved, resulting\nin abundant training data for a given experimental preparation. In the supplemental material, we\ndiscuss the construction of a training set (including data augmentation approaches) in Section C.2, the\narchitecture and training of the network in Section C.3, the detection using the network in Section C.4,\nempirical performance in Section C.5, and scalability in Section C.5. This strategy is effective when\n\n3\n\n\fthis training data exists; however, many research groups are currently using single electrodes and do\nnot have dense MEA training data. Thus it is worth emphasizing that here we train the detector only\non a single electrode. We have also experimented with training and evaluating on multiple electrodes\nwith good success; however, these results are more specialized and are not shown here.\nA key result is that our neural network dramatically improves both the temporal and spatial alignment\nof detected waveforms. This improved alignment improves the \ufb01delity of the feature space and the\nsignal-to-noise power, and the result of the improved feature space can clearly be seen by comparing\nthe detected waveform features from one standard detection approach (SpikeDetekt [24]) in Figure\n1 (left) to the detected waveform features from our neural network in Figure 1 (middle). Note that the\noutput of the neural net detection is remarkably more Gaussian compared to SpikeDetekt.\n\n2.3 Feature Extraction and Mask Creation\nFollowing detection we have a collection of N events de\ufb01ned as Xn 2 RR\u21e5C for n = 1, . . . , N,\neach with an associated detection time tn. Recall that C is the total number of electrodes, and R is the\nnumber of time samples, in our case chosen to correspond to 1.5ms. Next features are extracted by\nusing uncentered Principal Components Analysis (PCA) on each channel separately with P features\nper channel. Each waveform Xn is transformed to the feature space Yn. To handle duplicate spikes,\nYn is kept only if cn = arg max{||ync||c2Ncn}, where Ncn contains all electrodes in the local\nneighborhood of electrode cn . To address the increasing dimensionality, spikes are localized by using\nthe sparse masking vector {mn}2 [0, 1]C method of [24], where the mask should be set to 1 only\nwhere the signal exists. The sparse vector reduces the dimensionality and facilitates sparse updates to\nimprove computational ef\ufb01ciency. We give additional mathematical details in Supplemental Section\nD. We have also experimented with an autoencoder framework to standardize the feature extraction\nacross channels and facilitate online inference. This approach performed similarly to PCA and is not\nshown here, but will be addressed in depth in future work.\n\n2.4 Collision Screening and Outlier Triaging\n\nMany collisions and outliers remain even after our improved detection algorithm. Because these\nevents destabilize the clustering algorithms, the pipeline bene\ufb01ts from a \u201ctriage\u201d stage to further\nscreen collisions and noise events. Note that triaging out a small fraction of true positives is a minor\nconcern at this stage because they will be recovered in the \ufb01nal deconvolution step.\nWe use a two-fold approach to perform this triaging. First, obvious collisions with nearly overlapping\nspike times and spatial locations are removed. Second, k-Nearest Neighbors (k-NN) is used to\ndetect outliers in the masked feature space based on [27]. To develop a computationally ef\ufb01cient and\neffective approach, waveforms are grouped based on their primary (highest-energy) channel, and then\nk-NN is run for each channel. Empirically, these approximations do not suffer in ef\ufb01cacy compared\nto using the full spatial area. When the dimensionality of P , the number of features per channel, is\nlow, a kd-tree can \ufb01nd neighbors in O(N log N ) average time. We demonstrate that this method is\neffective for triaging false positives and collisions in Figure 1 (middle).\n\n2.5 Coreset Construction\n\n\u201cBig data\u201d improves density estimates for clustering, but the cost per iteration naively scales with the\namount of data. However, often data has some redundant features, and we can take advantage of\nthese redundancies to create more ef\ufb01cient summarizations of the data. Then running the clustering\nalgorithm on the summarized data should scale only with the number of summary points. By choosing\nrepresentative points (or a \u201ccoreset\") carefully we can potentially describe huge datasets accurately\nwith a relatively small number of points [19, 13, 2].\nThere is a sizable literature on the construction of coresets for clustering problems; however, the\nnumber of required representative points to satisfy the theoretical guarantees is infeasible in this\nproblem domain. Instead, we propose a simple approach to build coresets that empirically outperforms\nexisting approaches in our experiments by forcing adequate coverage of the complete dataset. We\ndemonstrate in Supplemental Figure S6 that this approach can cover clusters completely missed by\nexisting approaches, and show the chosen representative points on data in Figure 1 (right). This\nalgorithm is based on recursively performing k-means; we provide pseudocode and additional details\n\n4\n\n\fSpikeDetekt\n\nNN-triaged\nNN-kept\n\ncoreset\n\n2\nC\nP\n\nPC 1\n\nFigure 1: Illustration of Neural Network Detection, Triage, and Coreset from a primate retinal\nganglion cell recording. The \ufb01rst column shows spike waveforms from SpikeDetekt in their PCA\nspace. Due to poor alignment, clusters have a non-Gaussian shape with many outliers. The second\ncolumn shows spike waveforms from our proposed neural network detection in the PCA space. After\ntriaging outliers, the clusters have cleaner Gaussian shapes in the recomputed feature space. The last\ncolumn illustrates the coreset. The size of each coreset diamond represents its weight. For visibility,\nonly 10% of data are plotted.\n\nin in Supplemental Section E. The worst case time complexity is nearly linear with respect to the\nnumber of representative points, the number of detected spikes, and the number of channels. The\nalgorithm ends by returning G representative points, their suf\ufb01cient statistics, and masks.\n\n2.6 Ef\ufb01cient Inference for the Dirichlet Process Gaussian Mixture Model\nFor the clustering step we use a Dirichlet Process Gaussian Mixture Model (DP-GMM) formulation,\nwhich has been previously used in spike sorting [48, 9], to adaptively choose the number of mixture\ncomponents (visible neurons). In contrast to these prior approaches, here we adopt a Variational\nBayesian split-merge approach to explore the clustering space [21] and to \ufb01nd a more robust and\nhigher-likelihood optimum. We address the high computational cost of this approach with several key\ninnovations. First, following [24], we \ufb01t a mixture model on the virtual masked data to exploit the\nlocalized nature of the data. Second, following [9, 24], the covariance structure is approximated as a\nblock-diagonal to reduce the parameter space and computation. Finally, we adapted the methodology\nto work with the representative points (coreset) rather than the raw data, resulting in a highly scalable\nalgorithm. A more complete description of this stage can be found in Supplemental Section F, with\npseudocode in Supplemental Algorithm S2.\nIn terms of computational costs, the dominant cost per iteration in the DPMM algorithm is the\ncalculation of data to cluster assignments, which in our framework will scale at O(G \u00afmP 2K), where\n\u00afm is the average number of channels maintained in the mask for each of the representative points,\nG is the number of representative points, and P is the number of features per channel. This is in\nstark contrast to a scaling of O(N C2P 2K + P 3) without our above modi\ufb01cations. Both K and G\nare expected to scale linearly with the number of electrodes and sublinearly with the length of the\nrecording. Without further modi\ufb01cation, the time complexity in the above clustering algorithm would\ndepend on the square of the number of electrodes for each iteration; fortunately, this can be reduced\nto a linear dependency based on a divide-and-conquer approach discussed below in Section 2.7.\n\n5\n\n\fs Stability (High Collision ViSAPy)\n\nl\n\nr\ne\nt\ns\nu\nC\ne\nl\nb\na\nt\nS\n\n)\n\n%\n(\nx\n\nf\no\n%\n\n80\n60\n40\n20\n0\n100\n\n80\n\n60\n90\nStability % Threshold\n\n70\n\n50\n\ns Accuracy (High Collision ViSAPy)\n\nl\n\ns\nr\ne\nt\ns\nu\nC\ne\nl\nb\na\nt\nS\n\n)\n\n%\n(\nx\n\nf\no\n%\n\nl\n\ns\nr\ne\nt\ns\nu\nC\ne\nt\na\nr\nu\nc\nc\nA\n)\n\n%\n(\nx\n\n60\n\n40\n\n20\n\n0\n100\n\n15\n\n10\n\n5\n\n0\n100\n\nStability (Low SNR ViSAPy)\n\nYASS\nKilosort\nMountain\nSpyKing\n\n80\n\n60\n90\nStability % Threshold\n\n70\n\nAccuracy (Low SNR ViSAPy)\n\n90\n60\nTrue Positive % Threshold\n\n70\n\n80\n\n50\n\n50\n\nl\n\nr\ne\nt\ns\nu\nC\ne\nt\na\nr\nu\nc\nc\nA\n)\n\n%\n(\nx\n\nf\no\n#\n\n15\n\n10\n\n5\n\n0\n100\n\nYASS\nKiloSort\nMountain\nSpyKING\n90\n60\nTrue Positive % Threshold\n\n70\n\n80\n\n50\n\nf\no\n#\n\nFigure 2: Simulation results on 30-channel ViSAPy datasets. Left panels show the result on\nViSAPy with high collision rate and Right panels show the result on ViSAPy with low SNR setting.\n(Top) stability metric (following [5]) and percentage of total discovered clusters above a certain\nstability measure. The noticeable gap between stability of YASS and the other methods results\nfrom a combination of high number of stable clusters and lower number of total clusters. (Bottom)\nThese results show the number of clusters (out of a ground truth of 16 units) above a varying\nquality threshold for each pipeline. For each level of accuracy, the number of clusters that pass that\nthreshold is calculated to demonstrate the relative quality of the competing algorithms on this dataset.\nEmpirically, our pipeline (YASS) outperforms other methods.\n\n2.7 Divide and Conquer and Template Merging\n\nNeural action potentials have a \ufb01nite spatial extent [6]. Therefore, the spikes can be divided into\ndistinct groups based on the geometry of the MEA and the local position of each neuron, and each\ngroup is then processed independently. Thus, each group can be processed in parallel, allowing\nfor high data throughput. This is crucial for exploiting parallel computer resources and limited\nmemory structures. Second, the split-and-merge approach in a DP-GMM is greatly hindered when\nthe numbers of clusters is very high [21]. The proposed divide and conquer approach addresses this\nproblem by greatly reducing the number of clusters within each subproblem, allowing the split and\nmerge algorithm to be targeted and effective.\nTo divide the data based on the spatial location of each spike, the primary channel cn is determined\nfor every point in the coreset based on the channel with maximum energy, and clustering is applied\non each channel. Because neurons may now end up on multiple channels, it is necessary to merge\ntemplates from nearby channels as a post-clustering step. When the clustering is completed, the\nmean of each cluster is taken as a template. Because waveforms are limited to their primary channel,\nsome neurons may have \u201coverclustered\u201d and have a distinct mixture component on distinct channels.\nAlso, overclustering can occur from model mismatch (non-Gaussianity). Therefore, it is necessary to\nmerge waveforms. Template merging is performed based on two criteria, the angle and the amplitude\nof templates, using the best alignment on all temporal shifts between two templates. The pseudocode\nto perform this merging is shown in Supplemental Algorithm S3. Additional details can be found in\nSupplemental Section G.\n\n6\n\n\fAccuracy\n\nStability\n\nYASS\nKilosort\nMountain\nSpyKing\n\nl\n\ns\nr\ne\nt\ns\nu\nC\ne\nl\nb\na\nt\nS\n\n)\n\n%\n(\nx\n\nf\no\n%\n\n60\n\n40\n\n20\n\n0\n100\n\nl\n\ns\nr\ne\nt\ns\nu\nC\ne\nt\na\nr\nu\nc\nc\nA\n)\n\n%\n(\nx\n\n50\n\nf\no\n#\n\n30\n\n20\n\n10\n\n0\n100\n\n80\n\n60\n90\nStability % Threshold\n\n70\n\nFigure 3: Performance comparison of spike sorting pipelines on primate retina data. (Left)\nThe same type of plot as in the top panels of Figure 2. (Right) The same type of plot as in the bottom\npanels of Figure 2 compared to the \u201cgold standard\u201d sort. YASS demonstrates both improved stability\nand also per-cluster accuracy.\n\n60\n90\nTrue Positive % Threshold\n\n70\n\n80\n\n50\n\n2.8 Recovering Triaged Waveforms and Collisions\n\nAfter the previous steps, we apply matching pursuit [36] to recover triaged waveforms and collisions.\nWe detail the available choices for this stage in Supplemental Section I.\n\n3 Performance Comparison\n\nWe evaluate performance to compare several algorithms (detailed in Section 3.1) to our proposed\nmethodology on both synthetic (Section 3.2) and real (Section 3.3) dense MEA recordings. For\neach synthetic dataset we evaluate the ability to capture ground truth in addition to the per-cluster\nstability metrics. For the ground truth, inferred clusters are matched with ground truth clusters via the\nHungarian algorithm, and then the per-cluster accuracy is calculated as the number of assignments\nshared between the inferred cluster and the ground truth cluster over the total number of waveforms\nin the inferred cluster. For the per-cluster stability metric, we use the method from Section 3.3 of [5]\nwith the rate scaling parameter of the Poisson processes set to 0.25. This method evaluates how robust\nindividual clusters are to perturbations of the dataset. In addition, we provide runtime information to\nempirically evaluate the computational scaling of each approach. The CPU runtime was calculated\non a single core of a six-core i7 machine with 32GB of RAM. GPU runtime is given from a Nvidia\nTitan X within the same machine.\n\n3.1 Competing Algorithms\n\nWe compare our proposed pipeline to three recently proposed approaches for dense MEA spike\nsorting: KiloSort [36], Spyking Circus [51], and MountainSort [31]. Kilosort, Spyking Cricus,\nand MountainSort were downloaded on January 30, 2017, May 26th, 2017, and June 7th, 2017,\nrespectively. We dub our algorithm Yet Another Spike Sorter (YASS). We discuss additional details\non the relationships between these approaches and our pipeline in Supplemental Section I. All results\nare shown with no manual post-processing.\n\n3.2 Synthetic Datasets\n\nFirst, we used the biophysics-based spike activity generator ViSAPy [18] to generate multiple 30-\nchannel datasets with different noise levels and collision rates. The detection network was trained\non the ground truth from a low signal-to-noise level recording. Then, the trained neural network is\napplied to all signal-to-noise levels. The neural network dramatically outperforms existing detection\nmethodologies on these datasets. For a given level of true positives, the number of false positives\ncan be reduced by an order of magnitude. The properties of the learned network are shown in\nSupplemental Figures S4 and S5.\nPerformance is evaluated on the known ground truth. For each level of accuracy, the number of\nclusters that pass that threshold is calculated to demonstrate the relative quality of the competing\n\n7\n\n\fDetection (GPU) Data Ext.\n\n1m7s\n\n42s\n\nTriage Coreset Clustering\n11s\n\n3m12s\n\n34s\n\nTemplate Ext.\n\n54s\n\nTotal\n6m40s\n\nTable 1: Running times of the main processes on 512-channel primate retinal recording of\n30 minutes duration. Results shown using a single CPU core, except for the detection step (2.2),\nwhich was run on GPU. We found that full accuracy was achieved after processing just one-\ufb01fth\nof this dataset, leading to signi\ufb01cant speed gains. Data Extraction refers to waveform extraction\nand Performing PCA (2.3). Triage, Coreset, and Clustering refer to 2.4, 2.5, and 2.6, respectively.\nTemplate Extraction describes revisiting the recording to estimate templates and merging them (2.7).\nEach step scales approximately linearly (Section B.3).\n\nalgorithms on this dataset. Empirically, our pipeline (YASS) outperforms other methods. This is\nespecially true in low SNR settings, as shown in Figure 2. The per-cluster stability metric is also\nshown in Figure 2. The stability result demonstrates that YASS has signi\ufb01cantly fewer low-quality\nclusters than competing methods.\n\n3.3 Real Datasets\n\nTo examine real data, we focused on 30 minutes of extracellular recordings of the peripheral primate\nretina, obtained ex-vivo using a high-density 512-channel recording array [30]. The half-hour\nrecording was taken while the retina was stimulated with spatiotemporal white noise. A \u201cgold\nstandard\" sort was constructed for this dataset by extensive hand validation of automated techniques,\nas detailed in Supplemental Section H. Nonstationarity effects (time-evolution of waveform shapes)\nwere found to be minimal in this recording (data not shown).\nWe evaluate the performance of YASS and competing algorithms using 4 distinct sets of 49 spatially\ncontiguous electrodes. Note that the gold standard sort here uses the information from the full\n512-electrode array, while we examine the more dif\ufb01cult problem of sorting the 49-electrode data;\nwe have less information about the cells near the edges of this 49-electrode subset, allowing us to\nquantify the performance of the algorithms over a range of effective SNR levels. By comparing the\ninferred results to the gold standard, the cluster-speci\ufb01c true positives are determined in addition to\nthe stability metric. The results are shown in Figure 3 for one of the four sets of electrodes, and the\nremaining three sets are shown in Supplemental Section B.1. As in the simulated data, compared\nto KiloSort, which had the second-best overall performance on this dataset, YASS has dramatically\nfewer low-stability clusters.\nFinally, we evaluate the time required for each step in the YASS pipeline (Table 1). Importantly, we\nfound that YASS is highly robust to data limitations: as shown in Supplemental Figure S3 and Section\nB.3, using only a fraction of the 30 minute dataset has only a minor impact on performance. We\nexploit this to speed up the pipeline. Remarkably, running primarily on a single CPU core (only\nthe detect step utilizes a GPU here), YASS achieves a several-fold speedup in template and cluster\nestimation compared to the next fastest competitor2, Kilosort, which was run in full GPU mode and\nspent about 30 minutes on this dataset. We plan to further parallelize and GPU-ize the remaining\nsteps in our pipeline next, and expect to achieve signi\ufb01cant further speedups.\n\n4 Conclusion\n\nYASS has demonstrated state-of-the-art performance in accuracy, stability, and computational ef-\n\ufb01ciency; we believe the tools presented here will have a major practical and scienti\ufb01c impact in\nlarge-scale neuroscience. In our future work, we plan to continue iteratively updating our modular\npipeline to better handle template drift, refractory violations, and dense collisions.\nLastly, YASS is available online at https://github.com/paninski-lab/yass\n\n2Spyking Circus took over a day to process this dataset. Assuming linear scaling based on smaller-scale\n\nexperiments, Mountainsort is expected to take approximately 10 hours.\n\n8\n\n\fAcknowledgements\nThis work was partially supported by NSF grants IIS-1546296 and IIS-1430239, and DARPA Contract\nNo. N66001-17-C-4002.\n\nReferences\n[1] D. Arthur and S. Vassilvitskii. k-means++: The advantages of careful seeding. In ACM-SIAM\n\nSymposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2007.\n\n[2] O. Bachem, M. Lucic, and A. Krause. Coresets for nonparametric estimation-the case of\n\ndp-means. In ICML, 2015.\n\n[3] B. Bahmani, B. Moseley, A. Vattani, R. Kumar, and S. Vassilvitskii. Scalable k-means++.\n\nProceedings of the VLDB Endowment, 2012.\n\n[4] I. N. Bankman, K. O. Johnson, and W. Schneider. Optimal detection, classi\ufb01cation, and\n\nsuperposition resolution in neural waveform recordings. IEEE Trans. Biomed. Eng. 1993.\n\n[5] A. H. Barnett, J. F. Magland, and L. F. Greengard. Validation of neural spike sorting algorithms\n\nwithout ground-truth information. J. Neuro. Methods, 2016.\n\n[6] G. Buzs\u00e1ki. Large-scale recording of neuronal ensembles. Nature neuroscience, 2004.\n[7] T. Campbell, J. Straub, J. W. F. III, and J. P. How. Streaming, Distributed Variational Inference\n\nfor Bayesian Nonparametrics. In NIPS, 2015.\n\n[8] D. Carlson, V. Rao, J. Vogelstein, and L. Carin. Real-Time Inference for a Gamma Process\n\nModel of Neural Spiking. NIPS, 2013.\n\n[9] D. E. Carlson, J. T. Vogelstein, Q. Wu, W. Lian, M. Zhou, C. R. Stoetzner, D. Kipke, D. Weber,\nD. B. Dunson, and L. Carin. Multichannel electrophysiological spike sorting via joint dictionary\nlearning and mixture modeling. IEEE TBME, 2014.\n\n[10] B. Chen, D. E. Carlson, and L. Carin. On the analysis of multi-channel neural spike data. In\n\nNIPS, 2011.\n\n[11] D. M. Dacey, B. B. Peterson, F. R. Robinson, and P. D. Gamlin. Fireworks in the primate retina:\n\nin vitro photodynamics reveals diverse lgn-projecting ganglion cell types. Neuron, 2003.\n\n[12] C. Ekanadham, D. Tranchina, and E. P. Simoncelli. A uni\ufb01ed framework and method for\n\nautomatic neural spike identi\ufb01cation. J. Neuro. Methods 2014.\n\n[13] D. Feldman, M. Faulkner, and A. Krause. Scalable training of mixture models via coresets. In\n\nNIPS, 2011.\n\n[14] J. Fournier, C. M. Mueller, M. Shein-Idelson, M. Hemberger, and G. Laurent. Consensus-based\n\nsorting of neuronal spike waveforms. PloS one, 2016.\n\n[15] F. Franke, M. Natora, C. Boucsein, M. H. J. Munk, and K. Obermayer. An online spike detection\nand spike classi\ufb01cation algorithm capable of instantaneous resolution of overlapping spikes. J.\nComp. Neuro. 2010.\n\n[16] S. Gibson, J. W. Judy, and D. Markovi. Spike Sorting: The \ufb01rst step in decoding the brain.\n\nIEEE Signal Processing Magazine, 2012.\n\n[17] I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. MIT Press, 2016.\n[18] E. Hagen, T. V. Ness, A. Khosrowshahi, C. S\u00f8rensen, M. Fyhn, T. Hafting, F. Franke, and G. T.\nEinevoll. ViSAPy: a Python tool for biophysics-based generation of virtual spiking activity for\nevaluation of spike-sorting algorithms. J. Neuro. Methods 2015.\n\n[19] S. Har-Peled and S. Mazumdar. On coresets for k-means and k-median clustering. In ACM\n\nTheory of Computing. ACM, 2004.\n\n[20] G. Hilgen, M. Sorbaro, S. Pirmoradian, J.-O. Muthmann, I. Kepiro, S. Ullo, C. J. Ramirez,\nA. Maccione, L. Berdondini, V. Murino, et al. Unsupervised spike sorting for large scale, high\ndensity multielectrode arrays. Cell Reports, 2017.\n\n[21] M. C. Hughes and E. Sudderth. Memoized Online Variational Inference for Dirichlet Process\n\nMixture Models. In NIPS, 2013.\n\n[22] H. Ishwaran and L. F. James. Gibbs sampling methods for stick-breaking priors. JASA, 2001.\n\n9\n\n\f[23] J. J. Jun, C. Mitelut, C. Lai, S. Gratiy, C. Anastassiou, and T. D. Harris. Real-time spike sorting\nplatform for high-density extracellular probes with ground-truth validation and drift correction.\nbioRxiv, 2017.\n\n[24] S. N. Kadir, D. F. M. Goodman, and K. D. Harris. High-dimensional cluster analysis with the\n\nmasked EM algorithm. Neural computation 2014.\n\n[25] K. H. Kim and S. J. Kim. Neural spike sorting under nearly 0-db signal-to-noise ratio using\n\nnonlinear energy operator and arti\ufb01cial neural-network classi\ufb01er. IEEE TBME, 2000.\n\n[26] D. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2015.\n[27] E. M. Knox and R. T. Ng. Algorithms for mining distance-based outliers in large datasets. In\n\nVLDB. Citeseer, 1998.\n\n[28] K. C. Knudson, J. Yates, A. Huk, and J. W. Pillow. Inferring sparse representations of continuous\n\nsignals with continuous orthogonal matching pursuit. In NIPS, 2014.\n\n[29] M. S. Lewicki. A review of methods for spike sorting: the detection and classi\ufb01cation of neural\n\naction potentials. Network: Computation in Neural Systems, 1998.\n\n[30] A. Litke, N. Bezayiff, E. Chichilnisky, W. Cunningham, W. Dabrowski, A. Grillo, M. Grivich,\nP. Grybos, P. Hottowy, S. Kachiguine, et al. What does the eye tell the brain?: Development of\na system for the large-scale recording of retinal output activity. IEEE Trans. Nuclear Science,\n2004.\n\n[31] J. F. Magland and A. H. Barnett. Unimodal clustering using isotonic regression: Iso-split. arXiv\n\npreprint arXiv:1508.04841, 2015.\n\n[32] S. Mukhopadhyay and G. C. Ray. A new interpretation of nonlinear energy operator and its\n\nef\ufb01cacy in spike detection. IEEE TBME 1998.\n\n[33] J.-O. Muthmann, H. Amin, E. Sernagor, A. Maccione, D. Panas, L. Berdondini, U. S. Bhalla, and\nM. H. Hennig. Spike detection for large neural populations using high density multielectrode\narrays. Frontiers in neuroinformatics, 2015.\n\n[34] R. M. Neal. Markov chain sampling methods for dirichlet process mixture models. Journal of\n\ncomputational and graphical statistics, 2000.\n\n[35] A. Y. Ng, M. I. Jordan, et al. On spectral clustering: Analysis and an algorithm.\n[36] M. Pachitariu, N. A. Steinmetz, S. N. Kadir, M. Carandini, and K. D. Harris. Fast and accurate\n\nspike sorting of high-channel count probes with kilosort. In NIPS, 2016.\n\n[37] J. W. Pillow, J. Shlens, E. J. Chichilnisky, and E. P. Simoncelli. A model-based spike sorting\n\nalgorithm for removing correlation artifacts in multi-neuron recordings. PloS one 2013.\n\n[38] R. Q. Quiroga, Z. Nadasdy, and Y. Ben-Shaul. Unsupervised spike detection and sorting with\n\nwavelets and superparamagnetic clustering. Neural computation 2004.\n\n[39] H. G. Rey, C. Pedreira, and R. Q. Quiroga. Past, present and future of spike sorting techniques.\n\nBrain research bulletin, 2015.\n\n[40] A. Rodriguez and A. Laio. Clustering by fast search and \ufb01nd of density peaks. Science, 2014.\n[41] E. M. Schmidt. Computer separation of multi-unit neuroelectric data: a review. J. Neuro.\n\nMethods 1984.\n\n[42] R. Tarjan. Depth-\ufb01rst search and linear graph algorithms. SIAM journal on computing, 1972.\n[43] P. T. Thorbergsson, M. Garwicz, J. Schouenborg, and A. J. Johansson. Statistical modelling\nof spike libraries for simulation of extracellular recordings in the cerebellum. In IEEE EMBC.\nIEEE, 2010.\n\n[44] V. Ventura. Automatic Spike Sorting Using Tuning Information. Neural Computation, 2009.\n[45] R. J. Vogelstein, K. Murari, P. H. Thakur, C. Diehl, S. Chakrabartty, and G. Cauwenberghs.\n\nSpike sorting with support vector machines. In IEEE EMBS, volume 1. IEEE, 2004.\n\n[46] L. Wang and D. B. Dunson. Fast bayesian inference in dirichlet process mixture models. J.\n\nComp. and Graphical Stat., 2011.\n\n[47] A. B. Wiltschko, G. J. Gage, and J. D. Berke. Wavelet \ufb01ltering before spike detection preserves\n\nwaveform shape and enhances single-unit discrimination. J. Neuro. Methods, 2008.\n\n10\n\n\f[48] F. Wood and M. J. Black. A nonparametric bayesian alternative to spike sorting. J. Neuro.\n\nMethods, 2008.\n\n[49] F. Wood, M. J. Black, C. Vargas-Irwin, M. Fellows, and J. P. Donoghue. On the variability of\n\nmanual spike sorting. IEEE TBME 2004.\n\n[50] X. Yang and S. A. Shamma. A totally automated system for the detection and classi\ufb01cation of\n\nneural spikes. IEEE Trans. Biomed. Eng. 1988.\n\n[51] P. Yger, G. L. Spampinato, E. Esposito, B. Lefebvre, S. Deny, C. Gardella, M. Stimberg, F. Jetter,\nG. Zeck, S. Picaud, et al. Fast and accurate spike sorting in vitro and in vivo for up to thousands\nof electrodes. bioRxiv, 2016.\n\n[52] L. Zelnik-Manor and P. Perona. Self-tuning spectral clustering. In NIPS, volume 17, 2004.\n\n11\n\n\f", "award": [], "sourceid": 2138, "authors": [{"given_name": "Jin Hyung", "family_name": "Lee", "institution": "Columbia University"}, {"given_name": "David", "family_name": "Carlson", "institution": "Duke University"}, {"given_name": "Hooshmand", "family_name": "Shokri Razaghi", "institution": "Columbia University"}, {"given_name": "Weichi", "family_name": "Yao", "institution": "Columbia University"}, {"given_name": "Georges", "family_name": "Goetz", "institution": "Stanford University"}, {"given_name": "Espen", "family_name": "Hagen", "institution": null}, {"given_name": "Eleanor", "family_name": "Batty", "institution": "Columbia University"}, {"given_name": "E.J.", "family_name": "Chichilnisky", "institution": "Stanford University"}, {"given_name": "Gaute", "family_name": "Einevoll", "institution": "Norwegian University of Life Sciences"}, {"given_name": "Liam", "family_name": "Paninski", "institution": "Columbia University"}]}