{"title": "Inferring sparse representations of continuous signals with continuous orthogonal matching pursuit", "book": "Advances in Neural Information Processing Systems", "page_first": 1215, "page_last": 1223, "abstract": "Many signals, such as spike trains recorded in multi-channel electrophysiological recordings, may be represented as the sparse sum of translated and scaled copies of waveforms whose timing and amplitudes are of interest. From the aggregate signal, one may seek to estimate the identities, amplitudes, and translations of the waveforms that compose the signal. Here we present a fast method for recovering these identities, amplitudes, and translations. The method involves greedily selecting component waveforms and then refining estimates of their amplitudes and translations, moving iteratively between these steps in a process analogous to the well-known Orthogonal Matching Pursuit (OMP) algorithm. Our approach for modeling translations borrows from Continuous Basis Pursuit (CBP), which we extend in several ways: by selecting a subspace that optimally captures translated copies of the waveforms, replacing the convex optimization problem with a greedy approach, and moving to the Fourier domain to more precisely estimate time shifts. We test the resulting method, which we call Continuous Orthogonal Matching Pursuit (COMP), on simulated and neural data, where it shows gains over CBP in both speed and accuracy.", "full_text": "Inferring sparse representations of continuous signals\n\nwith continuous orthogonal matching pursuit\n\nKarin C. Knudson\n\nDepartment of Mathematics\n\nThe University of Texas at Austin\nkknudson@math.utexas.edu\n\nJacob L. Yates\n\nDepartment of Neuroscience\n\nThe University of Texas at Austin\n\njlyates@utexas.edu\n\nAlexander C. Huk\n\nCenter for Perceptual Systems\n\nDepartments of Psychology & Neuroscience\n\nThe University of Texas at Austin\n\nhuk@utexas.edu\n\nJonathan W. Pillow\n\nPrinceton Neuroscience Institute and\n\nDepartment of Psychology\n\nPrinceton University\n\npillow@princeton.edu\n\nAbstract\n\nMany signals, such as spike trains recorded in multi-channel electrophysiological\nrecordings, may be represented as the sparse sum of translated and scaled copies\nof waveforms whose timing and amplitudes are of interest. From the aggregate\nsignal, one may seek to estimate the identities, amplitudes, and translations of the\nwaveforms that compose the signal. Here we present a fast method for recover-\ning these identities, amplitudes, and translations. The method involves greedily\nselecting component waveforms and then re\ufb01ning estimates of their amplitudes\nand translations, moving iteratively between these steps in a process analogous\nto the well-known Orthogonal Matching Pursuit (OMP) algorithm [11]. Our ap-\nproach for modeling translations borrows from Continuous Basis Pursuit (CBP)\n[4], which we extend in several ways: by selecting a subspace that optimally cap-\ntures translated copies of the waveforms, replacing the convex optimization prob-\nlem with a greedy approach, and moving to the Fourier domain to more precisely\nestimate time shifts. We test the resulting method, which we call Continuous Or-\nthogonal Matching Pursuit (COMP), on simulated and neural data, where it shows\ngains over CBP in both speed and accuracy.\n\n1\n\nIntroduction\n\nIt is often the case that an observed signal is a linear combination of some other target signals that\none wishes to resolve from each other and from background noise. For example, the voltage trace\nfrom an electrode (or array of electrodes) used to measure neural activity in vivo may be recording\nfrom a population of neurons, each of which produces many instances of its own stereotyped action\npotential waveform. One would like to decompose an analog voltage trace into a list of the timings\nand amplitudes of action potentials (spikes) for each neuron.\nMotivated in part by the spike-sorting problem, we consider the case where we are given a signal\nthat is the sum of known waveforms whose timing and amplitude we seek to recover. Speci\ufb01cally,\nwe suppose our signal can be modeled as:\n\ny(t) =\n\nan,jfn(t \u2212 \u03c4n,j),\n\n(1)\n\nNf(cid:88)\n\nJ(cid:88)\n\nn=1\n\nj=1\n\n1\n\n\fwhere the waveforms fn are known, and we seek to estimate positive amplitudes an,j and event\ntimes \u03c4n,j. Signals of this form have been studied extensively [12, 9, 4, 3].\nThis a dif\ufb01cult problem in part because of the nonlinear dependence of y on \u03c4. Moreover, in most\napplications we do not have access to y(t) for arbitrary t, but rather have a vector of sampled (noisy)\nmeasurements on a grid of discrete time points. One way to simplify the problem is to discretize \u03c4,\nconsidering only a \ufb01nite set of possible time shift \u03c4n,j \u2208 {\u2206, 2\u2206..., N\u2206\u2206} and approximating the\nsignal as\n\nNf(cid:88)\n\nJ(cid:88)\n\ny \u2248\n\nan,jfn(t \u2212 in,j\u2206), in,j \u2208 1, ..., N\u2206\n\n(2)\n\nn=1\n\nj=1\n\nOnce discretized in this way, the problem is one of sparse recovery: we seek to represent the\nobserved signal with a sparse linear combination of elements of a \ufb01nite dictionary {fn,j(t) :=\nfn(t \u2212 j\u2206), n \u2208 1, ..., Nf , j \u2208 1, ..., N\u2206}. Framing the problem as sparse recovery, one can\nbring tools from compressed sensing to bear. However, the discretization introduces several new\ndif\ufb01culties. First, we can only approximate the translation \u03c4 by values on a discrete grid. Secondly,\nchoosing small \u2206 allows us to more closely approximate \u03c4, but demands more computation, and\nsuch \ufb01nely spaced dictionary elements yield a highly coherent dictionary, while sparse recovery\nalgorithms generally have guarantees for low-coherence dictionaries.\nA previously introduced algorithm that uses techniques of sparse recovery and returns accurate and\ncontinuous valued estimates of a and \u03c4 is Continuous Basis Pursuit (CBP) [4], which we describe\nbelow. CBP proceeds (roughly speaking) by augmenting the discrete dictionary fn,j(t) with other\ncarefully chosen basis elements, and then solving a convex optimization problem inspired by basis\npursuit denoising. We extend ideas introduced in CBP to present a new method for recovering\nthe desired time shifts \u03c4 and amplitudes a that leverage the speed and tractability of solving the\ndiscretized problem while still ultimately producing continuous valued estimates of \u03c4, and partially\ncircumventing the problem of too much coherence.\nBasis pursuit denoising and other convex optimization or (cid:96)1-minimization based methods have been\neffective in the realm of sparse recovery and compressed sensing. However, greedy methods have\nalso been used with great success. Our approach begins with the augmented bases used in CBP,\nbut adds basis vectors greedily, drawing on the well known Orthogonal Matching Pursuit algorithm\n[11]. In the regimes considered, our greedy approach is faster and more accurate than CBP.\nBroadly speaking, our approach has three parts. First, we augment the discretized basis in one of\nseveral ways. We draw on [4] for two of these choices, but also present another choice of basis that\nis in some sense optimal. Second, we greedily select candidate time bins of size \u2206 in which we\nsuspect an event has occurred. Finally, we move from this rough, discrete-valued estimate of timing\n\u03c4 to continuous-valued estimates of \u03c4 and a. We iterate the second and third steps, greedily adding\ncandidate time bins and updating our estimates of \u03c4 and a until a stopping criterion is reached.\nThe structure of the paper is as follows. In Section 2 we describe the method of Continuous Basis\nPursuit (CBP), which our method builds upon. In Section 3 we develop our method, which we call\nContinuous Orthogonal Matching Pursuit (COMP). In Section 4 we present the performance of our\nmethod on simulated and neural data.\n\n2 Continuous basis pursuit\n\nContinuous Basis Pursuit (CBP) [4, 3, 5] is a method for recovering the time shifts and amplitudes\nof waveforms present in a signal of the form (1). A key element of CBP is augmenting or replacing\nthe set {fn,j(t)} with certain additional dictionary elements that are chosen to smoothly interpolate\nthe one dimensional manifold traced out by fn,j(t \u2212 \u03c4 ) as \u03c4 varies in (\u2212\u2206/2, \u2206/2).\nThe bene\ufb01t of a dictionary that is expanded in this way is twofold. First, it increases the ability\nof the dictionary to represent shifted copies of the waveform fn(t \u2212 \u03c4 ) without introducing as\nmuch correlation as would be introduced by simply using a \ufb01ner discretization (decreasing \u2206),\nwhich is an advantage because dictionaries with smaller coherence are generally better suited for\nsparse recovery techniques. Second, one can move from recovered coef\ufb01cients in this augmented\ndictionary to estimates an,j and continuous-valued estimates of \u03c4n,j.\n\n2\n\n\fk=1 c(k)\n\nn,j, ..., c(K)\n\nn,j ), so afn,j(t \u2212 \u03c4 ) \u2248(cid:80)K\n\nIn general, there are three ingredients for CBP: basis elements, an interpolator with corresponding\nmapping function \u03a6, and a convex constraint set, C. There are K basis elements {gn,j,k(t) =\ngn,k(t\u2212 j\u2206)}k=K\nk=1 , for each waveform and width-\u2206 time bin, which together can be used to linearly\ninterpolate fn,j(t \u2212 \u03c4 ),|\u03c4| < \u2206/2. The function \u03a6 maps from amplitude a and time shift \u03c4 to K-\ntuples of coef\ufb01cients \u03a6(a, \u03c4 ) = (c(1)\nn,jgn,j,k(t). The convex\nconstraint set C is for K-tuples of coef\ufb01cients of {gn,j,k}k=K\nk=1 and corresponds to the requirement\nthat a > 0 and |\u03c4| < \u2206/2. If the constraint region corresponding to these requirements is not convex\n(e.g. in the polar basis discussed below), its convex relaxation is used.\nAs a concrete example, let us \ufb01rst consider (as discussed in [4]) the dictionary augmented with\nshifted copies of each waveform\u2019s derivative : {f(cid:48)\nn(t\u2212 j\u2206)}. Assuming fn is suf\ufb01ciently\nsmooth, we have from the Taylor expansion that for small \u03c4, afn,j(t\u2212 \u03c4 ) \u2248 afn,j(t)\u2212 a\u03c4 f(cid:48)\nn,j(t). If\nwe recover a representation of y as c1fn,j(t)+c2f(cid:48)\nn,j(t), then we can estimate the amplitude a of the\nwaveform present in y as c1, the time shift \u03c4 as \u2212c2/c1. Hence, we estimate y \u2248 c1fn,j(t+c2/c1) =\nc1fn(t \u2212 j\u2206 + c2/c1). Note that the estimate of the time shift \u03c4 varies continuously with c1, c2.\nIn contrast, using shifted copies of the waveforms only as a basis would not allow for a time shift\nestimate off of the grid {j\u2206}j=N\u2206\nOnce a suitable dictionary is chosen, one must still recover coef\ufb01cients (i.e. c1, c2 above). Motivated\nby the assumed sparsity of the signal (i.e. y is the sum of relatively few shifted copies of waveforms,\nso the coef\ufb01cients of most dictionary elements will be zero), CBP draws on the basis pursuit denois-\ning, which has been effective in the compressive sensing setting and elsewhere [10],[1]. Speci\ufb01cally,\nCBP (with a Taylor basis) recovers coef\ufb01cients using:\n\nn,j(t) := f(cid:48)\n\nj=1\n\n.\n\nargminc\n\n(Fnc(1)\n\nn + F(cid:48)\n\nn ) \u2212 y\n\nnc(2)\n\n+ \u03bb\n\ns.t. c(1)\n\nn,i \u2265 0 , |c(2)\n\nn,i| \u2264 \u2206\n2\n\ni,n \u2200 n, i (3)\nc(1)\n\nn,j and time shift j\u2206 \u2212 \u02c6\u03c4 = j\u2206 \u2212 c(2)\n\nHere we denote by F the matrix with columns {fn,j(t)} and F(cid:48) the matrix with columns {f(cid:48)\nn,j(t)}.\nThe (cid:96)1 penalty encourages sparsity, pushing most of the estimated amplitudes to zero, with higher\nn,j (cid:54)= 0, one estimates that there is\n\u03bb encouraging greater sparsity. Then, for each (n, j) such that c(1)\na waveform in the shape of fn with amplitude \u02c6a = c(1)\nn,j/c(1)\nn,j\npresent in the signal. The inequality constraints in the optimization problem ensure \ufb01rst that we only\nrecover positive amplitudes \u02c6a, and second that estimates \u02c6\u03c4 satisfy |\u02c6\u03c4| < \u2206/2. Requiring \u02c6\u03c4 to fall\nin this range keeps the estimated \u03c4 in the time bin represented by fn,j and also in the regime where\nthey Taylor approximation to fn,j(t\u2212\u03c4 ) is accurate. Note that (3) is a convex optimization problem.\nBetter results in [4] are obtained for a second order Taylor interpolation and the best results come\nfrom a polar interpolator, which represents each manifold of time-shifted waveforms fn,j(t \u2212\n\u03c4 ),|\u03c4| \u2264 \u2206/2 as an arc of the circle that is uniquely de\ufb01ned to pass through fn,j(t), fn,j(t \u2212 \u2206/2),\nand fn,j(t+\u2206/2). Letting the radius of the arc be r, and its angle be 2\u03b8 one represents points on this\narc by linear combinations of functions w, u, v: f (t\u2212 \u03c4 ) \u2248 w(t) + r cos( 2\u03c4\n\u2206 \u03b8)v(t).\nThe Taylor and polar bases consist of shifted copies of elements chosen in order to linearly interpo-\nlate the curve in function space de\ufb01ned by fn(t \u2212 \u03c4 ) as \u03c4 varies from \u2212\u2206/2 to \u2206/2. Let Gn,k be\nthe matrix whose columns are gn,j,k(t) for j \u2208 1, ..., N\u2206. With choices of basis elements, interpo-\nlator, and corresponding convex constraint set C in place, one proceeds to estimate coef\ufb01cients in\nthe chosen basis by solving:\n\n\u2206 \u03b8)u(t) + r sin( 2\u03c4\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) Nf(cid:88)\n\nn=1\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)2\n\n2\n\n(cid:13)(cid:13)(cid:13)c(1)\n\nn\n\n(cid:13)(cid:13)(cid:13)1\n\nNf(cid:88)\n\nn=1\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)y \u2212\n\nNf(cid:88)\n\nK(cid:88)\n\nn=1\n\nk=1\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)2\n\n2\n\nNf(cid:88)\n\nn=1\n\nargminc\n\nGn,kc(k)\nn\n\n+ \u03bb(cid:107)\n\nn (cid:107)1 subject to (c(1)\nc(1)\n\nn,j, ..., c(K)\n\nn,j ) \u2208 C \u2200(n, j)\n\n(4)\n\nOne then maps back from each nonzero K-tuple of recovered coef\ufb01cients c(1)\nto cor-\nresponding \u02c6an,j, \u02c6\u03c4n,j that represent the amplitude and timing of the nth waveform present in\nthe jth time bin. This can be done by inverting \u03a6, if possible, or estimating (\u02c6an,j, \u02c6\u03c4n,j) =\nargmina,\u03c4(cid:107)\u03a6(a, \u03c4 ) \u2212 (c(1)\n\nn,j, ..., c(K)\n\nn,j, ..., c(K)\n\nn,j )(cid:107)2\n2.\n\nn,j\n\n3\n\n\fTable 1: Basis choices (see also [4], Table 1.)\n\nInterpolator\n\nBasis Vectors\n\nTaylor\n(K=3)\n\nPolar\n\nSVD\n\nn,j(t)}\n\n{fn,j(t)},{f(cid:48)\n{f(cid:48)(cid:48)\n{wn,j},{un,j},\n{vn,j}\n{u1\n\nn,j}...{uK\n\nn,j(t)},\n\nn,j}.\n\n\u03a6(a, \u03c4 )\n(a,\u2212a\u03c4, a \u03c4 2\n2 )\n\n(a, ar cos( 2\u03c4\nar sin( 2\u03c4\n\u2206 \u03b8))\n\n\u2206 \u03b8),\n\nC\n\nc(1), c(3) > 0,|c(2)| < c(1) \u2206\n2 ,\n|c(3)| < c(1) \u22062\n\nc(1) \u2265 0,(cid:112)(c(2))2 + (c(3))2 \u2264 rc(1)\n\nrc(1) cos(\u03b8) \u2264 c(2) \u2264 rc(1)\n\n8\n\n(See Section 3.1)\n\n(See Section 3.1)\n\n3 Continuous Orthogonal Matching Pursuit\n\nWe now present our method for recovery, which makes use of the idea of augmented bases presented\nabove, but differs from CBP in several important ways. First, we introduce a different choice of basis\nthat we \ufb01nd enables more accurate estimates. Second, we make use of a greedy method that iterates\nbetween choosing basis vectors and estimating time shifts and amplitudes, rather than proceeding\nvia a single convex optimization problem as CBP does. Lastly, we introduce an alternative to the\nstep of mapping back from recovered coef\ufb01cients via \u03a6 that notably improves the accuracy of the\nrecovered time estimates.\nGreedy methods such as Orthogonal Matching Pursuit (OMP) [11], Subspace Pursuit [2], and Com-\npressive Sampling Matching Pursuit (CoSaMP) [8] have proven to be fast and effective in the realm\nof compressed sensing. Since the number of iterations of these greedy methods tend to go as the\nsparsity (when the algorithms succeed), they tend to be extremely fast when for very sparse sig-\nnals. Moreover, our the greedy method eliminates the need to choose a regularization constant \u03bb,\na choice that can vastly alter the effectiveness of CBP. (We still need to choose K and \u2206.) Our\nmethod is most closely analogous to OMP, but recovers continuous time estimates, so we call it\nContinuous Orthogonal Matching Pursuit (COMP). However, the steps below could be adapted in a\nstraightforward way to create analogs of other greedy methods.\n\n3.1 Choice of \ufb01nite basis\n\nWe build upon [4], choosing as our basis N\u2206 shifted copies of a set of K basis vectors for each\nwaveform in such away that these K basis vectors can effectively linearly interpolate fn(t \u2212 \u03c4 )\nfor |\u03c4| < \u2206/2.\nIn our method, as in Continuous Basis Pursuit, these basis vectors allow us to\nrepresent continuous time shifts instead of discrete time shifts, and expand the descriptive power of\nour dictionary without introducing undue amounts of coherence. While previous work introduced\nTaylor and polar bases, we obtain the best recovery from a different basis, which we describe now.\nThe basis comes from a singular value decomposition of a matrix whose columns correspond to\ndiscrete points on the curve in function space traced out by fn,j(t \u2212 \u03c4 ) as we vary \u03c4 for |\u03c4| < \u2206/2.\nWithin one time bin of size \u2206, consider discretizing further into N\u03b4 = \u2206/\u03b4 time bins of size \u03b4 (cid:28) \u2206.\nLet F\u03b4 be the matrix with columns that are these (slightly) shifted copies of the waveform, so that\nthe ith column of F\u03b4 is fn,j(t \u2212 i\u03b4 + \u2206/2) for a discrete vector of time points t. Each column of\nthis matrix is a discrete point on the curve traced out by fn,j(t \u2212 \u03c4 ) as \u03c4 varies.\nIn choosing a basis, we seek the best choice of K vectors to use to linearly interpolate this curve. We\nmight instead seek to solve the related problem of \ufb01nding the best K vectors to represent these \ufb01nely\nspaced points on the curve, in which case a clear choice for these K vectors is the \ufb01rst K left singular\nvectors of F\u03b4. This choice is optimal in the sense that the singular value decomposition yields the\nbest rank-K approximation to a matrix. If F\u03b4 = U\u03a3VT is the singular value decomposition, and\nk=1 uk\u03a3k,k(vk)T(cid:107) \u2264 (cid:107)F \u2212 A(cid:107) for\nany rank-K matrix A and any unitarily invariant norm (cid:107) \u00b7 (cid:107).\n\nuk, vk are the columns of U and V respectively, then (cid:107)F\u03b4 \u2212(cid:80)K\n\n4\n\n\fk=1 auk\u03a3k,kvk\n\ncoef\ufb01cients of this basis. Since afn,j(t\u2212 i\u03b4) =(cid:80)K\nsimple way to recover a and \u03c4 would to choose \u03c4 = i\u03b4 and a, i to minimize(cid:80)K\n\nIn order to use this SVD basis with CBP or COMP, one must specify a convex constraint set for the\ni a reasonable and simply enforced\nconstraint set would be to assume that the recovered coef\ufb01cients c(k) corresponding to each basis\nvector uk, when divided by c(1) to account for scaling, be between mini \u03a3k,kvk\ni . A\ni )2.\nIn \ufb01gure 3.1, we compare the error between shifted copies of a sample waveform f (t \u2212 \u03c4 ) for\n|\u03c4| < 0.5 and the best (least-squares) approximation of that waveform as a linear combination of\nK = 3 vectors from the Taylor, polar, and SVD bases. The structure of the error as a function of the\ntime shift \u03c4 re\ufb02ects the structure of these bases. The Taylor approximation is chosen to be exactly\naccurate at \u03c4 = 0 while the polar basis is chosen to be precisely accurate at \u03c4 = 0, \u2206/2,\u2212\u2206/2. The\nSVD basis gives the lowest mean error across time shifts.\n\ni and maxi \u03a3k,kvk\nk=1(c(k)\u2212a\u03a3k,kvk\n\nFigure 1: Using sample waveform f (t) \u221d t exp(\u2212t2) (left panel), we compare the error introduced\nby approximating f (t\u2212 \u03c4 ) for varying \u03c4 with a linear combination of K = 3 basis vectors, from the\nTaylor, polar or SVD bases. Basis vectors are shown in the middle three panels, and error in the far\nright panel. The SVD basis introduces the least error on average over the shift \u03c4. The average errors\nfor the Taylor, polar, and SVD bases are 0.026, 0.027, and 0.014 respectively.\n\n3.2 Greedy recovery\n\nHaving chosen our basis, we then greedily recover the time bins in which an occurrence of each\nwaveform appears to be present. We would like to build up a set of pairs (n, j) corresponding to\nan instance of the nth waveform in the jth time bin. (In our third step, we will re\ufb01ne the estimate\nwithin the chosen bins.)\nOur greedy method is motivated by Orthogonal Matching Pursuit (OMP), which is used to recover a\nsparse solution x from measurements y = Ax. In OMP [11], one greedily adds a single dictionary\nelement to an estimated support set S at each iteration, and then projects orthogonally to adjust the\ncoef\ufb01cients of all chosen dictionary elements. After initializing with S = \u2205, x = 0, one iterates the\nfollowing until a stopping criterion is met:\n\nr = y \u2212 Ax\nj = argmaxj{|(cid:104)aj, r(cid:105)| s.t. j \u2208 {1, ...J}\\S}\nS = S \u222a {j}\nx = argminz{||y \u2212 Az||2 s.t. zi = 0 \u2200 i /\u2208 S}\n\nIf we knew the sparsity of the signal, we could use that as our stopping condition. Normally we do\nnot know the sparsity a priori; we stop when changes in the residual become suf\ufb01ciently small.\nWe adjust this method to choose at each step not a single additional element but rather a set of\nK associated basis vectors. S is again initialized to be empty, but at each step we add a time-\nbin/waveform pair (n, j), which is associated with K basis vectors. In this way, we are adding K\nvectors at each step, instead of one as in OMP. We greedily add the next index (n, j) according to:\n\n(n, j) = argminm,i\n\nmin\ncm,i\n\nc(k)\nm,ig(k)\n\nm,i \u2212 r(cid:107)2\n\n2 s.t. cm,i \u2208 C} , (m, i) \u2208 Sc\n\n(5)\n\n(cid:40)\n\n{(cid:107) k(cid:88)\n\ni=1\n\n5\n\n(cid:41)\n\n5050.500.5Original WaveformApproximation ErrorTaylor: Polar:SVD: 0.0270.027 0.014Basis VectorsTaylorPolarSVD5050.200.25051015052020.500.50.020.040.060.08time shiftl2 error TaylorPolarSVDf(t)tttt\fm,i} are the chosen basis vectors (Taylor, polar, or SVD), and C is the corresponding con-\n\nHere {g(k)\nstraint set, as in Section 2.\nIn comparison with the greedy step in OMP, choosing j as in (5) is more costly, because we need\nto perform a constrained optimization over a K dimensional space for each n, j. Fortunately, it is\nnot necessary to repeat the optimization for each of the Nf \u00b7 N\u2206 possible indices each time we add\nan index. Assuming waves are localized in time, we need only update the results of the constrained\noptimization locally. When we update the residual r by subtracting the newly identi\ufb01ed waveform\nn in the jth bin, the residual only changes in the bins at or near the jth bin, so we need only update\n\nthe quantity mincn,j(cid:48){(cid:107)(cid:80)k\n\ni=1 c(k)\n\nn,j(cid:48)g(k)\n\nn,j(cid:48) \u2212 r(cid:107)2\n\n2 s.t. cn,j(cid:48) \u2208 C } for j(cid:48) neighboring j.\n\n3.3 Estimating time shifts\n\nHaving greedily added a new waveform/timebin index pair (n, j), we next de\ufb01ne our update step,\nwhich will correspond to the orthogonal projection in OMP. We present two alternatives, one of\nwhich most closely mirrors the corresponding step in OMP, the other of which works within the\nFourier domain to obtain more accurate recovery.\nTo most closely follow the steps of OMP, at each iteration after updating S we update coef\ufb01cients c\naccording to:\n\nsubject to cn,j \u2208 C \u2200 (n, j) \u2208 S\n\n(6)\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)\n(cid:88)\n\ufb01nding the new residual r =(cid:80)\n\nargminc\n\n(n,j)\u2208S\n\nK(cid:88)\n\nk=1\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)2\n\n2\n\nc(k)\nn,jg(k)\n\nn,j \u2212 y\n(cid:80)K\n\nWe alternate between the greedily updating S via (5), and updating c as in (6), at each iteration\nn,j\u2212y ) until the (cid:96)2 stopping criterion is reached.\n\nThen, one maps back from {cn,j}(n,j)\u2208S to {a(n,j), \u03c4(n,j)}(n,j)\u2208S as described in Section 2.\nAlternatively we may replace the orthogonal projection step with a more accurate recovery of spike\ntimings that involves working in the Fourier domain. We use the property of the Fourier transform\nwith respect to translation that: (f (t \u2212 \u03c4 ))\u2227 = e2\u03c0i\u03c4 \u02c6f. This allows us to estimate a, \u03c4 directly via:\n\nk=1 c(k)\n\nn,jg(k)\n\n(n,j)\u2208S\n\nargmina,\u03c4(cid:107)(\n\nan,je2\u03c0i\u03c9\u03c4n,j \u02c6fn,j(\u03c9)) \u2212 \u02c6y(\u03c9)(cid:107)2 subject to |\u03c4n,j| < \u2206/2 \u2200 (n, j) \u2208 S\n\n(7)\n\n(cid:88)\n\nn,j\u2208S\n\nThis is a nonlinear and non-convex constrained optimization problem. However, it can be solved rea-\nsonably quickly using, for example, trust region methods. The search space is dramatically reduced\nbecause \u03c4 has only |S| entries, each constrained to be small in absolute value. By searching directly\nfor a, \u03c4 as in (7) we sacri\ufb01ce convexity, but with the bene\ufb01t of eliminating from this step error of\ninterpolation introduced as we map back from c to a, \u03c4 using \u03a6\u22121 or a least squares estimation.\nIt is easy and often helpful to add inequality constraints to a as well, for example requiring a to be\nin some interval around 1, and we do impose this in our spike-sorting simulations and analysis in\nSection 4. Such a requirement effectively imposes a uniform prior on a over the chosen interval. It\nwould be an interesting future project to explore imposing other priors on a.\n\n4 Results\n\nWe test COMP and CBP for each choice of basis on simulated and neural data. Here, COMP denotes\nthe greedy method that includes direct estimation of a and \u03c4 during the update set as in (7). The\nconvex optimization for CBP is implemented using the cvx package for MATLAB [7], [6].\n\n4.1 Simulated data\nWe simulate a signal y as the sum of time-shifted copies of two sample waveforms f1(t) \u221d\nt exp(\u2212t2) and f2(t) \u221d e\u2212t4/16 \u2212 e\u2212t2 (Figure 2a). There are s1 = s2 = 5 shifted copies of\nf1 and f2, respectively. The time shifts are independently generated for each of the two waveforms\nusing a Poisson process (truncated after 5 spikes), and independent Gaussian noise of variance \u03c32 is\n\n6\n\n\fFigure 2: (a) Waveforms present in the signal. (b) A noiseless (top) and noisy (bottom) signal with\n\u03c3 = .2. (c) Recovery using CBP. (d) Recovery using COMP (with a, \u03c4 updated as in (7)). (e) For\neach recovery method over different values of the standard deviation of the noise \u03c3, misses plus false\npositives, divided by the total number of events present, s = s1 + s2. (f) Average distance between\nthe true and estimated spike for each hit.\n\nadded at each time point. Figures 2b,c show an example noise-free signal (\u03c3 = 0), and noisy signal\n(\u03c3 = .2) on which each recovery method will be run.\nWe run CBP with the Taylor and polar bases, but also with our SVD basis, and COMP with all three\nbases. Since COMP here imposes a lower bound on a, we also impose a thresholding step after\nrecovery with CBP, discarding any recovered waveforms with amplitude less than .3. We \ufb01nd the\nthresholding generally improved the performance of the CBP algorithm by pruning false positives.\nThroughout, we use K = 3, since the polar basis requires 3 basis vectors per bin.\nWe categorize hits, false positive and misses based on whether a time shift estimate is within a\nthreshold of \u0001 = 1 of the true value. The \u201caverage hit error\u201d of Figure 2h, 3b is the average distance\nbetween the true and estimated event time for each estimate that is categorized as a hit. Results are\naveraged over 20 trials.\nWe compare CBP and COMP over different parameter regimes, varying the noise (\u03c3) and the bin\nsize (\u2206). Figures 2g and 3a show misses plus false positives for each method, normalized by the total\nnumber of events present. Figures 2f and 3b show average distance between the true and estimated\nspike for each estimate categorized as a hit. The best performance by both measures across nearly\nall parameter regimes considered is achieved by COMP using the SVD basis. COMP is more robust\nto noise (Figure 2g), and also to increases in bin width \u2206. Since both algorithms are faster for\nhigher \u2206, robustness with respect to \u2206 is an advantage. We also note a signi\ufb01cant increase in CBP\u2019s\nrobustness to noise when we implement it with our SVD basis rather than with the Taylor or polar\nbasis (Figure 2e).\nA signi\ufb01cant advantage of COMP over CBP is its speed. In Figure 3c we compare the speed of\nCOMP (solid) and CBP (dashed) algorithms for each basis. COMP yields vast gains in speed. The\ncomparison is especially dramatic for small \u2206, where results are most accurate across methods.\n\n4.2 Neural data\n\nWe now present recovery of spike times and identities from neural data. Recordings were made\nusing glass-coated tungsten electrodes in the lateral intraparietal sulcus (LIP) of a macaque monkey\nperforming a motion discrimination task. In addition to demonstrating the applicability of COMP\nto sorting spikes in neural data, this section also shows the resistance of COMP to a certain kind of\nerror that recovery via CBP can systematically commit, and which is relevant to neural data.\n\n7\n\n5050.500.5t5050.500.5tCBP-SVDCOMP-SVD0.05.1.2.400.511.522.5Noise ()(Misses + False Positives)/s CBPTaylorCBPPolarCBPSVDCOMPTaylorCOMPPolarCOMPSVD0.05.1.2.400.10.20.30.40.5Average Hit ErrorNoise ()02040608010010.500.5102040608010010.500.5102040608010000.511.502040608010000.511.5 TrueCOMPSVD02040608010000.511.502040608010000.511.5 TrueCBPSVDwaveform 1tttwaveform 2waveform 1waveform 1waveform 2waveform 2(a)(b)(c)(d)(e)(f)\fFigure 3: (a) Misses plus false positives, divided by the total number of events present, s = s1 + s2\nover different values of bin width \u2206. (b) Average distance between the true and estimated spike for\neach hit for each recovery method. (c) Run time for COMP (solid) and CBP (dashed) for each basis.\n\nFigure 4: (a) Two neural waveforms; each is close to as scaled copy of the other (b) Recovery of\nspikes via COMP (magenta) and CBP (cyan) using the SVD basis. CBP tends to recover small-\namplitude instances of waveform one where COMP recovers large amplitude instances of waveform\ntwo (c) Top: recovered traces. Lower panel: zooming in on an area of disagreement between COMP\nand CBP. The large-ampltude copy of waveform two more closely matches the trace\n\nIn the data, the waveform of one neuron resembles a scaled copy of another (Figure 4a).The sim-\nilarity causes problems for CBP or any other (cid:96)1 minimization based method that penalizes large\namplitudes. When the second waveform is present with an amplitude of one, CBP is likely to incor-\nrectly add a low-amplitude copy of the \ufb01rst waveform (to reduce the amplitude penalty), instead of\ncorrectly choosing the larger copy of the second waveform; the amplitude penalty for choosing the\ncorrect waveform can outweigh the higher (cid:96)2 error caused by including the incorrect waveform.\nThis misassignment is exactly what we observe (Figure 4b). We see that CBP tends to report small-\namplitude copies of waveform one where COMP reports large-amplitude copies of waveform two.\nAlthough we lack ground truth, the closer match of the recovered signal to data (Figure 4c) indicates\nthat the waveform identities and amplitudes identi\ufb01ed via COMP better explain the observed signal.\n\n5 Discussion\n\nWe have presented a new greedy method called Continuous Orthogonal Matching Pursuit (COMP)\nfor identifying the timings and amplitudes for waveforms from a signal that has the form of a (noisy)\nsum of shifted and scaled copies of several known waveforms. We draw upon the method of Contin-\nuous Basis Pursuit, and extend it in several ways. We leverage the success of Orthogonal Matching\nPursuit in the realm of sparse recovery, use a different basis derived from a singular value decom-\nposition, and also introduce a move to the Fourier domain to \ufb01ne-tune the recovered time shifts.\nOur SVD basis can also be used with CBP and in our simulations it increased performance of CBP\nas compared to previously used bases.\nIn our simulations COMP obtains increased accuracy as\nwell as greatly increased speed over CBP across nearly all regimes tested. Our results suggest that\ngreedy methods of the type introduced here may be quite promising for, among other applications,\nspike-sorting during the processing of neural data.\n\nAcknowledgments\n\nThis work was supported by the McKnight Foundation (JP), NSF CAREER Award IIS-1150186\n(JP), and grants from the NIH (NEI grant EY017366 and NIMH grant MH099611 to AH & JP).\n\n8\n\n0.511.522.500.511.52Bin Width ()(Misses + False Positives)/s0.511.522.500.511.52Bin Width ()(Misses + False Positives)/s0.511.522.500.10.20.30.40.50.60.70.8Average Hit ErrorBin Width ()(a)(b)0.511.522.50100200300400500Bin Width ()Computing Time CBPTaylorCBPPolarCBPSVDCOMPTaylorCOMPPolarCOMPSVD(c)00.511.520.50.40.30.20.100.1time (ms) Neuron 1Neuron 2(a)(b)(c)010203040506070809010000.511.5Neuron 1010203040506070809010000.511.5time (ms)Neuron 20204060801000.500.5time (ms)7070.57171.57272.5730.20.100.1time (ms)WaveformsRecovered SpikesVoltage TraceCOMP-SVDCBP-SVD\fReferences\n[1] Scott Shaobing Chen, David L Donoho, and Michael A Saunders. Atomic decomposition by\n\nbasis pursuit. SIAM journal on scienti\ufb01c computing, 20(1):33\u201361, 1998.\n\n[2] Wei Dai and Olgica Milenkovic. Subspace pursuit for compressive sensing signal reconstruc-\n\ntion. Information Theory, IEEE Transactions on, 55(5):2230\u20132249, 2009.\n\n[3] Chaitanya Ekanadham, Daniel Tranchina, and Eero P Simoncelli. A blind deconvolution\nmethod for neural spike identi\ufb01cation. In Proceedings of the 25th Annual Conference on Neu-\nral Information Processing Systems (NIPS11), volume 23, 2011.\n\n[4] Chaitanya Ekanadham, Daniel Tranchina, and Eero P Simoncelli. Recovery of sparse\ntranslation-invariant signals with continuous basis pursuit. Signal Processing, IEEE Trans-\nactions on, 59(10):4735\u20134744, 2011.\n\n[5] D. Ekanadham, C.vand Tranchina and E. P. Simoncelli. A uni\ufb01ed framework and method for\n\nautomatic neural spike identi\ufb01cation. Journal of Neuroscience Methods, 222:47\u201355, 2014.\n\n[6] M. Grant and S. Boyd. Graph implementations for nonsmooth convex programs. In V. Blondel,\nS. Boyd, and H. Kimura, editors, Recent Advances in Learning and Control, Lecture Notes in\nControl and Information Sciences, pages 95\u2013110. Springer-Verlag Limited, 2008. http:\n//stanford.edu/\u02dcboyd/graph_dcp.html.\n\n[7] CVX Research Inc. CVX: Matlab software for disciplined convex programming, version 2.0.\n\nhttp://cvxr.com/cvx, August 2012.\n\n[8] Deanna Needell and Joel A Tropp. Cosamp: Iterative signal recovery from incomplete and\n\ninaccurate samples. Applied and Computational Harmonic Analysis, 26(3):301\u2013321, 2009.\n\n[9] Jonathan W Pillow, Jonathon Shlens, EJ Chichilnisky, and Eero P Simoncelli. A model-based\nspike sorting algorithm for removing correlation artifacts in multi-neuron recordings. PloS\none, 8(5):e62123, 2013.\n\n[10] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal\n\nStatistical Society. Series B (Methodological), pages 267\u2013288, 1996.\n\n[11] Joel A Tropp and Anna C Gilbert. Signal recovery from random measurements via orthogonal\n\nmatching pursuit. Information Theory, IEEE Transactions on, 53(12):4655\u20134666, 2007.\n\n[12] Martin Vetterli, Pina Marziliano, and Thierry Blu. Sampling signals with \ufb01nite rate of innova-\n\ntion. Signal Processing, IEEE Transactions on, 50(6):1417\u20131428, 2002.\n\n9\n\n\f", "award": [], "sourceid": 690, "authors": [{"given_name": "Karin", "family_name": "Knudson", "institution": "UT Austin"}, {"given_name": "Jacob", "family_name": "Yates", "institution": "University of Texas at Austin"}, {"given_name": "Alexander", "family_name": "Huk", "institution": "University of Texas at Austin"}, {"given_name": "Jonathan", "family_name": "Pillow", "institution": "UT Austin"}]}