{"title": "Accounting for network effects in neuronal responses using L1 regularized point process models", "book": "Advances in Neural Information Processing Systems", "page_first": 1099, "page_last": 1107, "abstract": "Activity of a neuron, even in the early sensory areas, is not simply a function of its local receptive field or tuning properties, but depends on global context of the stimulus, as well as the neural context. This suggests the activity of the surrounding neurons and global brain states can exert considerable influence on the activity of a neuron. In this paper we implemented an L1 regularized point process model to assess the contribution of multiple factors to the firing rate of many individual units recorded simultaneously from V1 with a 96-electrode Utah\" array. We found that the spikes of surrounding neurons indeed provide strong predictions of a neuron's response, in addition to the neuron's receptive field transfer function. We also found that the same spikes could be accounted for with the local field potentials, a surrogate measure of global network states. This work shows that accounting for network fluctuations can improve estimates of single trial firing rate and stimulus-response transfer functions.\"", "full_text": "Accounting for network effects in neuronal responses\n\nusing L1 regularized point process models\n\nRyan C. Kelly\u2217\n\nComputer Science Department\n\nCenter for the Neural Basis of Cognition\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\nrkelly@cs.cmu.edu\n\nRobert E. Kass\n\nDepartment of Statistics\n\nCenter for the Neural Basis of Cognition\n\nMachine Learning Department\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\nkass@stat.cmu.edu\n\nMatthew A. Smith\n\nUniversity of Pittsburgh\n\nCenter for the Neural Basis of Cognition\n\nPittsburgh, PA 15213\n\nmasmith@cnbc.cmu.edu\n\nTai Sing Lee\n\nComputer Science Department\n\nCenter for the Neural Basis of Cognition\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\ntai@cnbc.cmu.edu\n\nAbstract\n\nActivity of a neuron, even in the early sensory areas, is not simply a function of\nits local receptive \ufb01eld or tuning properties, but depends on global context of the\nstimulus, as well as the neural context. This suggests the activity of the surround-\ning neurons and global brain states can exert considerable in\ufb02uence on the activity\nof a neuron. In this paper we implemented an L1 regularized point process model\nto assess the contribution of multiple factors to the \ufb01ring rate of many individ-\nual units recorded simultaneously from V1 with a 96-electrode \u201cUtah\u201d array. We\nfound that the spikes of surrounding neurons indeed provide strong predictions of\na neuron\u2019s response, in addition to the neuron\u2019s receptive \ufb01eld transfer function.\nWe also found that the same spikes could be accounted for with the local \ufb01eld\npotentials, a surrogate measure of global network states. This work shows that ac-\ncounting for network \ufb02uctuations can improve estimates of single trial \ufb01ring rate\nand stimulus-response transfer functions.\n\n1\n\nIntroduction\n\nOne of the most striking features of spike trains is their variability \u2013 that is, the same visual stimulus\ndoes not elicit the same spike pattern on repeated presentations. This variability is often considered\nto be \u201cnoise,\u201d meaning that it is due to unknown factors. Identifying these unknowns should enable\nbetter characterization of neural responses. In the retina, it has recently become possible to record\nfrom a nearly complete population of certain types of ganglion cells in a region and identify the\n\u2217Data was collected by RCK, MAS and Adam Kohn in his laboratory as a part of a collaborative effort\nbetween the Kohn laboratory at Albert Einstein College of Medicine and the Lee laboratory at Carnegie Mellon\nUniversity. This work was supported by a National Science Foundation (NSF) Integrative Graduate Education\nand Research Traineeship to RCK (DGE-0549352), National Eye Institute (NEI) grant EY018894 to MAS,\nNSF 0635257 and NSF CISE IIS 0713206 to TSL, NIMH grant MH064537 to REK, and NEI grant EY016774\nto Adam Kohn. We thank Adam Kohn for collaboration, and we are also grateful to Amin Zandvakili, Xiaoxuan\nJia and Stephanie Wissig for assistance in data collection. We also thank Ben Poole for helpful comments.\n\n1\n\n\fcorrelation structure of this population [1]. However, in cerebral cortex, recording a full population\nof individual neurons in a region is currently impossible, and large scale recordings in vivo have\nbeen rare. Cross-trial variability is often removed in order to better reveal the effect of a signal of\ninterest. Classical methods attempt to explain the activity of neurons only in terms of stimulus \ufb01lters\nor kernels, ignoring sources unrelated to the stimulus.\nAn increasing number of groups have modeled spiking with point process models [2, 3, 4] to assess\nthe relative contributions of speci\ufb01c sources. Pillow et al.[3] used these methods to model retinal\nganglion cells, and they showed that the responses of cells could be predicted to a large extent\nusing the activity of nearby cells. We apply this technique to model spike trains in macaque V1 in\nvivo using L1 regularized point process models, which for discrete time become Generalized Linear\nModels (GLMs) [5]. In addition to incorporating the spike trains of nearby cells, we incorporated a\nmeaningful summary of local network activity, the local \ufb01eld potential (LFP), and show that it also\ncan explain an important part of the neuronal variability.\n\n2 L1 regularized Poisson regression\n\nFitting an unregularized point process model or GLM is simple with any convex optimization\nmethod, but the kind of neural data we have collected typically has a likelihood function that is\nrelatively \ufb02at near its minimum. This is a data constraint: there simply are not enough spikes to\nlocate the true parameters. To solve this over-\ufb01tting problem, we take the approach of regularizing\nthe GLMs with an L1 penalty (Lasso) on the log-likelihood function. Here we provide some details\nof how we \ufb01t L1-regularized GLMs using a Poisson noise assumption on data with large dimen-\nsionality. In general, a point process may be represented in terms of a conditional intensity function\nand, assuming the data (the spike times) are in suf\ufb01ciently small time bins, the resulting likelihood\nfunction may be approximated by a Poisson regression likelihood function. For ease of notation we\nleave the spiking history and other covariates implicit and write the conditional intensity (\ufb01ring rate)\nat time t as \u00b5(t). We then model the log of \u00b5(t) as a linear summation of other factors:\n\nlog \u00b5(t) =\n\n\u03b8jv(t)\n\nj = \u03b8V (t)\n\n(1)\n\nwhere vj is a feature of the data and \u03b8j is the corresponding parameter to be \ufb01t, and \u03b8 = {\u03b81, .., \u03b8N}.\nWe de\ufb01ne V to be a N \u00d7 T matrix (N parameters, T time steps) of variables we believe can impact\nthe \ufb01ring rate of a cell, where each column V (t) of V is v(t)\n1 , ..., v(t)\nN , which are the collection of\nobservables, including input stimulus and measured neural responses.\nWe de\ufb01ne y = y1...yT , with yt \u2208 {0, 1} as the observed binary spike train for the cell being\nmodeled, and let \u00b5t = \u00b5(t). The likelihood of the entire spike train is given by:\n\nN(cid:88)\n\nj\n\nWe obtain the log-likelihood by substituting Equation 1 into Equation 2 and taking the log:\n\nT(cid:89)\n\nP (Y = y1...yT ) =\n\n(\u00b5t)yt exp(\u2212\u00b5t)\n\nt\n\nyt!\n\nL(\u03b8) =\n\n(yt\u03b8V (t) \u2212 exp(\u03b8V (t)) \u2212 log yt!)\n\nT(cid:88)\n\nt\n\n(2)\n\n(3)\n\nMaximizing the likelihood with L1 penalty is equivalent to \ufb01nding the \u03b8 that minimizes the following\ncost function:\n\nR = \u2212L(\u03b8) +\n\n\u03bbj|\u03b8j|\n\n(4)\n\nN(cid:88)\n\nj=1\n\nAn L1 penalty term drives many of the \u03b8i coef\ufb01cients to zero. Fitting this equation with an L1\nconstraint is computationally dif\ufb01cult, because many standard convex optimization algorithms are\nonly guaranteed to converge for differentiable functions. Friedman et al. [5] discuss how coordinate\ndescent can ef\ufb01ciently facilitate GLM \ufb01tting on functions with L1 penalties, and they provide a\nderivation for the logistic regression case. Here we show a derivation for the Poisson regression\ncase.\n\n2\n\n\f\u02dc\u03b8. Then we proceed to minimize RQ = \u2212LQ(\u03b8) +(cid:80)N\n\nj=1 \u03bbj|\u03b8j|.\n\nWe approximate L(\u03b8) with LQ(\u03b8), a quadratic Taylor series expansion around the current estimate\n\nGiven \u02dc\u03b8, we can compute \u02dc\u00b5, the current estimate of \u00b5. A coordinate descent step for coordinate j\namounts to the minimization of RQ with respect to \u03b8j, for j \u2208 1 . . . N.\ndRQ\nd\u03b8j\n\nj )2\u2212\u03bbj\n\nFor \u02dc\u03b8j > 0,\n\nj )2 +\u03bbj,\n\nT(cid:88)\n\nT(cid:88)\n\ndRQ\nd\u03b8j\n\n= \u03c9j +\u03b8j\n\n= \u03c9j +\u03b8j\n\n\u02dc\u00b5t(v(t)\n\n\u02dc\u00b5t(v(t)\n\nt\n\nt\n\nT(cid:88)\n\nFor \u02dc\u03b8j < 0,\n\n(cid:16)\u2212yt + \u02dc\u00b5t \u2212 \u02dc\u00b5t(v(t)\n\n(cid:17)\n\nwhere \u03c9j =\n\n(5)\nThis is a linear function with positive slope, and a discontinuity at \u03b8j = 0. If \u2212\u03bbj < \u03c9j < \u03bbj,\n= 0 when\n\n(cid:54)= 0 and the minimum is at this discontinuity, \u03b8j = 0. Otherwise, if |\u03c9j| \u2265 \u03bbj, dRQ\n\nj\n\nt\n\n\u02dc\u03b8j)\n\nv(t)\nj\n\ndRQ\nd\u03b8j\n\nd\u03b8j\n\nT(cid:88)\nT(cid:88)\n\nt\n\n\u03b8j = \u2212(\u03c9j \u2212 \u03bbj)/(\n\n\u03b8j = \u2212(\u03c9j + \u03bbj)/(\n\n\u02dc\u00b5t(v(t)\n\nj )2),\n\nfor \u03c9j \u2265 \u03bbj\n\n\u02dc\u00b5t(v(t)\n\nj )2),\n\nfor \u03c9j \u2264 \u2212\u03bbj\n\n(6)\n\n(7)\n\nWe cyclically repeat these steps on all parameters until convergence.\n\nt\n\n2.1 Regularization path\n\nTo choose ef\ufb01ciently a penalty that avoids over-\ufb01tting, we implement a regularization path algo-\nrithm [6, 5]. The algorithm proceeds by computing a sequence of solutions \u03b8(1), \u03b8(2) . . . \u03b8(L) for\n\u03bb(1), \u03bb(2) . . . \u03bb(L). We standardize V (i.e. make each observable have mean 0 and standard devia-\ntion 1) and include a constant term v1, which is not penalized. With this normalization, we set all\n\u03bbj equal to the same \u03bb, except there is no penalty for v1.\nIn the coordinate descent method, we start with a \u03bb(1) = \u03bbmax = maxj |\u03c9j|, which is large enough\nso that all coef\ufb01cients are dominated by the regularization, and hence all coef\ufb01cients are 0 for this\nheavy penalty. In determining \u03bbmax, \u03c9j is computed based on the constant term v1 only. Initially,\nthe active set A(1) is empty, because \u03bb > \u03bbmax. The active set is the set of all coordinates with non-\nzero coef\ufb01cients for which the coordinate descent is being performed. As \u03bb is reduced and becomes\nsmaller than \u03bbmax, more and more non-zero terms will be included in the active set. For step i,\nwe compute the solution \u03b8(i) using penalty \u03bb(i) and \u03b8(i\u22121) as a warm start. As the regularization\nparameter \u03bb is decreased, the \ufb01tted models begin by under-\ufb01tting the data (with large \u03bb) and progress\nthrough the regularization path to over-\ufb01tting (with small \u03bb). The above algorithm works much faster\nwhen the active set is smaller, and we can halt the algorithm before over-\ufb01tting occurs.\nThe purpose of this regularization path is to \ufb01nd the best \u03bb. To quantitatively assess the model \ufb01ts,\nwe employ an ROC procedure [7]. To compute the ROC curve based on the conditional intensity\nfunction \u00b5(t), we \ufb01rst create a thresholded version of \u00b5(t) which serves as the prediction of spiking:\n(8)\n(9)\nFor each \ufb01xed threshold c, a point on the ROC curve is the true positive rate (TPR) versus the false\npositive rate (FPR). At each \u03bb in the regularization path, we compute the area under the ROC curve\n(AUC) to assess the relative performance of models \ufb01t below using a 10-fold cross validation pro-\ncedure. An alternative and natural metric is the likelihood value, and the peak of the regularization\npath was very similar between AUC and likelihood. We focus on AUC results because it was easier\nto relate the AUCs from different cells, some of which had very different likelihood values.\n\n\u02c6rc(t) =1 if \u00b5(t) \u2265 c\n0 if \u00b5(t) < c\n\n3 Modeling neural data\n\nWe report results from the application of Eq. (4) to neural data. The models here contain combina-\ntions of stimulus effects (spatio-temporal receptive \ufb01elds), coupling effects (history terms and past\n\n3\n\n\fspikes from other cells), and network effects (given by the LFP). We \ufb01nd that cells had different\ndegrees of contributions from the different terms, ranging from entirely stimulus-dependent cells to\nentirely network-dependent cells.\n\n3.1 Methods\n\nThe details of the array insertion have been described elsewhere [8]. Brie\ufb02y, we inserted the ar-\nray 0.6 mm into cortex using a pneumatic insertion device [9], which led to recordings con\ufb01ned\nmostly to layers 2\u20133 of parafoveal V1 (receptive \ufb01elds within 5\u25e6 of the fovea) in an anesthetized\nand paralyzed macaque (sufentanil anesthesia). Signals from each microelectrode were ampli\ufb01ed\nand bandpass \ufb01ltered (250 Hz to 7.5 kHz) to acquire spiking data. Waveform segments that ex-\nceeded a threshold (set as a multiple of the rms noise on each channel) were digitized (30 kHz) and\nsorted off-line. We \ufb01rst performed a principal components analysis by waveform shape [10] and\nthen re\ufb01ned the output by hand with custom time-amplitude window discrimination software (writ-\nten in MATLAB; MathWorks). We studied the responses of cells to visual stimuli, presented on a\ncomputer screen. All stimuli were generated with custom software on a Silicon Graphics Octane2\nWorkstation and displayed at a resolution of 1024 \u00d7 768 pixels and frame rate of 100 Hz on a CRT\nmonitor (stimulus intensities were linearized in luminance). We presented Gaussian white noise\nmovies, with 8 pixel spatial blocks chosen independently from a Gaussian distribution. The movies\nwere 5\u25e6 in width and height, 320 by 320 pixels. The stimuli were all surrounded by a gray \ufb01eld of\naverage luminance. Frames lasted 4 monitor refreshes, so the duration of each frame of noise was\n40 ms. The average noise correlation between pairs of cells was 0.256.\nThe biggest obstacle for \ufb01tting models is the huge dimensionality in the number of parameters and\nin the large number of observations. To reduce the problem size, we binned the spiking observations\nat 10 ms instead of 1 ms. The procedures we used to reduce the parameter sizes are given in the\ncorresponding sections below. We used cross validation to estimate the performance of the models\non 10 different test sets. Each test set consisted of 12,000 test observations and 180,000 training\nobservations. The penalty in the regularization path with the largest average area across all the cross\nvalidation runs was considered the optimal penalty.\nThe full model \u00b5(t) = \u00b5STIM + \u00b5COUP + \u00b5LFP has the following form:\n\nFor modeling the stimulus alone we used the form\n\n(cid:88)\n\n(cid:88)\n\n(cid:88)\n\nkxy\u03c4 sxy(t \u2212 \u03c4 )\n\n(11)\n\nlog \u00b5STIM(t) =\n\nx\n\ny\n\n\u03c4\n\nHere, sxy(t \u2212 \u03c4 ) is an individual feature of the stimulus \u03c4 ms before the current observation (time\nt). If we were to use pixel intensities over the last 150 ms (15 observations), the 320 \u00d7 320 movie\nwould have 1 536 000 parameters, a number far too large for the \ufb01tting method and data. We took the\napproach of \ufb01rst restricting the movie to a much smaller region (40x40 pixels) chosen using spike-\ntriggered average (STA) maps of the neural responses. Then, we transformed the stimulus space\nwith overlapping Gaussian bump \ufb01lters, which are very similar to basis functions. The separation\nof the bump centers was 4 pixels spatially in the 40x40 pixel space, and 2 time points (20 ms). The\ntotal number of parameters was 10 \u00d7 10 \u00d7 7 = 700, which is 100 parameters for each of 7 distinct\ntime points. Thus, sxy(t \u2212 \u03c4 ) corresponds to the convolution of a small Gaussian bump indexed by\nx, y, \u03c4 with the recent stimulus frames. Figure 1 shows the regularization path for one example cell.\nFor each model (11), we chose the \u03bb corresponding to the peak of the regularization path. Figure 2A\nshows the k parameters for some example cells transformed back to the original pixel space, with\nthe corresponding STAs alongside for comparison. The models produce cleaner receptive \ufb01elds, a\nconsequence of the L1 regularization. Figure 2D shows the population results for these models. The\ndistribution of AUC values is generally low, with many cells near chance (.5), and a smaller portion\nof cells climbing to 0.6 or higher. This suggests that a linear receptive \ufb01eld may not be appropriate\n\n4\n\n(cid:88)\n\n(cid:88)\n\n(cid:88)\n\nM(cid:88)\n\n100(cid:88)\n\nE(cid:88)\n\nkxy\u03c4 sxy(t \u2212 \u03c4 ) +\n\n\u03b3iri(t \u2212 \u03c4 ) +\n\n\u03b2ixi(t)\n\n(10)\n\nx\n\ny\n\n\u03c4\n\ni\n\n\u03c4 =1\n\ni\n\nlog \u00b5(t) =\n\n3.2 Stimulus effects\n\n\fFigure 1: Example of \ufb01tting a GLM with stimulus terms for a single cell. A: For four L1 penalties\n(\u03bb), the corresponding {ki} are shown, with the STA above for reference. For high \u03bb, the model is\nsparser. B: The regularization path for this same cell. \u03bb = 172 is the peak of the AUC curve and is\nthus the best model by this metric.\n\nFigure 2: Different GLM types. A: 4 example stimulus models, with the STAs shown for reference.\nThese models correspond to the AUC peaks of their respective regularization paths. B: 3 example\ncells \ufb01t with spike coupling models. The coef\ufb01cients are shown with respect to the cell location on\nthe array. If multiple cells were isolated on the same electrode, the square is divided into 2 or 3 parts.\nNearby electrodes tend to have more strength in their \ufb01tted coef\ufb01cients. C: 3 example cells \ufb01t with\nLFP models. As in B, nearby electrodes carry more information about spiking. D-F: Population\nresults for A-C. These are plots of the AUCs for the 57 cells modeled.\n\n5\n\n\u03c4 = 30\u03c4 = 50\u03c4 = 70\u03c4 = 100\u03c4 = 1307411727140.50.520.540.560.58\u03bbAUCSTA\u03bb = 714\u03bb = 172\u03bb = 41\u03bb = 7AB246810246810ElectrodeElectrodeStimulus model AUC 0.50.550.60.650.7{kxy}20msSTA40ms60msCell at (4,9)80ms100ms{kxy}STACell at (3,4){kxy}STACell at (7,2){kxy}STACell at (7,6)246810246810ElectrodeCoupling model AUC 0.50.60.70.80.9246810246810{\u03b3i} [cell at (1,1)] \u22120.0500.05246810{\u03b3i} [cell at (3,5)] \u22120.200.2246810{\u03b3i} [cell at (3,5)] \u22120.200.2246810246810ElectrodeLFP model AUC 0.50.60.70.80.9246810246810{\u03b2i} [cell at (3,1)] \u2212101246810{\u03b2i} [cell at (5,4)] \u22120.200.2246810{\u03b2i} [cell at (3,4)] \u2212101Stimulus modelsSpike coupling modelsLFP modelsABCDEF\ffor many of these cells. In addition, there is an effect of electrode location, with cells with the\nhighest AUC located on the left side of the array.\n\n3.3 Spike coupling effects\n\nFor the coupling terms, we used the history of \ufb01ring for the other cells recording in the array as well\nas the history for the cell being modeled. These take the form:\n\nlog \u00b5COUP(t) =\n\n\u03b3iri(t \u2212 \u03c4 )\n\n(12)\n\ni\n\n\u03c4 =1\n\nwith \u03b3i being the coupling strength/coef\ufb01cient, and ri(t \u2212 \u03c4 ) being the activity of the ith neuron \u03c4\nms earlier, and M being the number of neurons. Thus the in\ufb02uence from a surrounding neuron is\ncomputed based on its spike count in the last 100ms. As expected, nearby cells generally had the\nlargest coef\ufb01cients (Figure 2B), indicating that cells in closer proximity tend to have more correla-\ntion in their spike trains. We observed a large range of AUC values for these \ufb01ts (Figure 2E), from\nnear chance levels up to .9. There was a signi\ufb01cant (p < 10\u22126) negative correlation between the\nAUC and the number of nonzero coef\ufb01cients used in the model. Thus, the units which were well\npredicted by the other \ufb01ring in the population also did not require a large number of parameters to\nachieve the best AUC possible. Also apparent in the \ufb01gure is that the relationship between spike\ntrain predictability and array location had the opposite pattern of the stimulus model results, with\nunits toward the left side of the array generally having smaller AUCs based on the population activity\nthan units on the right side.\nThe models described above had one parameter per cell in the population, with each parameter\ncorresponding to the \ufb01ring over a 100 ms past window. We also \ufb01t models with 3 parameters per\ncell in the population, corresponding to the spikes in three non-overlapping temporal epochs (1-\n20 ms, 21-50 ms, 51-100 ms). These were considered to be independent parameters, and thus the\nactive set could contain none, some, or all of these 3 parameters for each cell. The mean AUC across\nthe population was .01 larger with this increased parameter set, but also the mean active set size was\n100 elements larger. We did not attempt to model effects on very short timescales, since we binned\nthe spikes at 10 ms.\n\n3.4 Network models\n\nThe spiking of cells in the population serves to help predict spiking very well for many cells, but the\ncause of this relationship remains undetermined. The speci\ufb01c timing of spikes may play a large role\nin predicting spikes, but alternatively the general network \ufb02uctuations could be the primary cause.\nTo disentangle these possibilities, we can model the network state using the LFP as an estimate:\n\nM(cid:88)\n\n100(cid:88)\n\nE(cid:88)\n\nlog \u00b5LFP(t) =\n\n\u03b2ixi(t)\n\n(13)\n\ni\n\nHere, E is the number of surrounding electrodes, xi is the LFP value from electrode i, and \u03b2i is the\ncoef\ufb01cient of the LFP in\ufb02uence on the spiking activity of the neuron being considered. Figure 2C\nshows the model coef\ufb01cients of several cells when {xi} are the LFP values at time t. The variance in\nthe coef\ufb01cient values falls off with increasing distance, with distant electrodes providing relatively\nless information about spiking. Across the population, the AUC values for the cells are almost the\nsame as in the spike coupling models (Figure 2F), and consequently the spatial pattern of AUC on\nthe array is almost identical. We also investigated models built using the LFP power in different\nfrequency bands, and we found that the LFP power in the gamma frequency range (30-80Hz) pro-\nduced similar results. With these models, the AUC distributions were remarkably similar to the\nmodels built with spike coupling terms (Figure 2E). The LFP re\ufb02ects activity over a very broad re-\ngion, and thus for these data the connectivity between most pairs in the population do not generally\nhave much more predictive power than the more broad network dynamics. This suggests that much\nof the power of the spike coupling terms above is a direct result of both cells being driven by the\nunderlying network dynamics, rather than by a direct connection between the two cells unrelated\n\n6\n\n\fFigure 3: Scatter plots of the AUC values for the population under different models and conditions.\nA,B: The full model improves upon the individual LFP or stimulus models. C: For most cells, trial\nshuf\ufb02ing the spike trains destroys the effectiveness of the models. D: Taking the network state and\ncell spikes into account generally yields a larger AUC than \u00b5PSTH.\n\nto the more global dynamics. Models of spike coupling with more precise timing (< 10 ms) may\nre\ufb02ect information that these LFP terms would fail to capture.\n\n4 Capturing variability and predicting the PSTH\n\nNeuronal \ufb01ring has long been accepted to have sources of noise that have typically been ignored or\nremoved. The simplest conception is that each of these cells has an independent source of intrinsic\nnoise, and to recover the underlying \ufb01ring rate function we can simply repeat a stimulus many times.\nWe have shown above that for many cells, a portion of the noise is not independent from the rest\nof the network and is related to other cells and the LFP. The population included a distribution of\ncells, and the GLMs showed that some cells included mostly network terms, and other cells included\nmostly stimulus terms. For most cells, the models included signi\ufb01cant contributions from both types\nof terms.\nFrom Figure 3A and 3B we can see that the inclusion of network terms does indeed explain more of\nthe spikes than the stimulus model alone. It is theoretically possible that the LFP or spikes from other\ncells are re\ufb02ecting higher order terms of the stimulus-response relationship that the linear model fails\nto capture, and the GLM is harnessing these effects to increase AUC. We performed an AUC analysis\non test data from the same neurons: 120 trials of the same 30 second noise movie. Since the stimulus\nwas repeated we were able to shuf\ufb02e trials. Any stimulus information is present on every trial of this\nrepeated stimulus, and so if the AUC improvement is entirely due to the network terms capturing\nstimulus information, there should be no decrease in AUC in the trial-shuf\ufb02ed condition. Figure 3C\nshows that this is not the case: trial shuf\ufb02ing reduces AUC values greatly across the population. This\nmeans that the network terms are not merely capturing extraneous stimulus effects.\nKelly et al. [11] show that when taking the network state into account with a very simple GLM, the\nsignal to noise in the stimulus-response relationship was improved. The PSTH is typically used as a\nproxy for the stimulus effects. The idea is that any noise terms are averaged out after many trials to\nthe same repeated stimulus. For the data set of a single repeated noise movie, we made a comparison\nof the AUC values computed from the PSTH to the AUC values due to the models. Recall that the\nAUC is computed from an ROC analysis on the thresholded \u00b5 function. Here, we de\ufb01ne \u00b5PSTH to\nbe the estimated \ufb01ring rate given by the PSTH. Thus, it is the same function for every trial to the\nrepeated stimulus. We compared the AUC values in the same manner as in the model procedure\nabove, building the \u00b5PSTH function on 90% of the trials and holding out 10% of the trials for the\nROC computation. Figure 3D shows the comparison: for almost every cell the full model is better\nat predicting the spikes than the PSTH itself, even though the stimulus component of the model is\nmerely a linear \ufb01lter.\nIf the extra-stimulus variability has truly been averaged out of the PSTH, the stimulus-only model\nshould do equally well in modeling the PSTH as the full model. To compare the ability for different\nmodels to reconstruct the PSTH, we computed the predicted \ufb01ring rates (\u00b5) to each of the 120 trials\nof the same white noise movie, and the predicted PSTH is simply the average of these 120 temporal\nfunctions. We computed these model predictions for the LFP-only model, stimulus-only model,\nand full model. Figure 4A shows examples of these simulated PSTHs for these three conditions.\nFigure 4B shows the overall results for the population. The stimulus model predicted the PSTH\n\n7\n\n0.60.810.50.60.70.80.91Stimulus model AUCFull model AUC0.60.810.50.60.70.80.91LFP model AUC0.60.810.50.60.70.80.91Trial\u2212shuffled AUC0.60.810.50.60.70.80.91\u00b5PSTH AUCABCD\fFigure 4: A: For an example cell, the ability for different models to predict the PSTH. Taking the\nnetwork state into account yields a closer estimate to the PSTH, indicating that the PSTH contains\neffects unrelated to the stimulus. B: Population histograms of the PSTH variance explained. Includ-\ning all the terms yields a dramatic increase in the variance explained across the population.\n\nwell for some cells, but for most others the stimulus model alone cannot match the full model\u2019s\nperformance, indicating a corruption of the PSTH by network effects.\n\n5 Conclusions\n\nIn this paper we have implemented a L1 regularized point process model to account for stimulus\neffects, neuronal interactions and network state effects for explaining the spiking activity of V1 neu-\nrons. We have showed the derivation for a form of L1 regularized Poisson regression, and identi\ufb01ed\nand implemented a number of computational approaches including coordinate descent and the regu-\nlarization path. These are crucial for solving the point process model for in vivo V1 data, and to our\nknowledge have not been previously attempted on this scale.\nUsing this model, we have shown that activity of cells in the surrounding population can account\nfor a signi\ufb01cant amount of the variance in the \ufb01ring of many neurons. We found that the LFP,\na broad indicator of the synaptic activity of many cells across a large region (the network state),\ncan account for a large share of these in\ufb02uences from the surrounding cells. This suggests that\nthese spikes are due to the general network state rather than precise spike timing or individual true\nsynaptic connections between a pair of cells. This is consistent with earlier observations that the\nspiking activity of a neuron is linked to ongoing population activity as measured with optical imaging\n[12] and LFP [13]. This link to the state of the local population is an in\ufb02uential force affecting\nthe variability in a cell\u2019s spiking behavior.\nIndeed, groups of neurons transition between \u201cUp\u201d\n(depolarized) and \u201cDown\u201d (hyperpolarized) states, which leads to cycles of higher and lower than\nnormal \ufb01ring rates (for review, see [14]). These state transitions occur in sleeping and anesthetized\nanimals, in cortical slices [15], as well as in awake animal [16, 17] and awake human patients [18,\n19], and might be responsible for generating much of the slow time scale correlation. Our additional\nexperiments showed similar results are found in experiments with natural movie stimulation.\nBy directly modeling these sources of variability, this method begins to allow us to obtain better\nencoding models and more accurately isolate the elements of the stimulus that are truly driving the\ncells\u2019 responses. By attributing portions of \ufb01ring to network state effects (as indicated by the LFP),\nthis approach can obtain more accurate estimates of the underlying connectivity among neurons in\ncortical circuits.\n\n8\n\n00.511.522.5302040Full model, R2 = 0.424Spikes/sTime (s)02040Stimulus model, R2 = 0.276Spikes/s02040LFP model, R2 = 0.058Spikes/s \u00b5PSTH 00.10.20.30.40.50.605Count0.29Full modelR201020Count0.13Stimulus model01020Count0.10LFP modelAB\fReferences\n[1] Jonathon Shlens, Greg D Field, Jeffrey L Gauthier, Martin Greschner, Alexander Sher, Alan M\nLitke, and E J Chichilnisky. The structure of large-scale synchronized \ufb01ring in primate retina.\nJ Neurosci, 29(15):5022\u201331, Apr 2009.\n\n[2] Wilson Truccolo, Leigh R Hochberg, and John P Donoghue. Collective dynamics in human\nand monkey sensorimotor cortex: predicting single neuron spikes. Nat Neurosci, 13(1):105\u2013\n11, Jan 2010.\n\n[3] Jonathan W Pillow, Jonathon Shlens, Liam Paninski, Alexander Sher, Alan M Litke, E J\nChichilnisky, and Eero P Simoncelli. Spatio-temporal correlations and visual signalling in\na complete neuronal population. Nature, 454(7207):995\u20139, Aug 2008.\n\n[4] Robert E. Kass, Valerie Ventura, and Emory N. Brown. Statistical issues in the analysis of\n\nneuronal data. J Neurophysiol, 94:8\u201325, 2005.\n\n[5] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Regularization paths for generalized\n\nlinear models via coordinate descent. Department of Statistics, Jan 2008.\n\n[6] Mee Young Park and Trevor Hastie. L1 regularization path algorithm for generalized lin-\nJournal of the Royal Statistical Society: Series B (Statistical Methodology),\n\near models.\n69(4):659\u2013677, 2007.\n\n[7] Nicholas G Hatsopoulos, Qingqing Xu, and Yali Amit. Encoding of movement fragments in\n\nthe motor cortex. J Neurosci, 27(19):5105\u201314, May 2007.\n\n[8] Matthew A Smith and Adam Kohn. Spatial and temporal scales of neuronal correlation in\n\nprimary visual cortex. J Neurosci, 28(48):12591\u2013603, Nov 2008.\n\n[9] P J Rousche and Richard A Normann. A method for pneumatically inserting an array of\n\npenetrating electrodes into cortical tissue. Ann Biomed Eng, 20(4):413\u201322, Jan 1992.\n\n[10] Shy Shoham, Matthew R Fellows, and Richard A Normann. Robust, automatic spike sorting\nusing mixtures of multivariate t-distributions. J Neurosci Methods, 127(2):111\u201322, Aug 2003.\n[11] Ryan C Kelly, Matthew A Smith, Jason M Samonds, Adam Kohn, A B Bonds, J Anthony\nMovshon, and Tai Sing Lee. Comparison of recordings from microelectrode arrays and single\nelectrodes in the visual cortex. J Neurosci, 27(2):261\u20134, Jan 2007.\n\n[12] M Tsodyks, Tal Kenet, Amiram Grinvald, and A Arieli. Linking spontaneous activity of single\ncortical neurons and the underlying functional architecture. Science, 286(5446):1943\u20136, Dec\n1999.\n\n[13] Ian Nauhaus, Laura Busse, Matteo Carandini, and Dario L Ringach. Stimulus contrast modu-\n\nlates functional connectivity in visual cortex. Nat Neurosci, 12(1):70\u20136, Jan 2009.\n\n[14] Alain Destexhe and Diego Contreras. Neuronal computations with stochastic network states.\n\nScience, 314(5796):85\u201390, Oct 2006.\n\n[15] Hope A Johnson and Dean V Buonomano. Development and plasticity of spontaneous activity\n\nand up states in cortical organotypic slices. J Neurosci, 27(22):5915\u201325, May 2007.\n\n[16] David A Leopold, Yusuke Murayama, and Nikos K Logothetis. Very slow activity \ufb02uctuations\nin monkey visual cortex: implications for functional brain imaging. Cereb Cortex, 13(4):422\u2013\n33, Apr 2003.\n\n[17] Artur Luczak, Peter Barth\u00b4o, Stephan L Marguet, Gy\u00a8orgy Buzs\u00b4aki, and Kenneth D Harris.\nSequential structure of neocortical spontaneous activity in vivo. Proc Natl Acad Sci USA,\n104(1):347\u201352, Jan 2007.\n\n[18] Biyu J He, Abraham Z Snyder, John M Zempel, Matthew D Smyth, and Marcus E Raichle.\nElectrophysiological correlates of the brain\u2019s intrinsic large-scale functional architecture. Proc\nNatl Acad Sci USA, 105(41):16039\u201344, Oct 2008.\n\n[19] Yuval Nir, Roy Mukamel, Ilan Dinstein, Eran Privman, Michal Harel, Lior Fisch, Hagar\nGelbard-Sagiv, Svetlana Kipervasser, Fani Andelman, Miri Y Neufeld, Uri Kramer, Amos\nArieli, Itzhak Fried, and Rafael Malach. Interhemispheric correlations of slow spontaneous\nneuronal \ufb02uctuations revealed in human sensory cortex. Nat Neurosci, 11(9):1100\u20138, Sep\n2008.\n\n9\n\n\f", "award": [], "sourceid": 87, "authors": [{"given_name": "Ryan", "family_name": "Kelly", "institution": null}, {"given_name": "Matthew", "family_name": "Smith", "institution": null}, {"given_name": "Robert", "family_name": "Kass", "institution": null}, {"given_name": "Tai", "family_name": "Lee", "institution": null}]}