{"title": "Estimating Nonlinear Neural Response Functions using GP Priors and Kronecker Methods", "book": "Advances in Neural Information Processing Systems", "page_first": 3603, "page_last": 3611, "abstract": "Jointly characterizing neural responses in terms of several external variables promises novel insights into circuit function, but remains computationally prohibitive in practice. Here we use gaussian process (GP) priors and exploit recent advances in fast GP inference and learning based on Kronecker methods, to efficiently estimate multidimensional nonlinear tuning functions. Our estimator require considerably less data than traditional methods and further provides principled uncertainty estimates. We apply these tools to hippocampal recordings during open field exploration and use them to characterize the joint dependence of CA1 responses on the position of the animal and several other variables, including the animal's speed, direction of motion, and network oscillations.Our results provide an unprecedentedly detailed quantification of the tuning of hippocampal neurons. The model's generality suggests that our approach can be used to estimate neural response properties in other brain regions.", "full_text": "Estimating Nonlinear Neural Response Functions\n\nusing GP Priors and Kronecker Methods\n\nCristina Savin\nIST Austria\n\nKlosterneuburg, AT 3400\n\ncsavin@ist.ac.at\n\nGasper Tka\u02c7cik\n\nIST Austria\n\nKlosterneuburg, AT 3400\n\ntkacik@ist.ac.at\n\nAbstract\n\nJointly characterizing neural responses in terms of several external variables\npromises novel insights into circuit function, but remains computationally pro-\nhibitive in practice. Here we use gaussian process (GP) priors and exploit recent\nadvances in fast GP inference and learning based on Kronecker methods, to ef-\n\ufb01ciently estimate multidimensional nonlinear tuning functions. Our estimator\nrequires considerably less data than traditional methods and further provides princi-\npled uncertainty estimates. We apply these tools to hippocampal recordings during\nopen \ufb01eld exploration and use them to characterize the joint dependence of CA1\nresponses on the position of the animal and several other variables, including the\nanimal\u2019s speed, direction of motion, and network oscillations. Our results provide\nan unprecedentedly detailed quanti\ufb01cation of the tuning of hippocampal neurons.\nThe model\u2019s generality suggests that our approach can be used to estimate neural\nresponse properties in other brain regions.\n\n1\n\nIntroduction\n\nAn important facet of neural data analysis concerns characterizing the tuning properties of neurons,\nde\ufb01ned as the average \ufb01ring rate of a cell conditioned on the value of some external variables, for\ninstance the orientation of an image patch in a V1 cell, or the position of the animal within an\nenvironment for hippocampal cells. As experiments become more complex and more naturalistic, the\nnumber of variables that modulate neural responses increases. These include not only experimentally\ntargeted inputs but also variables that are no longer under the experimenter\u2019s control but which can\nbe (to a certain extent) measured, either external (the behavior of the animal) or internal (attentional\nlevel, network oscillations, etc). Characterizing these complex dependencies is very dif\ufb01cult, yet it\ncould provide important insights into neural circuits computation and function.\nTraditional estimates of a cell\u2019s tuning properties often manipulate one variable at the time or consider\nsimple dependencies between inputs and the neural responses e.g. Generalized Linear Models,\nGLM [1, 2]). There is comparatively little work that allows for complex input-output functional\nrelationships on multidimensional input spaces [3\u20135]. The reasons for this are twofold. On one hand,\ndealing with complex nonlinearities is computationally challenging, on the other hand, constraints\non experimental duration lead to a potentially very sparse sampling of the stimulus space, requiring\nadditional assumptions for a sensible interpolation. This problem is further exacerbated in experiments\nin awake animals where the sampling of the stimulus space is driven by the animal behavior. The\nfew solutions for nonlinear tuning properties rely on spline-based approximation of one-dimensional\nfunctions (for position on a linear track) [6] or assume a log-Gaussian Cox process generative model\nas a way to enforce smoothness of 2D functional maps [3\u20135]. These methods are usually restricted to\nat most two input dimensions (but see [4]).\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\f(cid:16)\n\n(cid:17)\n\n(cid:16)\n\n(cid:17)\n\n2\nd\n\nd+1\n\nd\n\ndN\n\ndN\n\nand O\n\nHere we take advantage of recent advances in scaling GP inference and learning using Kronecker\nmethods [7] to extend the approach in [3] to the multidimensional setting, while keeping the com-\nputational and memory requirements almost linear in dataset size N, O\n,\nrespectively (for d dimensions) [8]. Our formulation requires a discretization of the input space,1\nbut allows for a \ufb02exible selection of the kernels specifying different assumptions about the nature\nof the functional dependencies we are looking for in the data, with hyperparameters inferred by\nmaximizing marginal likelihood. We deal with the non-gaussian likelihood in the traditional way by\nusing a Laplace approximation of the posterior [8]. The critical ingredient for our approach is the\nparticular form of the covariance matrix that decomposes into a Kronecker product over covariances\ncorresponding to individual input dimensions, dramatically simplifying computations. The focus\nhere is not on the methods per se but rather on their previously unacknowledged utility for estimating\nmultidimensional nonlinear tuning functions.\nThe inferred tuning functions are probabilistic. The estimator is adaptive, in the sense that it relies\nstrongly on the prior in regions of the input space where data is scarce, but can \ufb02exibly capture\ncomplex input-output relations where enough data is available. It naturally comes equipped with error\nbars which can be used for instance for detecting shifts in receptive \ufb01eld properties due to learning.\nUsing arti\ufb01cial data we show that inference and learning in our model can robustly recover the\nunderlying structure of neural responses even in the experimentally realistic setting where the\nsampling of the input space is sparse and strongly non-uniform (due to stereotyped animal behavior).\nWe further argue for the utility of spectral mixture kernels as a powerful tool for detecting complex\nfunctional relationships beyond simple smoothing/interpolation. We go beyond arti\ufb01cial data that\nfollows the assumptions of the model exactly, and show robust estimation of tuning properties in\nseveral experimental recordings. For illustration purposes we focus here on data from the CA1 region\nof the hippocampus of rats, during an open \ufb01eld exploration task. We characterize several 3D tuning\nfunctions as a function of the animal\u2019s position but also additional internal (the overall activity in\nthe network at the time) or external variables (speed or direction of motion, time within experiment)\nand use these to derive new insights into the distribution of spatial and non-spatial information at the\nlevel of CA1 principal cell activity.\n\n2 Methods\n\nGenerative model\nGiven data in the form of spike count \u2013 input pairs D = {y(i), x(i)}i=1:N , we model neural activity\nas an inhomogeneous Poisson process with input-dependent \ufb01ring rate \u03bb (as in [3], see. Fig. 1a):\n\nP(y|x) =\n\nPoisson\n\n,\n\nwhere Poisson (y; \u03bb) =\n\n\u2212\u03bb.\n\n\u03bbye\n\n1\ny!\n\n(1)\n\n(cid:89)\n\ni\n\n(cid:16)\n\ny(i); \u03bb(x)(i)(cid:17)\n\nThe inputs x are de\ufb01ned on a d-dimensional lattice and the spike counts are measured within a time\nwindow \u03b4t for which the input is roughly constant (25.6ms, given by the frequency of positional\ntracking).2 We formalize assumptions about neural tuning as a GP prior f \u223c GP(\u00b5, k\u03b2), with\nf = log \u03bb(x), with a constant mean \u00b5i = \u03b1 (for the overall scale of neural responses) and a\ncovariance function k(\u00b7,\u00b7) with hyperparameters \u03b2. This covariance function de\ufb01nes our assumptions\nabout what kind of functional dependencies are expected in the data (smoothness, periodicity, etc.).\nThe exponential linking f to \u03bb provides a mathematically convenient way to enforce positivity of the\nmean \ufb01ring while keeping the posterior log-concave in f, justi\ufb01ying the use of Laplace methods for\n(cid:81)\napproximating the posterior (see also [3]).\nFor computational tractability we restrict our model to the class of product kernels k(x, x(cid:48)) =\nd kd(xd, x(cid:48)\nd) for which the covariance matrix decomposes as a Kronecker product K = K1 \u2297\nK2 \u2297 . . . Kd, allowing for ef\ufb01cient computation of determinants, matrix multiplications and eigen-\ndecomposition in terms of the individual factors Ki (see Suppl.Info. and [7]).\nThe individual kernels can be tailored to the speci\ufb01c application, allowing for a \ufb02exible characteriza-\ntion of individual input dimensions (inputs need not live in the same space, e.g. space-time, or can be\n1In practice many input dimensions are discrete to begin with (e.g. measurements of an animal\u2019s position), so\n\nthis is a weak requirement. The coarseness of the discretization depends on the application.\n\n2Input noise is ignored here, but could be explicitly incorporated in the generative model [9].\n\n2\n\n\fFigure 1: Model overview and estimator validation. a) Generative model: spike counts arise as\nPoisson draws with an input dependent mean, f (x) with an exponential linkage function. b) A GP\nprior speci\ufb01es the assumptions concerning the properties of this function (smoothness, periodicity,\netc). c) Place \ufb01eld estimates from arti\ufb01cial data; left to right: the position of the animal modelled as\na bounded random walk, ground-truth, traditional estimate (without smoothing), posterior mean of\nthe inferred functional. d) Vertical slice through the posterior with shaded area showing the 2 \u00b7 sd\ncon\ufb01dence region. d) Estimates of place \ufb01eld selectivity in an example CA1 recording during open\n\ufb01eld exploration in a cross-shaped box; separate estimates for 6min subsets.\n\nperiodic, e.g. the phase of theta oscillations). Here we use a classic squared-exponential (SE kernel\nfor simple interpolation/smoothing tasks, kd(x, x(cid:48)) = \u03c12\n, with parameters \u03b2 = {\u03c1, \u03c3}\nspecifying the output variance and lengthscale [9]. For tasks involving extrapolation or discovering\ncomplex patterns we use spectral mixture (SM) kernels, as a powerful and mathematically tractable\nroute towards automated kernel design [10]. SMs are stationary kernels de\ufb01ned as a linear mixture of\nbasis functions in the spectral domain:\n\n(x\u2212x\n2\u03c32\nd\n\nd exp\n\n(cid:48))2\n\nwq exp(cid:0)\n\nQ(cid:88)\n\nq=1\n\n(cid:1) cos(2\u03c0(x \u2212 x\n\n(cid:48)\nkd(x, x\n\n) =\n\n\u22122\u03c02(x \u2212 x\n\n(cid:48)\n\n)2vq\n\n(cid:48)\n\n)\u00b5q)\n\n(2)\n\nwith parameters \u03b2 = {w, \u00b5, v} de\ufb01ning the weights, spectral means and variances for each of the\nmixture components. Assuming Q is large enough, such a spectral mixture can approximate any\narbitrary kernel (the same way Gaussian mixtures can be used to approximate an arbitrary density).\nMoreover, many traditional kernels can be recovered as special cases; for instance the SE kernel\ncorresponds to a single component spectral density with zero mean (see also [10]).\n\nInference and learning\n\nWe sketch the main steps of the derivation here and provide the details in the Suppl. Info. Our goal is\nto \ufb01nd the hyperparameters \u03b8 = {\u03b1, \u03b2} that maximize P (\u03b8|y) \u221d P (y|\u03b8) \u00b7 P(\u03b8). We follow common\npractice in using a point estimate \u03b8\u2217 = argmax\u03b8P (\u03b8|y) for the hyperparameters, and leave a fully\nprobabilistic treatment to future work (e.g. using [11]). We use \u03b8\u2217 to infer a predictive distribution\n|D, x\u2217, \u03b8\u2217) for a set of test inputs x\u2217. Because of the Poisson observation noise these quantities\nP(f\u2217\ndo not have simple closed form solutions and some approximations are required. As it is customary\n[9], we use the Laplace method to approximate the log posterior log P(f|D) = log P(y|f ) + log P(f )\nwith its second-order Taylor expansion around the maximum \u02c6f. This results in a multivariate Gaussian\n\napproximate posterior, with mean \u02c6f and covariance(cid:0)H + K\u22121(cid:1)\u22121, where H = \u2212\u2207\u2207 log P(y|f ) |\u02c6f\n\nis the Hessian of the log likelihood, and K is the covariance matrix. Substituting the approximate\nposterior, we obtain the Laplace approximate marginal likelihood of the form:\n\n(cid:48)\n\nK\n\n\u22121z \u2212 0.5 log |I + KH|\n\n(3)\n\nlog (y|\u03b8) = log P(y|\u02c6f ) \u2212 0.5z\n\n3\n\nPoissonD-dimensional inputneuralresponsetrue fieldtrajectoryestimate histogramestimate GP010ground truthestimateabcrate (Hz)data subgroups, 6min eachdxyf(x)\u223cGP(\u00b5\u03b1(x),k\u03b2(\u00b7,\u00b7))time\f\u2217(cid:0)H\u22121 + K(cid:1)\u22121\n\nd+1\n\nd\n\ndN\n\n(cid:0)N 3(cid:1) computations and O\n\nwith z = K\u22121(\u02c6f \u2212 \u00b5). The approximate predictive distribution for \u03b8\u2217 is a multivariate Gaussian with\nmean k\u2217\u2207 log P(y|\u02c6f ) and covariance k\u2217\u2217 \u2212 k(cid:48)\nk\u2217, where k\u2217 and k\u2217\u2217 correspond\nto the test-data and test-test covariances, respectively [8]. Lastly the predicted tuning function for\nan individual test point \u03bb\u2217) = exp(f\u2217), is log-normal with closed-form expressions for mean and\nvariance (see Suppl. Info.).\nStandard methods for implementing these computations using the Cholesky decomposition require\nO\nef\ufb01cient implementation proposed here relies on the Kronecker structure of the covariance matrix\n(which makes eigenvalue decomposition and matrix vector products very fast, see Suppl.Info.),\nwith linear conjugate gradients optimization and a lower bound on the marginal likelihood for\nhyperparameter learning. The predictive distribution can be ef\ufb01ciently evaluated in O\n(with a hidden constant given by the number of Newton steps needed for convergence, cf. [8]) Our\nimplementation is based on the gpml library [9] and the code is available online. A more detailed\ndescription of the algorithmic details is provided in the Suppl. Info. In practice, this means that\nit takes minutes on a laptop to estimate a 3D \ufb01eld for a 30min dataset (2-5min depending on the\ncoarseness of the grid), with a traditional 2D \ufb01eld estimated in 20-30sec.\n\n(cid:0)N 2(cid:1) memory, restricting their use to a few hundred data points. The\n(cid:17)\n\n(cid:16)\n\n3 Results\n\nEstimator validation\n\nWe \ufb01rst validated our implementation on arti\ufb01cial data with known statistics.3 We de\ufb01ned a circular\narena with 1m diameter and simulated the animal\u2019s behavior as a random walk with re\ufb02ective bounds\n(Fig. 1 b, left panel). This random process would eventually uniformly cover the space, but for\nshort sessions it yields occupancy maps similar to those seen in real data. We calibrated diffusion\nparameters to roughly match CA1 statistics (average speed 5cm/sec, peak \ufb01ring 5-10Hz, 10-30min\nlong sessions). Inferring the underlying place \ufb01eld was already robust with 10min sessions, with the\nposterior mean f\u2217 close to the ground truth (SE kernel, see Fig. 1 c). In comparison, the traditional\nhistogram-based estimates is quite poor (Fig. 1 b, left panel), though it can potentially be improved\nby gaussian smoothing at the right spatial scale (although not without caveats, see Suppl. Info.).\nIt is more dif\ufb01cult to quantify the effects of the various approximations on real data where the\nassumptions of the model are not matched exactly. Our approach was to check the robustness of the\nGP-based estimates on subsets of the data constructed by combining every 5th data point (see left\npanel in Fig. 1 d). This partitioning was designed to ensure that subsets are as statistically similar as\npossible, sharing slow \ufb02uctuations in responses (e.g. due to variations in attentional levels, or changes\nin behavior). An example cell\u2019s response is shown in Fig. 1 d. Our analysis revealed robust \ufb01eld\nestimation in most cells, provided they were reasonably active during the session (with mean \ufb01ring\nrates >0.1Hz; we discarded the non-responsive cells from subsequent analyses).\n\nFigure 2: Spectral mixture kernels for modelling complex structure. We use arti\ufb01cial data with\nhexagonal grid structure mimicking MEC responses. Extrapolation task: the animal\u2019s position is\nrestricted to the orange delimited region of the environment. Stereotyped behavior: the simulated\nanimal performs a bounded random walk within an annulus . In both cases, we recover the full \ufb01eld,\nbeyond these borders (GP estimate) using a spectral mixture kernel (kSM).\n\n3Here we show a 2D example for simplicity; we obtained very similar results with 3D arti\ufb01cial inputs.\n\n4\n\ntrue fieldtrajectoryestimate histogramestimate GP (kSM)estimate histogramestimate GP (kSM)extrapolationstereotyped behaviour\fSpectral mixture kernels for complex functional dependencies\n\nPlace \ufb01eld estimation is relatively easy in a traditional open \ufb01eld exploration session (30min). The\nmain challenge is getting robust estimates on the time scale of a few minutes (e.g. in order to be\nable to detect changes due to learning), which we have seen a GP-based estimator can do well.\nA much more dif\ufb01cult problem is detecting tuning properties in a cheeseboard memory task [12].\nWhat distinguishes this setup is that fact that the animal quickly discovers the location of the wells\ncontaining rewards, after which its running patterns become highly stereotypical, close to the shortest\npath that traverses the reward locations. While it is hard to \ufb01gure out place \ufb01eld selectivity for\nlocations that the animal never visits, GP-based estimators may have an advantage compared to\ntraditional methods when functional dependencies are structured, as is the case for grid cells in the\nmedial enthorinal cortex (MEC) [13, 14]. When tuning properties are complex and structured we can\nexploit the expressive power of spectral mixture kernels (SM) to make the most of very limited data.\nWe simulated two versions of this scenario. First, we de\ufb01ned an extrapolation task in which the\nanimal\u2019s behaviour is restricted to a subregion of the environment (marked by orange lines in the\n2nd panel of Fig. 2) but we want to infer the spatial selectivity outside these borders. The second\nscenario attempts to mimic the animal running patterns in a cheeseboard maze (after learning) by\nrestricting the trajectory within a ring (random walk with re\ufb02ective boundaries in both cases). Using a\n5 component spectral mixture kernel we were able to fully reconstruct the hexagonal lattice structure\nof the true \ufb01eld despite the size of the observed region covering only about 2 times the length scale of\nthe periodic pattern. In contrast, traditional methods (including GP-based inference with standard\nSE kernels) would fail completely at such extrapolation. While such complex patterns of spatial\ndependence are restricted to MEC (and the estimator is probably best suited for ventral MEC, where\ngrids have a small length scale [15]) it is conceivable that such extrapolation may also be useful in\nthe temporal domain, or more generally for cortical responses in neurons which have so far eluded a\nsimple functional characterization.\n\nSpatial and non-spatial modulation of CA1 responses\n\ni=1\n\nstate, quanti\ufb01ed as the population spike count, k = (cid:80)Nneurons\n\nTo explore the multidimensional characterization of principal cell responses in CA1 we constructed\nseveral 3D estimators where the input combines the position of the animal within a 2D environment\nwith an additional non-spatial variable.4 The \ufb01rst non-spatial variable we considered is the network\nyi (naturally a discrete variable\nbetween 0 and some kmax). This quantity provides a computationally convenient proxy for network\noscillations and has been recently used in a series of studies on the statistics of population activity\nin the retina and cortex [16\u201319]. Second, we considered the animal\u2019s speed and direction of motion\n(with a coarse discretization), motivated by past work on non-spatial modulation of place \ufb01elds on\nlinear tracks [20]. Third, we also considered input variable t measuring time within a session (SE\nkernel; 3-5 min windows), as a way to examine the stability of spatial tuning over time. For all\nanalyses, positional information was discretized on a 32 \u00d7 32 grid, corresponding to a spacing of\n2.5cm, comparable to the binning resolution used in traditional place \ufb01eld estimates. The animal\nspeed (estimated from the positional information with 250ms temporal smoothing) varied between 0\nand about 25cm/sec, with a very skewed distribution (not shown). Small to medium variations in the\ncoarseness of the discretization did not qualitatively affect the results although the choice of prior\nbecomes more important on the tail of the speed distribution, where data is scarce.\nThe resulting 3D tuning functions are shown in Fig. 3 for a few example neurons. First, network state\nmodulates the place \ufb01eld selectivity in most CA1 neurons in our recordings. The typical modulation\npattern is a monotonic increase in \ufb01ring with k (Fig. 3, a, top), although we also found k-dependent\n\ufb02ickering in a minority of the cells (Fig. 3a, middle), and very rarely k invariance (Fig. 3a, bottom).\nRate remapping is also the dominant pattern of speed-dependent modulation in our data set (Fig. 3b).\nIn terms of place \ufb01eld stability over time, about half the cells were stable during a 30min session in a\nfamiliar environment, with occasionally higher \ufb01ring rates at the very beginning of the trial (Fig. 3c,\ntop), while the rest showed \ufb02uctuation in representations (Fig. 3c, bottom). Results shown for 5min\nwindows, but results very similar for 3min.\n\n4We chose to estimate multiple 3D \ufb01elds rather than jointly conditioning on all variables mainly for simplicity;\n\nthis strategy has the added bonus of providing sanity checks for the quality of the different estimates.\n\n5\n\n\fFigure 3: Estimating 3D response dependences in CA1 cells. a) Conditional place \ufb01elds when\nconstraining the network state, de\ufb01ned by the average population activity k. c) Conditional place\n\ufb01elds as a function of the time within a 30min session, used to assess the stability of the representation.\nIn all cases, the rightmost \ufb01eld corresponds to the traditional place \ufb01eld ignoring the 3rd dimension.\nd) Sanity check: marginal statistics of the place \ufb01eld selectivity obtained independently from the 3D\n\ufb01elds in 5 example cells. e) Population summary of the degree of modulation of spatial selectivity\nby non-spatial variables; see text for details. f) Within comparison of cell properties during the\nexploration of a familiar vs. a novel environment.\n\nAs a sanity check of our 3D estimators\u2019 quality, we independently computed the traditional place\n\ufb01eld by marginalizing out the 3rd dimension for each of our 3D estimates. We used the empirical\ndistribution as a prior for the non-spatial dimensions, and an uniform prior for space. Reassuringly\nwe \ufb01nd that the estimates computed after marginalization are very close to the simple 2D place \ufb01eld\nmap in all but 2 cells, which we exclude from the next analysis (examples in Fig. 3d). This provides\nadditional con\ufb01dence in the robustness of the estimator in the multidimensional case.\nSince we have a closed form expression for the map between stimulus dimensions and neural\nresponses, we can estimate the mutual information between neural activity and various input variables\nas a way to dissect their contribution to coding. First, we visualize the modulation of spatial\nselectivity by the non-spatial variable as the spatial information conditioned on the 3rd variable,\nnormalized by the marginal spatial information, MI(x,y|z)\nMI(x,y) , with z generically denoting any of the\nnon-spatial variables (approximate closed form expression given f and Poisson observation noise).\nWe see monotonic increases in spatial information with k (Fig. 3e, top), and speed (Fig. 3e, top)\nat the level of the population, and a weak decrease in spatial information over time (possibly due\nto higher speeds at the beginning of the session, combined with heightened attention/motivation\nlevels). In terms of the division of spatial vs. non-spatial information across cells, we found that\nspace selective cells have weaker k-modulation (Spearman corr(MI(y, x), MI(y, k) = \u22120.17). This\nhowever does not exclude the possibility that theta-coupled cells have additional spatial information\nat the \ufb01ne temporal scale. Additionally, there is little correlation between the coding of position and\nspeed (corr(MI(y, x), MI(y, speed) = \u22120.03), suggesting that the encoding of the two is relatively\northogonal at the level of the population. Somewhat unexpectedly, we found a cell\u2019s temporal stability\nto be largely independent of its spatial selectivity corr(MI(y, x), MI(y, t) = \u22120.04).\n\n6\n\ntraditionalplace field3D estimate GPk=0k=26network state 0cm/s>10cm/sspeedabc5mintime30min5min30minddimensioncell1cell2cell3cell4cell5kspeedtime3rdspatial marginalsfamiliar010novel010 firing rate (Hz)familiar01230123spatial informationMI(y,x)familiar00.20.40.600.20.40.6temporal instability, MI(y,t)fnonspatial modulatione>10cm/sec60cm/secspeed060k=26k=0network state0530min5mintime\fMotivated by recent observations that the overall excitability of cells may be predictive of both their\nspatial selectivity and of the rigidity of their representation [21], we compared the overall \ufb01ring rate\nof the cells with their spatial and non-spatial selectivity. We found relatively strong dependencies,\nwith positive correlations between \ufb01ring rate and spatial information (cc = 0.21), network in\ufb02uence\n(cc = 0.43) and the cell\u2019s stability (cc = 0.38). When comparing these quantities in the same cells\nas the animal visits a familiar or a novel environment (93 cells, 20min in each environment) we\nfound additional nontrivial dependences between spatial and non-spatial tuning. Although the overall\n\ufb01ring rates of the cells are remarkably preserved across conditions (re\ufb02ecting general cell excitability,\ncc = 0.66), the subpopulation of cells with strong spatial selectivity is largely non-overlapping\nacross environments (corr(MIfam(y, x), MInov(y, x) = 0.07). Moreover, the temporal stability of\nthe representation is also environment speci\ufb01c (corr(MIfam(y, t), MInov(y, t) = \u22120.04). Overall,\nthese results paint a complex picture of hippocampal coding, the implications of which need further\nempirical and theoretical investigation.\nLastly, we studied the dependence of CA1 responses on the animal\u2019s direction of motion. Although\ndirectional selectivity is well documented on a linear track [20] it remains unclear if a similar\nbehavior occurs in a 2D environment. The main challenge comes from the poor sampling of the\nposition\u00d7direction-of-motion input space, something which our methods can handle readily. To\nconstruct directionally selective place \ufb01eld estimates in 2D we took inspiration from recent analyses\nof 2D phase procession [22] conditioning the responses on the main direction of motion within the\nplace \ufb01eld. Speci\ufb01cally, we used our estimation of a traditional 2D place \ufb01eld to de\ufb01ne a region of\ninterest (ROI) that covers 90% of the \ufb01eld for each cell (Fig. 4. We isolated all trajectory segments\nthat traverse this ROI and classi\ufb01ed them based on the primary direction of motion along the cardinal\norientations. We then computed place \ufb01eld estimates for each direction, with data outside the ROI\nshared across conditions. To avoid artefacts due to the stereotypical pattern of running along the\nbox borders, we restricted this analysis to cells with \ufb01elds in the central part of the environment (10\ncells). A set of representative examples for the resulting directional \ufb01elds are shown in Fig. 4d. We\nfound the \ufb01elds to be largely invariant to direction of motion in our setup, with small displacements\nin peak \ufb01ring possibly due to differences between the perceived vs. the camera-based measurements\nof position (see also [22]). Overall, these results suggest that, in contrast to linear track behavior,\nCA1 responses are largely invariant to the direction of motion in an open \ufb01eld exploration task.\n\nFigure 4: Directional selectivity in CA1 cells. a) Cell speci\ufb01c ROI that covers the classic place\n\ufb01eld (example corresponding to cell 6). b) Classi\ufb01cation of the traversals of the region of interest\nas a function of the primary direction of motion along the cardinal directions. Out of ROI data\nshared across conditions. c) Traditional place \ufb01eld estimates for example CA1 cells and d) their\ncorresponding direction-speci\ufb01c tuning.\n\n7\n\ndirectional fields (GP)traditional fieldGPhistogramabcdcell1cell6\f4 Discussion\n\n(cid:0)N 3(cid:1). Our proposal sits between these extremes in that it achieves close-to-linear\n\nStrong constraints on experiment duration, poor sampling of the stimulus space and additional sources\nof variability that are not under direct experimental control make the estimation of tuning properties\nduring awake behavior particularly challenging. Here we have shown that recent advances on fast GP\ninference based on Kronecker methods allow for a robust characterization of multidimensional non-\nlinear tuning functions, which was inaccessible to traditional methods. Furthermore, our estimators\ninherit all the advantages of a probabilistic approach, including a principled way of dealing with the\nnon-uniform sampling of the input space and natural uncertainty estimates.\nOur methods can robustly estimate place \ufb01elds with one order of magnitude fewer data points.\nFurthermore, they allow for more than two-dimensional inputs. While one could imagine it would\nsuf\ufb01ce to estimate separate place \ufb01elds conditioned on each value of the non-spatial dimension, z,\nthe joint estimator has the advantage that it allows for smoothing across z values, borrowing strength\nfrom well-sampled regions of the z space to make better estimates for poorly sampled z values.\nSeveral related algorithms have been proposed in the literature [3\u20135], which vary primarily in how\nthey handle the tradeoff between kernel \ufb02exibility and the computational time required for inference\nand learning (see Table 1). At one extreme, [3] strongly restricts the nature of the covariance matrix to\nnearest-neighbour interactions on a 2D grid (resulting in a band-diagonal inverse covariance matrix)\nwhich allows them to exploit sparse matrix techniques to estimate the posterior mean in linear time.\nAt the other extreme, [4, 5] allow for an arbitrary covariance structure, but are computationally\nprohibitive, O\ncomputational and memory costs without signi\ufb01cantly restricting the \ufb02exibility of the covariance\nstructure (for a better intuition of the effect of different covariances, see also Fig. S1). In particular, it\ncan be combined with powerful spectral mixture kernels to extract complex functional dependencies\nthat go beyond simple smoothing. This opens the door to a variety of previously inaccessible\ntasks such as extrapolation. Moreover, it allows for an agnostic exploration of the neural responses\nfunctional space, which could be used to discover novel tuning properties in cells for which coding is\npoorly understood.\nWhen applied to CA1 data, our multidimensional estimators revealed a complex picture of the\nmodulation of neural responses by spatial and non-spatial inputs in the hippocampus. First we\ncon\ufb01rmed linear track results concerning the speed and oscillatory modulation of spatial tuning.\nFurthermore, we revealed additional insights into the interaction between the representation of space\nand these non-spatial dimensions, which go beyond the capabilities of traditional methods. Most\nnotably we found 1) a mostly orthogonal representation of speed and position, that 2) place \ufb01eld\nstability cannot be easily explained in terms of cell excitability or spatial selectivity, although 3) it is\nenvironment speci\ufb01c. Lastly, while we showed 2D place \ufb01eld maps to be direction-invariant in an\nopen \ufb01eld exploration task, more interesting directional dependencies may be revealed in other 2D\ntasks, where the direction of motion is behavioraly more relevant (e.g. cheeseboard). Importantly,\nthere is nothing hippocampus-speci\ufb01c in the methodology. Hence fast GP inference using Kronecker\nmethods, combined with expressive kernels, may provide a general-purpose tool for characterizing\nneural responses across brain regions.\n\nTable 1: Summary comparison of different estimators.\n\nAlgorithm\n\nKernel function\n\nComputing cost Memory cost\n\nRad et al. 2010 [3]\n\nPark et al. 2014 [4]\nSavin & Tkacik\n\nsparse banded inverse co-\nvariance\nSE, any in principle\nSE and SM, works for\nany tensor-product\n\nAcknowledgments\n\n(cid:0)N 3(cid:1)\n(cid:16)\n\nO (N )\nO\nO\n\ndN\n\nd+1\n\nd\n\n(cid:17)\n\n(cid:0)N 2(cid:1)\n(cid:16)\n\nO (N )\nO\nO\n\ndN\n\n2\nd\n\n(cid:17)\n\nData\nsize\n105\n\n< 103\n105\n\nWe thank Jozsef Csicsvari for kindly sharing the CA1 data. This work was supported by the\nPeople Programme (Marie Curie Actions) of the European Union\u2019s Seventh Framework Programme\n(FP7/2007-2013) under REA grant agreement no. 291734.\n\n8\n\n\fReferences\n[1] Pillow, J.W. Likelihood-based approaches to modeling the neural code. in Bayesian brain:\n\nprobabilistic approaches to neural coding 1\u201321 (2006).\n\n[2] Pillow, J.W. et al. Spatio-temporal correlations and visual signalling in a complete neuronal\n\npopulation. Nature 454, 995\u2013999 (2008).\n\n[3] Rad, K.R. & Paninski, L. Ef\ufb01cient, adaptive estimation of two-dimensional \ufb01ring rate surfaces\n\nvia Gaussian process methods. Network 21, 142\u2013168 (2010).\n\n[4] Park, M., Weller, J.P., Horwitz, G.D. & Pillow, J.W. Bayesian active learning of neural \ufb01ring\nrate maps with transformed gaussian process priors. Neural computation 26, 1519\u20131541 (2014).\n[5] Macke, J.H., Gerwinn, S., White, L.E., Kaschube, M. & Bethge, M. Gaussian process methods\n\nfor estimating cortical maps. neuroimage 56, 570\u2013581 (2011).\n\n[6] Frank, L.M., Eden, U.T., Solo, V., Wilson, M.A. & Brown, E.N. Contrasting patterns of\nreceptive \ufb01eld plasticity in the hippocampus and the entorhinal cortex: an adaptive \ufb01ltering\napproach. Journal of Neuroscience 22, 3817\u20133830 (2002).\n\n[7] Saat\u00e7i, Y. Scalable inference for structured Gaussian process models. PhD thesis, Cambridge\n\nUniversity, UK, (2012).\n\n[8] Flaxman, A., Wilson, A., Neill, D., Nickisch, H. & Smola, A. Fast Kronecker inference in\nGaussian Processes with non-Gaussian likelihoods. in Proceedings of the 32nd International\nConference on Machine Learning (ICML-15) , 607\u2013616, (2015).\n\n[9] Rasmussen, C.E. & Williams, C.K.I. Gaussian Processes for Machine Learning (Adaptive\n\nComputation and Machine Learning) (The MIT Press, 2005).\n\n[10] Wilson, A., & Adams, R. Gaussian Process kernels for pattern discovery and extrapolation.\n\narXiv.org (2013).\n\n[11] Hensman, J., Matthews, A.G. & Filippone, M. MCMC for Variationally Sparse Gaussian\n\nProcesses. in Advances in Neural Information Processing Systems . MIT Press, (2015).\n\n[12] Dupret, D., O\u2019Neill, J., Pleydell-Bouverie, B. & Csicsvari, J. The reorganization and reactivation\nof hippocampal maps predict spatial memory performance. Nature Publishing Group 13, 995\u2013\n1002 (2010).\n\n[13] Moser, E.I., Kropff, E. & Moser, M.B. Place Cells, Grid Cells, and the Brain\u2019s Spatial\n\nRepresentation System. Annual Review of Neuroscience 31, 69\u201389 (2008).\n\n[14] Moser, E.I. et al. Grid cells and cortical representation. Nature Publishing Group 15, 466\u2013481\n\n(2014).\n\n[15] Brun, V.H. et al. Progressive increase in grid scale from dorsal to ventral medial entorhinal\n\ncortex. Hippocampus 18, 1200\u20131212 (2008).\n\n[16] Tka\u02c7cik, G. et al. The simplest maximum entropy model for collective behavior in a neural\n\nnetwork. Journal of Statistical Mechanics: Theory and Experiment 2013, P03011 (2013).\n\n[17] Tka\u02c7cik, G. et al. Searching for collective behavior in a large network of sensory neurons. PLoS\n\nComputational Biology 10, e1003408 (2014).\n\n[18] Fiser, J., Lengyel, M., Savin, C., Orban, G. & Berkes, P. How (not) to assess the importance of\n\ncorrelations for the matching of spontaneous and evoked activity. arXiv (2013).\n\n[19] Okun, M. et al. Diverse coupling of neurons to populations in sensory cortex. Nature (2015).\n[20] McNaughton, B.L., Barnes, C.A. & O\u2019Keefe, J. The contributions of position, direction, and\nvelocity to single unit activity in the hippocampus of freely-moving rats. Experimental Brain\nResearch 52, 41\u201349 (1983).\n\n[21] Grosmark, A.D. & Buzs\u00e1ki, G. Diversity in neural \ufb01ring dynamics supports both rigid and\n\nlearned hippocampal sequences. Science, 1\u20135 (2016).\n\n[22] Huxter, J.R., Senior, T.J., Allen, K. & Csicsvari, J. Theta phase\u2013speci\ufb01c codes for two-\ndimensional position, trajectory and heading in the hippocampus. Nature Neuroscience 11,\n587\u2013594 (2008).\n\n9\n\n\f", "award": [], "sourceid": 1797, "authors": [{"given_name": "Cristina", "family_name": "Savin", "institution": "IST Austria"}, {"given_name": "Gasper", "family_name": "Tkacik", "institution": "Institute of Science and Technology Austria"}]}