{"title": "Spectro-Temporal Receptive Fields of Subthreshold Responses in Auditory Cortex", "book": "Advances in Neural Information Processing Systems", "page_first": 149, "page_last": 156, "abstract": null, "full_text": "Spectro-Temporal Receptive Fields of\n\nSubthreshold Responses in Auditory Cortex\n\nChristian K. Machens, Michael Wehr, Anthony M. Zador\n\nCold Spring Harbor Laboratory\n\nOne Bungtown Rd\n\nCold Spring Harbor, NY 11724\n\n machens, wehr, zador\n\n@cshl.edu\n\nAbstract\n\nHow do cortical neurons represent the acoustic environment? This ques-\ntion is often addressed by probing with simple stimuli such as clicks or\ntone pips. Such stimuli have the advantage of yielding easily interpreted\nanswers, but have the disadvantage that they may fail to uncover complex\nor higher-order neuronal response properties.\nHere we adopt an alternative approach, probing neuronal responses with\ncomplex acoustic stimuli, including animal vocalizations and music. We\nhave used in vivo whole cell methods in the rat auditory cortex to record\nsubthreshold membrane potential \ufb02uctuations elicited by these stimuli.\nWhole cell recording reveals the total synaptic input to a neuron from\nall the other neurons in the circuit, instead of just its output\u2014a sparse bi-\nnary spike train\u2014as in conventional single unit physiological recordings.\nWhole cell recording thus provides a much richer source of information\nabout the neuron\u2019s response.\nMany neurons responded robustly and reliably to the complex stimuli\nin our ensemble. Here we analyze the linear component\u2014the spectro-\ntemporal receptive \ufb01eld (STRF)\u2014of the transformation from the sound\n(as represented by its time-varying spectrogram) to the neuron\u2019s mem-\nbrane potential. We \ufb01nd that the STRF has a rich dynamical structure,\nincluding excitatory regions positioned in general accord with the predic-\ntion of the simple tuning curve. We also \ufb01nd that in many cases, much of\nthe neuron\u2019s response, although deterministically related to the stimulus,\ncannot be predicted by the linear component, indicating the presence of\nas-yet-uncharacterized nonlinear response properties.\n\n1 Introduction\n\nIn their natural environment, animals encounter highly complex, dynamically changing\nstimuli. The auditory cortex evolved to process such complex sounds. To investigate a\nsystem in its normal mode of operation, it therefore seems reasonable to use natural stimuli.\n\nThe linear response of an auditory neuron can be described in terms of its spectro-temporal\nreceptive \ufb01eld (STRF). The cortical STRF has been estimated using a variety of stimu-\n\n\u0001\n\flus ensembles1, including tone pips [1] and dynamic ripples [2]. However, while natural\nstimuli have long been used to probe cortical responses [3, 4], and have been widely used\nin other preparations to compute STRFs [5], they have only rarely been used to compute\nSTRFs from cortical neurons [6].\n\nHere we present estimates of the STRF using in vivo whole cell recording. Because whole\ncell recording measures the total synaptic input to a neuron, rather than just its output\u2014\na sparse binary spike train\u2014as in conventional single unit physiological recordings, this\ntechnique provides a much richer source of information about the neuron\u2019s response.\n\nWhole cell recording also has a different sampling bias from conventional extracellular\nrecording: instead of recording from active neurons with large action potentials (i.e. those\nthat are most easily isolated on the electrode), whole cell recording selects for neurons\nsolely on the basis of the experimenter\u2019s ability to form a gigaohm seal.\n\nUsing these novel methods, we investigated the computations performed by single neurons\nin the auditory cortex A1 of rats.\n\n2 Spike responses and subthreshold activity\n\nWe \ufb01rst used cell-attached methods to obtain well-isolated single unit recordings. We found\nthat many cells in auditory cortex responded only very rarely to the natural stimulus ensem-\nble, making it dif\ufb01cult to characterize the neuron\u2019s input-output relationship effectively. An\nexample of this problem is shown in Fig. 1(b) where a natural stimulus (here, the call of a\nnightingale) leads to an average of about \ufb01ve spikes during the eight-second-long presen-\ntation. Such sparse responses are not surprising, since it is well known that many cortical\nneurons are selective for stimulus transients [7, 8].\n\nOne way to circumvent this dif\ufb01culty is to present stimuli that elicit high \ufb01ring rates. For\nexample, using dynamic ripple stimuli, an STRF can be constructed with about\n\u0002\u0001\u0004\u0003\u0005\u0001\u0006\u0001\u0007\u0001\nspikes collected over\nspikes/second, or\nabout\n-fold higher than the rate elicited by the natural stimulus in Fig. 1(b)) [9]. How-\never, such stimuli have, by design, a simple correlational structure, and therefore preclude\nthe investigation of nonlinear response properties driven by higher-order stimulus charac-\nteristics.\n\nminutes (average \ufb01ring rate of approximately\n\n\b\u0007\u0001\n\n\u0002\u0001\n\nWe have therefore adopted an alternative approach based on in vivo whole cell recording,\nexploiting the fact that although these neurons spike only rarely, they feature strong sub-\nthreshold activity. A set of subthreshold voltage traces, obtained by a whole-cell recording\nwhere spikes were blocked (only in the neuron being recorded from) with the intracellu-\nlar sodium channel blocker QX-314 (see Methods), is shown in Fig. 1(c). The responses\nfeature robust stimulus-locked \ufb02uctuations of membrane potential, as well as some spon-\ntaneous activity. Both the spontaneous and stimulus-locked voltage \ufb02uctuations are due to\nthe synchronous arrival of many excitatory postsynaptic potentials (EPSPs). (Note that if\nspikes had not been blocked pharmacologically, some of the larger EPSPs would have trig-\ngered spikes). Not only do these whole cell recordings avoid the problem of sparse spiking\nresponses, they also provide insight into the computations performed by the input to the\nneuron\u2019s spike generating mechanism.\n\n1Because cortical neurons respond poorly to white noise, this stimulus has not been used to esti-\n\nmate cortical STRFs.\n\n\t\n\f(a)\n\n(b)\n\n(c)\n\n10\n\n8\n\n6\n\n4\n\n2\n\n.\n\no\nn\n\n \nl\n\na\ni\nr\nt\n\n.\n\no\nn\n\n \nl\n\na\ni\nr\nt\n\n0\n\n1\n\n2\n\n3\n\n4\n\n5\n\n6\n\n7\n\n8\n\n0\n\n1\n\n2\n\n3\n\n4\n\n5\n\n6\n\n7\n\n8\n\ntime (sec)\n\nFigure 1: (a) Spectrogram of the song of a nightingale. (b) Spike raster plots recorded in\ncell-attached mode during ten repetitions of the nightingale song from a single neuron in\nauditory cortex A1. (c) Voltage traces recorded in whole-cell-mode during ten repetitions\nfrom another neuron in A1.\n\n3 Reliability of responses\n\nA key step in the characterization of the neuron\u2019s responses is the separation of the\nstimulus-locked activity from the stimulus-independent activity (\u201cbackground noise\u201d). A\nsample average trace is compared with a single trial in Fig. 2(a).\n\nTo quantify the amount of stimulus-locked activity, we computed the coherence function\nbetween a single response trace and the average over the remaining traces. The coherence\nmeasures the frequency-resolved correlation of two time series. This function is shown in\nFig. 2b for responses to several natural stimuli from the same cell. The coherence func-\ntion demonstrates that the stimulus-dependent activity is con\ufb01ned to lower frequencies\nHz). Note that the coherence function provides merely an average over the com-\n(\n\u0002\u0001\nplete trace; in reality, the coherence can locally be much higher (when all traces feature\nthe same stimulus-locked excursion in membrane potential) or much lower (for instance in\nthe absence of stimulus-locked activity). On average, however, the coherence is approx-\nimately the same for all the natural stimuli presented, indicating that all stimuli feature\napproximately the same level of background activity.\n\n\u0001\n\f(a)\n\n)\n\nV\nm\n\n(\n \ne\ng\na\n\nt\nl\n\no\nv\n\n20\n\n15\n\n10\n\n5\n\n0\n\n\u22125\n\n10\n\nmean response\nsingle trial\n\n(b)\n\n1\n\ne\nc\nn\ne\nr\ne\nh\no\nc\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n11\n\n12\n\n13\n\ntime (sec)\n\n14\n\n15\n\n0\n\n0\n\n100\n50\nfrequency (Hz)\n\n150\n\nFigure 2: (a) Mean response compared to single trial for a natural stimulus (jaguar mating\ncall). (b) Coherence functions between mean response and single trial for different stimuli.\nAll natural stimuli yield approximately the same relation between signal and noise.\n\n4 Spectro-temporal receptive \ufb01eld\n\nHaving established the mean over trials as a reliable estimate of the stimulus-dependent\nactivity, we next sought to understand the computations performed by the neurons.\n\nTo mimic the cochlear transform, it has proven useful to describe the stimulus in the time-\nfrequency domain [2]. Discretizing both time and frequency, we describe the stimulus\n\npower in the -th time bin\u0001\u0003\u0002 and the\u0004 -th frequency bin\u0005\u0007\u0006 by\b\n\t\u000b\u0001\f\u0002\n\ntime-frequency representation, we used the spectrogram method which requires a certain\nchoice for the time-frequency tradeoff [10]; several choices were used independently of\neach other, essentially yielding the same results. In all cases, stimulus power is measured\nin logarithmic units.\n\n\u0003\r\u0005\u000e\u0006\u0010\u000f . To compute the\n\nThe simplest and most widely used model is a linear transform between the stimulus (as\nrepresented by the spectrogram) and the response, given by the formula\n\n\u0011 est\t\u0012\u0001\n\n\u000f\u0014\u0013\n\n\u0011\u0016\u0015\u0018\u0017\u001a\u0019\n\n\u001b\f\u001c\n\n\u0006\u001e\u001d\n\nreceptive \ufb01eld (STRF) of the neuron. Note, though, that the response is usually taken\nto be the average \ufb01ring rate [2, 11]; here the response is given by the subthreshold volt-\nage trace. The parameters can be \ufb01tted by minimizing the mean-square error between\n\n\u0003\u001f\u0005\n\t\u0012\u0001\nis a constant offset and the parameters\u001d\n\n\u0002\"!\n\u000f\f\b \t\u0012\u0001\n\u0003\u001f\u0005\u000e\u0006\u0010\u000f represent the spectro-temporal\n\t\u000b\u0001\n\t\u000b\u0001#\u000f and the estimated response\u0011 est\t\u000b\u0001#\u000f . This problem is solved by\n\nwhere\u0011\nthe measured response\u0011\n\nHowever, a direct, \u201cnaive\u201d estimate as obtained by the solution to the regression equations,\nwill usually fail since the stimulus does not properly sample all dimensions in stimulus\nspace. In general, this leads to strong over\ufb01tting of the poorly sampled dimensions and\npoor predictive power of the model. The over\ufb01tting can be seen in the noisy structure of\nthe STRF shown in Fig. 3(a).\n\nmulti-dimensional linear regression.\n\n\u0003\u001f\u0005\n\n(1)\n\nA simple alternative is to penalize the improperly sampled directions which can be done\nusing ridge regression [12]. Ridge regression minimizes the mean-square-error between\nmeasured and estimated response while placing a constraint on the sum of the regression\ncoef\ufb01cients. Choosing the constraint such that the predictive power of the model is max-\nimized, we obtained the STRF shown in Fig. 3(b). Note that ridge regression operates\non all coef\ufb01cients uniformly (ie the constraint is global), so that observed smoothness in\nthe estimated STRF represents structure in the data; no local smoothness constraint was\napplied.\n\n\u0002\n\u001b\n\u0006\n\u0001\n\u001b\n\u0006\n\u000f\n\u0015\n\u001b\n\f(a)\n\nnaive estimate\n\n(b)\n\nridge estimate\n\n)\nz\nH\n\n(\n \ny\nc\nn\ne\nu\nq\ne\nr\nf\n\n12800\n\n6400\n\n3200\n\n1600\n\n800\n\n400\n\n200\n\n100\n\n)\nz\nH\n\n(\n \ny\nc\nn\ne\nu\nq\ne\nr\nf\n\n12800\n\n6400\n\n3200\n\n1600\n\n800\n\n400\n\n200\n\n100\n\n\u22120.3\n\n\u22120.2\n\n\u22120.1\n\n0\n\n\u22120.3\n\n\u22120.2\n\n\u22120.1\n\n0\n\ntime (sec)\n\ntime (sec)\n\nFigure 3: (a) Naive estimate of the STRF via linear regression. Darker pixels denote time-\nfrequency bins with higher power. (b) Estimate of the STRF via ridge regression.\n\nThe STRF displays the neuron\u2019s frequency-sensitivity, centered around 800\u20131600 Hz. This\nrange of frequencies matches the neuron\u2019s tuning curve which is measured with short sine\ntones. The STRF suggests that the neuron essentially integrates frequencies within this\nrange and a time constant of about 100 ms. These types of computations have been previ-\nously reported for neurons in auditory cortex [1, 2].\n\n4.1 Spectral analysis of error\n\nHow well does the simple linear model predict the subthreshold responses? To assess the\npredictive power of the model, the STRF was estimated from data obtained for ten different\nnatural stimuli and then tested on an eleventh stimulus. A sample prediction is shown in\nFig. 4(a). While the predicted trace roughly captures the occurrence of the EPSPs, it fails\nto predict their overall shape. This observation can be quanti\ufb01ed by spectrally resolving\nthe prediction success. For that purpose, we again used the coherence function which\nmeasures the correlation between the actual response and the predicted response at each\nfrequency. This function is shown in Fig. 4(b). Clearly, the model fails to predict any\nresponse \ufb02uctuations faster than\n\u0001 Hz. As a comparison, recall that the response is\nreliable up to about\n\nHz (Fig. 2).\n\n\u0001\u0006\u0001\n\n(a)\n\n)\n\nV\nm\n\n(\n \n\ne\ng\na\n\nt\nl\n\no\nv\n\n20\n\n15\n\n10\n\n5\n\n0\n\n\u22125\n\n10\n\nmean response\nprediction\n\n11\n\n12\n\n13\n\n14\n\n15\n\ntime (sec)\n\n(b)\n\n1\n\ne\nc\nn\ne\nr\ne\nh\no\nc\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\nfrequency (Hz)\n\nFigure 4: (a) Mean response and prediction for a natural stimulus (jaguar mating call). The\nSTRF captures the gross features of the response, but not the \ufb01ne details. (b) Coherence\nfunction between measured and predicted response.\n\n\n\f2\n\n^\n\n \nt\n\ni\n\nn\ne\nc\ni\nf\nf\n\ne\no\nc\n \n\nn\no\n\ni\nt\n\nl\n\na\ne\nr\nr\no\nc\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\nTHT BHW SLB JMC HBW KF\n\nTF JHP SJF CWM BGC\n\nstimulus no.\n\nFigure 5: Squared Correlation coef\ufb01cients between the mean of the measured responses\nand the predicted response. Linear prediction with the STRF is more effective for some\nstimuli than others.\n\n4.2 Errors across stimuli\n\nSome of the natural stimuli elicited highly reliable responses that were not at all predicted\nby the STRF, see Fig. 5. In fact, the example shown in Fig. 4 is one of the best predictions\nachieved by the model. The failure to predict the responses of some stimuli cannot be at-\ntributed to the absence of stimulus-locked activity; as the coherence functions in Fig. 2(a)\nhave shown, all stimuli feature approximately the same proportion of stimulus-locked ac-\ntivity to noise. Rather, such responses indicate a high degree of nonlinearity that dominates\nthe response to some stimuli. This observation is in accord with previous work on neurons\nin the auditory forebrain of zebra\ufb01nches [11], where neurons show a high degree of feature\nselectivity.\n\nThe nonlinearities seen in subthreshold responses of A1 neurons can partly be attributed\nto adaptation, to interactions between frequencies [13, 14], and also to off-responses 2. In\ngeneral, the linear model performs best if the stimuli are slowly modulated in both time\nand frequency.\n\n5 Discussion\n\nWe have used whole cell patch clamp methods in vivo to record subthreshold membrane\npotential \ufb02uctuations elicited by natural sounds. Subthreshold responses were reliable and\n(in contrast to the suprathreshold spiking responses) suf\ufb01ciently rich and robust to permit\nrapid and ef\ufb01cient estimation of the linear predictor of the neuron\u2019s response (the STRF).\nThe present manuscript represents the \ufb01rst analysis of subthreshold responses elicited by\nnatural stimuli in the cortex, or to our knowledge in any system.\n\nSTRFs estimated from natural sounds were in general agreement, with respect to gross\ncharacteristics such as frequency tuning, with those obtained directly from pure tone pips.\nThe STRFs from complex sounds, however, provided a much more complete view of the\nneuron\u2019s dynamics, so that it was possible to compare the predicted and experimentally\nmeasured responses.\n\nIn many cases the prediction was poor (cf. Fig. 6), indicating strong nonlinearities in the\nneuron\u2019s responses. These nonlinearities include adaptation, two-tone interactions, and\n\n2Off-responses are excitatory responses that occur at the termination of stimuli in some neurons.\nBecause they have the same sign as the on-response, they represent a form of rectifying nonlinear-\nity. Further complications arise because on- and off-responses interact, depending on their spectro-\ntemporal relations [14].\n\n\f \ns\n\nl\nl\n\ne\nc\n \nf\no\n \nr\ne\nb\nm\nu\nn\n\n3\n\n2\n\n1\n\n0.1\n\n0.2\n0.5\naverage over squared correlation coefficients\n\n0.4\n\n0.3\n\n0.6\n\nFigure 6: Summary \ufb01gure. Altogether\ncells were recorded in whole cell mode.\nShown are the squared correlation coef\ufb01cients, averaged over all stimuli for a given cell.\nFor many cells, the linear model worked rather poorly as indicated by low cross correla-\ntions.\n\n\u0002\u0001\n\noff-responses. Explaining these nonlinearities represents an exciting challenge for future\nresearch.\n\n6 Methods\n\nSprague-Dawley rats (p18-21) were anesthetized with ketamine (30 mg/kg) and medetomi-\ndine (0.24 mg/kg). Whole cell recordings and single unit recordings were made with glass\nmicroelectrodes (\u0001\n) from primary auditory cortex (A1) using standard methods\nappropriately modi\ufb01ed for the in vivo preparation. During whole cell recordings, sodium\naction potentials were blocked using the sodium channel blocker QX-314.\n\nM\n\nAll natural sounds were taken from an audio CD, sampled at 44,100 Hz. Animal vocaliza-\ntions were from \u201cThe Diversity of Animal Sounds,\u201d available from the Cornell Laboratory\nof Ornithology. Additional stimuli included pure tones and white noise bursts with 25\nms duration and 5 ms ramp (sampled at 97.656 kHz), and Purple Haze by Jimi Hendrix.\nSounds were delivered by a TDT RP2 at 97.656 kHz to a calibrated TDT electrostatic\nspeaker and presented free \ufb01eld in a double-walled sound booth.\n\nReferences\n\n[1] R. C. deCharms and M. M. Merzenich. Primary cortical representation of sounds by\n\nthe coordination of action- potential timing. Nature, 381(6583):610\u20133., 1996.\n\n[2] D. J. Klein, D. A. Depireux, J. Z. Simon, and S. A. Shamma. Robust spectrotemporal\nreverse correlation for the auditory system: optimizing stimulus design. J Comput\nNeurosci, 9(1):85\u2013111., 2000.\n\n[3] O. Creutzfeldt, F. C. Hellweg, and C. Schreiner. Thalamocortical transformation of\n\nresponses to complex auditory stimuli. Exp Brain Res, 39(1):87\u2013104, 1980.\n\n[4] I. Nelken, Y. Rotman, and O. Bar Yosef. Responses of auditory-cortex neurons to\n\nstructural features of natural sounds. Nature, 397:154\u2013157, 1999.\n\n[5] F. E. Theunissen, S. V. David, N. C. Singh, A. Hsu, W. E. Vinje, and J. L. Gallant.\nEstimating spatio-temporal receptive \ufb01elds of auditory and visual neurons from their\nresponses to natural stimuli. Network, 12(3):289\u2013316., 2001.\n\n[6] J. F. Linden, R. C. Liu, M. Kvale, C. E. Schreiner, and M. M. Merzenich. Reverse-\ncorrelation analysis of receptive \ufb01elds in mouse and rat auditory cortex. Society for\nNeuroscience Abstracts, 27(2):1635, 2001.\n\n\n\u0013\n!\n\u0001\n\u0002\n\f[7] P. Heil. Auditory cortical onset responses revisited. ii. response strength. J Neuro-\n\nphysiol, 77(5):2642\u201360., 1997.\n\n[8] S. L. Sally and J. B. Kelly. Organization of auditory cortex in the albino rat: sound\n\nfrequency. J Neurophysiol, 59(5):1627\u201338., 1988.\n\n[9] D. A. Depireux, J. Z. Simon, D. J. Klein, and S. A. Shamma. Spectro-temporal\nresponse \ufb01eld characterization with dynamic ripples in ferret primary auditory cortex.\nJ Neurophysiol, 85(3):1220\u201334., 2001.\n\n[10] L. Cohen. Time-frequency Analysis. Prentice Hall, 1995.\n[11] F. E. Theunissen, K. Sen, and A. J. Doupe. Spectral-temporal receptive \ufb01elds of non-\nlinear auditory neurons obtained by using natural sounds. J. Neurosci., 20(6):2315\u2013\n2331, 2000.\n\n[12] T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning theory.\n\nSpringer, 2001.\n\n[13] M. Brosch and C. E. Schreiner. Time course of forward masking tuning curves in cat\n\nprimary auditory cortex. J Neurophysiol, 77(2):923\u201343., 1997.\n\n[14] L. Tai and A. Zador.\n\nIn vivo whole cell recording of synaptic responses underly-\ning two-tone interactions in rat auditory cortex. Society for Neuroscience Abstracts,\n27(2):1634, 2001.\n\n\f", "award": [], "sourceid": 2334, "authors": [{"given_name": "Christian", "family_name": "Machens", "institution": null}, {"given_name": "Michael", "family_name": "Wehr", "institution": null}, {"given_name": "Anthony", "family_name": "Zador", "institution": null}]}