{"title": "Modeling Complex Cells in an Awake Macaque during Natural Image Viewing", "book": "Advances in Neural Information Processing Systems", "page_first": 236, "page_last": 242, "abstract": null, "full_text": "Modeling Complex Cells in an A wake \n\nMacaque During Natural Image Viewing \n\nWilliam E. Vinje \n\nvinjeCsocrates.berkeley.edu \n\nJack L. Gallant \n\ngallantCsocrates.berkeley.edu \n\nDepartment of Molecular and \n\nDepartment of Psychology \n\nCellular Biology, Neurobiology Division \n\nUniversity of California, Berkeley \n\nUniversity of California, Berkeley \n\nBerkeley, CA, 94720 \n\nBerkeley, CA, 94720 \n\nAbstract \n\nWe model the responses of cells in visual area VI during natural \nvision. Our model consists of a classical energy mechanism whose \noutput is divided by nonclassical gain control and texture contrast \nmechanisms. We apply this model to review movies, a stimulus \nsequence that replicates the stimulation a cell receives during free \nviewing of natural images. Data were collected from three cells \nusing five different review movies, and the model was fit separately \nto the data from each movie. For the energy mechanism alone we \nfind modest but significant correlations (rE = 0.41, 0.43, 0.59, \n0.35) between model and data. These correlations are improved \nsomewhat when we allow for suppressive surround effects (rE+G = \n0.42, 0.56, 0.60, 0.37). In one case the inclusion of a delayed \nsuppressive surround dramatically improves the fit to the data by \nmodifying the time course of the model's response. \n\nINTRODUCTION \n\n1 \nComplex cells in the primary visual cortex (area VI in primates) are tuned to \nlocalized visual patterns of a given spatial frequency, orientation, color, and drift \ndirection (De Valois & De Valois, 1990). These cells have been modeled as linear \nspatio-temporal filters whose output is rectified by a static nonlinearity (Adelson \n& Bergen, 1985); more recent models have also included a divisive contrast gain \ncontrol mechanism (Heeger, 1992; Wilson & Humanski, 1993; Geisler & Albrecht, \n1997). We apply a modified form of these models to a stimulus that simulates \nnatural vision. Our model uses relatively few parameters yet incorporates the cells' \ntemporal response properties and suppressive influences from beyond the classical \nreceptive field (C RF). \n\n\fModeling Complex Cells during Natural Image Viewing \n\n237 \n\n2 METHODS \nData Collection: Data were collected from One awake behaving Macaque monkey, \nusing single unit recording techniques described elsewhere (Connor et al., 1997).1 \nFirst, the cell's receptive field size and location were estimated manually, and tun(cid:173)\ning curves were objectively characterized using two-dimensional sinusoidal gratings. \nNext a static color image of a natural scene was presented to the animal and his \neye position was recorded continuously as he freely scanned the image for 9 seconds \n(Gallant et al., 1998).2 Image patches centered on the position of the cell's C RF \n(and 2-4 times the CRF diameter) were then extracted using an automated proce(cid:173)\ndure. The sequence of image patches formed a continuous 9 second review movie \nthat simulated all of the stimulation that had occurred in and around the C RF \nduring free viewing. 3 Although the original image was static, the review movies \ncontain the temporal dynamics of the saccadic eye movements made by the animal \nduring free viewing. Finally, the review movies were played in and around the C RF \nwhile the animal performed a fixation task. \n\nDuring free viewing each eye position is unique, so each image patch is likely to \nenter the C RF only once. The review movies were therefore replayed several times \nand the cell's average response with respect to the movie timestream was computed \nfrom the peri-stimulus time histogram (PSTH). These review movies also form the \nmodel's stimulus input, while its output is relative spike probability versus time \n(the model cell's PSTH). \n\nBefore applying the model each review movie was preprocessed by converting to \ngray scale (since the model does not consider color tuning), setting the average \nluminance level to zero (on a frame by frame basis) and prefiltering with the human \ncontrast sensitivity function to more accurately reflect the information reaching cells \nin VI. \nDivisive Normalization Model: The model consists of a classical receptive field \nenergy mechanism, ECRF, whose output is divided by two nonclassical suppressive \nmechanisms, a gain control field, G, and a texture contrast field, T. \n\nPSTHmodel(t) ex 1 + Q G(t - d) + f3T(t - d) \n\nECRF(t) \n\n(1) \n\nWe include a delay parameter for suppressive effects, consistent with the hypothesis \nthat these effects may be mediated by local cortical interactions (Heeger, 1992; \nWilson & Humanski, 1993). Any latency difference between the central energy \nmechanism and the suppressive surround will be reflected as a positive delay offset \n(15 > 0 in Equation 1). \nClassical Receptive Field Energy Mechanism: The energy mechanism, ECRF, \nis composed of four phase-dependent subunits, Uti>. Each subunit computes an inner \nproduct in space and a convolution in time between the model cell's space-time \nclassical receptive field, CRFtI>(x, y, r), and the image, I(x, y, t). \n\nU
(x, y, r) . I(x, y, t - r) dx dydr \n\n(2) \n\n1 Recorcling was performed under a university-approved protocol and conformed to all \n\nrelevant NIH and USDA guidelines. \n\n2 Images were taken from a Corel Corporation photo-CD library at 1280xl024 resolution. \n3Eye position data were collected at 1 KHz, whereas the monitor clisplay rate was 72.5 \nHz (14 ms per frame). Therefore each review movie frame was composed of the average \nstimulation occurring during the corresponcling 13.8 ms of free viewing. \n\n\f238 \n\nW. E. Vinje and 1. L Gallant \n\nThe model presented here incorporates the simplifying assumption of a space-time \nseparable receptive field structure, CRF4>(x, y, r) = CRF4>(x, y) CRF(r). \n\nu4>(t) = L: CRF(r) (L: L: CRF4>(x, y) . I(x, y, t - r)) \n\n(3) \n\nT \n\nX \n\nY \n\nTime is discretized into frames and space is discretized into pixels that match the \nreview movie input. CRF4>(x, y) is modeled as a sinusoidal grating that is spatially \nweighted by a Gaussian envelope (i.e. a Gabor function). In this paper CRF(r) \nis approximated as a delta function following a constant latency. This minimizes \nmodel parameters and highlights the model's responses to the stimulus present at \neach fixation. The latency, orientation and spatial frequency of the grating, and \nthe size of the C RF envelope, are all determined empirically by maximizing the fit \nbetween model and data. 4 \n\nA static non-linearity ensures that the model PSTH does not become negative. \nWe have e~amined both half-wave rectification, fj4>(t) = max[U4>(t), O], and half(cid:173)\nsquaring, U4>(t) = (max[U4>(t) , 0])2; here we present the results from half-wave \nrectification. Half-squaring produces small changes in the model PSTH but does \nnot improve the fit to the data. \n\nThe energy mechanism is made phase invariant by averaging over the rectified phase(cid:173)\ndependent subunits: \n\n(4) \n\nGain Control Field: Cells in V 1 incorporate a contrast gain control mechanism \nthat compensates for changes in local luminance. The gain control field, G, models \nthis effect as the total image power in a region encompassing the C RF and surround. \n\nG(t-<5) = L:CRF(r) (L:L:VP(kx,ky,r) ) \n\nT \n\nk% \n\nky \n\nP(kx, ky, r) = F FT[PG(x, y, r)] F FT*[JlG(x, y, r)] \n\n(5) \n\n(6) \n\n(7) \nP(kx, ky, r) is the spatial Fourier power of JlG(x, y, r) and VG is a two dimensional \nGaussian weighting function whose width sets the size of the gain control field. \n\nJlG(x, y, r) = vG(x, y) I(x, y, (t - <5) - r) \n\nHeeger's (1992) divisive gain control term sums over many discrete energy mecha(cid:173)\nnisms that tile space in and around the area of the C RF. Equation 5 approximates \nHeeger's approach in the limiting case of dense tiling. \n\nTexture Contrast Field: Cells in area VI can be affected by the image surround(cid:173)\ning the region of the CRF (Knierim & Van Essen, 1992) . The responses of many \nVI cells are highest when the optimal stimulus is presented alone within the CRF, \nand lowest when that stimulus is surrounded with a texture of similar orientation \nand frequency. The texture contrast field, T, models this effect as the image power \n\n4 As a fit statistic we use the linear correlation coefficient (Pearson's r) between model \nand data. Fitting is done with a gradient ascent algorithm. Our choice of correlation as a \nstatistic eliminates the need to explicitly consider model normalization as a variable, and is \nvery sensitive to latency mismatches between model and data. However, linear correlation \nis more prone to noise contamination than is X2 \u2022 \n\n\fModeling Complex Cells during Natural Image Viewing \n\n239 \n\nin the spatial region surrounding the C RF that matches the C RF's orientation and \nspatial frequency. \n\nT(t-J) = 4 1: 1: CRF(r) \"\u00a31: Jp4>(kx,ky,r) \n\n1 90,180,270 [ ( \n\n4>=0 \n\nT \n\nk\", ky \n\n)] \n\n(8) \n\nP4>(kx, ky, r) = F FT[p~(x, y, r)] F FT\u00b7[p~(x, y, r)] \n\nJ.t~(x, y, r) = ~*(x, y) (1 - lICRF(X, y)) I(x, y, (t - J) - r) \n\n(9) \n(10) \n~* is a Gabor function whose orientation and spatial frequency match those of the \nbest' fit C RF4> (x, y). The envelope of ~* defines the size of the texture contrast field. \nlICRF is a two dimensional Gaussian weighting function whose width matches the \nC RF envelope, and which suppresses the image center. Thus the texture contrast \nterm picks up oriented power from an annular region of the image surrounding the \nC RF envelope. T is made phase invariant by averaging over phase. \n\n3 RESULTS \nThus far our model has been evaluated on a small data set collected as part of \na different study (Gallant et ai., 1998). Two cells, 87A and 98C, were examined \nwith one review movie each, while cell 97 A was examined with three review movies. \nUsing this data set we compare the model's response in two interesting situations: \ncell 97 A, which had high orientation-selectivity, versus cell 87 A, which had poor \norientation-selectivity; and cell 98C, which was directionally-selective, versus cell \n97 A, which was not directionally-selective. \nCRF Energy Mechanism: We separately fit the energy mechanism parameters \nto each of the three different cells. For cell 97 A the three review movies were fit \nindependently to test for consistency of the best fit parameters. \nTable 1 shows the correlation between model and data using only the C RF energy \nmechanism (a = f3 = 0 in Equation 1). The significance of the correlations was \nassessed via a permutation test. The correlation values for cells 97 A and 98C, \nthough modest, are significant (p < 0.01). For these cells the 95% confidence \nintervals on the best fit parameter values are consistent with estimates from the \nflashed grating tests. The best fit parameter values for cell 97 A are also consistent \nacross the three independently fit review movies. \n\nThe model best accounts for the data from cell 97 A. This cell was highly selective for \nvertical gratings and was not directionally-selective. Figure 1 compares the PSTH \nobtained from cell 97 A with movie B to the model PSTH. The model generally \nresponds to the same features that drive the real cell, though the match is imperfect. \nMuch of the discrepancy between the model and data arises from our approximation \nof CRF(t) as a delta function. The model's response is roughly constant during \n\nCell \nMovie \nOriented \nDirectional \nrE \n\n97A \nA \nYes \nNo \n\n97A \n87A \nB \nA \nYes \nNo \nNo \nNo \nNA 0.41 0.43 \n\n97A \nC \nYes \nNo \n0.59 \n\n98C \nA \nYes \nYes \n0.35 \n\nTable 1: Correlations between model and data PSTHs. Oriented cells showed \norientation-selectivity in the flashed grating test while Directional cells showed \ndirectional-selectivity during manual characterization. rE is the correlation between \nECRF and the data. No fit was obtained for cell 87 A. \n\n\f240 \n\nW. E. Vinje and J L. Gallant \n\n1~--~~--~----~----~----T-----~--~----~----~ \n\n.(cid:173).-. \ncO.8 \n~ \n~ \u00a3 0.6 \n~ 'a \n~ 0.4 \n~ \n.~ \n.-. \n~ \n~ 0.2 \n\n~ \n\n1 \n\n2 \n\n3 \n\n4 \n\n5 \n\nTime (seconds) \n\n6 \n\n7 \n\n8 \n\n9 \n\nFigure 1: CRF energy mechanism versus data (Cell 97A, Movie B) . White indicates that \nthe model response is greater than the data, while black indicates the data is greater than \nthe model and gray indicates regions of overlap. A perfect match between model and data \nwould result in the entire area under the curve being gray. Our approximation of CRF(t) \nleads to a relatively constant model PSTH during each fixation. In contrast the real cell \ngenerally gives a phasic response as each saccade brings a new stimulus into the CRF. In \ngeneral the same movie features drive both model and cell. \n\neach fixation, which causes the model PSTH to appear stepped. In contrast the \ndata PSTH shows a strong phasic response at the beginning of each fixation when \na new stimulus patch enters the cell's CRF . \n\nThe model is less successful at accounting for the responses of the directionally(cid:173)\nselective cell, 98C. This is probably because the model's space-time separable re(cid:173)\nceptive field misses motion energy cues that drive the cell. The model completely \nfailed to fit the data from cell 87 A. This cell was not orientation-selective, so the \nfitting procedure was unable to find an appropriate orientation for the CRF\u00a2(x, y) \nGabor function. 5 \n\nCRF Energy Mechanism with Suppressive Surround: Table 2 lists the im(cid:173)\nprovements in correlation obtained by adding the gain control term (a > 0, fJ = 0 \nin Equation 1). For cell 97 A (all three movies) the best correlations are obtained \nwhen the surround effects are delayed by 56 ms relative to the center. The best \ncorrelation for cell 98C is obtained when the surround is not delayed. \n\nIn three out of four cases the correlation values are barely improved when the \nsurround effects are included, suggesting that the cells were not strongly surround(cid:173)\ninhibited by these review movies. However, the improvement is quite striking in the \n\nSFor cell 87 A the correlation values in the orientation and spatial frequency parameter \nsubspace contained three roughly equivalent maxima. Contamination by multiple cells \nwas unlikely due to this cell's excellent isolation. \n\n\fModeling Complex Cells during Natural Image Viewing \n\n241 \n\nCell \nMovie \nrE+G \n~r \n\n97A \nA \n0.42 \n\n97A \nB \n0.56 \n+0.01 +0.13 \n\n97A \nC \n0.60 \n+0.01 \n\n98C \nA \n0.37 \n+0.02 \n\nTable 2: Correlation improvements due to surround gain control mechanism. rE+G \ngives the correlation value between the best fit model and the data. ~r gives the \nimprovement over rEo Including G in Equation 1 leads to a dramatic correlation \nincrease for cell 97 A, movie B, but not for the other review movies. \n\ncase of cell 97 A, movie B. Figure 2 compares the data with a model using both Ecr f \nand G in Equation 1. Here the delayed surround suppresses the sustained responses \nseen in Figure 1 and results in a more phasic model PSTH that closely matches the \ndata. \nWe consider G and T fields both independently and in combination. For each \nwe independently fit for Q, {3, &, and the size of the suppressive fields. However, \nthe oriented Fourier power correlates with the total Fourier power for our sample of \nnatural images, so that G and T are highly correlated. Combined fitting of G and T \nterms leads to competition and dominance by G (i.e. (3 -r 0). In this paper we only \nreport the effects of the gain control mechanism; the texture contrast mechanism \nresults in similar (though slightly degraded) results. \n\n1~--~----~----~----~----~----~---.----~----~ \n\ncO.8 \n..... \n..... \n...... \n~ \n~ ct 0.6 \n] ..... \n0-\n~ 0.4 \n..... \n> \n~ ...... \n0::: 0.2 \n\n(I) \n\no o \n\n1 \n\n2 \n\n3 456 \n\nTime (seconds) \n\n7 \n\n8 \n\n9 \n\nFigure 2: C RF energy mechanism with delayed surround gain control versus data (Cell \n97A, Movie B). Color scheme as in Figure 1. The inclusion of the delayed G term results \nin a more phasic model response which greatly improves the match between model and \ndata. \n\n\f242 \n\nW. E. Vinje and 1. L. Gallant \n\n4 DISCUSSION \nThis preliminary study suggests that models of the form outlined here show great \npromise for describing the responses of area V1 cells during natural vision. For \ncomparison consider the correlation values obtained from an earlier neural network \nmodel that attempted to reproduce V1 cells' responses to a variety of spatial pat(cid:173)\nterns (Lehky et al. 1992). They report a median correlation value of 0.65 for \ncomplex stimuli, whereas the average correlation score from Table 2 is 0.49. This is \nremarkable considering that our model has only 7 free parameters, a very hmited \ndata set for fitting, doesn't yet consider color tuning or directional-selectivity and \nconsiders response across time. \n\nFuture implementations of the model will use a more sophisticated energy mech(cid:173)\nanism that allows for nonseparable space time receptive field structure and more \nrealistic temporal response dynamics. We will also incorporate more detail into \nthe surround mechanisms, such as asymmetric surround structure and a broadband \ntexture contrast term. \nBy abstracting physiological observation into approximate functional forms our \nmodel balances explanatory power against parametric complexity. A cascaded series \nof these models may form the foundation for future modeling of cells in extra-striate \nareas V2 and V4. Natural image stimuli may provide an appropriate stimulus set \nfor development and validation of these extrastriate models. \n\nAcknowledgements \nWe thank Joseph Rogers for assistance in this study, Maneesh Sahani for the ex(cid:173)\ntremely useful suggestion of fitting the CRF parameters, Charles Connor for help \nwith data collection and David Van Essen for support of data collection. \n\nReferences \nAdelson, E. H. & Bergen, J. R. (1985) Spatiotemporal energy models for the per(cid:173)\nception of motion. Journal of the Optical Society of America, A, 2, 284-299. \nConnor, C. C., Preddie, D. C., Gallant, J . L. & Van Essen, D. C. (1997) Spatial \nattention effects in macaque area V4. Journal of Neuroscience, 77, 3201-3214. \nDe Valois, R. L. & De Valois, K. K. (1990) Spatial Vision. New York: Oxford \nUniversity Press. \nGallant, J. L., Connor, C. E., & Van Essen, D. C. (1998) Neural Activity in Areas \nV1 , V2 and V4 During Free Viewing of Natural Scenes Compared to Controlled \nViewing. NeuroReport, 9 . \n\nGeisler, W. S., Albrecht, D. G. (1997) Visual cortex neurons in monkeys and cats: \nDetection, discrimination, and identification. Visual Neuroscience, 14, 897-919. \n\nHeeger, D. J. (1992) Normalization of cell responses in cat striate cortex. Visual \nNeuroscience, 9, 181-198. \nKnierim, J . J . & Van Essen, D. C. (1992) Neuronal responses to static texture \npatterns in area V1 of the alert macaque monkey. Journal of Neurophysiology, 67, \n961-980. \nLehky, S. R., Sejnowski, T . J . & Desimone, R. (1992) Predicting Responses of \nNonlinear Neurons in Monkey Striate Cortex to Complex Patterns. Journal of \nNeuroscience, 12, 3568-3581. \nWilson, H. R. & Humanski, R. (1993) Spatial frequency adaptation and contrast \ngain control. Vision Research, 33, 1133-1149. \n\n\f", "award": [], "sourceid": 1403, "authors": [{"given_name": "William", "family_name": "Vinje", "institution": null}, {"given_name": "Jack", "family_name": "Gallant", "institution": null}]}