{"title": "Deep Learning Models of the Retinal Response to Natural Scenes", "book": "Advances in Neural Information Processing Systems", "page_first": 1369, "page_last": 1377, "abstract": "A central challenge in sensory neuroscience is to understand neural computations and circuit mechanisms that underlie the encoding of ethologically relevant, natural stimuli. In multilayered neural circuits, nonlinear processes such as synaptic transmission and spiking dynamics present a significant obstacle to the creation of accurate computational models of responses to natural stimuli. Here we demonstrate that deep convolutional neural networks (CNNs) capture retinal responses to natural scenes nearly to within the variability of a cell's response, and are markedly more accurate than linear-nonlinear (LN) models and Generalized Linear Models (GLMs). Moreover, we find two additional surprising properties of CNNs: they are less susceptible to overfitting than their LN counterparts when trained on small amounts of data, and generalize better when tested on stimuli drawn from a different distribution (e.g. between natural scenes and white noise). An examination of the learned CNNs reveals several properties.  First, a richer set of feature maps is necessary for predicting the responses to natural scenes compared to white noise.  Second, temporally precise responses to slowly varying inputs originate from feedforward inhibition, similar to known retinal mechanisms. Third, the injection of latent noise sources in intermediate layers enables our model to capture the sub-Poisson spiking variability observed in retinal ganglion cells.  Fourth, augmenting our CNNs with recurrent lateral connections enables them to capture contrast adaptation as an emergent property of accurately describing retinal responses to natural scenes.  These methods can be readily generalized to other sensory modalities and stimulus ensembles. Overall, this work demonstrates that CNNs not only accurately capture sensory circuit responses to natural scenes, but also can yield information about the circuit's internal structure and function.", "full_text": "Deep Learning Models of the Retinal Response to\n\nNatural Scenes\n\nLane T. McIntosh\u22171, Niru Maheswaranathan\u22171, Aran Nayebi1,\n\nSurya Ganguli2,3, Stephen A. Baccus3\n\n1Neurosciences PhD Program, 2Department of Applied Physics, 3Neurobiology Department\n\n{lmcintosh, nirum, anayebi, sganguli, baccus}@stanford.edu\n\nStanford University\n\nAbstract\n\nA central challenge in sensory neuroscience is to understand neural computations\nand circuit mechanisms that underlie the encoding of ethologically relevant, natu-\nral stimuli. In multilayered neural circuits, nonlinear processes such as synaptic\ntransmission and spiking dynamics present a signi\ufb01cant obstacle to the creation of\naccurate computational models of responses to natural stimuli. Here we demon-\nstrate that deep convolutional neural networks (CNNs) capture retinal responses to\nnatural scenes nearly to within the variability of a cell\u2019s response, and are markedly\nmore accurate than linear-nonlinear (LN) models and Generalized Linear Mod-\nels (GLMs). Moreover, we \ufb01nd two additional surprising properties of CNNs:\nthey are less susceptible to over\ufb01tting than their LN counterparts when trained\non small amounts of data, and generalize better when tested on stimuli drawn\nfrom a different distribution (e.g. between natural scenes and white noise). An\nexamination of the learned CNNs reveals several properties. First, a richer set\nof feature maps is necessary for predicting the responses to natural scenes com-\npared to white noise. Second, temporally precise responses to slowly varying\ninputs originate from feedforward inhibition, similar to known retinal mechanisms.\nThird, the injection of latent noise sources in intermediate layers enables our model\nto capture the sub-Poisson spiking variability observed in retinal ganglion cells.\nFourth, augmenting our CNNs with recurrent lateral connections enables them to\ncapture contrast adaptation as an emergent property of accurately describing retinal\nresponses to natural scenes. These methods can be readily generalized to other\nsensory modalities and stimulus ensembles. Overall, this work demonstrates that\nCNNs not only accurately capture sensory circuit responses to natural scenes, but\nalso can yield information about the circuit\u2019s internal structure and function.\n\n1\n\nIntroduction\n\nA fundamental goal of sensory neuroscience involves building accurate neural encoding models that\npredict the response of a sensory area to a stimulus of interest. These models have been used to shed\nlight on circuit computations [1, 2, 3, 4], uncover novel mechanisms [5, 6], highlight gaps in our\nunderstanding [7], and quantify theoretical predictions [8, 9].\nA commonly used model for retinal responses is a linear-nonlinear (LN) model that combines a linear\nspatiotemporal \ufb01lter with a single static nonlinearity. Although LN models have been used to describe\nresponses to arti\ufb01cial stimuli such as spatiotemporal white noise [10, 2], they fail to generalize to\nnatural stimuli [7]. Furthermore, the white noise stimuli used in previous studies are often low\nresolution or spatially uniform and therefore fail to differentially activate nonlinear subunits in the\n\n\u2217These authors contributed equally to this work.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fFigure 1: A schematic of the model architecture. The stimulus was convolved with 8 learned\nspatiotemporal \ufb01lters whose activations were recti\ufb01ed. The second convolutional layer then projected\nthe activity of these subunits through spatial \ufb01lters onto 16 subunit types, whose activity was linearly\ncombined and passed through a \ufb01nal soft rectifying nonlinearity to yield the predicted response.\n\nretina, potentially simplifying the retinal response to such stimuli [11, 12, 2, 10, 13]. In contrast to\nthe perceived linearity of the retinal response to coarse stimuli, the retina performs a wide variety of\nnonlinear computations including object motion detection [6], adaptation to complex spatiotemporal\npatterns [14], encoding spatial structure as spike latency [15], and anticipation of periodic stimuli\n[16], to name a few. However it is unclear what role these nonlinear computational mechanisms have\nin generating responses to more general natural stimuli.\nTo better understand the visual code for natural stimuli, we modeled retinal responses to natural image\nsequences with convolutional neural networks (CNNs). CNNs have been successful at many pattern\nrecognition and function approximation tasks [17]. In addition, these models cascade multiple layers\nof spatiotemporal \ufb01ltering and recti\ufb01cation\u2013exactly the elementary computational building blocks\nthought to underlie complex functional responses of sensory circuits. Previous work utilized CNNs\nto gain insight into the neural computations of inferotemporal cortex [18], but these models have\nnot been applied to early sensory areas where knowledge of neural circuitry can provide important\nvalidation for such models.\nWe \ufb01nd that deep neural network models markedly outperform previous models in predicting retinal\nresponses both for white noise and natural scenes. Moreover, these models generalize better to unseen\nstimulus classes, and learn internal features consistent with known retinal properties, including\nsub-Poisson variability, feedforward inhibition, and contrast adaptation. Our \ufb01ndings indicate that\nCNNs can reveal both neural computations and mechanisms within a multilayered neural circuit\nunder natural stimulation.\n\n2 Methods\n\nThe spiking activity of a population of tiger salamander retinal ganglion cells was recorded in response\nto both sequences of natural images jittered with the statistics of eye movements and high resolution\nspatiotemporal white noise. Convolutional neural networks were trained to predict ganglion cell\nresponses to each stimulus class, simultaneously for all cells in the recorded population of a given\nretina. For a comparison baseline, we also trained linear-nonlinear models [19] and generalized\nlinear models (GLMs) with spike history feedback [2]. More details on the stimuli, retinal recordings,\nexperimental structure, and division of data for training, validation, and testing are given in the\nSupplemental Material.\n\n2.1 Architecture and optimization\n\nThe convolutional neural network architecture is shown in Figure 2.1. Model parameters were\noptimized to minimize a loss function corresponding to the negative log-likelihood under Poisson\nspike generation. Optimization was performed using ADAM [20] via the Keras and Theano software\nlibraries [21]. The networks were regularized with an (cid:96)2 weight penalty at each layer and an (cid:96)1\nactivity penalty at the \ufb01nal layer, which helped maintain a baseline \ufb01ring rate near 0 Hz.\n\n2\n\n\u2026\u2026time8 subunits16 subunitsconvolutionconvolutiondenseresponses\fWe explored a variety of architectures for the CNN, varying the number of layers, number of \ufb01lters per\nlayer, the type of layer (convolutional or dense), and the size of the convolutional \ufb01lters. Increasing\nthe number of layers increased prediction accuracy on held-out data up to three layers, after which\nperformance saturated. One implication of this architecture search is that LN-LN cascade models \u2013\nwhich are equivalent to a 2-layer CNN \u2013 would also underperform 3 layer CNN models.\nContrary to the increasingly small \ufb01lter sizes used by many state-of-the-art object recognition\nnetworks, our networks had better performance using \ufb01lter sizes in excess of 15x15 checkers. Models\nwere trained over the course of 100 epochs, with early-stopping guided by a validation set. See\nSupplementary Materials for details on the baseline models we used for comparison.\n\n3 Results\n\nWe found that convolutional neural networks were substantially better at predicting retinal responses\nthan either linear-nonlinear (LN) models or generalized linear models (GLMs) on both white noise\nand natural scene stimuli (Figure 2).\n\n3.1 Performance\n\nFigure 2: Model performance. (A,B) Correlation coef\ufb01cients between the data and CNN, GLM or\nLN models for white noise and natural scenes. Dotted line indicates a measure of retinal reliability\n(See Methods). (C) Receiver Operating Characteristic (ROC) curve for spike events for CNN, GLM\nand LN models. (D) Spike rasters of one example retinal ganglion cell responding to 6 repeated\ntrials of the same randomly selected segment of the natural scenes stimulus (black) compared to the\npredictions of the LN (red), GLM (green), or CNN (blue) model with Poisson spike generation used\nto generate model rasters. (E) Peristimulus time histogram (PSTH) of the spike rasters in (D).\n\n3\n\nEABWhite noiseNatural scenesCNNLNDRetinal reliabilityCData6 trialsGLMCNNLNGLMCNNLNGLMFiring Rate (Hz)Time (seconds)ROC Curve for Natural Scenes\fLN models and GLMs failed to capture retinal responses to natural scenes (Figure 2B) consistent with\nprevious results [7]. In addition, we also found that LN models only captured a small fraction of the\nresponse to high resolution spatiotemporal white noise, presumably because of the \ufb01ner resolution that\nwere used (Figure 2A). In contrast, CNNs approach the reliability of the retina for both white noise\nand natural scenes. Using other metrics, including fraction of explained variance, log-likelihood, and\nmean squared error, CNNs showed a robust gain in performance over previously described sensory\nencoding models.\nWe investigated how model performance varied as a function of training data, and found that LN\nmodels were more susceptible to over\ufb01tting than CNNs, despite having fewer parameters (Figure 4A).\nIn particular, a CNN model trained using just 25 minutes of data had better held out performance\nthan an LN model \ufb01t using the full 60 minute recording. We expect that both depth and convolutional\n\ufb01lters act as implicit regularizers for CNN models, thereby increasing generalization performance.\n\n3.2 CNN model parameters\n\nFigure 3 shows a visualization of the model parameters learned when a convolutional network is\ntrained to predict responses to either white noise or natural scenes. We visualized the average feature\nrepresented by a model unit by computing a response-weighted average for that unit. Models trained\non white noise learned \ufb01rst layer features with small (\u223c200 \u00b5m) receptive \ufb01eld widths (top left box\nin Figure 3), whereas the natural scene model learns spatiotemporal \ufb01lters with overall lower spatial\nand temporal frequencies. This is likely in part due to the abundance of low spatial frequencies\npresent in natural images [22]. We see a greater diversity of spatiotemporal features in the second\nlayer receptive \ufb01elds compared to the \ufb01rst (bottom panels in Figure 3). Additionally, we see more\ndiversity in models trained on natural scenes, compared to white noise.\n\nFigure 3: Model parameters visualized by computing a response-weighted average for different\nmodel units, computed for models trained on spatiotemporal white noise stimuli (left) or natural\nimage sequences (right). Top panel (purple box): visualization of units in the \ufb01rst layer. Each 3D\nspatiotemporal receptive \ufb01eld is displayed via a rank-one decomposition consisting of a spatial \ufb01lter\n(top) and temporal kernel (black traces, bottom). Bottom panel (green box): receptive \ufb01elds for the\nsecond layer units, again visualized using a rank-one decomposition. Natural scenes models required\nmore active second layer units, displaying a greater diversity of spatiotemporal features. Receptive\n\ufb01elds are cropped to the region of space where the subunits have non-zero sensitivity.\n\n3.3 Generalization across stimulus distributions\n\nHistorically, much of our understanding of the retina comes from \ufb01tting models to responses to\narti\ufb01cial stimuli and then generalizing this understanding to cases where the stimulus distribution is\nmore natural. Due to the large difference between arti\ufb01cial and natural stimulus distributions, it is\nunknown what retinal encoding properties generalize to a new stimulus.\n\n4\n\n\fFigure 4: CNNs over\ufb01t less and generalize better across stimulus class as compared to simpler models.\n(A) Held-out performance curves for CNN (\u223c150,000 parameters) and GLM/LN models cropped\naround the cell\u2019s receptive \ufb01eld (\u223c4,000 parameters) as a function of the amount of training data. (B)\nCorrelation coef\ufb01cients between responses to natural scenes and models trained on white noise but\ntested on natural scenes. See text for discussion.\n\nWe explored what portion of CNN, GLM, and LN model performance is speci\ufb01c to a particular\nstimulus distribution (white noise or natural scenes), versus what portion describes characteristics\nof the retinal response that generalize to another stimulus class. We found that CNNs trained\non responses to one stimulus class generalized better to a stimulus distribution that the model\nwas not trained on (Figure 4B). Despite LN models having fewer parameters, they nonetheless\nunderperform larger convolutional neural network models when predicting responses to stimuli not\ndrawn from the training distribution. GLMs faired particularly poorly when generalizing to natural\nscene responses, likely because changes in mean luminance result in pathological \ufb01ring rates after\nthe GLM\u2019s exponential nonlinearity. Compared to standard models, CNNs provide a more accurate\ndescription of sensory responses to natural stimuli even when trained on arti\ufb01cial stimuli (Figure 4B).\n\n3.4 Capturing uncertainty of the neural response\n\nIn addition to describing the average response to a particular stimulus, an accurate model should also\ncapture the variability about the mean response. Typical noise models assume i.i.d. Poisson noise\ndrawn from a deterministic mean \ufb01ring rate. However, the variability in retinal spiking is actually\nsub-Poisson, that is, the variability scales with the mean but then increases sublinearly at higher mean\nrates [23, 24]. By training models with injected noise [25], we provided a latent noise source in the\nnetwork that models the unobserved internal variability in the retinal population. Surprisingly, the\nmodel learned to shape injected Gaussian noise to qualitatively match the shape of the true retinal\nnoise distribution, increasing with the mean response but growing sublinearly at higher mean rates\n(Figure 5). Notably, this relationship only arises when noise is injected during optimization\u2013injecting\nGaussian noise in a pre-trained network simply produced a linear scaling of the noise variance as a\nfunction of the mean.\n\n3.5 Feedforward inhibition shapes temporal responses in the model\n\nTo understand how a particular model response arises, we visualized the \ufb02ow of signals through the\nnetwork. One prominent aspect of the difference between CNN and LN model responses is that CNNs\nbut not LN models captured the precise timing and short duration of \ufb01ring events. By examining\nthe responses to the internal units of CNNs in time and averaged over space (Figure 6 A-C), we\nfound that in both convolutional layers, different units had either positive or negative responses to the\nsame stimuli, equivalent to excitation and inhibition as found in the retina. Precise timing in CNNs\narises by a timed combination of positive and negative responses, analogous to feedforward inhibition\nthat is thought to generate precise timing in the retina [26, 27]. To examine network responses in\n\n5\n\n\fFigure 5: Training with added noise recovers retinal sub-Poisson noise scaling property. (A) Variance\nversus mean spike count for CNNs with various strengths of injected noise (from 0.1 to 10 standard\ndeviations), as compared to retinal data (black) and a Poisson distribution (dotted red). (B) The same\nplot as A but with each curve normalized by the maximum variance. (C) Variance versus mean spike\ncount for CNN models with noise injection at test time but not during training.\n\nspace, we selected a particular time in the experiment and visualized the activation maps in the \ufb01rst\n(purple) and second (green) convolutional layers (Figure 6D). A given image is shown decomposed\nthrough multiple parallel channels in this manner. Finally, Figure 6E highlights how the temporal\nautocorrelation in the signals at different layers varies. There is a progressive sharpening of the\nresponse, such that by the time it reaches the model output the predicted responses are able to mimic\nthe statistics of the real \ufb01ring events (Figure 6C).\n\n3.6 Feedback over long timescales\n\nRetinal dynamics are known to exceed the duration of the \ufb01lters that we used (400 ms). In particular,\nchanges in stimulus statistics such as luminance, contrast and spatio-temporal correlations can\ngenerate adaptation lasting seconds to tens of seconds [5, 28, 14]. Therefore, we additionally\nexplored adding feedback over longer timescales to the convolutional network.\nTo do this, we added a recurrent neural network (RNN) layer with a history of 10s after the fully\nconnected layer prior to the output layer. We experimented with different recurrent architectures\n(LSTMs [29], GRUs [30], and MUTs [31]) and found that they all had similar performance to the\nCNN at predicting natural scene responses. Despite the similar performance, we found that the\nrecurrent network learned to adapt its response over the timescale of a few seconds in response to\nstep changes in stimulus contrast (Figure 7). This suggests that RNNs are a promising way forward\nto capture dynamical processes such as adaptation over longer timescales in an unbiased, data-driven\nmanner.\n\n4 Discussion\n\nIn the retina, simple models of retinal responses to spatiotemporal white noise have greatly in\ufb02uenced\nour understanding of early sensory function. However, surprisingly few studies have addressed\nwhether or not these simple models can capture responses to natural stimuli. Our work applies\nmodels with rich computational capacity to bear on the problem of understanding natural scene\nresponses. We \ufb01nd that convolutional neural network (CNN) models, sometimes augmented with\nlateral recurrent connections, well exceed the performance of other standard retinal models including\nLN and GLMs. In addition, CNNs are better at generalizing both to held-out stimuli and to entirely\ndifferent stimulus classes, indicating that they are learning general features of the retinal response.\nMoreover, CNNs capture several key features about retinal responses to natural stimuli where LN\nmodels fail. In particular, they capture: (1) the temporal precision of \ufb01ring events despite employing\n\ufb01lters with slower temporal frequencies, (2) adaptive responses during changing stimulus statistics,\nand (3) biologically realistic sub-Poisson variability in retinal responses. In this fashion, this work\nprovides the \ufb01rst application of deep learning to understanding early sensory systems under natural\nconditions.\n\n6\n\nVariance in Spike CountMean Spike CountAMean Spike CountNormalized Variancein Spike CountMean Spike CountBVariance in Spike CountCDataPoisson0.10.11.02.04.010.0DataPoisson1.02.04.010.0\fFigure 6: Visualizing the internal activity of a CNN in response to a natural scene stimulus. (A-C)\nTime series of the CNN activity (averaged over space) for the \ufb01rst convolutional layer (8 units, A),\nthe second convolutional layer (16 units, B), and the \ufb01nal predicted response for an example cell (C,\ncyan trace). The recorded (true) response is shown below the model prediction (C, gray trace) for\ncomparison. (D) Spatial activation of example CNN \ufb01lters at a particular time point. The selected\nstimulus frame (top, grayscale) is represented by parallel pathways encoding spatial information in\nthe \ufb01rst (purple) and second (green) convolutional layers (a subset of the activation maps is shown\nfor brevity). (E) Autocorrelation of the temporal activity in (A-C). The correlation in the recorded\n\ufb01ring rates is shown in gray.\n\nFigure 7: Recurrent neural network (RNN) layers capture response features occurring over multiple\nseconds. (A) A schematic of how the architecture from Figure 2.1 was modi\ufb01ed to incorporate the\nRNN at the last layer of the CNN. (B) Response of an RNN trained on natural scenes, showing a\nslowly adapting \ufb01ring rate in response to a step change in contrast.\n\nTo date, modeling efforts in sensory neuroscience have been most useful in the context of carefully\ndesigned parametric stimuli, chosen to illuminate a computation or mechanism of interest [32]. In\npart, this is due to the complexities of using generic natural stimuli. It is both dif\ufb01cult to describe\nthe distribution of natural stimuli mathematically (unlike white or pink noise), and dif\ufb01cult to \ufb01t\nmodels to stimuli with non-stationary statistics when those statistics in\ufb02uence response properties.\n\n7\n\n0Firing Rate(spikes/s)04242Time (s)StimulusIntensityRNN6LSTMABFull Field Flicker\fWe believe the approach taken in this paper provides a way forward for understanding general natural\nscene responses. We leverage the computational power and \ufb02exibility of CNNs to provide us with a\ntractable, accurate model that we can then dissect, probe, and analyze to understand what that model\ncaptures about the retinal response. This strategy of casting a wide computational net to capture\nneural circuit function and then constraining it to better understand that function will likely be useful\nin a variety of neural systems in response to many complex stimuli.\n\nAcknowledgments\n\nThe authors would like to thank Ben Poole and EJ Chichilnisky for helpful discussions related to this\nwork. Thanks also goes to the following institutions for providing funding and hardware grants, LM:\nNSF, NVIDIA Titan X Award, NM: NSF, AN and SB: NEI grants, SG: Burroughs Wellcome, Sloan,\nMcKnight, Simons, James S. McDonnell Foundations and the ONR.\n\nReferences\n[1] Tim Gollisch and Markus Meister. Eye smarter than scientists believed: neural computations in\n\ncircuits of the retina. Neuron, 65(2):150\u2013164, 2010.\n\n[2] Jonathan W Pillow, Jonathon Shlens, Liam Paninski, Alexander Sher, Alan M Litke,\nEJ Chichilnisky, and Eero P Simoncelli. Spatio-temporal correlations and visual signalling in a\ncomplete neuronal population. Nature, 454(7207):995\u2013999, 2008.\n\n[3] Nicole C Rust, Odelia Schwartz, J Anthony Movshon, and Eero P Simoncelli. Spatiotemporal\n\nelements of macaque v1 receptive \ufb01elds. Neuron, 46(6):945\u2013956, 2005.\n\n[4] David B Kastner and Stephen A Baccus. Coordinated dynamic encoding in the retina using\n\nopposing forms of plasticity. Nature neuroscience, 14(10):1317\u20131322, 2011.\n\n[5] Stephen A Baccus and Markus Meister. Fast and slow contrast adaptation in retinal circuitry.\n\nNeuron, 36(5):909\u2013919, 2002.\n\n[6] Bence P \u00d6lveczky, Stephen A Baccus, and Markus Meister. Segregation of object and back-\n\nground motion in the retina. Nature, 423(6938):401\u2013408, 2003.\n\n[7] Alexander Heitman, Nora Brackbill, Martin Greschner, Alexander Sher, Alan M Litke, and\nEJ Chichilnisky. Testing pseudo-linear models of responses to natural scenes in primate retina.\nbioRxiv, page 045336, 2016.\n\n[8] Joseph J Atick and A Norman Redlich. Towards a theory of early visual processing. Neural\n\nComputation, 2(3):308\u2013320, 1990.\n\n[9] Xaq Pitkow and Markus Meister. Decorrelation and ef\ufb01cient coding by retinal ganglion cells.\n\nNature neuroscience, 15(4):628\u2013635, 2012.\n\n[10] Jonathan W Pillow, Liam Paninski, Valerie J Uzzell, Eero P Simoncelli, and EJ Chichilnisky.\nPrediction and decoding of retinal ganglion cell responses with a probabilistic spiking model.\nThe Journal of Neuroscience, 25(47):11003\u201311013, 2005.\n\n[11] S Hochstein and RM Shapley. Linear and nonlinear spatial subunits in y cat retinal ganglion\n\ncells. The Journal of Physiology, 262(2):265, 1976.\n\n[12] Tim Gollisch. Features and functions of nonlinear spatial integration by retinal ganglion cells.\n\nJournal of Physiology-Paris, 107(5):338\u2013348, 2013.\n\n[13] Adrienne L Fairhall, C Andrew Burlingame, Ramesh Narasimhan, Robert A Harris, Jason L\nPuchalla, and Michael J Berry. Selectivity for multiple stimulus features in retinal ganglion\ncells. Journal of neurophysiology, 96(5):2724\u20132738, 2006.\n\n[14] Toshihiko Hosoya, Stephen A Baccus, and Markus Meister. Dynamic predictive coding by the\n\nretina. Nature, 436(7047):71\u201377, 2005.\n\n[15] Tim Gollisch and Markus Meister. Rapid neural coding in the retina with relative spike latencies.\n\nscience, 319(5866):1108\u20131111, 2008.\n\n8\n\n\f[16] Greg Schwartz, Rob Harris, David Shrom, and Michael J Berry. Detection and prediction of\n\nperiodic patterns by the retina. Nature neuroscience, 10(5):552\u2013554, 2007.\n\n[17] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436\u2013444,\n\n2015.\n\n[18] Daniel L Yamins, Ha Hong, Charles Cadieu, and James J DiCarlo. Hierarchical modular\noptimization of convolutional networks achieves representations similar to macaque it and\nhuman ventral stream. In Advances in neural information processing systems, pages 3093\u20133101,\n2013.\n\n[19] EJ Chichilnisky. A simple white noise analysis of neuronal light responses. Network: Computa-\n\ntion in Neural Systems, 12(2):199\u2013213, 2001.\n\n[20] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint\n\narXiv:1412.6980, 2014.\n\n[21] Fr\u00e9d\u00e9ric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian Goodfellow, Arnaud\nBergeron, Nicolas Bouchard, David Warde-Farley, and Yoshua Bengio. Theano: new features\nand speed improvements. arXiv preprint arXiv:1211.5590, 2012.\n\n[22] Aapo Hyv\u00e4rinen, Jarmo Hurri, and Patrick O Hoyer. Natural Image Statistics: A Probabilistic\nApproach to Early Computational Vision., volume 39. Springer Science &amp; Business Media,\n2009.\n\n[23] Michael J Berry, David K Warland, and Markus Meister. The structure and precision of retinal\n\nspike trains. Proceedings of the National Academy of Sciences, 94(10):5411\u20135416, 1997.\n\n[24] Rob R de Ruyter van Steveninck, Geoffrey D Lewen, Steven P Strong, Roland Koberle, and\nWilliam Bialek. Reproducibility and variability in neural spike trains. Science, 275(5307):1805\u2013\n1808, 1997.\n\n[25] Ben Poole, Jascha Sohl-Dickstein, and Surya Ganguli. Analyzing noise in autoencoders and\n\ndeep networks. arXiv preprint arXiv:1406.1831, 2014.\n\n[26] Botond Roska and Frank Werblin. Vertical interactions across ten parallel, stacked representa-\n\ntions in the mammalian retina. Nature, 410(6828):583\u2013587, 2001.\n\n[27] Botond Roska and Frank Werblin. Rapid global shifts in natural scenes block spiking in speci\ufb01c\n\nganglion cell types. Nature neuroscience, 6(6):600\u2013608, 2003.\n\n[28] Peter D Calvert, Victor I Govardovskii, Vadim Y Arshavsky, and Clint L Makino. Two temporal\nphases of light adaptation in retinal rods. The Journal of general physiology, 119(2):129\u2013146,\n2002.\n\n[29] Sepp Hochreiter and J\u00fcrgen Schmidhuber. Long short-term memory. Neural Computation,\n\n9(8):1735\u20131780, 1997.\n\n[30] Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. On the\nproperties of neural machine translation: Encoder\u2013decoder approaches. Eighth Workshop on\nSyntax, Semantics and Structure in Statistical Translation (SSST-8), pages 103\u2013111, 2014.\n\n[31] Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. An empirical exploration of recurrent\nnetwork architectures. Proceedings of the 32nd International Conference on Machine Learning,\n37:2342\u20132350, 2015.\n\n[32] Nicole C Rust and J Anthony Movshon. In praise of arti\ufb01ce. Nature neuroscience, 8(12):1647\u2013\n\n1650, 2005.\n\n9\n\n\f", "award": [], "sourceid": 752, "authors": [{"given_name": "Lane", "family_name": "McIntosh", "institution": "Stanford University"}, {"given_name": "Niru", "family_name": "Maheswaranathan", "institution": "Stanford University"}, {"given_name": "Aran", "family_name": "Nayebi", "institution": "Stanford University"}, {"given_name": "Surya", "family_name": "Ganguli", "institution": "Stanford"}, {"given_name": "Stephen", "family_name": "Baccus", "institution": "Stanford University"}]}