{"title": "A Theory of Retinal Population Coding", "book": "Advances in Neural Information Processing Systems", "page_first": 353, "page_last": 360, "abstract": null, "full_text": "A Theory of Retinal Population Coding\n\nEizaburo Doi Center for the Neural Basis of Cognition Carnegie Mellon University Pittsburgh, PA 15213 edoi@cnbc.cmu.edu\n\nMichael S. Lewicki Center for the Neural Basis of Cognition Carnegie Mellon University Pittsburgh, PA 15213 lewicki@cnbc.cmu.edu\n\nAbstract\nEfficient coding models predict that the optimal code for natural images is a population of oriented Gabor receptive fields. These results match response properties of neurons in primary visual cortex, but not those in the retina. Does the retina use an optimal code, and if so, what is it optimized for? Previous theories of retinal coding have assumed that the goal is to encode the maximal amount of information about the sensory signal. However, the image sampled by retinal photoreceptors is degraded both by the optics of the eye and by the photoreceptor noise. Therefore, de-blurring and de-noising of the retinal signal should be important aspects of retinal coding. Furthermore, the ideal retinal code should be robust to neural noise and make optimal use of all available neurons. Here we present a theoretical framework to derive codes that simultaneously satisfy all of these desiderata. When optimized for natural images, the model yields filters that show strong similarities to retinal ganglion cell (RGC) receptive fields. Importantly, the characteristics of receptive fields vary with retinal eccentricities where the optical blur and the number of RGCs are significantly different. The proposed model provides a unified account of retinal coding, and more generally, it may be viewed as an extension of the Wiener filter with an arbitrary number of noisy units.\n\n1 Introduction\nWhat are the computational goals of the retina? The retina has numerous specialized classes of retinal ganglion cells (RGCs) that are likely to subserve a variety of different tasks [1]. An important class directly subserving visual perception is the midget RGCs (mRGCs) which constitute 70% of RGCs with an even greater proportion at the fovea [1]. The problem that mRGCs face should be to maximally preserve signal information in spite of the limited representational capacity, which is imposed both by neural noise and the population size. This problem was recently addressed (although not specifically as a model of mRGCs) in [2], which derived the theoretically optimal linear coding method for a noisy neural population. This model is not appropriate, however, for the mRGCs, because it does not take into account the noise in the retinal image (Fig. 1). Before being projected on the retina, the visual stimulus is distorted by the optics of the eye in a manner that depends on eccentricity [3]. This retinal image is then sampled by cone photoreceptors whose sampling density also varies with eccentricity [1]. Finally, the sampled image is noisier in the dimmer illumination condition [4]. We conjecture that the computational goal of mRGCs is to represent the maximum amount of information about the underlying, non-degraded image signal subject to limited coding precision and neural population size. Here we propose a theoretical model that achieves this goal. This may be viewed as a generalization of both Wiener filtering [5] and robust coding [2]. One significant characteristic of the proposed model is that it can make optimal use of an arbitrary number of neurons in order to preserve the maximum amount of signal information. This allows the model to predict theoretically optimal representations at any retinal eccentricity in contrast to the earlier studies [4, 6, 7, 8].\n\n\f\n(a) Undistorted image\n\n(b) Fovea\n\nretinal image\n\n(c) 40 degrees eccentricity\n\nIntensity\n\nVisual angle [arc min]\n\n-4\n\n0\n\n4\n\n-4\n\n0\n\n4\n\nFigure 1: Simulation of retinal images at different retinal eccentricities. (a) Undistorted image signal. (b) The convolution kernel at the fovea [3] superimposed on the photoreceptor array indicated by triangles under the x-axis [1]. (c) The same as in (b) but at 40 degrees of retinal eccentricity.\n\n2 The model\nFirst let us define the problem (Fig. 2). We assume that data sampled by photoreceptors (referred to as the observation) x RN are blurred versions of the underlying image signal s RN with 2 additive white noise N (0, IN ), x = Hs + (1) where H RN N implements the optical blur. To encode the image, we assume that the observation is linearly transformed into an M -dimensional representation. To model limited neural pre2 cision, it is assumed that the representation is subject to additive channel noise, N (0, IM ). The noisy neural representation is therefore expressed as r = W(Hs + ) + \nM N\n\n(2)\n\nwhere each row of W R corresponds to a receptive field. To evaluate the amount of signal information preserved in the representation, we consider a linear reconstruciton ^ = Ar where s A RN M . The residual is given by = (IN - AWH)s - AW - A , where IN is the N -dimensional identity matrix, and the mean squared error (MSE) is\n2 2 E = tr[s ] - 2 tr[AWHs ] + tr[AW(Hs HT + IN )WT AT ] + tr[AAT ]\n\n(3) (4)\n\nwith E = tr T by definition, the average over samples, and s the covariance matrix of the image signal s. The problem is to find W and A that minimize E . To model limited neural capacity, the representation r must have limited SNR. This constraint is T 2 equivalent to fixing the variance of filter output wj x = u , where wj is the j -th row of W (here we assume all neurons have the same capacity). It is expressed in the matrix form as\n2 diag[Wx WT ] = u 1M\n\n(5)\n\n2 where x = Hs HT + IN is the covariance of the observation. It can further be simplified to\n\ndiag[VVT ] W\nsensory noise\n\n= 1M , = u VS-1 ET , x\nchannel noise\n\n(6) (7)\n\n\noptical blur encoder\n\n\ndecoder\n\ns\nimage\n\nH\n\nx\nobservation\n\nW\n\nr\nrepresentation\n\nA\n\n^ s\nreconstruction\n\n2 Figure 2: The model diagram. If there is no degradation of the image (H = I and = 0), the 2 model is reduced to the original robust coding model [2]. If channel noise is zero as well ( = 0), it boils down to conventional block coding such as PCA, ICA, or wavelet transforms.\n\n\f\n 2 2 where Sx = diag( k 1 1 + , , N N + ) (the square root of x 's eigenvalues), and k are respectively e eigenvalues of H and s , and the columns of E are their common th eigenvectors1 . Note that k defines the modulation transfer function of the optical blur H, i.e., the attenuation of the amplitude of the signal along the k -th eigenvector. Now, the problem is to find V and A that minimize E . The optimal A should satisfy E / A = O, which yields\n2 2 = s HT WT [W(Hs HT + IN )WT + IM ]-1 (8) 2 = ESs P[IN + 2 VT V]-1 VT (9) u 2 2 where 2 = u / (neural SNR), Ss = diag( 1 , , N ), P = diag( 1 , , N ), and 2 k = k k /(k k + ) (the power ratio between the attenuated signal and that signal plus sensory noise; as we will see below, k characterizes the generalized solutions of robust coding, and if there is neither sensory noise nor optical blur, k becomes 1 that reduces the solutions of the current model to those in the original robust coding model [2]). This implies that the optimal A is determined once the optimal V is found.\n\nA\n\nWith eqn. 7 and 9, E becomes E= kN\n=1\n\nk (1 - k ) + tr[Ss 2 P2 (IN + 2 VT V)-1 ].\n\n(10)\n\nFinally, the problem is reduced to finding V that minimizes eqn. 10. Solutions for 2-D data In this section we present the explicit characterization of the optimal solutions for two-dimensional data. It entails under-complete, complete, and over-complete representations, and provides precise insights into the numerical solutions for the high-dimensional image data (Section 3). This is a generalization of the analysis in [2] with the addition of optical blur and additive sensory noise. From eqn. 6 we can parameterize V with \n\ncos 1 . . V= . cos M\n\n sin 1 . . . sin M\n\n(11)\n\nwhere j [0, 2 ), j = 1, , M , which yields M - 2 k2 (1 + 2 ) 2 2 + 1 2 (1 - 2 ) Re(Z ) E= k (1 - k ) + , (12) M 21 2 - 4 4 |Z |2 =1 2 +1 j with k k k and Z (cos 2j + i sin 2j ). In the following we analyze the cases when 1 = 2 and when 1 = 2 . Without loss of generality we consider 1 > 2 for the latter case. (In the previous analysis of robust coding [2], these cases depend only on the ratio between 1 and 2 , i.e., the isotropy of the data. In the current, general model, these also depend on the isotropy of the 2 optical blur (1 and 2 ) and the variance of sensory noise ( ), and no simple meaning is attatched to the individual cases.) 1). If 1 = 2 ( ): E in eqn. 10 becomes E= k2\n=1\n\nM k (1 - k ) +\n2\n\n2 + 1 . 21 2 + 1 - 4 4 |Z |2 2\n2\n\nM\n\n(13)\n\nTherefore, E is minimized when |Z |2 is minimized.\nThe eigenvectors of s and H are both Fourier basis functions because we assume that s are natural images [9] and H is a circulant matrix [10].\n1\n\n\f\n1-a). If M = 1 (single neuron case): By definition |Z |2 = 1, implying that E is constant for any 1 , + + 1 2 E= 2 = 1 , (14) 1 (1 - 1 ) + 2 2 (1 - 2 ) + 2 +1 +1 1 E 2 / 1 1 + T 0 W = u ( cos 1 sin 1 ) . (15) 2 0 1/ 2 2 + Because there is only one neuron, only one direction in the two dimensional space can be reconstructed, and eqn. 15 implies that any direction can be equally good. The first equality in eqn. 14 can be interpreted as the case when W represents the direction along the first eigenvector, and consequently, the whole data variance along the second eigenvector 2 is left in the error E . 1-b). If M 2 (multiple neuron case): There always exists Z that satisfies |Z | = 0 if M 2, with which E is minimized [2]. Accordingly, , k2 k E= (16) k (1 - k ) + M 2 2 +1 =1 1 E 2 / 1 1 + T 0 W = u V , (17) 2 0 1/ 2 2 + where V is arbitrary as long as it satisfies |Z | = 0. Note that W takes the same form as in M = 1 except that there are more than two neurons. Also, eqn.16 shares the second term with eqn. 14 except that the SNR of the representation 2 is multiplied by M /2. It implies that having n times the neurons is equivalent to increasing the representation SNR by the factor of n (this relation generally holds in the multiple neuron cases below). 2). If 1 > 2 : Eqn. 12 is minimized when Z = Re(Z ) 0 for a fixed value of |Z |2 . Therefore, the problem is reduced to seeking a real value Z = y [0, M ] that minimizes - 2 M k2 (1 + 2 ) 2 2 + 1 2 (1 - 2 ) y . (18) k (1 - k ) + E= M 21 2+1 - 4 4 y2 =1 2 2-a). If M = 1 (single neuron case): Z = Re(Z ) holds iff 1 = 0. Accordingly, + 1 E= 2 , 1 (1 - 1 ) + 2 +1 u eT . W= 21 1 + 1\n\n(19) (20)\n\nThese take the same form as in the case of 1 = 2 and M = 1 (eqn. 14-15) except that the direction of the representation is specified along the first eigenvector e1 , indicating that all the representational resources (namely, one neuron) are devoted to the largest data variance direction. 2-b). If M 2 (multiple neuron case): From eqn. 18, the necessary condition for the minimun dE /dy = 0 yields M - M - = 1 - 2 2 1 + 2 2 +2 y +2 y 0. (21) 1 + 2 1 - 2 The existence of a root y in the domain [0, M ] depends on how 2 compares to the next quantity, which is a generalized form of the critical point of neural precision [2]: . 1 1 2 c = -1 (22) M 2\n2 2-b-i). If 2 < c : dE /dy = 0 does not have a root within the domain. Since dE /dy is always negative, E is minimized when y = M . Accordingly, + 1 E= 2 , (23) 1 (1 - 1 ) + M 2 + 1 u W= 1M e T . (24) 1 2 1 1 + \n\n\f\nThese solutions are the same as in M = 1 (eqn. 19-20) except that the neural SNR 2 is multiplied by M to yield smaller MSE.\n2 2-b-ii). If 2 c : Eqn. 21 has a root within [0, M ], 2 1 - 2 y= +M 1 + 2 2 2 with y = M if 2 = c . The optimal solutions are\n\n, (25)\n\nE W\n\n ( 1 + 2 )2 = k (1 - k ) + M 2 , 2 2 +1 =1 E 1 2 / 1 1 + T 0 , = u V 2 0 1/ 2 2 + k2 1\n\n(26) (27)\n\nwhere V is arbitrary up to satisfying eqn. 25. In Fig. 3 we illustrate some examples of explicit solutions for 2-D data with two neurons. The general strategy of the proposed model is to represent the principal axis of the signal s more accurately as the signal is more degraded (by optical blur and/or sensory noise). Specifically, the two neurons come to represent the identical dimension when the degradation is sufficiently large.\n20dB 10dB 0dB -10dB\n\nFigure 3: Sensory noise changes the optimal linear filter. The gray (outside) and blue (inside) contours show the variance of the target and reconstructed signal, respectively, and the red (thick) bars the optimal linear filters when there are two neurons. The SNR of the observation is varied from 20 to -10 dB (column-wise). The bottom row is the case where the power of the signal's minor component is attenuated as in the optical blur (i.e., low pass filtering): (1 , 2 ) = (1, 0.1); while the top is without the blur: (1 , 2 ) = (1, 1). The neural SNR is fixed at 10 dB.\n\n3 Optimal receptive field populations\nWe applied the proposed model to a natural images data set [11] to obtain the theoretically optimal population coding for mRGCs. The optimal solutions were derived under the following biological constraints on the observation, or the photoreceptor response, x (Fig. 2). To model the retinal images at different retinal eccentricities, we used modulation transfer functions of the human eye [3] and cone photoreceptor densities of the human retina [1] (Fig. 1). The retinal image is further corrupted by the additive Gaussian noise to model the photon transduction noise by which the SNR of the observartion becomes smaller under dimmer illumination level [4]. This yields the observation at different retinal eccentricities. In the following, we present the optimal solutions for the fovea (where the most accurate visual information is represented while the receptive field characteristics are difficult to measure experimentally) and those at 40 degrees retinal eccentricity (where we can compare the model to recent physiological measurements in the primate retina [12]).\n\nblur\n\nno-blur\n\n\f\nThe information capacity of neural representations is limited by both the number of neurons and the precision of neural codes. The ratio of cone photoreceptors to mRGCs in the human retina is 1 : 2 at the fovea and 23 : 2 at 40 degrees [13]. We did not model neural rectification (separate on and off channels) and thus assumed the effective cell ratios as 1 : 1 and 23 : 1, respectively. We also fixed the neural SNR at 10 dB, equivalent to assuming 1.7 bits coding precision as in real neurons [14]. The optimal W can be derived with the gradient descent on E , and A can be derived from W using eqn. 8. As explained in Section 2, the solution must satisfy the variance constraint (eqn. 6). We formulate this as a constrained optimization problem [15]. The update rule for W is given by l W 2 n[diag(Wx WT )/u ] T T 2T W -A (AWH - IN )s H - A AW - diag x , (28) diag(Wx WT ) where is a positive constant that controls the strength of the variance constraint. Our initial results indicated that the optimal solutions are not unique and these solutions are equivalent in terms of MSE. We then imposed an additional neural resource constraint that j enalizes the spatial extent of p a receptive field: the constraint for the k -th neuron is defined by |Wkj |( dk2 + 1) where dkj j is the spatial distance between the j -th weight and the center of mass of all weights, and is a positive constant defining the strength of the spatial constraint. This assumption is consistent with the spatially restricted computation in the retina. If = 0, it imposes sparse weights [16], though not necessarily spatially localized. In our simulations we fixed = 0.5. For the fovea, we examined 1515 pixel image patches sampled from a large set of natural images, where each pixel corresponds to a cone photoreceptor. Since the cell ratio is assumed to be 1 : 1, there were 225 model neurons in the population. As shown in Fig. 4, the optimal filters show concentric center-surround organization that is well fit with a difference-of-Gaussian function (which is one major characteristic of mRGCs). The precise organization of the model receptive field changes according to the SNR of the observation: as the SNR decreases, the surround inhibition gradually disappears and the center becomes larger, which serves to remove sensory noise by averaging. As a population, this yields a significant overlap among adjacent receptive fields. In terms of spatial-frequency, this change corresponds to a shift from band-pass to low-pass filtering, which is consistent with psychophysical measurements of the human and the macaque [17].\n(a) 20dB\n10dB 0dB -10dB\n\n(d)\nMagnitude\n\n(b)\n\n20dB\n\n-10dB Spatial freq.\n\n(c)\n\nFigure 4: The model receptive fields at the fovea under different SNRs of the observation. (a) A cross-section of the two-dimensional receptive field. (b) Six examples of receptive fields. (c) The tiling of a population of receptive fields in the visual field. The ellipses show the contour of receptive fields at half the maximum. One pair of adjacent filters are highlighted for clarity. The scale bar indicates an interval of three photoreceptors. (d) Spatial-frequency profiles (modulation transfer functions) of the receptive fields at different SNRs. For 40 degrees retinal eccentricity, we examined 3535 photoreceptor array that are projected to 53 model neurons (so that the cell ratio is 23 : 1). The general trend of the results is the same as in the fovea except that the receptive fields are much larger. This allows the fewer neurons in the population to completely tile the visual field. Furthermore, the change of the receptive field with the sensory noise level is not as significant as that predicted for the fovea, suggesting that the SNR is a\n\n\f\nless significant factor when neural number is severely limited. We also note that the elliptical shape of the extent of the receptive fields matches experimental observations [12].\n20dB -10dB\n\nFigure 5: The theoretically-derived receptive fields for 40 degrees of the retinal eccentricity. Captions as in Fig. 4. Finally, we demonstrate the performance of de-blurring, de-noising, and information preservation by these receptive fields (Fig. 6). The original image is well recovered in spite of both the noisy representation (10% of the code's variation is noise because of the 10 dB precision) and the noisy, degraded observation. Note that the 40 degrees eccentricity is subject to an additional, significant dimensionality reduction, which is why the reconstruction error (e.g., 34.8% at 20 dB) can be greater than the distortion in the observation (30.5%).\n20 dB observation fovea 25.7% -10 dB 1024.7% 20 dB 40 deg. 30.5% -10 dB 1029.5%\n\noriginal\n\nFigure 6: Reconstruction example. For both the fovea and 40 degrees retinal eccentricity, two sensory noise conditions are shown (20 and -10 dB). The percentages indicate the average distortion in the observation or the reconstruction error, respectively, over 60,000 samples. The blocking effect is caused by the implementation of the optical blur on each image patch using a matrix H instead of convolving the whole image.\n\n4 Discussion\nThe proposed model is a generalization of the robust coding model [2] and allows a complete characterization of the optimal representation as a function of both image degradation (optical blur and additive sensory noise) and limited neural capacity (neural precision and population size). If there 2 is no sensory noise = 0 and no optical blur H = IN , then k = 1 for all k , which reduces all the optimal solutions above to those reported in [2]. The proposed model may also be viewed as a generalization of the Wiener filter: if there is no 2 channel noise = 0 and the cell ratio is 1 : 1, and by assuming A IN without loss of generality, the problem is reformulated as finding W RN N that provides the best estimate of the original signal ^ = W(Hs + ) in terms of the MSE. The optimal solution is given by the Wiener filter: s E 1 1 N N T 2 , (29) , , W = s HT [Hs HT + IN ]-1 = E diag 2 2 1 1 + N N + \n\nreconstruction\n\n10.1%\n\n57.5%\n\n34.8%\n\n53.5%\n\n\f\nE = tr[s ] - tr[WHs ] =\n\nN\n\nk=1\n\nk (1 - k ),\n\n(30)\n\n(note that the diagonal matrix in eqn. 29 corresponds to the Wiener filter formula in the frequency domain [5]). This also implies that the Wiener filter is optimal only in the limiting case in our setting. Here, we have treated the model primarily as a theory of retinal coding, but its generality would allow it to be applied to a wide range of problems in signal processing. We should also note several limitations. The model assumes Gaussian signal structure. Modeling non-Gaussian signal distributions might account for coding efficiency constraints on the retinal population. The model is linear, but the framework allows for the incorporation of non-linear encoding and decoding methods, at the expense of analytic tractability. There have been earlier approaches to theoretically characterizing the retinal code [4, 6, 7, 8]. Our approach differs from these in several respects. First, it is not restricted to the so-called complete representation (M = N ) and can predict properties of mRGCs at any retinal eccentricity. Second, we do not assume a single, translation invariant filter and can derive the optimal receptive fields for a neural population. Third, we accurately model optical blur, retinal sampling, cell ratio, and neural precision. Finally, we assumed that, as in [4, 8], the objective of the retinal coding is to form the neural code that yields the minimum MSE with linear decoding, while others assumed it to form the neural code that maximally preserves information about signal [6, 7]. To the best of our knowledge, we don't know a priori which objective should be appropriate for the retinal coding. As suggested earlier [8], this issue could be resolved by comparing different theoretical predictions to physiological data.\n\nReferences\n[1] R. W. Rodieck. The First Steps in Seeing. Sinauer, MA, 1998. [2] E. Doi, D. C. Balcan, and M. S. Lewicki. A theoretical analysis of robust coding over noisy overcomplete channels. In Advances in Neural Information Processing Systems, volume 18. MIT Press, 2006. [3] R. Navarro, P. Artal, and D. R. Williams. Modulation transfer of the human eye as a function of retinal eccentricity. Journal of Optical Society of America A, 10:201212, 1993. [4] M. V. Srinivasan, S. B. Laughlin, and A. Dubs. Predictive coding: a fresh view of inhibition in the retina. Proc. R. Soc. Lond. B, 216:427459, 1982. [5] R. C. Gonzalez and R. E. Woods. Digital image processing. Prentice Hall, 2nd edition, 2002. [6] J. J. Atick and A. N. Redlich. Towards a theory of early visual processing. Neural Computation, 2:308 320, 1990. [7] J. H. van Hateren. Theoretical predictions of spatiotemporal receptive fields of fly LMCs, and experimental validation. J. Comp. Physiol. A, 171:157170, 1992. [8] D. L. Ruderman. Designing receptive fields for highest fidelity. Network, 5:147155, 1994. [9] D. J. Field. Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am. A, 4:23792394, 1987. [10] R. M. Gray. Toeplitz and circulant matrices: A review. Foundations and Trends in Communications and Information Theory, 2:155239, 2006. [11] E. Doi, T. Inui, T.-W. Lee, T. Wachtler, and T. J. Sejnowski. Spatiochromatic receptive field properties derived from information-theoretic analyses of cone mosaic responses to natural scenes. Neural Computation, 15:397417, 2003. [12] E. S. Frechette, A. Sher, M. I. Grivich, D. Petrusca, A. M. Litke, and E. J. Chichilnisky. Fidelity of the ensemble code for visual motion in primate retina. Journal of Neurophysiology, 94:119135, 2005. [13] C. A. Curcio and K. A. Allen. Topography of ganglion cells in human retina. Journal of Comparative Neurology, 300:525, 1990. [14] A. Borst and F. E. Theunissen. Information theory and neural coding. Nature Neuroscience, 2:947957, 1999. [15] E. Doi and M. S. Lewicki. Sparse coding of natural images using an overcomplete set of limited capacity units. In Advances in Neural Information Processing Systems, volume 17. MIT Press, 2005. [16] B. T. Vincent and R. J. Baddeley. Synaptic energy efficiency in retinal processing. Vision Research, 43:12831290, 2003. [17] R. L. De Valois, H. Morgan, and D. M. Snodderly. Psychophysical studies of monkey vision - III. Spatial luminance contrast sensitivity test of macaque and human observers. Vision Research, 14:7581, 1974.\n\n\f\n", "award": [], "sourceid": 3114, "authors": [{"given_name": "Eizaburo", "family_name": "Doi", "institution": null}, {"given_name": "Michael", "family_name": "Lewicki", "institution": null}]}