Odelia Schwartz, Eero Simoncelli
We explore the statistical properties of natural sound stimuli pre(cid:173) processed with a bank of linear filters. The responses of such filters exhibit a striking form of statistical dependency, in which the response variance of each filter grows with the response amplitude of filters tuned for nearby frequencies. These dependencies may be substantially re(cid:173) duced using an operation known as divisive normalization, in which the response of each filter is divided by a weighted sum of the recti(cid:173) fied responses of other filters. The weights may be chosen to maximize the independence of the normalized responses for an ensemble of natu(cid:173) ral sounds. We demonstrate that the resulting model accounts for non(cid:173) linearities in the response characteristics of the auditory nerve, by com(cid:173) paring model simulations to electrophysiological recordings. In previous work (NIPS, 1998) we demonstrated that an analogous model derived from the statistics of natural images accounts for non-linear properties of neurons in primary visual cortex. Thus, divisive normalization appears to be a generic mechanism for eliminating a type of statistical dependency that is prevalent in natural signals of different modalities.
Signals in the real world are highly structured. For example, natural sounds typically con(cid:173) tain both harmonic and rythmic structure. It is reasonable to assume that biological auditory systems are designed to represent these structures in an efficient manner [e.g., 1,2]. Specif(cid:173) ically, Barlow hypothesized that a role of early sensory processing is to remove redundancy in the sensory input, resulting in a set of neural responses that are statistically independent.
Experimentally, one can test this hypothesis by examining the statistical properties of neural responses under natural stimulation conditions [e.g., 3,4], or the statistical dependency of pairs (or groups) of neural responses. Due to their technical difficulty, such multi-cellular experiments are only recently becoming possible, and the earliest reports in vision appear consistent with the hypothesis [e.g., 5]. An alternative approach, which we follow here, is to develop a neural model from the statistics of natural signals and show that response properties of this model are similar to those of biological sensory neurons.
A number of researchers have derived linear filter models using statistical criterion. For vi(cid:173) sual images, this results in linear filters localized in frequency, orientation and phase [6, 7].
Similar work in audition has yielded filters localized in frequency and phase . Although these linear models provide an important starting point for neural modeling, sensory neu(cid:173) rons are highly nonlinear. In addition, the statistical properties of natural signals are too complex to expect a linear transformation to result in an independent set of components.
Recent results indicate that nonlinear gain control plays an important role in neural pro(cid:173) cessing. Ruderman and Bialek  have shown that division by a local estimate of standard deviation can increase the entropy of responses of center-surround filters to natural images. Such a model is consistent with the properties of neurons in the retina and lateral genicu(cid:173) late nucleus. Heeger and colleagues have shown that the nonlinear behaviors of neurons in primary visual cortex may be described using a form of gain control known as divisive normalization , in which the response of a linear kernel is rectified and divided by the sum of other rectified kernel responses and a constant. We have recently shown that the responses of oriented linear filters exhibit nonlinear statistical dependencies that may be substantially reduced using a variant of this model, in which the normalization signal is computed from a weighted sum of other rectified kernel responses [11, 12]. The resulting model, with weighting parameters determined from image statistics, accounts qualitatively for physiological nonlinearities observed in primary visual cortex.
In this paper, we demonstrate that the responses of bandpass linear filters to natural sounds exhibit striking statistical dependencies, analogous to those found in visual images. A divisive normalization procedure can substantially remove these dependencies. We show that this model, with parameters optimized for a collection of natural sounds, can account for nonlinear behaviors of neurons at the level of the auditory nerve. Specifically, we show that: 1) the shape offrequency tuning curves varies with sound pressure level, even though the underlying linear filters are fixed; and 2) superposition of a non-optimal tone suppresses the response of a linear filter in a divisive fashion, and the amount of suppression depends on the distance between the frequency of the tone and the preferred frequency of the filter.
1 Empirical observations of natural sound statistics
The basic statistical properties of natural sounds, as observed through a linear filter, have been previously documented by Attias . In particular, he showed that, as with visual images, the spectral energy falls roughly according to a power law, and that the histograms of filter responses are more kurtotic than a Gaussian (i.e., they have a sharp peak at zero, and very long tails).
Here we examine the joint statistical properties of a pair of linear filters tuned for nearby temporal frequencies. We choose a fixed set of filters that have been widely used in mod(cid:173) eling the peripheral auditory system . Figure 1 shows joint histograms of the instanta(cid:173) neous responses of a particular pair of linear filters to five different types of natural sound, and white noise. First note that the responses are approximately decorrelated: the expected value of the y-axis value is roughly zero for all values of the x-axis variable. The responses are not, however, statistically independent: the width of the distribution of responses of one filter increases with the response amplitude of the other filter. If the two responses were statistically independent, then the response of the first filter should not provide any information about the distribution of responses of the other filter. We have found that this type of variance dependency (sometimes accompanied by linear correlation) occurs in a wide range of natural sounds, ranging from animal sounds to music. We emphasize that this dependency is a property of natural sounds, and is not due purely to our choice of lin(cid:173) ear filters. For example, no such dependency is observed when the input consists of white noise (see Fig. 1).
The strength of this dependency varies for different pairs of linear filters . In addition, we see this type of dependency between instantaneous responses of a single filter at two