{"title": "A Probabilistic Model of Auditory Space Representation in the Barn Owl", "book": "Advances in Neural Information Processing Systems", "page_first": 1351, "page_last": 1358, "abstract": "", "full_text": "A probabilistic model of auditory space\n\nrepresentation in the barn owl\n\nBrian J. Fischer\n\nDept. of Electrical and Systems Eng.\nWashington University in St. Louis\n\nSt. Louis, MO 63110\n\nCharles H. Anderson\n\nDepartment of Anatomy and Neurbiology\n\nWashington University in St. Louis\n\nSt. Louis, MO 63110\n\nfischerb@pcg.wustl.edu\n\ncha@pcg.wustl.edu\n\nAbstract\n\nThe barn owl is a nocturnal hunter, capable of capturing prey using au-\nditory information alone [1]. The neural basis for this localization be-\nhavior is the existence of auditory neurons with spatial receptive \ufb01elds\n[2]. We provide a mathematical description of the operations performed\non auditory input signals by the barn owl that facilitate the creation of a\nrepresentation of auditory space. To develop our model, we \ufb01rst formu-\nlate the sound localization problem solved by the barn owl as a statistical\nestimation problem. The implementation of the solution is constrained\nby the known neurobiology.\n\n1 Introduction\n\nThe barn owl shows great accuracy in localizing sound sources using only auditory in-\nformation [1]. The neural basis for this localization behavior is the existence of auditory\nneurons with spatial receptive \ufb01elds called space speci\ufb01c neurons [2]. Experimental evi-\ndence supports the hypothesis that spatial selectivity in auditory neurons arises from tuning\nto a speci\ufb01c combination of the interaural time difference (ITD) and the interaural level\ndifference (ILD) [3]. Still lacking, however, is a complete account of how ITD and ILD\nspectra are integrated across frequency to give rise to spatial selectivity. We describe a\ncomputational model of the operations performed on the auditory input signals leading to\nan initial representation of auditory space. We develop the model in the context of a statis-\ntical estimation formulation of the localization problem that the barn owl must solve. We\nuse principles of signal processing and estimation theory to guide the construction of the\nmodel, but force the implementation to respect neurobiological constraints.\n\n2 The environment\n\nThe environment consists of Ns point sources and a source of ambient noise. Each point\nsource is de\ufb01ned by a sound signal, si(t), and a direction (\u03b8i, \u03c6i) where \u03b8i is the azimuth\nand \u03c6i is the elevation of the source relative to the owl\u2019s head. In general, source loca-\ntion may change over time. For simplicity, however, we assume that source locations are\n\ufb01xed. Source signals can be broadband or narrowband. Signals with onsets are modeled as\n\n\fbroadband noise signals modulated by a temporal envelope, si(t) = [\nwhere win(t) = Aine\u2212 1\nkHz (see \ufb01gure (4A)). The ambient noise is described below.\n\nn=1 win(t)]ni(t),\nin and ni(t) is Gaussian white noise bandlimited to 12\n\n2 (t\u2212cin)2/\u03c32\n\n(cid:80)Ni\n\n3 Virtual Auditory Space\n\nThe \ufb01rst step in the localization process is the location-dependent mapping of source signals\nto the received pressure waveforms at the eardrums. For a given source location, the system\ndescribing the transformation of a source signal to the waveform received at the eardrum is\nwell approximated by a linear system. This system is characterized by its transfer function\ncalled the head related transfer function (HRTF) or, equivalently, by its impulse response,\nthe head related impulse response (HRIR). Additionally, when multiple sources are present\nthe composite waveform at each ear is the sum of the waveforms received due to each\nsource alone. Therefore, we model the received pressure waveforms at the ears as\n\nNs(cid:88)\n\nNs(cid:88)\n\nrL(t) =\n\nhL(\u03b8i,\u03c6i)(t)\u2217si(t)+nL(t) and rR(t) =\n\nhR(\u03b8i,\u03c6i)(t)\u2217si(t)+nR(t) (1)\n\ni=1\n\ni=1\n\nwhere hL(\u03b8,\u03c6)(t) and hR(\u03b8,\u03c6)(t) are the HRIRs for the left and right ears, respectively, when\nthe source location is (\u03b8, \u03c6), [4], and nL(t), nR(t) are the ambient noises experienced by\nthe left and right ears, respectively. For our simulations, the ambient noise for each ear\nis created using a sample of a natural sound recording of a stream, sb(t) [5]. The sample\nis \ufb01ltered by HRIRs for all locations in the frontal hemisphere, \u2126, then averaged so that\nnL(t) = 1|\u2126|\n\n(cid:80)\ni\u2208\u2126 hL(\u03b8i,\u03c6i)(t) \u2217 sb(t) and nR(t) = 1|\u2126|\n\n(cid:80)\ni\u2208\u2126 hR(\u03b8i,\u03c6i)(t) \u2217 sb(t).\n\n4 Cue Extraction\n\nIn our model, location information is not inferred directly from the received signals but is\nobtained from stimulus-independent binaural location cues extracted from the input signals\n[6],[7]. The operations used in our model to process the auditory input signals and extract\ncues are motivated by the known processing in the barn owl\u2019s auditory system and by the\ndesire to extract stimulus-independent location cues from the auditory signals that can be\nused to infer the locations of sound sources.\n\n4.1 Cochlear processing\n\nIn the \ufb01rst stage of our model, input signals are \ufb01ltered with a bank of linear band-pass\n\ufb01lters. Following linear \ufb01ltering, input signals undergo half-wave recti\ufb01cation. So, the\ninput signals to the two ears rL(t) and rR(t) are decomposed into a set of scalar valued\nfunctions uL(t, \u03c9k) and uR(t, \u03c9k) de\ufb01ned by\n\nuL(t, \u03c9k) = [f\u03c9k (cid:63) rL(t)]+ and uR(t, \u03c9k) = [f\u03c9k (cid:63) rR(t)]+\n\n(2)\nwhere f\u03c9k(t) is the linear bandpass \ufb01lter for the channel with center frequency \u03c9k. Here\nwe use the standard gammatone \ufb01lter f\u03c9k(t) = t\u03b3\u22121e\u2212t/\u03c4k cos(\u03c9kt) with \u03b3 = 4 [8].\nFollowing recti\ufb01cation there is a gain control step that is a modi\ufb01ed version of the divisive\nnormalization model of Schwartz and Simoncelli [9]. We introduce intermediate variables\n\u03b3L(t, \u03c9k) and \u03b3R(t, \u03c9k) that dynamically compute the intensity of the signals within each\nfrequency channel as\n\n\u02d9\u03b3L(t, \u03c9k) = \u2212 \u03b3L(t, \u03c9k)\n\n\u03c4\n\n+\n\nuL(t, \u03c9k)\n\nn akn\u03b3(t, \u03c9n) + \u03c3\n\n(3)\n\n(cid:80)\n\n\f\u03c4\n\n\u02d9\u03b3R(t, \u03c9k) = \u2212 \u03b3R(t, \u03c9k)\n(cid:80)\n\nuL(t, \u03c9k)\n\nn akn\u03b3(t, \u03c9n) + \u03c3\n\n(cid:80)\n\n+\n\nuR(t, \u03c9k)\n\nn akn\u03b3(t, \u03c9n) + \u03c3\n\n(cid:80)\n\nuR(t, \u03c9k)\n\nn akn\u03b3(t, \u03c9n) + \u03c3\n\n(4)\n\n(5)\n\nand\n\nwhere \u03b3(t, \u03c9n) = \u03b3L(t, \u03c9k) + \u03b3R(t, \u03c9k). We de\ufb01ne the output of the cochlear \ufb01lter in\nfrequency channel k to be\n\nvL(t, \u03c9k) =\n\nand vR(t, \u03c9k) =\n\nfor the left and right, respectively. Note that the recti\ufb01ed outputs from the left and right ears,\nuL(t, \u03c9k) and uR(t, \u03c9k), are normalized by the same term so that binaural disparities are\nnot introduced by the gain control operation. Initial cue extraction operations are performed\nwithin distinct frequency channels established by this \ufb01ltering process.\n\n4.2 Level difference cues\n\n(cid:90) t\n\nThe level difference pathway has two stages. First, the outputs of the \ufb01lter banks are\nintegrated over time to obtain windowed intensity measures for the components of the\nleft and right ear signals. Next, signals from the left and right ears are combined within\neach frequency channel to measure the location dependent level difference. We compute\nthe intensity of the signal in each frequency channel over a small time window, w(t), as:\nvR(\u03c3, \u03c9k)w(t \u2212 \u03c3)d\u03c3. (6)\nyL(t, \u03c9k) =\nWe use a simple exponential window w(t) = e\u2212t/\u03c4 H(t) where H(t) is the unit step func-\ntion.\nThe magnitude of yL(t, \u03c9k) and yR(t, \u03c9k) vary with both the signal intensity and the gain\nof the HRIR in the frequency band centered at \u03c9k. To compute the level difference between\nthe input signals that is introduced by the HRIRs in a manner that is invariant to changes in\nthe intensity of the source signal we compute\n\nvL(\u03c3, \u03c9k)w(t \u2212 \u03c3)d\u03c3 and yR(t, \u03c9k) =\n\n(cid:90) t\n\n0\n\n0\n\nz(t, \u03c9k) = log( yR(t, \u03c9k)\nyL(t, \u03c9k)\n\n).\n\n(7)\n\n4.3 Temporal difference cues\n\nWe use a modi\ufb01ed version of the standard windowed cross correlation operation to measure\ntime differences. Our modi\ufb01cations incorporate three features that model processing in the\nbarn owl\u2019s auditory system. First, signals are passed through a saturating nonlinearity to\nmodel the saturation of the nucleus magnocellularis (NM) inputs to the nucleus laminaris\n(NL) [10]. We de\ufb01ne \u03c7L(t, \u03c9k) = F (vL(t, \u03c9k)) and \u03c7R(t, \u03c9k) = F (vR(t, \u03c9k)), where\nF (\u00b7) is a saturating nonlinearity. Let x(t, \u03c9k, m) denote the value of the cross correlation\nin frequency channel k at delay index m \u2208 {0, . . . , N}, de\ufb01ned by\n\u02d9x(t, \u03c9k, m) = \u2212 x(t, \u03c9k, m)\n+ [\u03c7L(t \u2212 \u2206m, \u03c9k) + \u03b1][\u03c7R(t \u2212 \u2206(N \u2212 m), \u03c9k) + \u03b2]. (8)\n\u03c4(y(t, \u03c9k))\nHere, \u03c4(y(t, \u03c9k)) is a time constant that varies with the intensity of the stimulus in the\nfrequency channel where y(t, \u03c9k) = yL(t, \u03c9k)+ yR(t, \u03c9k). The time constant decreases as\ny(t, \u03c9k) increases, so that for more intense sounds information is integrated over a smaller\ntime window. This operation functions as a gain control and models the inhibition of NL\nneurons by superior olive neurons [11]. The constants \u03b1, \u03b2 > 0 are included to re\ufb02ect the\nfact that NL neurons respond to monaural stimulation, [12], and are chosen so that at input\nlevels above threshold (0 \u2212 5 dB SPL) the cross correlation term dominates. We choose\nthe delay increment \u2206 to satisfy \u2206N = 200\u00b5s so that the full range of possible delays is\ncovered.\n\n\f5 Representing auditory space\n\nThe general localization problem that the barn owl must solve is that of localizing multiple\nobjects in its environment using both auditory and visual cues. An abstract discussion\nof a possible solution to the localization problem will motivate our model of the owl\u2019s\ninitial representation of auditory space. Let Ns(t) denote the number of sources at time\nt. Assume that each source is characterized by the direction pair (\u03b8i, \u03c6i) that obeys a\ndynamical system ( \u02d9\u03b8i, \u02d9\u03c6i) = f(\u03b8i, \u03c6i, \u00b5i) where \u00b5i is a noise term and f : R3 \u2192 R2 is a\npossibly nonlinear mapping. We assume that (\u03b8i(t), \u03c6i(t)) de\ufb01nes a stationary stochastic\nprocess with known density p(\u03b8i, \u03c6i) [6],[7]. At time t, let \u03bea\nt denote a vector of cues\ncomputed from auditory input and let \u03bev\nt denote a vector of cues computed from visual\ninput. The problem is to estimate, at each time, the number and locations of sources in\nthe environment using past measurements of the auditory and visual cues at a \ufb01nite set of\nsample times. A simple Bayesian approach is to introduce a minimal state vector \u03b1t =\n[\u03b8(t) \u03c6(t)]T where \u02d9\u03b1t = f(\u03b1t, \u00b5t) and compute the posterior density of \u03b1t given the\ncue measurements. Here the number and locations of sources can be inferred from the\nexistence and placement of multiple modes in the posterior. If we assume that the state\nsequence {\u03b1tn} is a Markov process and that the state is conditionally independent of past\ncue measurements given the present cue measurement, then we can recursively compute\nthe posterior through a process of prediction and correction described by the equations\n\n(cid:90) (cid:90)\n\np(\u03b1tn|\u03bet1:tn\u22121) =\n\np(\u03b1tn|\u03b1tn\u22121)p(\u03b1tn\u22121|\u03bet1:tn\u22121)d\u03b1tn\u22121\n\n(9)\ntn|\u03b1tn)p(\u03b1tn|\u03bet1:tn\u22121) (10)\nt ]T . This formulation suggests that at each time auditory space can be\n\np(\u03b1tn|\u03bet1:tn) \u221d p(\u03betn|\u03b1tn)p(\u03b1tn|\u03bet1:tn\u22121) = p(\u03bea\nwhere \u03bet = [\u03bea\nrepresented in terms of the likelihood function p(\u03bea\n\ntn|\u03b1tn)p(\u03bev\nt |\u03b8(t), \u03c6(t)).\n\nt \u03bev\n\n6 Combining temporal and intensity difference signals\n\nthe cues derived from the auditory signals.\n\nTo facilitate the calculation of the likelihood function over the locations, we in-\ntroduce compact notation for\nLet\nx(t, \u03c9k) = [x(t, \u03c9k, 0), . . . , x(t, \u03c9k, N)]/(cid:107)[x(t, \u03c9k, 0), . . . , x(t, \u03c9k, N)](cid:107) be the normal-\nized vector of cross correlations computed within frequency channel k. Let x(t) =\n[x(t, \u03c91), . . . , x(t, \u03c9NF )] denote the spectrum of cross correlations and let z(t) =\n[z(t, \u03c91), . . . , z(t, \u03c9NF )] denote the spectrum of level differences where NF is the num-\nt = [x(t) z(t)]T =\nber of frequency channels. Let \u03bea\n[\u00afx(\u03b8, \u03c6) \u00afz(\u03b8, \u03c6)]T + \u03b7(t) where \u00afx(\u03b8, \u03c6) and \u00afz(\u03b8, \u03c6) are the expected values of the cross\ncorrelation and level difference spectra, respectively, for a single source located at (\u03b8, \u03c6),\nand \u03b7(t) is Gaussian white noise [6],[7].\nExperimental evidence about the nature of auditory space maps in the barn owl suggests\nthat spatial selectivity occurs after both the combination of temporal and level difference\ncues and the combination of information across frequency [3],[13]. The computational\nmodel speci\ufb01es that the transformation from cues computed from the auditory input sig-\nnals to a representation of space occurs by performing inference on the cues through the\nlikelihood function\n\nt = [x(t) z(t)]T . We assume that \u03bea\n\np(\u03bea\n\nt |\u03b8, \u03c6) = p(x(t), z(t)|\u03b8, \u03c6) \u221d exp(\u22121\n2\n\n(cid:107)(x(t), z(t)) \u2212 (\u00afx(\u03b8, \u03c6), \u00afz(\u03b8, \u03c6))(cid:107)2\n\n\u22121\nn\n\n\u03a3\n\n).\n\n(11)\n\nThe known physiology of the barn owl places constraints on how this likelihood function\ncan be computed. First, the spatial tuning of auditory neurons in the optic tectum is con-\nsistent with a model where spatial selectivity arises from tuning to combinations of time\ndifference and level difference cues within each frequency channel [14]. This suggests\n\n\f(cid:80)\n\n(cid:80)\n\n(cid:82)(cid:82)\n\nFigure 1: Non-normalized likelihood functions at t = 26 ms with sources located at\n(\u221225o, 0o) and (0o, 25o). Source signals are s1(t) = A\ni cos(\u03c9i1(t)) and s2(t) =\nj cos(\u03c9j2(t)) where \u03c9i1 (cid:54)= \u03c9j2 for any i, j. Left: Linear model of frequency com-\nA\nbination. Right: Multiplicative model of frequency combination.\n\nthat time and intensity information is initially combined multiplicatively within frequency\nchannels.\nGiven this constraint we propose two models of the frequency combination step. In the \ufb01rst\nmodel of frequency integration we assume that the likelihood is a product of kernels\n\np(x(t), z(t)|\u03b8, \u03c6) \u221d\n\nK(x(t, \u03c9k), z(t, \u03c9k); \u03b8, \u03c6).\n\n(12)\n\n(cid:89)\n\nk\n\nEach kernel is a product of a temporal difference function and a level difference function\nto respect the \ufb01rst constraint,\n\nK(x(t, \u03c9k), z(t, \u03c9k); \u03b8, \u03c6) = Kx(x(t, \u03c9k); \u03b8, \u03c6)Kz(z(t, \u03c9k); \u03b8, \u03c6).\n\n(13)\n\nIf we require that each kernel is normalized,\n\nK(x(t\u2217, \u03c9k), z(t\u2217, \u03c9k); \u03b8, \u03c6)dx(t\u2217, \u03c9k)dz(t\u2217, \u03c9k) = 1, for each t\u2217 then the multiplica-\ntive model is a factorization of the likelihood into a product of the conditional probabilities\np(x(t\u2217, \u03c9k), z(t\u2217, \u03c9k)|\u03b8, \u03c6). The second model is a linear model of frequency integration\nwhere the likelihood is approximated by a kernel estimate of the form\n\np(x(t), z(t)|\u03b8, \u03c6) \u221d\n\nck(y(t, \u03c9k))K(x(t, \u03c9k), z(t, \u03c9k); \u03b8, \u03c6)\n\n(14)\n\n(cid:88)\n\nk\n\nwhere each kernel is of the above product form. We again assume that the kernels are\nnormalized, but we weight each kernel by the intensity of the signal in that frequency\nchannel.\nExperiments performed in multiple source environments by Takahashi et al. suggest that\ninformation is not multiplied across frequency channels [15]. Takahashi et al. measured the\nresponse of space speci\ufb01c neurons in the external nucleus of the inferior colliculus under\nconditions of two sound sources located on the horizontal plane with each signal consisting\nof a unique combination of sinusoids. Their results suggest that a bump of activity will\nbe present at each source location in the space map. Using identical stimuli (see Table 1\ncolumns A and C in [15]) we compute the likelihood function using the linear model and\nthe multiplicative model. The results shown in \ufb01gure (1) demonstrate that with a linear\nmodel the likelihood function will display a peak corresponding to each source location,\nbut with the multiplicative model only a spurious location that is consistent among the\nkernels remains and information about the two sources is lost. Therefore, we use a model\nin which time difference and level difference information is \ufb01rst combined multiplicatively\nwithin frequency channels and is then summed across frequency.\n\n7 Examples\n\n7.1 Parameters\n\nIn each example stimuli are presented for 100 ms and HRIRs for owl 884 recorded by Keller\net al., [4], are used to generate the input signals. We use six gammatone \ufb01lters for each ear\n\n\u221250050\u221280\u221260\u221240\u221220020406080qf0.511.522.533.54x 10\u22123\u221250050\u221280\u221260\u221240\u221220020406080q12345678910x 10\u221229\fFigure 2: Non-normalized likelihood functions at t = 21.1 ms for a single source located\nat (\u221225o,\u221215o). Left: Broadband source signal at 50 dB SPL. Right: Source signal is a 7\nkHz tone at 50 dB SPL.\n\nFigure 3: Non-normalized likelihood functions under conditions of summing localization.\nIn each case sources are located at (\u221220o, 0o) and (20o, 0o) and produce scaled versions\nof the same waveform. Left: Left signal at 50 dB SPL, right signal at 40 dB SPL. Center:\nLeft signal at 50 dB SPL, right signal at 50 dB SPL. Right: Left signal at 40 dB SPL, right\nsignal at 50 dB SPL.\n\nwith center frequencies {4.22, 5.14, 6.16, 7.26, 8.47, 9.76} kHz, and Q10 values chosen to\nmatch the auditory nerve \ufb01ber data of K\u00a8oppl [16]. In each example we use a Gaussian form\n2(cid:107)x(t, \u03c9k) \u2212\nfor the temporal and level difference kernels, Kx(x(t, \u03c9k); \u03b8, \u03c6) \u221d exp(\u2212 1\n2(cid:107)z(t, \u03c9k) \u2212 \u00afz(\u03b8, \u03c6)(cid:107)2/\u03c32) where \u03c32 =\n\u00afx(\u03b8, \u03c6)(cid:107)2/\u03c32) and Kz(z(t, \u03c9k); \u03b8, \u03c6) \u221d exp(\u2212 1\n0.1. The terms \u00afx(\u03b8, \u03c6) and \u00afz(\u03b8, \u03c6) correspond to the time average of the cross correlation\nand level difference cues for a broadband noise stimulus. Double polar coordinates are\nused to describe source locations. Only locations in the frontal hemisphere are considered.\nAmbient noise is present at 10 dB SPL.\n\n7.2 Single source\n\nIn \ufb01gure (2) we show the approximate likelihood function of equation (19) at a single\ntime during the presentation of a broadband noise stimulus and a 7 kHz tone from direction\n(\u221225o,\u221215o). In response to the broadband signal there is a peak at the source location. In\nresponse to the tone there is a peak at the true location and signi\ufb01cant peaks near (60o,\u22125o)\nand (20o,\u221225o).\n\n7.3 Multiple sources\n\nIn \ufb01gure (3) we show the response of our model under the condition of summing local-\nization. The top signal shown in \ufb01gure (4A) was presented from (\u221220o, 0o) and (20o, 0o)\nwith no delay between the two sources, but with varied intensities for each signal. In each\ncase there is a single phantom bump at an intermediate location that is biased toward the\nmore intense source.\nIn \ufb01gure (4) we simulate an echoic environment where the signal at the top of 4A is pre-\nsented from (\u221220o, 0o) and a copy delayed by 2 ms shown at the bottom of 4A is presented\nfrom (20o, 0o). We plot the likelihood function at the three times indicated by vertical dot-\nted lines in 4A. At the \ufb01rst time the initial signal dominates and there is a peak at the\nlocation of the leading source. At the second time when both the leading and lagging\nsounds have similar envelope amplitudes there is a phantom bump at an intermediate, al-\n\n\u221250050\u221280\u221260\u221240\u221220020406080qf246810121416x 10\u22123\u221250050\u221280\u221260\u221240\u221220020406080q0.511.522.533.5x 10\u22123\u221250050\u221280\u221260\u221240\u221220020406080qf1234567891011x 10\u22123\u221250050\u221280\u221260\u221240\u221220020406080q1234567891011x 10\u22123\u221250050\u221280\u221260\u221240\u221220020406080q123456789x 10\u22123\fFigure 4: Non-normalized likelihoods under simulated echoic conditions. The leading\nsignal is presented from (\u221220o, 0o) and the lagging source from (20o, 0o). Both signals\nare presented at 50 dB SPL. A: The top signal is the leading signal and the bottom is the\nlagging. Vertical lines show times at which the likelihood function is plotted in B,C,D. B:\nLikelihood at t = 14.3 ms. C: Likelihood at t = 21.1 ms. D: Likelihood at t = 30.6 ms.\n\nthough elevated, location. At the third time where the lagging source dominates there are\npeaks at both the leading and lagging locations.\n\n8 Discussion\n\nWe used a Bayesian approach to the localization problem faced by the barn owl to guide\nour modeling of the computational operations supporting sound localization in the barn\nowl. In the context of our computational model, auditory space is initially represented in\nterms of a likelihood function parameterized by time difference and level difference cues\ncomputed from the auditory input signals.\nIn transforming auditory cues to spatial locations, the model relies on stimulus invariance\nin the cue values achieved by normalizing the cross correlation vector and computing a\nratio of the left and right signal intensities within each frequency channel. It is not clear\nfrom existing experimental data where or if this invariance occurs in the barn owl\u2019s auditory\nsystem.\nIn constructing a model of the barn owl\u2019s solution to the estimation problem, the opera-\ntions that we employ are constrained to be consistent with the known physiology. As stated\nabove, physiological data is consistent with the multiplication of temporal difference and\nlevel difference cues in each frequency channel, but not with multiplication across fre-\nquency. This model does not explain, however, across frequency nonlinearities that occur\nin the processing of temporal difference cues [17].\nThe likelihood function used in our model is a linear approximation to the likelihood speci-\n\ufb01ed in equation (11). The multiplicative model clearly does not explain the response of the\nspace map to multiple sound sources producing spectrally nonoverlapping signals [15]. The\nlinear approximation may re\ufb02ect the requirement to function in a multiple source environ-\nment. We must more precisely de\ufb01ne the multi-target tracking problem that the barn owl\nsolves and include all relevant implementation constraints before interpreting the nature of\nthe approximation.\nThe tuning of space speci\ufb01c neurons to combinations of ITD and ILD has been interpreted\nas a multiplication of ITD and ILD related signals [3]. Our model suggests that, to be\nconsistent with known physiology, the multiplication of ITD and ILD signals occurs in the\nmedial portion of the lateral shell of the central nucleus of the inferior colliculus before\nfrequency convergence [13]. Further experiments must be done to determine if the multi-\nplication is a network property of the \ufb01rst stage of lateral shell neurons or if multiplication\noccurs at the level of single neurons in the lateral shell.\nWe simulated the model\u2019s responses under conditions of summing localization and simu-\nlated echoes. The model performs as expected for two simultaneous sources with a phan-\ntom bump occurring in the likelihood function at a location intermediate between the two\nsource locations. Under simulated echoic conditions the likelihood shows evidence for\nboth the leading and lagging source, but only the leading source location appears alone.\n\n050100time (ms)\u221250050\u221280\u221260\u221240\u221220020406080q1234567891011x 10\u22124\u221250050\u221280\u221260\u221240\u221220020406080q123456x 10\u22123\u221250050\u221280\u221260\u221240\u221220020406080q123456x 10\u22123ABCD\fThis suggests that with this instantaneous estimation procedure the lagging source would\nbe perceptible as a source location, however, possibly less so than the leading. It is likely\nthat a feedback mechanism, such as the Bayesian \ufb01ltering described in equations (14) and\n(15), will need to be included to explain the decreased perception of lagging sources.\n\nAcknowledgments\n\nWe thank Kip Keller, Klaus Hartung, and Terry Takahashi for providing the head related\ntransfer functions. We thank Mike Lewicki for providing the natural sound recordings.\nThis work was supported by the Mathers Foundation.\n\nReferences\n\n[1] Payne, R.S., \u201cAcoustic location of prey by barn owls (Tyto alba).\u201d, J. Exp. Biol., 54: 535-573,\n1971.\n\n[2] Knudsen, E.I., Konishi, M., \u201cA neural map of auditory space in the owl.\u201d, Science, 200: 795-797,\n1978.\n\n[3] Pe\u02dcna, J.L., Konishi, M., \u201cAuditory receptive \ufb01elds created by multiplication.\u201d, Science, 292: 249-\n252, 2001.\n\n[4] Keller, C.H., Hartung, K., Takahashi, T.T., \u201cHead-related transfer functions of the barn owl:\nmeasurement and neural responses.\u201d, Hearing Research, 118: 13-34, 1998.\n\n[5] Lewicki, M.S., \u201cEf\ufb01cient coding of natural sounds.\u201d, Nature Neurosci., 5(4): 356-363, 2002.\n\n[6] Martin, K.D., \u201cA computational model of spatial hearing.\u201d, Masters thesis, MIT, 1995.\n\n[7] Duda, R.O., \u201cElevation dependence of the interaural transfer function.\u201d, In Gilkey, R. and Ander-\nson, T.R. (eds.), Binaural and Spatial Hearing, 49-75, 1994.\n\n[8] Slaney, M., \u201cAuditory Toolbox.\u201d, Apple technical report 45, Apple Computer Inc., 1994.\n\n[9] Schwartz, O., Simoncelli, E.P., \u201cNatural signal statistics and sensory gain control.\u201d, Nature Neu-\nrosci., 4(8): 819-825, 2001.\n\n[10] Sullivan, W.E., Konishi, M., \u201cSegregation of stimulus phase and intensity coding in the cochlear\nnucleus of the barn owl.\u201d, J. Neurosci., 4(7): 1787-1799, 1984.\n\n[11] Yang, L., Monsivais, P., Rubel, E.W., \u201cThe superior olivary nucleus and its in\ufb02uence on nucleus\nlaminaris: A source of inhibitory feedback for coincidence detection in the avian auditory brain-\nstem.\u201d, J. Neurosci., 19(6): 2313-2325, 1999.\n\n[12] Carr, C.E., Konishi, M., \u201cA circuit for detection of interaural time differences in the brain stem\nof the barn owl.\u201d J. Neurosci., 10(10): 3227-3246, 1990.\n\n[13] Mazer, J.A., \u201cIntegration of parallel processing streams in the inferior colliculus of the barn\nowl.\u201d, Ph.D thesis, Caltech 1995.\n\n[14] Brainard, M.S., Knudsen, E.I., Esterly, S.D., \u201cNeural derivation of sound source location: Res-\nolution of spatial ambiguities in binaural cues.\u201d, J. Acoust. Soc. Am., 91(2): 1015-1026, 1992.\n\n[15] Takahashi, T.T., Keller, C.H., \u201cRepresentation of multiple sources in the owl\u2019s auditory space\nmap.\u201d, J. Neurosci., 14(8): 4780-4793, 1994.\n\n[16] K\u00a8oppl, C., \u201cFrequency tuning and spontaneous activity in the auditory nerve and cochlear nu-\ncleus magnocellularis of the barn owl Tyto alba.\u201d, J. Neurophys., 77: 364-377, 1997.\n\n[17] Takahashi, T.T., Konishi, M., \u201cSelectivity for interaural time difference in the owl\u2019s midbrain.\u201d,\nJ. Neurosci., 6(12): 3413-3422, 1986.\n\n\f", "award": [], "sourceid": 2401, "authors": [{"given_name": "Brian", "family_name": "Fischer", "institution": null}, {"given_name": "Charles", "family_name": "Anderson", "institution": null}]}