{"title": "Toward a Single-Cell Account for Binocular Disparity Tuning: An Energy Model May Be Hiding in Your Dendrites", "book": "Advances in Neural Information Processing Systems", "page_first": 208, "page_last": 214, "abstract": null, "full_text": "2D Observers for Human 3D Object Recognition? \n\nZili Liu \n\nNEC Research Institute \n\nDaniel Kersten \n\nUniversity of Minnesota \n\n. Abstract \n\nConverging evidence has shown that human object recognition \ndepends on familiarity with the images of an object. Further, \nthe greater the similarity between objects, the stronger is the \ndependence on object appearance, and the more important two(cid:173)\ndimensional (2D) image information becomes. These findings, how(cid:173)\never, do not rule out the use of 3D structural information in recog(cid:173)\nnition, and the degree to which 3D information is used in visual \nmemory is an important issue. Liu, Knill, & Kersten (1995) showed \nthat any model that is restricted to rotations in the image plane \nof independent 2D templates could not account for human perfor(cid:173)\nmance in discriminating novel object views. We now present results \nfrom models of generalized radial basis functions (GRBF), 2D near(cid:173)\nest neighbor matching that allows 2D affine transformations, and \na Bayesian statistical estimator that integrates over all possible 2D \naffine transformations. The performance of the human observers \nrelative to each of the models is better for the novel views than \nfor the familiar template views, suggesting that humans generalize \nbetter to novel views from template views. The Bayesian estima(cid:173)\ntor yields the optimal performance with 2D affine transformations \nand independent 2D templates. Therefore, models of 2D affine \nmatching operations with independent 2D templates are unlikely \nto account for human recognition performance. \n\n1 \n\nIntroduction \n\nObject recognition is one of the most important functions in human vision. To \nunderstand human object recognition, it is essential to understand how objects are \nrepresented in human visual memory. A central component in object recognition \nis the matching of the stored object representation with that derived from the im(cid:173)\nage input. But the nature of the object representation has to be inferred from \nrecognition performance, by taking into account the contribution from the image \ninformation. When evaluating human performance, how can one separate the con-\n\n\f830 \n\nZ Liu and D. Kersten \n\ntributions to performance of the image information from the representation? Ideal \nobserver analysis provides a precise computational tool to answer this question. An \nideal observer's recognition performance is restricted only by the available image \ninformation and is otherwise optimal, in the sense of statistical decision theory, \nirrespective of how the model is implemented. A comparison of human to ideal \nperformance (often in terms of efficiency) serves to normalize performance with re(cid:173)\nspect to the image information for the task. We consider the problem of viewpoint \ndependence in human recognition. \n\nA recent debate in human object recognition has focused on the dependence of recog(cid:173)\nnition performance on viewpoint [1 , 6]. Depending on the experimental conditions, \nan observer's ability to recognize a familiar object from novel viewpoints is impaired \nto varying degrees. A central assumption in the debate is the equivalence in view(cid:173)\npoint dependence and recognition performance. In other words, the assumption is \nthat viewpoint dependent performance implies a viewpoint dependent representa(cid:173)\ntion, and that viewpoint independent performance implies a viewpoint independent \nrepresentation. However, given that any recognition performance depends on the \ninput image information, which is necessarily viewpoint dependent, the viewpoint \ndependence of the performance is neither necessary nor sufficient for the viewpoint \ndependence of the representation. Image information has to be factored out first, \nand the ideal observer provides the means to do this. \n\nThe second aspect of an ideal observer is that it is implementation free. Con(cid:173)\nsider the GRBF model [5], as compared with human object recognition (see be(cid:173)\nlow). The model stores a number of 2D templates {Ti} of a 3D object 0, \nand reco~nizes or rejects a stimulus image S by the following similarity measure \n~iCi exp UITi - SI1 2 j2(2 ), where Ci and a are constants. The model's performance \nas a function of viewpoint parallels that of human observers. This observation has \nled to the conclusion that the human visual system may indeed, as does the model, \nuse 2D stored views with GRBF interpolation to recognize 3D objects [2]. Such a \nconclusion, however, overlooks implementational constraints in the model, because \nthe model's performance also depends on its implementations. Conceivably, a model \nwith some 3D information of the objects can also mimic human performance, so \nlong as it is appropriately implemented. There are typically too many possible \nmodels that can produce the same pattern of results. \n\nIn contrast, an ideal observer computes the optimal performance that is only limited \nby the stimulus information and the task. We can define constrained ideals that are \nalso limited by explicitly specified assumptions (e.g., a class of matching operations). \nSuch a model observer therefore yields the best possible performance among the \nclass of models with the same stimulus input and assumptions. \nIn this paper, \nwe are particularly interested in constrained ideal observers that are restricted in \nfunctionally Significant aspects (e.g., a 2D ideal observer that stores independent \n2D templates and has access only to 2D affine transformations) . The key idea is \nthat a constrained ideal observer is the best in its class. So if humans outperform \nthis ideal observer, they must have used more than what is available to the ideal. \nThe conclusion that follows is strong: not only does the constrained ideal fail to \naccount for human performance, but the whole class of its implementations are also \nfalsified. \n\nA crucial question in object recognition is the extent to which human observers \nmodel the geometric variation in images due to the projection of a 3D object onto a \n2D image. At one extreme, we have shown that any model that compares the image \nto independent views (even if we allow for 2D rigid transformations of the input \nimage) is insufficient to account for human performance. At the other extreme, it \nis unlikely that variation is modeled in terms of rigid transformation of a 3D object \n\n\f2D Observers/or Hwnan 3D Object Recognition? \n\n831 \n\ntemplate in memory. A possible intermediate solution is to match the input image \nto stored views, subject to 2D affine deformations. This is reasonable because 2D \naffine transformations approximate 3D variation over a limited range of viewpoint \nchange. \n\nIn this study, we test whether any model limited to the independent comparison \nof 2D views, but with 2D affine flexibility, is sufficient to account for viewpoint \ndependence in human recognition. In the following section, we first define our ex(cid:173)\nperimental task, in which the computational models yield the provably best possible \nperformance under their specified conditions. We then review the 2D ideal observer \nand GRBF model derived in [4], and the 2D affine nearest neighbor model in [8]. \nOur principal theoretical result is a closed-form solution of a Bayesian 2D affine ideal \nobserver. We then compare human performance with the 2D affine ideal model, as \nwell as the other three models. In particular, if humans can classify novel views of \nan object better than the 2D affine ideal, then our human observers must have used \nmore information than that embodied by that ideal. \n\n2 The observers \n\nLet us first define the task. An observer looks at the 2D images of a 3D wire \nframe object from a number of viewpoints. These images will be called templates \n{Td. Then two distorted copies of the original 3D object are displayed. They \nare obtained by adding 3D Gaussian positional noise (i.i.d.) to the vertices of the \noriginal object. One distorted object is called the target, whose Gaussian noise has \na constant variance. The other is the distract or , whose noise has a larger variance \nthat can be adjusted to achieve a criterion level of performance. The two objects \nare displayed from the same viewpoint in parallel projection, which is either from \none of the template views, or a novel view due to 3D rotation. The task is to choose \nthe one that is more similar to the original object. The observer's performance is \nmeasured by the variance (threshold) that gives rise to 75% correct performance. \nThe optimal strategy is to choose the stimulus S with a larger probability p (OIS). \nFrom Bayes' rule, this is to choose the larger of p (SIO). \nAssume that the models are restricted to 2D transformations of the image, and \ncannot reconstruct the 3D structure of the object from its independent templates \n{Ti}. Assume also that the prior probability p(Td is constant. Let us represent S \nand Ti by their (x, y) vertex coordinates: (X Y )T, where X = (Xl, x2, ... , xn), \ny = (yl, y2 , ... , yn). We assume that the correspondence between S and T i is \nsolved up to a reflection ambiguity, which is equivalent to an additional template: \nTi = (xr yr )T, where X r = (xn, ... ,x2,xl ), yr = (yn, ... ,y2,yl). We still \ndenote the template set as {Td. Therefore, \n\n(1) \n\nIn what follows, we will compute p(SITi)p(Ti ), with the assumption that S = \nF (Ti) + N (0, crI2n ), where N is the Gaussian distribution, 12n the 2n x 2n identity \nmatrix, and :F a 2D transformation. For the 2D ideal observer, :F is a rigid 2D \nrotation. For the GRBF model, F assigns a linear coefficient to each template \nT i , in addition to a 2D rotation. For the 2D affine nearest neighbor model, :F \nrepresents the 2D affine transformation that minimizes liS - Ti11 2 , after Sand Ti \nare normalized in size. For the 2D affine ideal observer, :F represents all possible \n2D affine transformations applicable to T i. \n\n\f832 \n\n2.1 The 2D ideal observer \n\nZ Liu and D. Kersten \n\nThe templates are the original 2D images, their mirror reflections, and 2D rotations \n(in angle \u00a2) in the image plane. Assume that the stimulus S is generated by adding \nGaussian noise to a template, the probability p(SIO) is an integration over all \ntemplates and their reflections and rotations. The detailed derivation for the 2D \nideal and the GRBF model can be found in [4]. \n\nEp(SITi)p(Ti) ex: E J d\u00a2exp (-liS - Ti(\u00a2)112 /2( 2 ) \u2022 \n\n(2) \n\n2.2 The GRBF model \n\nThe model has the same template set as the 2D ideal observer does. Its training \nrequires that EiJ;7r d\u00a2Ci(\u00a2)N(IITj - Ti(\u00a2)II,a) = 1, j = 1,2, ... , with which {cd \ncan be obtained optimally using singular value decomposition. When a pair of new \nstimuli is} are presented, the optimal decision is to choose the one that is closer \nto the learned prototype, in other words, the one with a smaller value of \n\n111- E 127r d\u00a2ci(\u00a2)exp (_liS -2:~(\u00a2)1I2) II. \n\n(3) \n\n2.3 The 2D affine nearest neighbor model \n\nIt has been proved in [8] that the smallest Euclidean distance D(S, T) between S \nand T is, when T is allowed a 2D affine transformation, S ~ S/IISII, T ~ T/IITII, \n(4) \n\nD2(S, T) = 1 - tr(S+S . TTT)/IITII2, \n\nwhere tr strands for trace, and S+ = ST(SST)-l. The optimal strategy, therefore, \nis to choose the S that gives rise to the larger of E exp (_D2(S, Ti)/2a2) , or the \nsmaller of ED2(S, Ti). (Since no probability is defined in this model, both measures \nwill be used and the results from the better one will be reported.) \n\n2.4 The 2D affine ideal observer \n\nWe now calculate the Bayesian probability by assuming that the prior probabil(cid:173)\nity distribution of the 2D affine transformation, which is applied to the template \nT i, AT + Tr = (~ ~) Ti + (~: ::: ~:), obeys a Gaussian distribution \nN(Xo,,,,/16 ), where Xo is the identity transformation xl' = (a,b,c,d,tx,t y) = \n(1,0,0,1,0,0). We have \n\nEp(SITi ) = E i: dX exp (-IIATi + Tr - SII 2/2(2) \n\n(5) \n\n= EC(n, a, \",/)deC 1 (QD exp (tr (KfQi(QD-1QiKi) /2(12), \n\n(6) \n\nwhere C(n, a, \",/) is a function of n, a, \"'/; Q' = Q + \",/-212, and \n\nQ _ ( XT . X T X T \u00b7 Y T ) QK _ ( XT\u00b7 Xs Y T . Xs) \n\n-\n\nX T \u00b7Ys Y T .Ys \n\n-\n\nY T \u00b7XT YT \u00b7YT \n\n' \n\n-21 \n\n+\"'/ \n\n2\u00b7 \n\n(7) \n\nThe free parameters are \"'/ and the number of 2D rotated copies for each T i (since \na 2D affine transformation implicitly includes 2D rotations, and since a specific \nprior probability distribution N(Xo, \",/1) is assumed, both free parameters should \nbe explored together to search for the optimal results). \n\n\f2D Observers for Hwnan 3D Object Recognition? \n\n833 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\u2022 \n\n\u2022 \n\nFigure 1: Stimulus classes with increasing structural regularity: Balls, Irregular, \nSymmetric, and V-Shaped. There were three objects in each class in the experiment. \n\n2.5 The human observers \n\nThree naive subjects were tested with four classes of objects: Balls, Irregular, Sym(cid:173)\nmetric, and V-Shaped (Fig. 1). There were three objects in each class. For each \nobject, 11 template views were learned by rotating the object 60\u00b0 /step, around \nthe X- and Y-axis, respectively. The 2D images were generated by orthographic \nprojection, and viewed monocularly. The viewing distance was 1.5 m. During the \ntest, the standard deviation of the Gaussian noise added to the target object was \n(J\"t = 0.254 cm. No feedback was provided. \n\nBecause the image information available to the humans was more than what was \navailable to the models (shading and occlusion in addition to the (x, y) positions of \nthe vertices), both learned and novel views were tested in a randomly interleaved \nfashion. Therefore, the strategy that humans used in the task for the learned and \nnovel views should be the same. The number of self-occlusions, which in princi(cid:173)\nple provided relative depth information, was counted and was about equal in both \nlearned and novel view conditions. The shading information was also likely to be \nequal for the learned and novel views. Therefore, this additional information was \nabout equal for the learned and novel views, and should not affect the comparison \nof the performance (humans relative to a model) between learned and novel views. \nWe predict that if the humans used a 2D affine strategy, then their performance \nrelative to the 2D affine ideal observer should not be higher for the novel views than \nfor the learned views. One reason to use the four classes of objects with increasing \nstructural regularity is that structural regularity is a 3D property (e.g., 3D Sym(cid:173)\nmetric vs. Irregular), which the 2D models cannot capture. The exception is the \nplanar V-Shaped objects, for which the 2D affine models completely capture 3D ro(cid:173)\ntations, and are therefore the \"correct\" models. The V-Shaped objects were used in \nthe 2D affine case as a benchmark. If human performance increases with increasing \nstructural regularity of the objects, this would lend support to the hypothesis that \nhumans have used 3D information in the task. \n\n2.6 Measuring performance \n\nA stair-case procedure [7] was used to track the observers' performance at 75% \ncorrect level for the learned and novel views, respectively. There were 120 trials \nfor the humans, and 2000 trials for each of the models. For the GRBF model, \nthe standard deviation of the Gaussian function was also sampled to search for \nthe best result for the novel views for each of the 12 objects, and the result for \nthe learned views was obtained accordingly. This resulted in a conservative test \nof the hypothesis of a GRBF model for human vision for the following reasons: \n(1) Since no feedback was provided in the human experiment and the learned and \nnovel views were randomly intermixed, it is not straightforward for the model to \nfind the best standard deviation for the novel views, particularly because the best \nstandard deviation for the novel views was not the same as that for the learned \n\n\f834 \n\nZ Liu and D. Kersten \n\nones. The performance for the novel views is therefore the upper limit of the \nmodel's performance. (2) The subjects' performance relative to the model will be \ndefined as statistical efficiency (see below). The above method will yield the lowest \npossible efficiency for the novel views, and a higher efficiency for the learned views, \nsince the best standard deviation for the novel views is different from that for the \nlearned views. Because our hypothesis depends on a higher statistical efficiency for \nthe novel views than for the learned views, this method will make such a putative \ndifference even smaller. Likewise, for the 2D affine ideal, the number of 2D rotated \ncopies of each template Ti and the value I were both extensively sampled, and the \nbest performance for the novel views was selected accordingly. The result for the \nlearned views corresponding to the same parameters was selected. This choice also \nmakes it a conservative hypothesis test. \n\n3 Results \n\nLearned Views \n\n\u2022 Human \nIJ 20 Ideal \nO GRBF \nO 20 Affine Nearest NtMghbor \nrn 20 Affine kIoai \n\ne-\n.\u00a3. \n:!2 \n0 \n~ \n\n~ \n~ \nI-\n\n1.5 \n\ne-\n.\u00a3. \n:g \n0 \n~ \n81 \nl! \nl-\n\n25 \n\n1.5 \n\n0.5 \n\nNovel Views \n\n\u2022 Human \nEJ 20 Ideal \no GRBF \no 20 Affine Nearesl N.tghbor \n\n~ 2DAfllna~ \n\nObject Type \n\nObject Type \n\nFigure 2: The threshold standard deviation of the Gaussian noise, added to the \ndistractor in the test pair, that keeps an observer's performance at the 75% correct \nlevel, for the learned and novel views, respectively. The dotted line is the standard \ndeviation of the Gaussian noise added to the target in the test pair. \n\nFig. 2 shows the threshold performance. We use statistical efficiency E to com(cid:173)\npare human to model performance. E is defined as the information used by \nhumans relative to the ideal observer [3] : E = (d~uman/d~deal)2, where d' \nis the discrimination index. We have shown in [4] that, in our task, E = \n((a~1!f;actor)2 - (CTtarget)2) / ((CT~~~~~tor)2 - (CTtarget)2) , where CT is the thresh(cid:173)\nold. Fig. 3 shows the statistical efficiency of the human observers relative to each \nof the four models. \nWe note in Fig. 3 that the efficiency for the novel views is higher than those for the \nlearned views (several of them even exceeded 100%), except for the planar V-Shaped \nobjects. We are particularly interested in the Irregular and Symmetric objects in \nthe 2D affine ideal case, in which the pairwise comparison between the learned \nand novel views across the six objects and three observers yielded a significant \ndifference (binomial, p < 0.05). This suggests that the 2D affine ideal observer \ncannot account for the human performance, because if the humans used a 2D affine \ntemplate matching strategy, their relative performance for the novel views cannot \nbe better than for the learned views. We suggest therefore that 3D information was \nused by the human observers (e.g., 3D symmetry). This is supported in addition \nby the increasing efficiencies as the structural regularity increased from the Balls, \nIrregular, to Symmetric objects (except for the V-Shaped objects with 2D affine \nmodels). \n\n\f2D Observers for Hwnan 3D Object Recognition? \n\n835 \n\n20 Ideal \n\no Learned \n\u2022 Novel \n\n300 \n\n.. \nl 250 \n.. \n\" 200 \n..! \n\" \n$: \nw \n\n'50 \n\nQ \nN \n\nGRBF Modol \n\nI 0 l&arnedl \n.Noval \n\n'\" \n\"\" \n.'\" --------------\n\n\"\" \n\n250 \n\nl \nf \n~ \n\"-\nII! \n\" '\" \n\nObJect Type \n\nObjoctTypo \n\n>-\n\n250 \n\nl 300 \nj \n~ \n~ 200 \nt \ni \nI \n! \n~ 0 \n\n150 \n\nQ \nN \n\n20 Aftlne Nearest \no Learned \n\u2022 Novel \n\nIghbor \n\n20 Affine Ideal \n\no Learned \n\u2022 Novel \n\n300 \n\n200 \n\nl \n,.. 250 \n\" \nj \n\" \n~ \nj \n\n'50 \n\nObject Type \n\nObjOGtType \n\n---\n\nFigure 3: Statistical efficiencies of human observers relative to the 2D ideal observer, \nthe GRBF model, the 2D affine nearest neighbor model, and the 2D affine ideal \nobserver_ \n\n4 Conclusions \n\nComputational models of visual cognition are subject to information theoretic as \nwell as implementational constraints. When a model's performance mimics that of \nhuman observers, it is difficult to interpret which aspects of the model characterize \nthe human visual system. For example, human object recognition could be simu(cid:173)\nlated by both a GRBF model and a model with partial 3D information of the object. \nThe approach we advocate here is that, instead of trying to mimic human perfor(cid:173)\nmance by a computational model, one designs an implementation-free model for a \nspecific recognition task that yields the best possible performance under explicitly \nspecified computational constraints. This model provides a well-defined benchmark \nfor performance, and if human observers outperform it, we can conclude firmly that \nthe humans must have used better computational strategies than the model. We \nshowed that models of independent 2D templates with 2D linear operations cannot \naccount for human performance. This suggests that our human observers may have \nused the templates to reconstruct a representation of the object with some (possibly \ncrude) 3D structural information. \n\nReferences \n\n[1] Biederman I and Gerhardstein P C. Viewpoint dependent mechanisms in visual \nobject recognition: a critical analysis. J. Exp. Psych.: HPP, 21: 1506-1514, 1995. \n[2] Biilthoff H H and Edelman S. Psychophysical support for a 2D view interpolation \n\ntheory of object recognition. Proc. Natl. Acad. Sci. , 89:60-64, 1992. \n\n[3] Fisher R A. Statistical Methods for Research Workers. Oliver and Boyd, Edin(cid:173)\n\nburgh, 1925. \n\n[4] Liu Z, Knill D C, and Kersten D. Object classification for human and ideal \n\nobservers. Vision Research, 35:549-568, 1995. \n\n[5] Poggio T and Edelman S. A network that learns to recognize three-dimensional \n\nobjects. Nature, 343:263-266, 1990. \n\n[6] Tarr M J and Biilthoff H H. \n\nby geon-structural-descriptions or by multiple-views? \n21:1494-1505,1995. \n\nIs human object recognition better described \nJ. Exp. Psych.: HPP, \n\n[7] Watson A B and Pelli D G. QUEST: A Bayesian adaptive psychometric method. \n\nPerception and Psychophysics, 33:113-120, 1983. \n\n[8] Werman M and Weinshall D. Similarity and affine invariant distances between \n\n2D point sets. IEEE PAMI, 17:810-814,1995. \n\n\fToward a Single-Cell Account for \n\nBinocular Disparity Tuning: An Energy \nModel May be Hiding in Your Dendrites \n\nBartlett W. Mel \n\nDepartment of Biomedical Engineering \n\nUniversity of Southern California, MC 1451 \n\nLos Angeles, CA 90089 \n\nmel@quake.usc.edu \n\nDaniel L. Ruderman \n\nThe Salk Institute \n\n10010 N. Torrey Pines Road \n\nLa Jolla, CA 92037 \nruderman@salk.edu \n\nKevin A. Archie \n\nNeuroscience Program \n\nUniversity of Southern California \n\nLos Angeles, CA 90089 \nkarchie@quake.usc.edu \n\nAbstract \n\nHubel and Wiesel (1962) proposed that complex cells in visual cor(cid:173)\ntex are driven by a pool of simple cells with the same preferred \norientation but different spatial phases. However, a wide variety of \nexperimental results over the past two decades have challenged the \npure hierarchical model, primarily by demonstrating that many \ncomplex cells receive monosynaptic input from unoriented LGN \ncells, or do not depend on simple cell input. We recently showed us(cid:173)\ning a detailed biophysical model that nonlinear interactions among \nsynaptic inputs to an excitable dendritic tree could provide the non(cid:173)\nlinear subunit computations that underlie complex cell responses \n(Mel, Ruderman, & Archie, 1997). This work extends the result \nto the case of complex cell binocular disparity tuning, by demon(cid:173)\nstrating in an isolated model pyramidal cell (1) disparity tuning \nat a resolution much finer than the the overall dimensions of the \ncell's receptive field, and (2) systematically shifted optimal dispar(cid:173)\nity values for rivalrous pairs of light and dark bars-both in good \nagreement with published reports (Ohzawa, DeAngelis, & Free(cid:173)\nman, 1997). Our results reemphasize the potential importance of \nintradendritic computation for binocular visual processing in par(cid:173)\nticular, and for cortical neurophysiology in general. \n\n\fA Single-Cell Accountfor Binocular Disparity Tuning \n\n209 \n\n1 \n\nIntroduction \n\nBinocular disparity is a powerful cue for depth in vision. The neurophysiological \nbasis for binocular disparity processing has been of interest for decades, spawned \nby the early studies of Rubel and Wiesel (1962) showing neurons in primary visual \ncortex which could be driven by both eyes. Early qualitative models for disparity \ntuning held that a binocularly driven neuron could represent a particular disparity \n(zero, near, or far) via a relative shift of receptive field (RF) centers in the right \nand left eyes. According to this model, a binocular cell fires maximally when an \noptimal stimulus, e.g. an edge of a particular orientation, is simultaneously centered \nin the left and right eye receptive fields, corresponding to a stimulus at a specific \ndepth relative to the fixation point. An account of this kind is most relevant to the \ncase of a cortical \"simple\" cell, whose phase-sensitivity enforces a preference for a \nparticular absolute location and contrast polarity of a stimulus within its monocular \nreceptive fields. \n\nThis global receptive field shift account leads to a conceptual puzzle, however, when \nbinocular complex cell receptive fields are considered instead, since a complex cell \ncan respond to an oriented feature nearly independent of position within its monoc(cid:173)\nular receptive field. Since complex cell receptive field diameters in the cat lie in the \nrange of 1-3 degrees, the excessive \"play\" in their monocular receptive fields would \nseem to render complex cells incapable of signaling disparity on the much finer scale \nneeded for depth perception (measured in minutes). \n\nIntriguingly, various authors have reported that a substantial fraction of complex \ncells in cat visual cortex are in fact tuned to left-right disparities much finer than \nthat suggested by the size of the monocular RF's. For such cells, a stimulus deliv(cid:173)\nered at the proper disparity, regardless of absolute position in either eye, produces \na neural response in excess of that predicted by the sum of the monocular responses \n(Pettigrew, Nikara, & Bishop, 1968; Ohzawa, DeAngelis, & Freeman, 1990; Ohzawa \net al., 1997). Binocular responses of this type suggest that for these cells, the left \nand right RF's are combined via a correlation operation rather than a simple sum \n(Nishihara & Poggio, 1984; Koch & Poggio, 1987). This computation has also been \nformalized in terms of an \"energy\" model (Ohzawa et al., 1990, 1997), building \non the earlier use of energy models to account for complex cell orientation tuning \n(Pollen & Ronner, 1983) and direction selectivity (Adelson & Bergen, 1985). In \nan energy model for binocular disparity tuning, sums of linear Gabor filter out(cid:173)\nputs representing left and right receptive fields are squared to produce the crucial \nmultiplicative cross terms (Ohzawa et al., 1990, 1997). \nOur previous biophysical modeling work has shown that the dendritic tree of a cor(cid:173)\ntical pyramidal cells is well suited to support an approximative high-dimensional \nquadratic input-output relation, where the second-order multiplicative cross terms \narise from local interactions among synaptic inputs carried out in quasi-isolated \ndendritic \"subunits\" (Mel, 1992b, 1992a, 1993). We recently applied these ideas \nto show that the position-invariant orientation tuning of a monocular complex cell \ncould be computed within the dendrites of a single cortical cell, based exclusively \nupon excitatory inputs from a uniform, overlapping population of unoriented ON \nand OFF cells (Mel et al., 1997). Given the similarity of the \"energy\" formulations \npreviously proposed to account for orientation tuning and binocu~ar disparity tun(cid:173)\ning, we hypothesized that a similar type of dendritic subunit computation could \nunderlie disparity tuning in a binocularly driven complex cell. \n\n\f210 \n\nB. W. Mel, D. L Ruderman and K A. Archie \n\nParameter \nRm \nRa \nem \nVrest \nCompartments \nSomatic !iNa, YnR \nDendritic !iNa, YnR \nInput frequency \ngAMPA \nTAMPA (on, of f) \ngNMDA \n7'NMDA (on, off) \nEsyn \n\nValue \nIOkOcm:l \n2000cm \n\n1.0ILF/cm~ \n\n-70 mV \n\n615 \n\n0.20,0.12 S/cm:l \n0.05,0.03 S/cm:t. \n\n0- 100 Hz \n\n0.027 nS - 0.295 nS \n\n0.5 ms, 3 ms \n\n0.27 nS - 2.95 nS \n\n0.5 ms, 50 ms \n\nOmV \n\nTable 1: Biophysical simulation parameters. Details of HH channel implementa(cid:173)\ntion are given elsewhere (Mel, 1993); original HH channel implementation cour(cid:173)\ntesy Ojvind Bernander and Rodney Douglas. In order that local EPSP size be \nheld approximately constant across the dendritic arbor, peak synaptic conduc(cid:173)\ntance at dendritic location x was approximately scaled to the local input resis(cid:173)\ntance (inversely), given by 9syn(X) = C/Rin(X), where c was a constant, and \nRin(X) = max(Rin(X),200MO). Input resistance Rin(X) was measured for a pas(cid:173)\nsive cell. Thus 9syn was identical for all dendritic sites with input resistance below \n200MO, and was given by the larger conductance value shown; roughly 50% of the \ntree fell within a factor of 2 of this value. Peak conductances at the finest distal tips \nwere smaller by roughly a factor of 10 (smaller number shown). Somatic input resis(cid:173)\ntance was near 24MO. The peak synaptic conductance values used were such that \nthe ratio of steady state current injection through NMDA vs. AMPA channels was \n1.2\u00b10.4. Both AMPA and NMDA-type synaptic conductances were modeled using \nthe kinetic scheme of Destexhe et al. (1994); synaptic activation and inactivation \ntime constants are shown for each. \n\n2 Methods \n\nCompartmental simulations of a pyramidal cell from cat visual cortex (morphol(cid:173)\nogy courtesy of Rodney Douglas and Kevan Martin) were carried out in NEURON \n(Hines, 1989); simulation parameters are summarized in Table 1. The soma and den(cid:173)\ndritic membrane contained Hodgkin-Huxley-type (HH) voltage-dependent sodium \nand potassium channels. Following evidence for higher spike thresholds and decre(cid:173)\nmental propagation in dendrites (Stuart & Sakmann, 1994), HH channel density was \nset to a uniform, 4-fold lower value in the dendritic membrane relative to that of the \ncell body. Excitatory synapses from LGN cells included both NMDA and AMPA(cid:173)\ntype synaptic conductances. Since the cell was considered to be isolated from the \ncortical network, inhibitory input was not modeled. Cortical cell responses were \nreported as average spike rate recorded at the cell body over the 500 ms stimulus \nperiod, excluding the 50 ms initial transient. \n\nThe binocular LGN consisted of two copies of the monocular LGN model used \npreviously (Mel et al., 1997), each consisting of a superimposed pair of 64x64 ON \nand OFF subfields. LGN cells were modeled as linear, half-rectified center-surround \nfilters with centers 7 pixels in width. We randomly subsampled the left and right \nLGN arrays by a factor of 16 to yield 1,024 total LGN inputs to the pyramidal cell. \n\n\fA Single-Cell Account for Binocular Disparity Tuning \n\n211 \n\nA developmental principle was used to determine the spatial arrangement of these \n1,024 synaptic contacts onto the dendritic branches of the cortical cell, as follows. \nA virtual stimulus ensemble was defined for the cell, consisting of the complete set \nof single vertical light or dark bars presented binocularly at zero-disparity within \nthe cell's receptive field. Within this ensemble, strong pairwise correlations existed \namong cells falling into vertically aligned groups of the same (ON or OFF) type, \nand cells in the vertical column at zero horizontal disparity in the other eye. These \nbinocular cohorts of highly correlated LGN cells were labeled mutual \"friends\". \nProgressing through the dendritic tree in depth first order, a randomly chosen LG N \ncell was assigned to the first dendritic site. A randomly chosen \"friend\" of hers \nwas assigned to the second site, the third site was assigned to a friend of the site 2 \ninput, etc., until all friends in the available subsample were assigned (4 from each \neye, on average). If the friends of the connection at site i were exhausted, a new \nLGN cell was chosen at random for site i + 1. In earlier work, this type of synaptic \narrangement was shown to be the outcome of a Hebb-type correlational learning \nrule, in which random, activity independent formation of synaptic contacts acted \nto slowly randomize the axo-dendritic interface, shaped by Hebbian stabilization of \nsynaptic contacts based on their short-range correlations with other synapses. \n\n3 Results \n\nModel pyramidal cells configured in this way exhibited prominent phase-invariant \norientation tuning, the hallmark response property of the visual complex cell. Mul(cid:173)\ntiple orientation tuning curves are shown, for example, for a monocular complex cell, \ngiving rise to strong tuning for light and dark bars across the receptive field (fig. 1). \nThe bold curve shows the average of all tuning curves for this cell; the half-width at \nhalf max is 25\u00b0, in the normal range for complex cells in cat visual cortex (Orban, \n1984). When the spatial arrangement of LGN synaptic contacts onto the pyra(cid:173)\nmidal cell dendrites was randomly scrambled, leaving all other model parameters \nunchanged, orientation tuning was abolished in this cell (right frame), confirming \nthe crucial role of spatially-mediated nonlinear synaptic interactions (average curve \nfrom left frame is reproduced for comparison). \n\nDisparity-tuning in an orientation-tuned binocular model cell is shown in fig. 2, com(cid:173)\npared to data from a complex cell in cat visual cortex (adapted from Ohzawa et al. \n(1997)). Responses to contrast matched (light-light) and contrast non-matched \n(light-dark) bar pairs were subtracted to produce these plots. The strong diagonal \nstructure indicates that both the model and real cells responded most vigorously \nwhen contrast-matched bars were presented at the same horizontal position in the \nleft and right-eye RF's (Le. at zero-disparity), whereas peak responses to contrast(cid:173)\nnon-matched bars occured at symmetric near and far, non-zero disparities. \n\n4 Discussion \n\nThe response pattern illustrated in fig. 2A is highly similar to the response generated \nby an analytical binocular energy model for a complex cell (Ohzawa et al., 1997): \n\n{exp (-kXi) cos (271' f XL) + exp (-kX'kJ cos (271' f XR)}2 + \n{exp (-kxiJ sin (271' f XL) + exp (-kXh) sin (271' f XR)}2, \n\n(1) \n\nwhere XL and X R are the horizontal bar positions to the two eyes, k is the factor \n\n\faverage +(cid:173)\nlightO -+(cid:173)\ndark 4 -\u20acI(cid:173)\nlight 8 , , * -\nlight16 \n.... -\ndark 16 -ll-\n\n55 \n\n50 \n\n45 \n\n40 \n\n35 \n\n30 \n\n25 \n\n20 \n\n15 \n\n10 \n\nU \nQl \n(/) \nUs \nQl \"'\" \n'5. .e \nQl \n(/) c: \n8. \nQl ex: \n\n(/) \n\nOrdered vs. Scrambled \n\nordered -\n\nscrambled -+-\n\n/'---\n\n\"' \n\n'+---- / \n+ \n\n/ , ~ \n+\n, \n\n' +_+- ~ \n\nI \n\n70 \n\n60 \n\n50 \n\n40 \n\n30 \n\n20 \n\n10 \n\n'0 \nQl \n~ \nQl \n\"'\" \n'5. \n.e \nQl \n(/) c: \n8. \nQl ex: \n\n(/) \n\n~ \n\n0 \n-90 \n\n-60 \n\n212 \n\nB. W. Mel, D_ L Ruderman and K. A. Archie \n\nOrientation Tuning \n\n-30 \n\n30 \nOrientabon (degrees) \n\n0 \n\n60 \n\n90 \n\n5 \n-90 \n\n-60 \n\n-30 \n\n30 \nOrientation (degrees) \n\n0 \n\n60 \n\n90 \n\nFigure 1: Orientation tuning curves are shown in the left frame for light and dark \nbars at 3 arbitrary positions_ Essentially similar responses were seen at other re(cid:173)\nceptive field positions, and for other complex cells_ Bold trace indicates average \nof tuning curves at positions 0, 1, 2, 4, 8, and 16 for light and dark bars. Similar \nform of 6 curves shown reflects the translation-invariance of the cell's response to \noriented stimuli, and symmetry with respect to ON and OFF input. Orientation \ntuning is eliminated when the spatial arrangement of LGN synapses onto the model \ncell dendrites is randomly scrambled (right frame). \n\nComplex Cell Model \n\nComplex Cell in Cat VI \n\nOhzawa, Deangelis, & Freeman, 1997 \n\nRight eye position \n\nRight eye position \n\nFigure 2: Comparison of disparity tuning in model complex cell to that of a binoc(cid:173)\nular complex cell from cat visual cortex. Light or dark bars were presented simul(cid:173)\ntaneously to the left and right eyes. Bars could be of same polarity in both eyes \n(light, light) or different polarity (light, dark); cell responses for these two cases were \nsubtracted to produce plot shown in left frame. Right frame shows data similarly \ndisplayed for a binocular complex cell in cat visual cortex (adapted from Ohzawa \net al. (1997)). \n\n\fA Single-Cell Account for Binocular Disparity Tuning \n\n213 \n\nthat determines the width of the subunit RF's, and f is the spatial frequency. \nIn lieu of literal simple cell \"subunits\" , the present results indicate that the subunit \ncomputations associated with the terms of an energy model could derive largely \nfrom synaptic interactions within the dendrites of the individual cortical cell, driven \nexclusively by excitatory inputs from unoriented, monocular ON and OFF cells \ndrawn from a uniform overlapping spatial distribution. While lateral inhibition \nand excitation play numerous important roles in cortical computation, the present \nresults suggest they are not essential for the basic features of the nonlinear disparity \ntuned responses of cortical complex cells. Further, these results address the paradox \nas to how inputs from both unoriented LGN cells and oriented simple cells can \ncoexist without conflict within the dendrites of a single complex cell. \n\nA number of controls from previous work suggest that this type of subunit process(cid:173)\ning is very robustly computed in the dendrites of an individual neuron, with little \nsensitivity to biophysical parameters and modeling assumptions, including details of \nthe algorithm used to spatially organize the genicula-cortical projection, specifics of \ncell morphology, synaptic activation density across the dendritic tree, passive mem(cid:173)\nbrane and cytoplasmic parameters, and details of the kinetics, voltage-dependence, \nor spatial distribution of the voltage-dependent dendritic channels. \nOne important difference between a standard energy model and the intradendritic \nresponses generated in the present simulation experiments is that the energy model \nhas oriented RF structure at the linear (simple-cell-like) stage, giving rise to ori(cid:173)\nented, antagonistic ON-OFF subregions (Movshon, Thompson, & Tolhurst, 1978), \nwhereas the linear stage in our model gives rise to center-surround antagonism only \nwithin individual LGN receptive fields. Put another way, the LGN-derived subunits \nin the present model cannot provide all the negative cross-terms that appear in the \nenergy model equations, specifically for pairs of pixels that fall outside the range of \na single LG N receptive field. \nWhile the present simulations involve numerous simplifications relative to the full \ncomplexity of the cortical microcircuit, the results nonetheless emphasize the po(cid:173)\ntential importance of intradendritic computation in visual cortex. \n\nAcknowledgements \n\nThanks to Ken Miller, Allan Dobbins, and Christof Koch for many helpful comments \non this work. This work was funded by the National Science Foundation and the \nOffice of Naval Research, and by a Slo~n Foundation Fellowship (D.R.). \n\nReferences \n\nAdelson, E., & Bergen, J. (1985). Spatiotemporal energy models for the perception \n\nof motion. J. Opt. Soc. Amer., A 2, 284-299. \n\nRines, M. (1989). A program for simulation of nerve equations with branching \n\ngeometries. Int. J. Biomed. Comput., 24, 55-68. \n\nRubel, D., & Wiesel, T . (1962) . Receptive fields, binocular interaction and func(cid:173)\n\ntional architecture in the cat's visual cortex. J. Physiol., 160, 106- 154. \n\nKoch, C., & Poggio, T . (1987) . Biophysics of computation: Neurons, synapses, \nand membranes. In Edelman, G., Gall, W., & Cowan, W. (Eds.), Synaptic \njunction, pp. 637-697. Wiley, New York. \n\nMel, B. (1992a). The clusteron: Toward a simple abstraction for a complex neu(cid:173)\n\nron. In Moody, J., Hanson, S., & Lippmann, R. (Eds.), Advances in Neural \n\n\f214 \n\nB. W. Mel, D. L Ruderman and K. A Archie \n\nInformation Processing Systems, vol. 4, pp. 35-42. Morgan Kaufmann, San \nMateo, CA. \n\nMel, B. (1992b). NMDA-based pattern discrimination in a modeled cortical neuron. \n\nNeural Computation, 4, 502-516. \n\nMel, B. (1993). Synaptic integration in an excitable dendritic tree. J. Neurophysiol., \n\n70(3), 1086-110l. \n\nMel, B., Ruderman, D., & Archie, K. (1997). Complex-cell responses derived from \ncenter-surround inputs: the surprising power of intradendritic computation. In \nMozer, M., Jordan, M., & Petsche, T. (Eds.), Advances in Neural Information \nProcessing Systems, Vol. 9, pp. 83-89. MIT Press, Cambridge, MA. \n\nMovshon, J., Thompson, I., & Tolhurst, D. (1978). Receptive field organization of \n\ncomplex cells in the cat's striate cortex. J. Physiol., 283, 79-99. \n\nNishihara, H., & Poggio, T. (1984). Stereo vision for robotics. In Brady, & Paul \n(Eds.), Proceedings of the First International Symposium of Robotics Research, \npp. 489-505. MIT Press, Cambridge, MA. \n\nOhzawa, I., DeAngelis, G., & Freeman, R. (1990). Stereoscopic depth discrimination \nin the visual cortex: Neurons ideally suited as disparity detectors. Science, \n249, 1037- 104l. \n\nOhzawa, I., DeAngelis, G., & Freeman, R. (1997). Encoding of binocular disparity \n\nby complex cells in the cat's visual cortex. J. Neurophysiol., June. \n\nOrban, G. (1984). Neuronal operations in the visual cortex. Springer Verlag, New \n\nYork. \n\nPettigrew, J., Nikara, T., & Bishop, P. (1968). Responses to moving slits by single \n\nunits in cat striate cortex. Exp. Brain Res., 6, 373-390. \n\nPollen, D., & Ronner, S. (1983). Visual cortical neurons as localized spatial fre(cid:173)\n\nquency filters. IEEE Trans. Sys. Man Cybero., 13, 907-916. \n\nStuart, G., & Sakmann, B. (1994). Active propagation of somatic action potentials \n\ninto neocortical pyramidal cell dendrites. Nature, 367, 69-72. \n\n\f", "award": [], "sourceid": 1365, "authors": [{"given_name": "Bartlett", "family_name": "Mel", "institution": null}, {"given_name": "Daniel", "family_name": "Ruderman", "institution": null}, {"given_name": "Kevin", "family_name": "Archie", "institution": null}]}