'~in) present in the data and the largest disparity \nwhich can be estimated reliably (Henkel, 1997): \n\nII \nd < k~ax -\n\n7r _1I{J \n\n'2>'min . \n\n(1) \n\n\fA Simple and Fast Neural Network Approach to Stereovision \n\n811 \n\nA well-known example of the size-disparity scaling expressed in equation (1) is \nfound in the context of the spatial frequency channels assumed to exist in the \nvisual cortex. Cortical cells respond to spatial wavelengths down to about half \ntheir peak wavelength Aopt; therefore, they can estimate reliable only disparities \nless than 1/4 Aopt. This is known as Marr's quarter-cycle limit (Blake, 1991). \n\nEquation (1) immediately suggests a way to extend the limited working range of \ndisparity estimators: a spatial smoothing of the image data before or during dispar(cid:173)\nity calculation reduces k'f:tax, and in turn increases the disparity range. However, \nspatial smoothing reduces also the spatial resolution of the resulting disparity map. \nAnother way of modifying the usable range of disparity estimators is the applica(cid:173)\ntion of a fixed preshift to the input data before disparity calculation. This would \nrequire prior knowledge of the correct preshift to be applied, which is a nontrivial \nproblem. One could resort to hierarchical coarse-to-fine schemes, but the difficulties \nwith hierarchical schemes have already been elal ')rated. \n\nThe aliasing effects discussed are a general feature of sampling visual space with \nonly two eyes; instead of counteracting, one can exploit them in a simple coherence(cid:173)\ndetection scheme, where the multi-unit activity in stacks of disparity detectors tuned \nto a common view direction is analyzed. \n\nAssuming that all disparity units i in a stack have random preshifts or presmoothing \napplied to their input data, these units will have different, but slightly overlapping \nworking ranges Di = [diin , diax] for valid disparity estimates. An object with true \ndisparity d, seen in the common view direction of such a stack, will therefore split \nthe stack into two disjunct classes: the class C of estimators with dEDi for all \ni E C, and the rest of the stack, C, with d \u00a2 D i . All disparity estimators E C will \ncode more or less the true disparity di ~ d, but the estimates of units belonging to C \nwill be subject to the random aliasing effects discussed, depending in a complicated \nway on image content and disparity range Di of the unit. \n\nWe will thus have di ~ d ~ dj whenever units i and j belong to C, and random rela(cid:173)\ntionships otherwise. A simple coherence detection within each stack, i.e. searching \nfor all units with di ~ dj and extracting the largest cluster found, will be sufficient \nto single out C. The true disparity d in the view direction of the stack can be simply \nestimated as an average over all coherently coding units: \n\n3 Neural Network Implementation \n\nRepeating this coherence detection scheme in every view direction results in a fully \nparallel network structure for disparity calculation. Neighboring disparity stacks \nresponding to different view directions estimate disparity values independently from \neach other, and within each stack, disparity units operate independently from each \nother. Since coherence detection is an opportunistic scheme, extensions of the basic \nalgorithm to mUltiple spatial scales and combinations of different types of disparity \nestimators are trivial. Additional units are simply included in the appropriate \ncoherence stacks. The coherence scheme will combine only the information from \nthe coherently coding units and ignore the rest of the data. For this reason, the \nscheme also turns out to be extremely robust against single-unit failures. \n\n\f812 \n\nR. D. Henkel \n\ndisparity data \n\n\"h'7\" \n\n-----------r\u00b7----------\n\nLeft eye\u00b7\" .. , \n\n: \n\n, .............. , .. \n\n.' Right eye \n\nFigure 2: The network structure for a single horizontal scan-line (left). The view \ndirections of the disparity stacks split the angle between the left and right lines \nof sight in the network and 3D-space in half, therefore analyzing space along the \ncyclopean view directions (right). \n\nCyclopean eye \n\nIn the current implementation (Fig. 2), disparity units at a single spatial scale \nare arranged into horizontal disparity layers. Left and right image data is fed \ninto this network along diagonally running data lines. This causes every disparity \nlayer to receive the stereo data with a certain fixed preshift applied, leading to the \nrequired, slightly different working-ranges of neighboring layers. Disparity units \nstacked vertically above each other are collected into a single disparity stack which \nis then analyzed for coherent activity. \n\n4 Results \n\nThe new stereo network performs comparable on several standard test image sets \n(Fig. 3). The calculated disparity maps are similar to maps obtained by classical \narea-based approaches, but they display subpixel-precision. Since no smoothing or \nregularization is performed by the coherence-based stereo algorithm, sharp disparity \nedges can be observed at object borders. \n\nWithin the network, a simple validation map is available locally. A measure of local \n\nFigure 3: Disparity maps for some standard test images (small insets), calculated \nby the coherence-based stereo algorithm. \n\n\fA Simple and Fast Neural Network Approach to Stereovision \n\n813 \n\nFigure 4: The performance of coherence-based stereo on a difficult scene with spec(cid:173)\nular highlights, transparency and repetitive structures (left). The disparity map \n(middle) is dense and correct, except for a few structure-less image regions. These \nregions, as well as most object borders, are indicated in the validation map (right) \nwith a low [dark] validation count. \n\ncoherence can be obtained by calculating the relative number of coherently acting \ndisparity units in each stack, i.e. by calculating the ratio N(C)/ N(CUC), where N(C) \nis the number of units in class C. In most cases, this validation map clearly marks \nimage areas where the disparity calculations failed (for various reasons, notably at \nocclusions caused by object borders, or in large structure-less image regions, where \nno reliable matching can be obtained -\n\ncompare Fig 4). \n\nClose inspection of disparity and validation maps reveals that these image maps \nare not aligned with the left or the right view of the scene. Instead, both maps are \nregistered with the cyclopean view. This is caused by the structural arrangement of \ndata lines and disparity stacks in the network. Reprojecting data lines and stacks \nback into 3D-space shows that the stacks analyze three-dimensional space along \nlines splitting the angle between the left and right view directions in half. This is \nthe cyclopean view direction as defined by (Hering, 1879). \nIt is easy to obtain the cyclopean view of the scene itself. With If and If denoting \nthe left and right input data at the position of disparity-unit i, a summation over \nall coherently coding disparity units in a stack, i.e., \n\nFigure 5: A simple superposition of the left and right stereo images results in \ndiplopia (left). By using a vergence system, the two stereo images can be aligned \nbetter (middle), but diplopia is still prominent in most areas of the visual field. \nThe fused cyclopean view of the scene (left) was calculated by the coherence-based \nstereo network. \n\n\f814 \n\nR. D. Henkel \n\ngives the image intensity I C in the cyclopean view-direction of this stack. Collecting \nIC from all disparity stacks gives the complete cyclopean view as the third co(cid:173)\nregistered map of the network (Fig 5). \n\nAcknowledgements \n\nThanks to Helmut Schwegler and Robert P. O'Shea for interesting discus(cid:173)\nImage data courtesy of G. Medoni, UCS Institute for Robotics & In(cid:173)\nsions. \ntelligent Systems, B. Bolles, AIC, SRI International, and G. Sommer, Kiel \nCognitive Systems Group, Christian-Albrechts-Universitat Kiel. An internet(cid:173)\nbased implementation of the algorithm presented in this paper is available at \nhttp://axon.physik.uni-bremen.de/-rdh/online~alc/stereo/. \n\nReferences \n\nAdelson, E.H. & Bergen, J.R. (1985): Spatiotemporal Energy Models for the Per(cid:173)\nception of Motion. J. Opt. Soc. Am. A2: 284-299. \n\nBarron, J.L., Fleet, D.J. & Beauchemin, S.S. (1994): Performance of Optical Flow \nTechniques. Int. J. Camp. Vis. 12: 43-77. \n\nBlake, R. & Wilson, H.R. (1991): Neural Models of Stereoscopic Vision. TINS 14: \n445-452. \n\nDeAngelis, G.C., Ohzawa, I. & Freeman, R.D. (1991): Depth is Encoded in the \nVisual Cortex by a Specialized Field Structure. Nature 11: 156-159. \n\nFleck, M.M. (1991): A Topological Stereo Matcher. Int. J. of Camp. Vis. 6: \n197-226. \n\nFleet, D.J. & Jepson, A.D. (1993): Stability of Phase Information. IEEE PAMI 2: \n333-340. \nFrisby, J.P. & and S. B. Pollard, S.B. (1991): Computational Issues in Solving the \nStereo Correspondence Problem. eds. M.S. Landy and J. A. Movshon, Computa(cid:173)\ntional Models of Visual Processing, pp. 331, MIT Press, Cambridge 1991. \n\nHenkel, R.D. (1997): Fast Stereovision by Coherence Detection, in Proc. of \nCAIP'97, Kiel, LCNS 1296, eds. G. Sommer, K. Daniilidis and J. Pauli, pp. 297, \nLCNS 1296, Springer, Heidelberg 1997. \n\nE. Hering (1879): Der Raumsinn und die Bewegung des Auges, in Handbuch der \nPsychologie, ed. 1. Hermann, Band 3, Teil 1, Vogel, Leipzig 1879. \nMarr, D. & Poggio, T. (1979): A Computational Theory of Human Stereo Vision. \nProc. R. Soc. Land. B 204: 301-328. \nOhta, Y, & Kanade, T. (1985): Stereo by Intra- and Inter-scanline Search using \ndynamic programming. IEEE PAMI 7: 139-154. \nQian, N. & Zhu, Y. (1997): Physiological Computation of Binocular Disparity, to \nappear in Vision Research. \nYuille, A.L., Geiger, D. & Biilthoff, H.H. (1991): Stereo Integration, Mean Field \nTheory and Psychophysics. Network 2: 423-442. \n\n\f", "award": [], "sourceid": 1352, "authors": [{"given_name": "Rolf", "family_name": "Henkel", "institution": null}]}