{"title": "The Computation of Stereo Disparity for Transparent and for Opaque Surfaces", "book": "Advances in Neural Information Processing Systems", "page_first": 385, "page_last": 392, "abstract": null, "full_text": "The Computation of Stereo Disparity for \n\nTransparent and for Opaque Surfaces \n\nSuthep Madarasmi \n\nComputer Science Department \n\nUniversity of Minnesota \nMinneapolis, MN 55455 \n\nDaniel Kersten \n\nDepartment of Psychology \nUniversity of Minnesota \n\nTing-Chuen Pong \n\nComputer Science Department \n\nUniversity of Minnesota \n\nAbstract \n\nThe classical computational model for stereo vision incorporates \na uniqueness inhibition constraint to enforce a one-to-one feature \nmatch, thereby sacrificing the ability to handle transparency. Crit(cid:173)\nics of the model disregard the uniqueness constraint and argue \nthat the smoothness constraint can provide the excitation support \nrequired for transparency computation. However, this modifica(cid:173)\ntion fails in neighborhoods with sparse features. We propose a \nBayesian approach to stereo vision with priors favoring cohesive \nover transparent surfaces. The disparity and its segmentation into a \nmulti-layer \"depth planes\" representation are simultaneously com(cid:173)\nputed. The smoothness constraint propagates support within each \nlayer, providing mutual excitation for non-neighboring transparent \nor partially occluded regions. Test results for various random-dot \nand other stereograms are presented. \n\n1 \n\nINTRODUCTION \n\nThe horizontal disparity in the projection of a 3-D point in a parallel stereo imag(cid:173)\ning system can be used to compute depth through triangulation. As the number of \n\n385 \n\n\f386 \n\nMadarasmi, Kersten, and Pong \n\npoints in the scene increases, the correspondence problem increases in complexity \ndue to the matching ambiguity. Prior constraints on surfaces are needed to arrive \nat a correct solution. Marr and Poggio [1976] use the smoothness constraint to re(cid:173)\nsolve matching ambiguity and the uniqueness constraint to enforce a 1-to-1 match. \nTheir smoothness constraint tends to oversmooth at occluding boundaries and their \nuniqueness assumption discourages the computation of stereo transparency for two \noverlaid surfaces. Prazdny [1985] disregards the uniqueness inhibition term to en(cid:173)\nable transparency perception. However, their smoothness constraint is locally en(cid:173)\nforced and fails at providing excitation for spatially disjoint regions and for sparse \ntransparency. \n\nMore recently, Bayesian approaches have been used to incorporate prior constraints \n(see [Clark and Yuille, 1990] for a review) for stereopsis while overcoming the prob(cid:173)\nlem of oversmoothing. Line processes are activated for disparity discontinuities to \nmark the smoothness boundaries while the disparity is simultaneously computed. \nA drawback of such methods is the lack of an explicit grouping of image sites \ninto piece-wise smooth regions. In addition, when presented with a stereogram of \noverlaid (transparent) surfaces such as in the random-dot stereogram in figure 5, \nmultiple edges in the image are obtained while we clearly perceive two distinct, \noverlaid surfaces. With edges as output, further grouping of overlapping surfaces \nis impossible using the edges as boundaries. This suggests that surface grouping \nshould be performed simultaneously with disparity computation. \n\n2 THE MULTI-LAYER REPRESENTATION \n\nWe propose a Bayesian approach to computing disparity and its segmentation that \nuses a different output representation from the previous, edge-based methods. Our \nrepresentation was inspired by the observations of Nakayama et al. [1989] that mid(cid:173)\nlevel processing such as the grouping of objects behind occluders is performed for \nobjects within the same \"depth plane\" . \n\nAs an example consider the stereogram of a floating square shown in figure 1a. The \nedge-based segmentation method computes the disparity and marks the disparity \nedges as shown in figure lb. Our approach produces two types of output at each \npixel: a layer (depth plane) number and a disparity value for that layer. The goal \nof the system is to place points that could have arisen from a single smooth surface \nin the scene into one distinct layer. The output for our multi-surface representation \nis shown in figure 1c. Note that the floating square has a unique layer label, namely \nlayer 4, and the background has another label of 2. Layers 1 and 3 have no data \nsupport and are, therefore, inactive. \n\nThe rest of the pixels in each layer that have no data support obtain values by a \nmembrane fitting process using the computed disparity as anchors. The occluded \nparts of surfaces are, thus, represented in each layer. In addition, disjoint regions of a \nsingle surface due to occlusion are represented in a single layer. This representation \nof occluded parts is an important difference between our representation and a similar \nrepresentation for segmentation by Darrell and Pentland [1991]. \n\n\fThe Computation of Stereo Disparity for Transparent and for Opaque Surfaces \n\n387 \n\n(a) \n\nFigure I: a) A gray scale \ndisplay of a noisy stereo(cid:173)\ngram depicting a floating \nsquare. b. Edge based \ndisp. = 0 method: disparity com(cid:173)\n\nputed and disparity discon(cid:173)\ntinuity computed. c. Multi(cid:173)\nSurface method: disparity \ncomputed, surface grouping \nperformed by layer assign-\nh \nment. an \nlspanty or eac \nlayer filled in. \nALGORITHM AND SIMULATION METHOD \n\nIl1IIII -Layer 4 \n~ -Layer 2 \n\n. \ndlsp. = 4 \n\nLayer 4 \n\n(b) \n\nd d' \n\n. \n\nf \n\n3 \n\nWe use Bayes' [1783] rule to compute the scene attribute, namely disparity u and \nits layer assignment 1 for each layer: \n\nIldL dR) = p(dL,dRlu, I)p(u, I) \n\np( dL , dR ) \n\n( \np u , ' \n\nwhere dL and dR are the left and right intensity image data. Each constraint is ex(cid:173)\npressed as a local cost function using the Markov Random Field (MRF) assumption \nlGeman and Geman, 1984], that pixels values are conditional only on their nearest \nneighbors. Using the Gibbs-MRF equivalence, the energy function can be written \nas a probability function: \n\np(x) = -e-\"(cid:173)\n\nE(.,) \n\n1 \nZ \n\nwhere Z is the normalizing constant, T is the temperature, E is the energy cost \nfunction, and x is a random variable \n\nOur energy constraints can be expressed as \n\nE = >'D VD + >'s Vs + >'G VG + >'E VE + AR VR \n\nwhere the>. 's are the weighting factors and the VD, Vs, VG, VE, VR functions are \nthe data matching cost, the smoothness term, the gap term, the edge shape term, \nand the disparity versus intensity edge coupling term, respectively. \n\nThe data matching constraint prefers matches with similar intensity and contrast: \n\nVD = t [Idr - dfl +.., .2: I(df - dr) - (d~ - df)l] \n\n, \n\nJENi \n\nwith the image indices k and m given by the ordered pairs k = (row(i), col(i)+uC,i), \nm = (row(j) , col(j) + UCii), M is the number of pixels in the image, Ci is the layer \nclassification for site i, and Uli is the disparity at layer I. The.., weighs absolute \nintensity versus contrast matching. \nThe >'D is higher for points that belong to unambiguous features such as straight \nvertical contours, so that ambiguous pixels rely more on their prior constraints. \n\n\f388 \n\nMadarasmi, Kersten, and Pong \n\ncost \n\n(b) \n\ndepth difference \n\ndepth difference \n\nFigure 2: Cost function V s. a) The smoothness cost is quadratic until the disparity differ(cid:173)\nence is high and an edge process is activated. b) In our simulations we use a threshold \nbelow which the smoothness cost is scaled down and above which a different layer \nassignment is accepted at a constant high cost. \n\nAlso, if neighboring pixels have a higher disparity than the current pixel and are in \na different layer, its )..D is lowered since its corresponding point in the left image is \nlikely to be occluded. \n\nThe equation for the smoothness term is given by: \n\nM \n\nL \n\nVs = LL L V,(uu, U'j)a, \n\nwhere, Ni are the neighbors of i, V, is the local smoothness potential, a, is the \n\nactivity level for layer I defined by the percent of pixels belonging to layer I, and L \nis the number layers in the system. The local smoothness potential is given by: \n\ni \n\n1 \n\njEN. \n\nif (a - b)2 < Tn \notherwise \n\nwhere JJ is the weighting term between depth smoothness and directional derivative \nsmoothness. The ~k is the difference operation in various directions k, and T \nis the threshold. \nInstead of the commonly used quadratic smoothness function \ngraphed in figure 2a, we use the (7 function graphed in figure 2b which resembles \nthe Ising potential. This allows for some flexibility since )..5 is set rather high in \nour simulations. \nThe VG term ensures a gap in the values of corresponding pixels between layers: \n\nThis ensures that if a site i belongs to layer C., then all points j neighboring i for \neach layer 1 must have different disparity values ulj than uCia' \nThe edge or boundary shape constraint VE incorporates two types of constraints: \na cohesive measure and a saliency measure. The costs for various neighborhood \nconfigurations are given in figure 3. \n\nThe constraint VR ensures that if there is no edge in intensity then there should be \nno edge in the disparity. This is particularly important to avoid local minima for \ngray scale images since there is so much ambiguity in the matching. \n\n\fThe Computation of Stereo Disparity for Transparent and for Opaque Surfaces \n\n389 \n\n\u2022 \n\n- same layer label \n\ncost == 0.7 \n\ncost == I \n\nD -different layer label \n\ncost = 0.2 \n\ncost = 0.25 cost == 0.5 \n\ncost = 0 \nFigure 3: Cost function VE. The costs associated nearest neighborhood layer label con(cid:173)\n~gurations. a) Fully cohesive region (lowest cost) b) Two opaque regions with straight \nhne boundary. c) Two opaque regions with diagonal line boundary. d) Opaque regions \nwith no figural continuity. e) Transparent region with dense samplings. f) Transparent \nregion with no other neighbors (highest cost). \n\nLayer 3 \n\nlayer labels \n\nWire-frame plot of Layer 3 \n\nFigure 4: Stereogram of floating cyl(cid:173)\ninder shown in crossed and uncrossed \ndisparity. Only disparity values in the \nactive layers are shown. A wire(cid:173)\nframe rendering for layer 3 which \ncaptures the cylinder is shown. \n\nThe Gibbs Sampler [Geman and Geman, 1984] with simulated annealing is used \nto compute the disparity and layer assignments. After each iteration of the Gibbs \nSampler, the missing values within each layer are filled-in using the disparity at the \navailable sites. A quadratic energy functional enforces smoothness of disparity and \nof disparity difference in various directions. A gradient descent approach minimizes \nthis energy and the missing values are filled-in. \n\n4 SIMULATION RESULTS \n\nAfter normalizing each of the local costs to lie between 0 and 1, the values for the \nweighting parameters used in decreasing order are: .As, .AR, .AD, .AE,.AG with the .AD \nvalue moved to follow .AG if a pixel is partially occluded. The results for a random(cid:173)\ndot stereogram with a floating half-cylinder are shown in figure 4. Note that for \nclarity only the visible pixels within each layer are displayed, though the remaining \npixels are filled-in. A wire-frame rendering for layer 3 is also provided. \n\nFigure 5 is a random-dot stereogram with features from two transparent fronto(cid:173)\nparallel surfaces. The output consists primarily of two labels corresponding to \nthe foreground and the background. Note that when the stereogram is fused, the \npercept is of two overlaid surfaces with various small, noisy regions of incorrect \nmatches. \nFigure 6 is a random-dot stereogram depicting many planar-parallel surfaces. Note \n\n\f390 \n\nMadarasmi, Kersten, and Pong \n\n~ _\n\n... \n\n_ \n\n1 \n\nLayer I \n\n- _ - - -iJ2~JI!5iw \n\n- ~ -~ \n\n~4j. -, 7 - An \n\nLayer 2 \n\nLayer 3 \n\n--\n\n- _ ..... \n\n........-\n\nLaye~o;II~_:-~_IIIII;;C;=;:;F;_~.~2i:4::3=\"\":;;\"--~~ \nLayer 5 \n\nFigure 5: Random-dot ste(cid:173)\nreogram of two overlaid \nsurfaces. Layers 1 and 4 \nare the mostly activated \nlayers. Only 5 of the layers \nare shown here. \n\n-\n\n\")51 \n\" . , \u00b7.ii \n\nlayer labeb \n\nFigure 6: Random-dot stereogram of \nmultiple flat surfaces. Layers 4 captures \ntwo regions since they belong to the \nsame surface (equal disparity). \n\nlayer labels \n\n.. ). \n\nthat there are two disjoint regions which are classified into the same layer since they \nform a single surface. \nA gray-scale stereogram depicting a floating square occluding the letter 'C' also \nfloating above the background is shown in figure 7. A feature-based matching \nscheme is bound to fail here since locally one cannot correctly attribute the com(cid:173)\nputed disparity at a matched corner of the rectangle, for example, to either the \nrectangle, the background, or to both regions. Our VR constraint forces the system \nto attempt various matches until points with no intensity discontinuity have no \ndisparity discontinuity. Another important feature is that the two ends of the letter \n'C' are in the same \"depth plane\" [Nakayama et al., 1989] and may later be merged \nto complete the letter. \n\nFigure 8 is a gray scale stereogram depicting 4 distant surfaces with planar disparity. \nAt occluding boundaries, the region corresponding to the further surface in the right \nimage has no corresponding region in the left image. A high .AD would only force \nthese points to find an incorrect match and add to the systems errors. The.AD \nreduction factor for partially occluded points reduces the data matching requirement \nfor such points. This is crucial for obtaining correct matches especially since the \nimages are sparsely textured and the dependence on accurate information from the \ntextured regions is high. \n\nA transparency example of a fence in front a bill-board is given in figure 9. Note \n\n\f", "award": [], "sourceid": 709, "authors": [{"given_name": "Suthep", "family_name": "Madarasmi", "institution": null}, {"given_name": "Daniel", "family_name": "Kersten", "institution": null}, {"given_name": "Ting-Chuen", "family_name": "Pong", "institution": null}]}