{"title": "Against Edges: Function Approximation with Multiple Support Maps", "book": "Advances in Neural Information Processing Systems", "page_first": 388, "page_last": 395, "abstract": null, "full_text": "Against Edges: Function Approximation with \n\nMultiple Support Maps \n\nTrevor Darrell and Alex Pentland \n\nVision and Modeling Group, The Media Lab \n\nMassachusetts Institute of Technology \n\nE15-388, 20 Ames Street \nCambridge MA, 02139 \n\nAbstract \n\nNetworks for reconstructing a sparse or noisy function often use an edge \nfield to segment the function into homogeneous regions, This approach \nassumes that these regions do not overlap or have disjoint parts, which is \noften false. For example, images which contain regions split by an occlud(cid:173)\ning object can't be properly reconstructed using this type of network. We \nhave developed a network that overcomes these limitations, using support \nmaps to represent the segmentation of a signal. In our approach, the sup(cid:173)\nport of each region in the signal is explicitly represented. Results from \nan initial implementation demonstrate that this method can reconstruct \nimages and motion sequences which contain complicated occlusion. \n\n1 \n\nIntroduction \n\nThe task of efficiently approximating a function is central to the solution of many \nimportant problems in perception and cognition. Many vision algorithms, for in(cid:173)\nstance, integrate depth or other scene attributes into a dense map useful for robotic \ntasks such as grasping and collision avoidance. Similarly, learning and memory are \noften posed as a problem of generalizing from stored observations to predict future \nbehavior, and are solved by interpolating a surface through the observations in an \nappropriate abstract space. Many control and planning problems can also be solved \nby finding an optimal trajectory given certain control points and optimization con(cid:173)\nstraints. \n388 \n\n\fAgainst Edges: Function Approximation with Multiple Support Maps \n\n389 \n\nIn general, of course, finding solutions to these approximation problems is an ill(cid:173)\nposed problem, and no exact answer can be found without the application of some \nprior knowledge or assumptions. Typically, one assumes the surface to be fit is either \nlocally smooth or has some particular parametric form or basis function description. \nMany successful systems have been built to solve such problems in the cases where \nthese assumptions are valid. However in a wide range of interesting cases where \nthere is no single global model or universal smoothness constraint, such systems \nhave difficulty. These cases typically involve the approximation or estimation of \na heterogeneous function whose typical local structure is known, but which also \nincludes an unknown number of abrupt changes or discontinuities in shape. \n\n2 Approximation of Heterogeneous Functions \n\nIn order to accurately approximate a heterogeneous function with a minimum num(cid:173)\nber of parameters or interpolation units, it is necessary to divide the function into \nhomogeneous chunks which can be approximated parsimoniously. When there is \nmore than one homogeneous chunk in the signal/function, the data must be seg(cid:173)\nmented so that observations of one object do not intermingle with and corrupt the \napproximation of another region. \n\nOne simple approach is to estimate an edge map to denote the boundaries of ho(cid:173)\nmogeneous regions in the function, and then to regularize the function within such \nboundaries. This method was formalized by Geman and Geman (1984), who de(cid:173)\nveloped the \"line-process\" to insert discontinuities in a regularization network. A \nregularized solution can be efficiently computed by a neural network, either using \ndiscrete computational elements or analog circuitry (Poggio et al. 1985; Terzopou(cid:173)\nlos 1988). In this context, the line-process can be thought of as an array of switches \nplaced between interpolation nodes (Figure la). As the regularization proceeds in \nthis type of network, the switches of the line process open and prevent smoothing \nacross suspected discont.inuities. Essentially, these switches are opened when the \nsquared difference between neighboring interpolated values exceeds some thresh(cid:173)\nold (Blake and Zisserman 1987; Geiger and Girosi 1991). In practice a continuation \nmethod is used to avoid problems with local minima, and a continuous non-linearity \nis used in place of a boolean discontinuity. The term \"resistive fuse\" is often used \nto describe these connections between interpolation sites (Harris et al. 1990). \n\n3 Limitations of Edge-based Segmentation \n\nAn edge-based representation assumes that homogeneous chunks of a function are \ncompletely connected, and have no disjoint subregions. For the visual reconstruction \ntask, this implies that the projection of an object onto the image plane will always \nyield a single connected region. While this may be a reasonable assumption for \ncertain classes of synthetic images, it is not valid for realistic natural images which \ncontain occlusion and/or transparent phenomena. \n\nWhile a human observer can integrate over gaps in a region split by occlusion, the \nline process will prevent any such smoothing, no matter how close the subregions \nare in the image plane. When these disjoint regions are small (as when viewing \nan object through branches or leaves), the interpolated values provided by such a \n\n\f390 \n\nDarrell and Pentland \n\n(a) \n\n(b) \n\nFigure 1: (a) Regularization network with line-process. Shaded circles represent \ndata nodes, while open circles represent interpolation nodes. Solid rectangles indi(cid:173)\ncate resistorsj slashed rectangles indicate \"resistive fuses\". (b) Regularization net(cid:173)\nwork with explicit support mapSj support process can be implemented by placing \nresistive fuses between data and interpolation nodes (other constraints on support \nare described in text). \n\nnetwork will not be reliable, since observation noise can not be averaged over a large \nnumber of samples. \n\nSimilarly, an edge-based approach cannot account for the perception of motion \ntransparency, since these stimuli have no coherent local neighborhoods. Human \nobservers can easily interpolate 3-D surfaces in transparent random-dot motion \ndisplays (Husain et al. 1989). In this type of display, points only last a few frames, \nand points from different surfaces are transparently intermingled. With a line(cid:173)\nprocess, no smoothing or integration would be possible, since neighboring points \nin the image belong to different 3-D surfaces. To represent and process images \ncontaining this kind of transparent phenomena, we need a framework that does not \nrely on a global 2D edge map to make segmentation decisions. By generalizing \nthe regularization/surface interpolation paradigm to use support. maps rather than \na line-process, we can overcome limitations the discontinuity approach has with \nrespect to transparency. \n\n4 U sing Support Maps for Segmentation \n\nOur approach decomposes a heterogeneous function into a set of individual approx(cid:173)\nimations corresponding to the homogeneous regions of the function. Each approx(cid:173)\nimation covers a specific region, and ues a support map to indicate which points \nbelong to that region. Unlike an edge-based representation, the support of an ap(cid:173)\nproximation need not be a connected region -\nscattered collection of independent points! \n\nin fact, the support can consist of a \n\n\fAgainst Edges: Function Approximation with Multiple Support Maps \n\n391 \n\nFor a single approximation, it is relatively straight-forward to compute a support \nmap. Given an approximation, we can find the support it has in the function by \nthresholding the residual error of that approximation. In terms of analog regular(cid:173)\nization, the support map (or support \"process\") can be implemented by placing a \nresistive fuse between the data and the interpolating units (Figure 1b). \n\nA single support map is limited in usefulness, since only one region can be approxi(cid:173)\nmated. In fact, it reduces to the \"outlier\" rejection paradigm of certain robust esti(cid:173)\nmation methods, which are known to have severe theoretical limits on the amount \nof outlier contamination they can handle (Meer et al. 1991; Li 1985). To represent \ntrue heterogeneous stimuli, multiple support maps are needed, with one support \nmap corresponding to each homogeneous (but not necessarily connected) region. \n\nWe have developed a method to estimate a set of these support maps, based on find(cid:173)\ning a minimal length description of the function. We adopt a three-step approach: \nfirst, we generate a set of candidate support maps using simple thresholding tech(cid:173)\nniques. Second, we find the subset of these maps which minimally describes the \nfunction, using a network optimization to find the smallest set of maps that covers \nall the observations. Finally, we re-allocate the support in this subset, such that \nonly the approximation with the lowest residual error supports a particular point. \n\n4.1 Estimating Initial Support Fields \n\nIdeally, we would like to consider all possible support patterns of a given dimension \nas candidate support maps. Unfortunately, the combinatorics of the problem makes \nthis impossible; instead, we attempt to find a manageable number of initial maps \nwhich will serve as a useful starting point. \n\nA set of candidate approximations can be obtained in many ways. In our work we \nhave initialized their surfaces either using a table of typical values or by fitting a \nsmall fixed regions of the function. We denote each approximation of a homogeneous \nregion as a tuple, (ai,si,ui,fi), where si = {Sij} is a support map, ui = {Uij} is \nthe approximated surface, and ri = {l'ij} is the residual error computed by taking \nthe difference of ui with the observed data. \n(The scalar ai is used in deciding \nwhich subset of approximations are used in the final representation.) The support \nfields are set by thresholding the residual field based on our expected (or assumed) \nobservation variance e. \n\nif (rij)2 < e } \notherwise \n\n4.2 Estimating the Number of Regions \n\nPerhaps the most critical problem in recovering a good heterogeneous description \nis estimating how many regions are in the function. Our approach to this problem \nis based on finding a small set of approximations which constitutes a parsimonious \ndescription of the function. We attempt to find a subset of the candidate approxima(cid:173)\ntions whose support maps are a minimal covering of the function, e.g. the smallest \nsubset whose combined support covers the entire function. In non-degenerate cases \nthis will consist of one approximation for each real region in the function. \n\n\f392 \n\nDarrell and Pentland \n\nThe quantity ai indicates if approximation i is included in the final representation. \nA positive value indicates it is \"active\" in the representation; a negative value \nindicates it is excluded from the representation. Initially ai is set to zero for each \napproximation; to find a minimal covering, this quantity is dynamically updated as \na function of the number of points uniquely supported by a particular support map. \n\nA point is uniquely supported in a support map if it is supported by that map and \nno other. Essentially, we find these points by modulating the support values of a \nparticular approximation with shunting inhibition from all other active approxima(cid:173)\ntions. To compute Cij, a flag that indicates whether or not point j of map i is \nuniquely supported, we multiply each support map with the product of the inverse \nof all other maps whose aj value indicates it is active: \n\nCij = Sij II (1 - SkjO\"(ak\u00bb \n\nk~i \n\nwhere 0\"0 is a sigmoid function which converts the real-valued ai into a multiplica(cid:173)\ntive factor in the range (0, 1). The quantity Cij is close to one at uniquely supported \npoints, and close to zero for all other points. \nIf there are a sufficient number of uniquely supported points in an approximation, \nwe increase ai, otherwise it is decreased: \n\nd \n\ndt ai = L Cij - a. \n\nj \n\n(1) \n\nwhere a specifies the penalty for adding another approximation region to the rep(cid:173)\nresentation. This constant determines the smallest number of points we are willing \nto have constitute a distinct region in the function. The network defined by these \nequations has a corresponding Lyoponov function: \n\nE = L ai( - I)O\"(Sij) II (1 - O\"(Skj )O\"(ak\u00bb) + a) \n\nN \n\ni \n\nM \n\nj \n\nk~i \n\nso it will be guaranteed to converge to a local minima if we bound the values of ai \n(for fixed Sij and a). After convergence, those approximations with positive ai are \nkept, and the rest are discarded. Empirically we have found the local minima found \nby our network correspond to perceptually salient segmentations. \n\n4.3 Refining Support Fields \n\nOnce we have a set of approximations whose support maps minimally cover the \nfunction (and presumably correspond to the actual regions of the function), we can \nrefine the support using a more powerful criteria than a local threshold. First, we \ninterpolate the residual error values through unsampled points, so that support can \nbe computed even where there are no observations. Then we update the support \nmaps based on which approximation has the lowest residual error for a given point: \n\nSij \n\n1 \n-- { 0\n\nif (rij)2 < (J \nand (rij)2 = min{klak>o}(rkj)2 \n\notherwise \n\n\fAgainst Edges: Function Approximation with Multiple Support Maps \n\n393 \n\n( \n\nFigure 2: (a) Function consisting of constant regions with added noise. (b) Same \nfunction sparsely sampled. (c) Support maps found to approximate uniformly sam(cid:173)\npled function. (d) Support maps found for sparsely sampled function. \n\n5 Results \n\nWe tested how well our network could reconstruct functions consisting of piecewise \nconstant patches corrupted with random noise of known variance. Figure 2( a) \nshows the image containing the function the used in this experiment. We initialized \n256 candidate approximations, each with a different constant surface. Since the \nimage consisted of piecewise constant regions, the interpolation performed by each \napproximation was to compute a weighted average of the data over the supported \npoints. Other experiments have used more powerful shape models, such as thin-plate \nor membrane Markov random fields, as well as piecewise-quadratic polynomials \n(Darrell et al. 1990). \n\nUsing a penalty term which prevented approximations with 10 or fewer support \npoints to be considered (0' = 10.0), the network found 5 approximations which cov(cid:173)\nered the entire image; their support maps are shown in Figure 2( c). The estimated \nsurfaces corresponded closely to the values in the constant patches before noise was \nadded. We ran a the same experiment on a sparsely sampled version of this func(cid:173)\ntion, as shown in Figure 2(b) and (d), with similar results and only slightly reduced \naccuracy in the recovered shape of the support maps. \n\n\f394 \n\nDarrell and Pentland \n\n(b) \n\n-0 -\n\n0 \n\n-'L- 0 _0 -\n\n(d) \n\nFigure 3: ( a) First frame from image sequence and (b) recovered regions. (c) First \nframe from random dot sequence described in text. (d) Recovered parameter values \nacross frames for dots undergoing looming motion; solid line plots Tz , dotted line \nplots Tx , and circles plot Ty for each frame. \n\nWe have also applied our framework to the problem of motion segmentation. For \nhomogeneous data, a simple \"direct\" method can be used to model image motion \n(Horn and Weldon 1988). Under this assumption, the image intensities for a region \ncentered at the origin undergoing a translation (Tx, T y, Tz ) satisfy at each point \n\ndI \n\ndI \no = dt + Tx dx + Ty dy + Tz (x dx + y dy) \n\ndI \n\ndI \n\ndI \n\nwhere I is the image function. Each approximation computes a motion estimate \nby selecting a T vector which minimizes the square of the right hand side of this \nequation over its support map, using a weighted least-squares algorithm. The resid(cid:173)\nual error at each point is then simply this constraint equation evaluated with the \nparticular translation estimate. \n\nFigure 3( a) shows the first frame of one sequence, containing a person moving behind \na stationary plant. Our network began with 64 candidate approximations, with the \ninitial motion parameters in each distributed uniformly along the parameter axes. \nFigure 3(b) shows the segmentation provided by our method. Two regions were \nfound to be needed, one for the person and one for the plant. Most of the person \nhas been correctly grouped together despite the occlusion caused by the plant's \nleaves. Points that have no spatial or temporal variation in the image sequence are \nnot attributed to any approximation, since they are invisible to our motion model. \nNote that there is a cast shadow moving in synchrony with the person in the scene, \n.and is thus grouped with that approximation. \n\n\fAgainst Edges: Function Approximation with Multiple Suppon Maps \n\n395 \n\nFinally, we ran our system on the finite-lifetime, transparent random dot stimulus \ndescribed in Section 2. Since our approach recovers a global motion estimate for each \nregion in each frame, we do not need to build explicit pixel-to-pixel correspondences \nover long sequences. We used two populations of random dots, one undergoing a \nlooming motion and one a rightward shift. After each frame 10% of the dots died \noff and randomly moved to a new point on the 3-D surface. Ten 128x128 frames \nwere rendered using perspective projection; the first is shown in Figure 3(c) \n\nWe applied our method independently to each trio of successive frames, and in each \ncase two approximations were found to account for the motion information in the \nscene. Figure 3(d) shows the parameters recovered for the looming motion. Similar \nresults were found for the translating motion, except that the Tx parameter was \nnonzero rather than Tz \u2022 Since the recovered estimates were consistent, we would \nbe able to decrease the overall uncertainty by averaging the parameter values over \nsuccessive frames. \n\nReferences \n\nGeman, S., and Geman, D. (1984) Stochastic relaxation, Gibbs distribution, and \nBayesian restoration of images. Trans. Pattern Anal. Machine Intell. 6:721-741. \n\nPoggio, T., Torre, V., and Koch, C. (1985) Computational vision and regularization \ntheory. Nature 317(26). \n\nTerzopoulos, D. (1988) The computation of visible surface representations. IEEE \nTrans. Pattern Anal. Machine Intel/. 10:4. \n\nGeiger, D., and Girosi, F. (1991) Parallel and deterministic algorithms from MRF's: \nsurface reconstruction. Trans. Pattern Anal. Machine Intell. 13:401-412. \n\nBlake, A. and Zisserman, A. (1987) Visual Reconstruction; MIT Press, Cambridge, \nMA. \n\nHarris J., Koch, C., Staats, E., and Luo, J. (1990) Analog hardware for detecting \ndiscontinuities in early vision Inti. 1. Computer Vision 4:211-233. \n\nHusain, M., Treue, S., and Andersen, R. A. (1989) Surface interpolation in three(cid:173)\ndimensional structure-from-motion perception. Neural Computation 1:324-333. \n\nMeer, P., Mintz, D., and Rosenfeld, A. (1991) Robust regression methods for com(cid:173)\nputer vision: A review. Inti. 1. Computer Vision; 6:60-70. \n\nLi, G. (1985) Robust Regression. In D.C. Hoaglin, F. Mosteller and J.W. Tukey \n(Eds.) Exploring Data, Tables, Trends and Shapes: John Wiley & Sons, N.Y. \n\nDarrell, T ., Sclaroff, S., and Pentland, A. P. (1990) Segmentation by minimal de(cid:173)\nscription. Proc. IEEE 3nd Inti. Con! Computer Vision; Osaka, Japan. \n\nHorn, B.K.P., and Weldon, E.J. (1988) Direct methods for recovering motion. Inti. \n1. Computer Vision 2:51-76. \n\n\f", "award": [], "sourceid": 462, "authors": [{"given_name": "Trevor", "family_name": "Darrell", "institution": null}, {"given_name": "Alex", "family_name": "Pentland", "institution": null}]}