{"title": "Grouping with Bias", "book": "Advances in Neural Information Processing Systems", "page_first": 1327, "page_last": 1334, "abstract": "", "full_text": "Grouping with Bias\n\nStella X. Yu\n\nRobotics Institute\n\nCarnegie Mellon University\n\nCenter for the Neural Basis of Cognition\n\nPittsburgh, PA 15213-3890\n\nstella. yu@es. emu. edu\n\nJianbo Shi\n\nRobotics Institute\n\nCarnegie Mellon University\n\n5000 Forbes Ave\n\nPittsburgh, PA 15213-3890\n\njshi@es.emu.edu\n\nAbstract\n\nWith the optimization of pattern discrimination as a goal, graph\npartitioning approaches often lack the capability to integrate prior\nknowledge to guide grouping.\nIn this paper, we consider priors\nfrom unitary generative models, partially labeled data and spatial\nattention. These priors are modelled as constraints in the solution\nspace. By imposing uniformity condition on the constraints, we\nrestrict the feasible space to one of smooth solutions. A subspace\nprojection method is developed to solve this constrained eigenprob(cid:173)\nlema We demonstrate that simple priors can greatly improve image\nsegmentation results.\n\n1 \" Introduction\n\nGrouping is often thought of as the process of finding intrinsic clusters or group\nstructures within a data set.\nIn image segmentation, it means finding objects or\nobject segments by clustering pixels and segregating them from background.\nIt\nis often considered a bottom-up process. Although never explicitly stated, higher\nlevel of knowledge, such as familiar object shapes, is to be used only in a separate\npost-processing step.\nThe need for the integration of prior knowledge arises in a number of applications. In\ncomputer vision, we would like image segmentation to correspond directly to object\nsegmentation.\nIn data clustering, if users provide a few examples of clusters, we\nwould like a system to adapt the grouping process to achieve the desired properties.\nIn this case, there is an intimate connection to learning classification with partially\nlabeled data.\nWe show in this paper that it is possible to integrate both bottom-up and top-down\ninformation in a single grouping process. In the proposed method, the bottom-up\ngrouping process is modelled as a graph partitioning [1, 4, 12, 11, 14, 15] prob(cid:173)\nlem, and the top-down knowledge is encoded as constraints on the solution space.\nThough we consider normalized cuts criteria in particular, similar derivation can be\ndeveloped for other graph partitioning criteria as well. We show that it leads to a\nconstrained eigenvalue problem, where the global optimal solution can be obtained\nby eigendecomposition. Our model is expanded in detail in Section 2. Results and\nconclusions are given in Section 3.\n\n\f2 Model\n\nIn graph theoretic methods for grouping, a relational graph GA == (V, E, W) is first\nconstructed based on pairwise similarity between two elements. Associated with\nis weight Wij , characterizing their likelihood\nthe graph edge between vertex i and j\nof belonging in the same group.\nFor image segmentation, pixels are taken as graph nodes, and pairwise pixel simi(cid:173)\nlarity can be evaluated based on a number of low level grouping cues. Fig. Ic shows\none possible defini~ion, where the weight b.etween two pixels is inversely proportional\nto the magnitude of the strongest intervening edge [9].\n\na)Image.\n\nd)NCuts.\n\ne)Segmentation.\n\nFigure 1: Segmentation by graph partitioning.\na)200 x 129 image with a few pixels\nmarked(+). b)Edge map extracted using quadrature filters.c)Local affinity fields of\nmarked pixels superimposed together. For every marked pixel, we compute its affinity\nwith its neighbours within a radius of 10. The value is determined by a Gaussian function\nof the maximum magnitude of edges crossing the straight line connecting the two pixels\n[9]. When there is a strong edge separating the two, the affinity is low. Darker intensities\nmean larger values. d)Solution by graph partitioning. It is the second eigenvector from\nnormalized cuts [15] on the affinity matrix.\nIt assigns a value to each pixel. Pixels of\nsimilar values belong to the same group. e)Segmentation by thresholding the eigenvector\nwith o. This gives a bipartitioning of the image which corresponds to the best cuts that\nhave maximum within-region coherence and between-region distinction.\n\nAfter an image is transcribed into a graph, image segmentation becomes a vertex\npartitioning problem. Consider segmenting an image into foreground and back(cid:173)\nground. This corresponds to vertex bipartitioning (VI, V2 ) on graph G, where\nV = VI U V2 and VI n V2 = 0. A good segmentation seeks a partitioning such\nthat nodes within partitions are tightly connected and nodes across partitions are\nloosely connected. A number of criteria have been proposed to achieve this goal.\nFor normalized cuts [15], the solution is given by some eigenvector of weight matrix\nW (Fig. Id). Thresholding on it leads to a discrete segmentation (Fig. Ie). W.hile\nwe will focus on normalized cuts criteria [15], most of the following discussions apply\nto other criteria as well.\n\n2.1 Biased grouping as constrained optimization\n\nKnowledge other than the image itself can greatly change the segmentation we might\nobtain based on such low level cues. Rather than seeing boundaries between black\nand white regions, we see objects. The sources of priors we consider in this paper\nare: unitary generative models (Fig. 2a), which could arise from sensor models\nin MRF [5], partial grouping (Fig. 2b), which could arise from human computer\ninteraction [8], and spatial attention (Fig. 2c). All of these provide additional, often\nlong-range, binding information for grouping.\nWe model such prior knowledge in the form of constraints on a valid grouping\nconfiguration. In particular, we see that all such prior knowledge defines a partial\n\n\fa)Bright foreground.\n\nb)Partial grouping.\n\nc)Spatial attention.\n\nFigure 2: Examples of priors considered in this paper. a)Local constraints from unitary\ngenerative models. In this case, pixels of light (dark) intensities are likely to be the fore(cid:173)\nground(background). This prior knowledge is helpful not only for identifying the tiger as\nthe foreground, but also for perceiving the river as one piece. How can we incorporate\nthese unitary constraints into a- graph that handles only pairwise relationships between\npixels? b )Global configuration constraints from partial grouping a priori.\nIn this case,\nwe have manually selected two sets of pixels to be grouped together in foreground (+)\nand background (JJ.) respectively. They are distributed across the image and often have\ndistinct local features. How can we force them to be in the same group and further bring\nsimilar pixels along and push dissimilar pixels apart? c)Global constraints from spatial\nattention. We move our eyes to the place of most interest and then devote our limited\nvisual processing to it. The complicated scene structures in the periphery can thus be\nignored while sparing the parts associated with the object at fovea. How can we use this\ninformation to facilitate figural popout in segmentation?\n\ngrouping solution, indicating which set of pixels should belong to one partition.\nLet Hz, 1 == 1\"\" ,n, denote a partial grouping. Ht have pixels known to be in\nVt , t == 1,2. These sets are derived as follows.\nUnitary generative models: H l and H 2 contains a set of pixels that satisfy the\nunitary generative models for foreground and background respectively. For example,\nin Fig. 2a, H l (H2 ) contains pixels of brightest(darkest) intensities.\nPartial grouping: Each Hz, 1== 1, ... ,n, contains a set of pixels that users specify to\nbelong together. The relationships between Hz, 1> 2 and Vt , t == 1,2 are indefinite.\nSpatial attention: H l == 0 and H 2 contains pixels randomly selected outside the\nvisual fovea, since we want to maintain maximum discrimination at the -fovea but\nmerging pixels far away from the fovea to be one group.\nTo formulate these constraints induced on the graph partitioning, we introduce\nbinary group indicators X == [Xl, X 2 ]. Let N == IVI be the number of nodes in the\ngraph. For t == 1,2, X t is an N x 1 vector where Xt(k) == 1 if vertex k E Vt and 0\notherwise. The constrained grouping problem can be formally written as:\n\n\u20ac(X l ,X2 )\n\nmin\ns.t. Xt(i) == Xt(j), i, j E HE, 1== 1\"\" ,n, t == 1,2,\n\nXt(i) =1= Xt(j), i E H l ,\n\nj E H2 , t == 1,2,\n\nwhere \u20ac(X 1 ,X2 ) is some graph partitioning cost function, such as minimum cuts\n[6], average cuts [7], or normalized cuts [15]. The first set of constraints can be\nfor some column k, Uik == 1,\nre-written in matrix form: UT X == 0 , where, e.g.\nUjk == ~1. We search for the optimal solution only in the feasible set determined\nby all the constraints.\n\n2.2 Conditions on grouping constraints\n\nThe above formulation can be implemented by the maximum-flow algorithm for\nminimum cuts criteria [6, 13, 3], where two special nodes called source and sink are\n\n\fintroduced,.with infinite weights set up between nodes in HI (H2 ) and source(sink).\nIn the context of learning from labeled and unlabeled data, the biased mincuts\nare linked to minimizing leave-one-out cross validation [2].\nIn the normalize cuts\nformulation, this leads to a constrained eigenvalue problem, as soon to be seen.\nHowever, simply forcing a few nodes to be in the same group can produce some\nundesirable graph partitioning results, illustrated in Fig. 3. Without bias, the data\npoints are naturally first organized into top and bottom groups, and then subdivided\ninto left and right halves (Fig. 3a). When we assign points from top and bottom\nclusters to be together, we do not just want one of the groups to lose its labeled\npoint to the other group (Fig. 3b), but rather we desire the biased grouping process\nto explore their neighbouring connections and change the organization to left and\nright division accordingly.\n\nLarger Cut\n\na\na\na\na\na\na\n\na\na\na\na\na\na\n\na\na\na\na\na\na\n\na\na\na\na\na\na\n\na\na\na\na\na\na\n\na\na\na\na\na\na\n\na\na\na\na\na\na\n\na\na\na\na\na\na\n\na\na\na\na\na\na\n\na\na\na\na\na\na\n\na\na\na\n\n0\na\na\n\na\na\na\na\na\n\n0\n\na\na\na\na\na\na\n\na\na\na\na\na\na\n\na\na\n\n0\na\na\na\n\na\na\na\na\na\na\n\na\na\n\n0\na\na\na\n\na\na\na\na\n\n0\na\n\na\na\na\na\na\na\n\na\na\na\na\na\na\n\na\na\n\n0\na\na\na\n\na\na\na\na\na\na\n\na\na\n\n0\na\na\na\n\na\na\na\na\na\na\n\nMin\nCut\n\na\na\n\n0\na\na\n\na\na\na\na\na\na\n\na\na\n\n0\na\na\n\na\na\n\n0\n\n0\na\na\n\na\na\n\n0\na\na\n\na\na\n\n0\na\na\n\na\na\n\nDesired Cut\na\na\na\na\na\na\n\na\na\na\na\na\na\n\n0\na\na\n\nI\n\na\na\n\n0\na\na\n\nII a\na\na\na\na\na\na\na\na\na\na\n\na\na\na\na\na\na\n\na\na\na\na\na\na\n\na\na\na\na\na\n\na\na\na\na\na\n\na\na\na\na\na\na\n\na\na\na\na\na\n\na\na\na\na\na\na\na\na\na\na\n.A. a\n\na\na\na\na\na\n\na\na\na\na\na\n\na\na\na\na\na\na\n\na\na\na\na\na\n\nPerturbed\nMin\nCut\n\na)No bias.\n\nb)With bias.\n\nFigure 3: Undesired grouping caused by simple grouping constraints. a)Data points\nare distributed in four groups, with a larger spatial gap between top and bottom\ngroups than that between left and right groups. Defining weights based on prox(cid:173)\nimity, we find the top-bottom grouping as the optimal bisection. b)Introduce two\npairs of filled nodes to be together. Each pair has one point from the top and the\nother from the bottom group. The desired partitioning should now be the left-right\ndivision. However, perturbation on the unconstrained optimal cut can lead to a\npartitioning that satisfies the constraints while producing the smallest cut cost.\n\nThe desire of propagating partial grouping information on the constrained nodes\nis, however, not reflected in the constrained partitioning criterion itself. Often,\na slightly perturbed version of the optimal unbiased cut becomes the legitimate\noptimum. One reason for such a solution being undesirable is that some of the\n\"perturbed\" nodes-are isolated from their close neighbours.\nTo fix this problem, we introduce the notion of uniformity of a graph partitioning.\nIntuitively, if two labeled nodes, i and j, have similar connections to their neigh(cid:173)\nbours, we desire a cut to treat them fairly so that if i gets grouped with i's friends,\nj also gets grouped with j's friends (Fig. 3b). This uniformity condition is one way\nto propagate prior grouping information from labeled nodes to their neighbours.\nFor normalized cuts criteria, we define the normalized cuts of a single node to be\n\nNC t\n\n( \u00b7.X)- EXt(k)=I=Xt(i),YtWik\n\nu s ~,-\n\nD..\nn\n\n.\n\nThis value is high for a node isolated from its close neighbours in partitioning X.\n\n\fWe may not know in advance what this value is for the optimal partitioning, but\nwe desire this value to be the same for any pair of nodes preassigned together:\n\nNCuts(i;X) == NCuts(j;X), \\li,j E Hz, l == 1,\u00b7\u00b7\u00b7 ,no\n\nWhile this condition does not force NCuts(i; X) to be small for each labeled node,\nit is unlikely for all of them to have a large value while producing the minimum\nNCuts for the global bisection. Similar measures can be defined for other criteria.\nIn Fig.\n4, we show that the uniformity condition on the bias helps preserving\nthe smoothness of solutions at every labeled point. Such smoothing is necessary\nespecially when partially labeled data are scarce.\n\n0.5\n\n-0.5\n\n0.5\n\n:-....~ ....-....\n_o5A~,S/\n\n-1 [l...--V_-_l...--_-----'---_----'l\n300\n\no\n\n-1[\u00b7.....\n\n0\n\n100\n\nI\n300\n\na)Point set data.\n\nb)Simple bias.\n\nc) Conditioned bias.\n\n0.5\n\n-0.5\n\n-1\n\n0.5\n\no\n\n0\n\no\n\n-0.2\n\n-0.4\n\n-1\n\nd)NCuts wlo bias.\n\ne)NCuts wi bias b).\n\nf)NCuts wi bias c).\n\n100\n\n100\n\nFigure 4: Condition constraints with uniformity. a)Data consist of three strips, with\n100 points each, numbered from left toright. Two points from the side strips are ran-\ndomly chosen to be pre-assigned together. b)Simple constraint UT X == 0 forces any\nfeasible solution to have equal valuation on the two marked points. c)Conditioned\nconstraint UTpX == o. Note that now we cannot tell which points are biased. We\ncompute W using Gaussian function of distance with u == 3. d) Segmentation with(cid:173)\nout bias gives three separate groups. e)Segmentation with simple bias not only fails\nto glue the two side strips into one, but also has two marked points isolated from\ntheir neighbours. f)Segmentaion with conditioned bias brings two side strips into\none group. See the definition of P below.\n\n2.3 COlllpntation: subspace projection\n\nTo develop a computational solution for the c9nstrained optimization problem, we\nintroduce some notations. Let the degree matrix D be a diagonal matrix, D ii ==\nEk Wik, \\Ii. Let P == D-IW be the normalized weight matrix. It is a transition\nprobability matrix for nonnegative weight matrix W [10]. Let a == xI~~l be\nthe degree ratio of VI, where 1 is the vector of ones. We define a new variable\nx == (1 - a)XI\n- aX2 \u2022 We can show that for normalized cuts, the biased grouping\nwith the uniformity condition is translated into:\n\n.\n\nmIn E(X) ==\n\nxT(D-W)x\n\nT\n\n' s.t. U Px == o.\n\nTD\n\nx\n\nx\n\nNote, we have dropped the constraint Xt(i) =1= Xt(j), i E HI, j E H 2 , t == 1,2.\nUsing Lagrange multipliers, we find that the optimal solution x* satisfies:\n\n\fQPx* == AX*, E(X*) == 1 - A,\n\nwhere Q is a projector onto the feasible solution space:\n\nQ == I - D-1V(VTD-1V)-lVT, V == pTU.\n\nHere we assume that the conditioned constraint V is of full rank, thus V TD-1V is\ninvertible. Since 1 is still the trivial 'Solution corresponding to the largest eigenvalue\nof 1, the second leading right eigenvector of the matrix QP is the solution we seek.\nTo summarize, given weight matrix W, partial grouping in matrix form UT x == 0,\nwe do the following to find the optimal bipartitioning:\nStep 1: Compute degree matrix D, D ii == E j Wij , Vi.\nStep 2: Compute normalized matrix P == D-1W.\nStep 3: Compute conditioned constraint V == pTU.\nStep 4: Compute projected weight matrix W == QP==p-n-1V(VTn-1V)-lVTp.\nStep 5: Compute the second largest eigenvector x*: Wx* == AX*.\nStep 6: Threshold x* to get a discrete segmentation.\n\n3 Results and conclusions\n\nWe apply our method to the images in Fig. 2. For all the examples, we compute\npixel affinity W as in Fig. 1. All the segmentation results are obtained by thresh(cid:173)\nolding the eigenvectors using their mean values. The results without bias, with\nsimple bias U T x == 0 and conditioned bias UT Px == 0 are compared in Fig. 5, 6, 7.\n\nb)Prior.\n\nc)NCuts' on W.\n\nd)Seg. wlo bias.\n\ne)Simple bias.\n\nf)Seg. on e)\n\ng)Conditioned bias.\n\nh)Seg. w/ bias.\n\nFigure 5: Segmentation with bias from unitary generative models. a)Edge map of the\n100 x 150 image. N = 15000. b)We randomly sample 17 brightest pixels for HI (+),48\ndarkest pixels for H2 (~), producing 63 constraints in total. c) and d) show the solution\nwithout bias.\nIt picks up the optimal bisection based on intensity distribution. e) and\nf) show the solution with simple bias. The labeled nodes have an uneven influence on\ngrouping. g) and h) show the solution with conditioned bias. It successfully breaks the\nimage into tiger and river as our general impression of this image. The computation for\nthe three cases takes 11, 9 and 91ms respectively.\n\nPrior knowledge is particularly useful in supplying long-range grouping information\nwhich often lacks in data grouping based on low level cues. With our model, the\npartial grouping prior can be integrated into the bottom-up grouping framework by\nseeking the optimal solution in a restricted domain. We show that the uniformity\nconstraint is effective in eliminating spurious solutions resulting from simple per(cid:173)\nturbation on the optimal unbiased solutions. Segmentation from the discretization\nof the continuous eigenvectors also becomes trivial.\n\n\fe)Simple bias.\n\nf)Seg. on e)\n\ng)Conditioned bias.\n\nh)Seg. w / bias.\n\nFigure 6: Segmentation with bias from hand-labeled partial grouping. a)Edge map of\nthe 80 x 82 image. N = 6560. b)Hand-labeled partial grouping includes 21 pixels for HI\n(+), 31 pixels for H 2 (A), producing 50 constraints in total. c) and d) show the solution\nwithout bias. It favors a few largest nearby pieces of similar intensity. e) and f) show the\nsolution with simple bias. Labeled pixels in cluttered contexts are poor at binding their\nsegments together. g) and h) show the solution with conditioned bias. It successfully pops\nout the pumpkin made of many small intensity patches. The computation for the three\ncases takes 5, 5 and 71ms respectively.\n\nf)4th eig. b)\n\ng)6th eig. b)\n\nh)4th eig. d)\n\ni)6th eig. d)\n\nj)8th eig. d)\n\nFigure 7: Segmentation with bias from spatial attention. N = 25800. a)We randomly\nchoose 86 pixels far away from the fovea (Fig. 2c) for H 2 (A), producing 85 constraints.\nIt is similar to the solution without bias\nb) and c) show the solution with simple bias.\n(Fig. 1). d) and e) show the solution with conditioned bias.\nIt ignores the variation\nin the background scene, which includes not only large pieces of constant intensity, but\nalso many small segments of various intensities. The foreground .successfully clips out the\nhuman figure.\nf) and g) are two subsequent eigenvectors with simple bias. h), i) and\nj.) are those with conditioned bias. There is still a lot of structural organization in the\nformer, but almost none in the latter. This greatly simplifies the task we face when seeking\na segmentation from the continuous eigensolution. The computation for the three cases\ntakes 16, 25 and 220ms respectively.\n\n\fAll these benefits come at a computational cost that is 10 times that for the original\nunbiased grouping problem. We note that we can also impose both UT x == 0 and\nUT Px == 0, or even UT pBX == 0, S > 1. Little improvement is observed in our\nexamples.' Since projected weight matrix W becomes denser, the computation slows\ndown. We hope that this problem can be alleviated by using multi-scale techniques.\nIt remains open for future research.\n\nAcknowledgelllents\n\nThis research is supported by (DARPA HumanID) ONR NOOOI4-00-1-091~and\nNSF IRI-9817496.\n\nReferences\n\n[1] A. Amir\u00b7 and M. Lindenbaum. Quantitative analysis of grouping process.\n\nEuropean Conference on Computer Vision, pages 371-84, 1996.\n\nIn\n\n[2] A. Blum and S. Chawla. Learning from labeled and unlabeled data using graph\n\nmincuts, 2001.\n\n[3] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization\n\nvia graph cuts. In International Conference on Computer Vision, 1999.\n\n[4] Y. Gdalyahu, D. Weinshall, and M. Werman. A randpmized algorithm for\npairwise clustering. ill Neural Information Processing Systems, pages 424-30,\n1998.\n\n[5] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the\nBayesian restoration of images. IEEE Transactions on Pattern Analysis and\nMachine Intelligence, 6(6):721-41, 1984.\n\n[6] H. Ishikawa and D. Geiger. Segmentation by grouping junctions.\n\nConference on Computer Vision and Pattern Recognition, 1998.\n\nIn IEEE\n\n[7] I. H. Jermyn and H. Ishikawa. Globally optimal regions and boundaries.\n\nInternational Conference on Computer Vision, 1999.\n\nIn\n\n[8] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models.\n\nInternational Journal of Computer Vision, pages 321-331, 1988.\n\n[9] J. Malik, S. Belongie, T. Leung, and J. Shi. Contour and texture analysis for\n\nimage segmentation. International Journal of Computer Vision, 2001.\n\n[10] M. Meila and J. Shi. Learning segmentation with random walk.\n\nInformation Processing Systems, 2001.\n\nill Neural\n\n[11] P. Perona and W. Freeman. A factorization approach to grouping. In European\n\nConference on Computer Vision, pages 655-70, 1998.\n\n[12] J. Puzicha, T. Hofmann, and J. Buhmann. Unsupervised texture segmenta(cid:173)\nIEEE Transactions on Pattern\n\ntion in a deterministic annealing framework.\nAnalysis and Machine Intelligence, 20(8):803-18, 1998.\n\n[13] S. Roy and I. J. Cox. A maximum-flow formulation of then-camera stereo\nIn International Conference on Computer Vision,\n\ncorrespondence problem.\n1998.\n\n[14] E. Sharon, A. Brandt, and R. Basri. Fast multiscale image segmentation. In\nIEEE Conference on Computer Vision and Pattern Recognition, pages 70-7,\n2000.\n\n[15] J. Shi and J. Malik. Normalized cuts and image segmentation. In IEEE Con(cid:173)\nference on Computer Vision and Pattern Recognition, pages 731-7, June 1997.\n\n\f", "award": [], "sourceid": 2013, "authors": [{"given_name": "Stella", "family_name": "Yu", "institution": null}, {"given_name": "Jianbo", "family_name": "Shi", "institution": null}]}