{"title": "Segmentation as Maximum-Weight Independent Set", "book": "Advances in Neural Information Processing Systems", "page_first": 307, "page_last": 315, "abstract": "Given an ensemble of distinct, low-level segmentations of an image, our goal is to identify visually meaningful\" segments in the ensemble. Knowledge about any specific objects and surfaces present in the image is not available. The selection of image regions occupied by objects is formalized as the maximum-weight independent set (MWIS) problem. MWIS is the heaviest subset of mutually non-adjacent nodes of an attributed graph. We construct such a graph from all segments in the ensemble. Then, MWIS selects maximally distinctive segments that together partition the image. A new MWIS algorithm is presented. The algorithm seeks a solution directly in the discrete domain, instead of relaxing MWIS to a continuous problem, as common in previous work. It iteratively finds a candidate discrete solution of the Taylor series expansion of the original MWIS objective function around the previous solution. The algorithm is shown to converge to a maximum. Our empirical evaluation on the benchmark Berkeley segmentation dataset shows that the new algorithm eliminates the need for hand-picking optimal input parameters of the state-of-the-art segmenters, and outperforms their best, manually optimized results.\"", "full_text": "Segmentation as Maximum-Weight Independent Set\n\nWilliam Brendel and Sinisa Todorovic\n\nSchool of Electrical Engineering and Computer Science\n\nOregon State University\n\nCorvallis, OR 97331\n\nbrendelw@onid.orst.edu, sinisa@eecs.oregonstate.edu\n\nAbstract\n\nGiven an ensemble of distinct, low-level segmentations of an image, our goal is to\nidentify visually \u201cmeaningful\u201d segments in the ensemble. Knowledge about any\nspeci\ufb01c objects and surfaces present in the image is not available. The selection of\nimage regions occupied by objects is formalized as the maximum-weight indepen-\ndent set (MWIS) problem. MWIS is the heaviest subset of mutually non-adjacent\nnodes of an attributed graph. We construct such a graph from all segments in\nthe ensemble. Then, MWIS selects maximally distinctive segments that together\npartition the image. A new MWIS algorithm is presented. The algorithm seeks a\nsolution directly in the discrete domain, instead of relaxing MWIS to a continu-\nous problem, as common in previous work. It iteratively \ufb01nds a candidate discrete\nsolution of the Taylor series expansion of the original MWIS objective function\naround the previous solution. The algorithm is shown to converge to an optimum.\nOur empirical evaluation on the benchmark Berkeley segmentation dataset shows\nthat the new algorithm eliminates the need for hand-picking optimal input pa-\nrameters of the state-of-the-art segmenters, and outperforms their best, manually\noptimized results.\n\n1 Introduction\n\nThis paper presents: (1) a new formulation of image segmentation as the maximum-weight indepen-\ndent set (MWIS) problem; and (2) a new algorithm for solving MWIS.\n\nImage segmentation is a fundamental problem, and an area of active research in computer vision\nand machine learning. It seeks to group image pixels into visually \u201cmeaningful\u201d segments, i.e.,\nthose segments that are occupied by objects and other surfaces occurring in the scene. The literature\nabounds with diverse formulations. For example, normalized-cut [1], and dominant set [2] formu-\nlate segmentation as a combinatorial optimization problem on a graph representing image pixels.\n\u201cMeaningful\u201d segments may give rise to modes of the pixels\u2019 probability distribution [3], or min-\nimize the Mumford-Shah energy [4]. Segmentation can also be done by: (i) integrating edge and\nregion detection [5], (ii) learning to detect and close object boundaries [6, 7], and (iii) identifying\nsegments which can be more easily described by their own parts than by other image parts [8, 9, 10].\n\nFrom prior work, we draw the following two hypotheses. First, surfaces of real-world objects are\ntypically made of a unique material, and thus their corresponding segments in the image are char-\nacterized by unique photometric properties, distinct from those of other regions. To capture this\ndistinctiveness, it seems bene\ufb01cial to use more expressive, mid-level image features (e.g., superpix-\nels, regions) which will provide richer visual information for segmentation, rather than start from\npixels. Second, it seems that none of a host of segmentation formulations are able to correctly de-\nlineate every object boundary present. However, an ensemble of distinct segmentations is likely to\ncontain a subset of segments that provides accurate spatial support of object occurrences. Based on\nthese two hypotheses, below, we present a new formulation of image segmentation.\n\n1\n\n\fGiven an ensemble of segments, extracted from the image by a number of different low-level seg-\nmenters, our goal is to select those segments from the ensemble that are distinct, and together par-\ntition the image area. Suppose all segments from the ensemble are represented as nodes of a graph,\nwhere node weights capture the distinctiveness of corresponding segments, and graph edges con-\nnect nodes whose corresponding segments overlap in the image. Then, the selection of maximally\ndistinctive and non-overlapping segments that will partition the image naturally lends itself to the\nmaximum-weight independent set (MWIS) formulation.\n\nThe MWIS problem is to \ufb01nd the heaviest subset of mutually non-adjacent nodes of an attributed\ngraph. It is a well-researched combinatorial optimization problem that arises in many applications.\nIt is known to be NP-hard, and hard to approximate [11]. Numerous heuristic approaches exist.\nFor example, iterated tabu search [12] and branch-and-price [13] use a trial-and-error, greedy search\nin the space of possible solutions, with an optimistic complexity estimate of O(n3), where n is\nthe number of nodes in the graph. The message passing [14] relaxes MWIS into a linear program\n(LP), and solves it using loopy belief propagation with no guarantees of convergence for general\ngraphs; the \u201ctightness\u201d of this relaxation holds only for bipartite graphs [15]. The semi-de\ufb01nite\nprogramming formulation of MWIS [16] provides an upper bound of the sum of weights of all\nindependent nodes in MWIS. However, this is done by reformulating MWIS as a large LP of a new\ngraph with n2 nodes, which is unsuitable for large-scale problems as ours. Finally, the replicator\ndynamics [17, 18] converts the original graph into its complement, and solves MWIS as a continuous\nrelaxation of the maximum weight clique (MWC) problem. But in some domains, including ours,\nimportant hard constraints captured by edges of the original graph may be lost in this conversion.\n\n(t+1)=y\n\n(t)+\u03b7( \u02dcx\u2212y\n\n(t)}t=1,2,..., de\ufb01ned in the continuous domain, y\n\n(t+1)= \u02dcx; else, the algorithm visits the interpolation point, y\n\nIn this paper, we present a new MWIS algorithm, which represents a \ufb01xed-point iteration, guaran-\nteed to converge to an optimum. It goes back and forth between the discrete and continuous domains.\nIt visits a sequence of points {y\n(t)\u2208[0, 1]n. Around\neach of these points, the algorithm tries to maximize the objective function of MWIS in the discrete\ndomain. Each iteration consists of two steps. First, we use the Taylor expansion to approximate\nthe objective function around y\n(t). Maximization in the discrete domain of the approximation gives\na candidate discrete solution, \u02dcx\u2208{0, 1}n. Second, if \u02dcx increases the original objective, then this\ncandidate is taken as the current solution \u02dcx, and the algorithm visits that point in the next iteration,\n(t)), which can\ny\nbe shown to be a local maximizer of the original objective for a suitably chosen \u03b7. The algorithm\nalways improves the objective, \ufb01nally converging to a maximum. For non-convex objective func-\ntions, our method tends to pass either through or near discrete solutions, and the best discrete one\n\u2217 encountered along the path is returned. Our algorithm has relatively low complexity, O(|E|),\nx\nwhere, in our case, |E| \u226a n2 is the number of edges in the graph, and converges in only a few steps.\nContributions: To the best of our knowledge, this paper presents the \ufb01rst formulation of image\nsegmentation as MWIS. We derive a new MWIS algorithm that has low complexity, and prove that\nit converges to a maximum. Selecting segments from an ensemble so they cover the entire image\nand minimize a total energy has been used for supervised object segmentation [19]. They estimate\n\u201cgood\u201d segments by using classi\ufb01ers of a pre-selected number of object classes. In contrast, our\ninput, and our approach are genuinely low-level, i.e., agnostic about any particular objects in the\nimage. Our MWIS algorithm has lower complexity, and is arguably easier to implement than the\ndual decomposition they use for energy minimization. Our segmentation outperforms the state of the\nart on the benchmark Berkeley segmentation dataset, and our MWIS algorithm runs faster and yields\non average more accurate solutions on benchmark datasets than other existing MWIS algorithms.\nOverview: Our approach consists of the following steps (see Fig.1). Step 1: The image is segmented\nusing a number of different, off-the-shelf, low-level segmenters, including meanshift [3], Ncuts [1],\nand gPb-OWT-UCM [7]. Since the right scale at which objects occur in the image is unknown, each\nof these segmentations is conducted at an exhaustive range of scales. Step 2: The resulting segments\nare represented as nodes of a graph whose edges connect only those segments that (partially) overlap\nin the image. A small overlap between two segments, relative to their area, may be ignored, for\nrobustness. A weight is associated with each node capturing the distinctiveness of the corresponding\nsegment from the others. Step 3: We \ufb01nd the MWIS of this graph. Step 4: The segments selected in\nthe MWIS may not be able to cover the entire image, or may slightly overlap (holes and overlaps are\nmarked red in Fig.1). The \ufb01nal segmentation is obtained by using standard morphological operators\non region boundaries to eliminate these holes and overlaps. Note that there is no need for Step 4 if\n\n2\n\n\f(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 1: Our main steps: (a) Input segments extracted at multiple scales by different segmenta-\ntion algorithms; (b) Constructing a graph of all segments, and \ufb01nding its MWIS (marked green);\n(c) Segments selected by our MWIS algorithm (red areas indicate overlaps and holes); (d) Final\nsegmentation after region-boundary re\ufb01nement (actual result using Meanshift and NCuts as input).\n\nthe input low-level segmentation is strictly hierarchical, as gPb-OWT-UCM [7]. The same holds if\nwe added the intersections of all input segments to the input ensemble, as in [19], because our MWIS\nalgorithm will continue selecting non-overlapping segments until the entire image is covered.\nPaper Organization: Sec. 2 formulates MWIS, and presents our MWIS algorithm and its theoret-\nical analysis. Sec. 3 formulates image segmentation as MWIS, and describes how to construct the\nsegmentation graph. Sec. 4 and Sec. 5 present our experimental evaluation and conclusions.\n\n2 MWIS Formulation and Our Algorithm\n\nConsider a graph G = (V, E, \u03c9), where V and E are the sets of nodes and undirected edges, with\ncardinalities |V |=n and |E|, and \u03c9 : V \u2192R+ associates positive weights wi to every node i \u2208 V ,\ni=1, . . ., n. A subset of V can be represented by an indicator vector x=(xi)\u2208{0, 1}n, where xi=1\nmeans that i is in the subset, and xi=0 means that i is not in the subset. A subset x is called an\nindependent set if no two nodes in the subset are connected by an edge, \u2200(i, j)\u2208E : xixj=0. We\nare interested in \ufb01nding a maximum-weight independent set (MWIS), denoted as x\n\u2217. MWIS can be\nnaturally posed as the following integer program (IP):\n\nIP: x\n\n\u2217 = argmaxx w\n\nT\n\nx,\n\ns.t. \u2200i \u2208 V : xi \u2208 {0, 1}, and \u2200(i, j)\u2208E: xixj = 0\n\n(1)\n\nThe non-adjacency constraint in (1) can be equivalently formalized asP(i,j)\u2208E xixj =0. The latter\n\nTAx=0, where A=(Aij) is the adjacency\nexpression can be written as a quadratic constraint, x\nmatrix, with Aij=1 if (i, j)\u2208E, and Aij=0 if (i, j) /\u2208E. Consequently, IP can be reformulated as\nthe following integer quadratic program (IQP):\n\n\u2217 = argmaxx w\n\nx\n\nT\n\nx,\n\ns.t. \u2200i \u2208 V : xi \u2208 {0, 1}, x\n\nTAx = 0\n\n(2)\nwhere there exists a positive regularization parameter \u03b1>0 such that the problem on the implication\nin (2) holds. Next, we present our new algorithm for solving MWIS.\n\nIQP: x\n\n\u21d2\n\u2203\u03b1\u2208R\n\n\u2217 = argmaxx[w\n\n2 \u03b1x\ns.t. \u2200i \u2208 V : xi \u2208 {0, 1}\n\nx \u2212 1\n\nT\n\nTAx]\n\n2.1 The Algorithm\n\nAs reviewed in Sec. 1, to solve IQP in (2), the integer constraint is usually either ignored, or relaxed\nto a continuous QP, e.g., by \u2200i\u2208V : xi\u22650 and kxk =1. For example, when \u21131 norm is used as\nrelaxation, the solution x\n\u2217 of (2) can be found using the replicator dynamics in the continuous\ndomain [17]. Also, when only \u2200i\u2208V : xi\u22650 is used as relaxation, then the IP of (1) can be solved via\nmessage passing [14]. Usually, the solution found in the continuous domain is binarized to obtain\na discrete solution. This may lead to errors, especially if the relaxed QP is nonconvex [20]. In this\npaper, we present a new MWIS algorithm that iteratively seeks a solution directly in the discrete\ndomain. A discrete solution is computed by maximizing the \ufb01rst-order Taylor series approximation\n\n3\n\n\fof the quadratic objective in (2) around a solution found in the previous iteration. This is similar\nto the method of [20], which, however, makes the restrictive assumptions that the matrix of the\nquadratic term (analog of our A) is \u201cclose\u201d to positive-semi-de\ufb01nite (PSD), or that it is rank-1 with\nnon-negative elements. These assumptions are not suitable for image segmentation. Graduated\nassignment [21] also iteratively maximizes a Taylor series expansion of a continuous QP around the\nprevious solution; but this is done in the continuous domain. Since A in (2) is not PSD, our algorithm\nguarantees convergence only to a local maximum, as most state-of-the-art MWIS algorithms [12, 13,\n14, 17, 18]. Below, we describe the main steps of our MWIS algorithm.\n\nT\n\n2 \u03b1x\n\nx \u2212 1\n\nTAx denote the objective function of IQP in (2). Also, in our notation,\nLet f (x) = w\n\u2217 \u2208 {0, 1}n denote a point, candidate solution, and solution, respectively, in the discrete\nx, \u02dcx, x\ndomain; and y \u2208 [0, 1]n denotes a point in the continuous domain. Our algorithm is a \ufb01xed-point\niteration that solves a sequence of integer programs which are convex approximations of f , around\na solution found in the previous iteration. The key intuition is that the approximations are simpler\nfunctions than f , and thus facilitate computing the candidate discrete solutions in each iteration. The\nalgorithm increases f in every iteration until convergence.\n\nOur algorithm visits a sequence of continuous points {y\n(t) \u2208 [0, 1]n, in itera-\ntions t = 1, 2, . . . , and \ufb01nds discrete candidate solutions \u02dcx \u2208 {0, 1}n in their respective neighbor-\nhoods, until convergence. Each iteration t consists of two steps. First, for any point y \u2208 [0, 1]n in\nthe neighborhood of y\n\n(t), we \ufb01nd the \ufb01rst-order Taylor series approximation of f (y) as\n\n(t), . . . }, y\n\n(1), . . . , y\n\nf (y) \u2248 h(y, y\n\n(t)) = f (y\n\n(t)) + (y \u2212 y\n\n(t))\n\nT\n\n(w \u2212 \u03b1Ay\n\n(t)) = y\n\nT(w \u2212 \u03b1Ay\n\n(t)) + const,\n\n(3)\n\nwhere \u2018const\u2019 does not depend on y. Note that the approximation h(y, y\nsimpler than f (y), which allows us to easily compute a discrete maximizer of h(\u00b7) as\n\n(t)) is convex in y, and\n\n\u02dcx = argmax\nx\u2208{0,1}n\n\nh(x, y\n\n(t)) \u21d4 \u02dcxi =(cid:26) 1\n\n0\n\nif ith element of (w \u2212 \u03b1Ay\n\n,\n, otherwise.\n\n(t))i \u2265 0\n\n(4)\n\nTo avoid the trivial discrete solution, when \u02dcx = 0 we instead set \u02dcx = [0, . . . , 0, 1, 0, . . . , 0]T, with\n\u02dcxi = 1 where i is the index of the minimum element of (w \u2212 \u03b1Ay\nIn the second step of iteration t, the algorithm veri\ufb01es if \u02dcx can be accepted as a new, valid discrete\n(t)). In this case, the\nsolution. This will be possible only if f is non-decreasing, i.e., if f ( \u02dcx)\u2265f (y\nalgorithm visits point y\n(t)), this means that there\n(t) and \u02dcx. We estimate this local\nmust be a local maximum of f in the neighborhood of points y\nmaximizer of f in the continuous domain by linear interpolation, y\n(t)). The\noptimal value of the interpolation parameter \u03b7\u2208[0, 1] is computed such that \u2202f (y\n(t+1))/\u2202\u03b7 \u2265 0,\nwhich ensures that f is non-decreasing in the next iteration. As shown in Sec. 2.2, the optimal \u03b7 has\na closed-form solution:\n\n(t+1)= \u02dcx, in the next iteration. In case f ( \u02dcx)<f (y\n\n(t)+\u03b7( \u02dcx\u2212y\n\n(t+1)=y\n\n(t)).\n\n\u03b7 = min max  (w \u2212 \u03b1Ay\n\n\u03b1( \u02dcx \u2212 y(t))\n\nT\n(t))\nT\n\n( \u02dcx \u2212 y\n\n(t))\nA( \u02dcx \u2212 y(t))\n\n, 0! , 1! .\n\n(5)\n\nHaving computed y\nmation in the neighborhood of point y\nto represent the \ufb01nal solution of MWIS, x\n\n(t+1), the algorithm starts the next iteration by \ufb01nding a Taylor series approxi-\n(t+1). After convergence, the latest discrete solution \u02dcx is taken\n\n\u2217= \u02dcx. Our MWIS algorithm is summarized in Alg. 1\n\n2.2 Theoretical Analysis\n\nThis section presents the proof that our MWIS algorithm converges to a maximum. We also show\nthat its complexity is O(|E|). We begin by stating a lemma that pertains to linear interpolation\ny\n\n(t)) such that the IQP objective function f is non-decreasing at y\n\n(t)+\u03b7( \u02dcx\u2212y\n\n(t+1)=y\n\n(t+1).\n\nLemma 1 Suppose that the IQP objective function f is increasing at point y1 \u2208 [0, 1]n, and de-\ncreasing at point y2 \u2208 [0, 1]n, y1 6= y2. Then, there exists a point, y = y1 + \u03b7(y2 \u2212 y1), and\ny \u2208 [0, 1]n, such that f is increasing at y, where \u03b7 is an interpolation parameter, \u03b7 \u2208 [0, 1].\n\nProof: It is straightforward to show that if \u03b7 \u2208 [0, 1] \u21d2 y \u2208 [0, 1]n. For \u03b7 = 0, we obtain\ny = y1, where f is said to be increasing. For \u03b7 6= 0, y can be found by estimating \u03b7 such\n\n4\n\n\fthat \u2202f(cid:0)y1+\u03b7(y2\u2212y1)(cid:1)/\u2202\u03b7\u22650. It follows: (w\u2212\u03b1Ay1)T(y2\u2212y1)\u2212\u03b7\u03b1(y2\u2212y1)T\n\nDe\ufb01ne auxiliary terms c = (w \u2212 \u03b1Ay1)T(y2 \u2212 y1) and d = \u03b1(y2 \u2212 y1)T\nis not PSD, we obtain \u03b7 \u2264 c\n\u03b7 = min(max( c\nIn the following, we de\ufb01ne the notion of maximum, and prove that Alg. 1 converges to a maximum.\n\nA(y2\u2212y1)\u22650.\nA(y2 \u2212 y1). Since A\nd , for d < 0. Since \u03b7 \u2208 [0, 1], we compute\n\nd , 0), 1), which is equivalent to (5), for y1 = y\n\nd , for d > 0, and \u03b7 \u2265 c\n\n(t) and y2 = \u02dcx. (cid:3)\n\nDe\ufb01nition We refer to point y\nover domain D, g : D \u2192 R, if there exists a neighborhood of y\n\u2200y \u2208 N (y\n\n\u2217 as a maximum of a real, differentiable function g(y), de\ufb01ned\n\u2217) \u2286 D, such that\n\n\u2217) \u2265 g(y).\n\n\u2217) : g(y\n\n\u2217, N (y\n\nProposition 1 Alg. 1 increases f in every iteration, and converges to a maximum.\n\nProof: In iteration t of Alg. 1, if f ( \u02dcx) \u2265 f (y\nThus, f increases in this case. Else, y\n\n(t+1) = y\n\n(t) + \u03b7( \u02dcx \u2212 y\n\n(t)), yielding\n\n(t)) then the next point visited by Alg. 1 is y\n\n(t+1) = \u02dcx.\n\nf (y\n\n(t+1))=f (y\n\n(t))+\u03b7(w\u2212\u03b1Ay\n\nT\n(t))\n\n( \u02dcx\u2212y\n\n(t)) + \u03b72 1\n2\n\n\u03b1( \u02dcx\u2212y\n\nT\n(t))\n\nA( \u02dcx\u2212y\n\n(t)).\n\n(6)\n\nT\n(t))\n\n(t), y\n\nT\n\n(t))\n\n( \u02dcx\u2212y\n\n(t))\u2212h(y\n\n(t))=(w\u2212\u03b1Ay\n\nT1, and f increases in every iteration, then f converges to a maximum. (cid:3)\n\nSince \u02dcx maximizes h, given by (3), we have h( \u02dcx, y\n(t))\u22650.\nAlso, from Lemma 1, \u03b7 is non-negative. Consequently, the second term in (6) is non-negative. Re-\ngarding the third term in (6), from (5) we have \u03b7\u03b1( \u02dcx\u2212y\n(t))\nwhich we have already proved to be non-negative. Thus, f also increases in this second case. Since\nf \u2264 w\nComplexity: Alg. 1 has complexity O(|E|) per iteration. Complexity depends only on a few matrix-\nvector multiplications with A, where each takes O(|E|). This is because A is sparse and binary,\nwhere each element Aij=1 iff (i, j) \u2208 E. Thus, any computation in Alg. 1 pertaining to particular\nnode i\u2208V depends on the number of positive elements in ith row Ai\u00b7, i.e., on the branching factor\nof i. Computing \u02dcx in (4) has complexity O(n), where n < |E|, and thus does not affect the \ufb01nal\ncomplexity. For the special case of balanced graphs, Alg. 1 has complexity O(|E|) = O(n log n).\nIn our experiments, Alg. 1 converges in 5-10 iterations on graphs with about 300 nodes.\n\n(t))=(w\u2212\u03b1Ay\n\nA( \u02dcx\u2212y\n\nT\n(t))\n\n( \u02dcx\u2212y\n\n3 Formulating Segmentation as MWIS\n\n|Si\u2229Sj|\n\nWe formulate image segmentation as the MWIS of a graph of image regions obtained from different\nsegmentations. Below, we explain how to construct this graph. Given a set of all segments, V ,\nextracted from the image by a number of distinct segmenters, we construct a graph, G = (V, E, \u03c9),\nwhere V and E are the sets of nodes and undirected edges, and \u03c9 : V \u2192R+ assigns positive weights\nwi to every node i \u2208 V , i=1, . . ., n. Two nodes i and j are adjacent, (i, j) \u2208 E, if their respective\nsegments Si and Sj overlap in the image, Si \u2229 Sj 6= \u2205. This can be conceptualized by the adjacency\nmatrix A = (Aij), where Aij = 1 iff Si \u2229 Sj 6= \u2205, and Aij = 0 iff Si \u2229 Sj = \u2205. For robustness\nin our experiments, we tolerate a relatively small amount of overlap by setting a tolerance threshold\n\u03b8, such that Aij = 1 if\nmin(|Si|,|Sj|) > \u03b8, and Aij = 0 otherwise. (In our experiments we use\n\u03b8 = 0.2). Note that the IQP in (2) also permits a \u201csoft\u201d de\ufb01nition of A which is beyond our scope.\nThe weights wi should be larger for more \u201cmeaningful\u201d segments Si, so that these segments are\nmore likely included in the MWIS of G. Following the compositionality-based approaches of [8, 9],\nwe de\ufb01ne that a \u201cmeaningful\u201d segment can be easily described in terms of its own parts, but dif\ufb01cult\nto describe via other parts of the image. Note that this de\ufb01nition is suitable for identifying both:\n(i) distinct textures in the image, since texture can be de\ufb01ned as a spatial repetition of elementary\n2D patterns; and (ii) homogeneous regions with smooth variations of brightness. To de\ufb01ne wi,\nwe use the formalism of [8], where the easiness and dif\ufb01culty of describing Si is evaluated by its\ndescription length in terms of visual codewords. Speci\ufb01cally, given a dictionary of visual codewords,\nand the histogram of occurrence of the codewords in Si, we de\ufb01ne wi = |Si|KL(Si, \u00afSi), where KL\ndenotes the Kullback Leibler divergence, I is the input image, and \u00afSi = I\\Si. All the weights w\nare normalized by maxi wi. Below, we explain how to extract the dictionary of codewords.\nSimilar to [22], we describe every pixel with an 11-dimensional descriptor vector consisting of the\nLab colors and \ufb01lter responses of the rotationally invariant, nonlinear MR8 \ufb01lter bank, along with\n\n5\n\n\fthe Laplacian of Gaussian \ufb01lters. The pixel descriptors are then clustered using K-means (with\nK = 100). All pixels grouped within one cluster are labeled with a unique codeword id of that\ncluster. Then, the histogram of their occurrence in every region Si is estimated.\nGiven G, as described in this section, we use our MWIS algorithm to select \u201cmeaningful\u201d segments,\nand thus partition the image. Note that the selected segments will optimally cover the entire image,\notherwise any uncovered image areas will be immediately \ufb01lled out by available segments in V that\ndo not overlap with already selected ones, because this will increase the IQP objective function f .\nIn the case when the input segments do not form a strict hierarchy and intersections of the input\nsegments have not been added to V , we eliminate holes (or \u201csoft\u201d overlaps) between the selected\nsegments by applying the standard morphological operations (e.g., thinning and dilating of regions).\n\n4 Results\n\nThis section presents qualitative and quantitative evaluation of our segmentation on 200 images\nfrom the benchmark Berkeley segmentation dataset (BSD) [23]. BSD images are challenging for\nsegmentation, because they contain complex layouts of distinct textures (e.g., boundaries of several\nregions meet at one point), thin and elongated shapes, and relatively large illumination changes. We\nalso evaluate the generality and execution time of our MWIS algorithm on a synthetic graph from\nbenchmark OR-Library [24], and the problem sets from [12].\n\nOur MWIS algorithm is evaluated for the following three types of input segmentations. The \ufb01rst\ntype is a hierarchy of segments produced by the gPb-OWT-UCM method of [7]. gPb-OWT-UCM\nuses the perceptual signi\ufb01cance of a region boundary, Pb \u2208 [0, 100], as an input parameter. To\nobtain the hierarchy, we vary Pb = 20:5:70. The second type is a hierarchy of segments produced\nby the multiscale algorithm of [5]. This method uses pixel-intensity contrast, \u03c3 \u2208 [0, 255], as an\ninput parameter. To obtain the hierarchy, we vary \u03c3 = 30:20:120. Finally, the third type is a\nunion of NCut [1] and Meanshift [3] segments. Ncut uses one input parameter \u2013 namely, the total\nnumber of regions, N , in the image. Meanshift uses three input parameters: feature bandwidth bf ,\nspatial bandwidth bs, and minimum region area Smin. We vary these parameters as N = 10:10:100,\nbf = 5.5:0.5:8.5, bs = 4:2:10, and Smin = 100:200:900. The variants [7]+Ours and [5]+Ours\nserve to test whether our approach is capable of extracting \u201cmeaningful\u201d regions from a multiscale\nsegmentation. The variant ([3]+[1])+Ours evaluates our hypothesis that reasoning over an ensemble\nof distinct segmentations improves each individual one.\n\nSegmentation of BSD images is used for a comparison with replicator dynamics approach of [17],\nwhich transforms the MWIS problem into the maximum weight clique problem, and then relaxes it\ninto a continuous problem, denoted as MWC. In addition, we also use data from other domains \u2013\nspeci\ufb01cally, OR-Library [24] and the problem sets from [12] \u2013 for a comparison with other state-of-\nthe-art MWIS algorithms.\nQualitative evaluation: Fig. 3 and Fig. 4 show the performance of our variant [7]+Ours on ex-\nample images from BSD. Fig. 4 also shows the best segmentations of [7] and [25], obtained by an\nexhaustive search for the optimal values of their input parameters. As can be seen in Fig. 4, the\nmethod of [7] misses to segment the grass under the tiger, and oversegments the star\ufb01sh and the\ncamel, which we correct. Our approach eliminates the need for hand-picking the optimal input pa-\nrameters in [7], and yields results that are good even in cases when objects have complex textures\n(e.g. tiger and star\ufb01sh), or when the boundaries are blurred or jagged (e.g. camel).\nQuantitative evaluation: Table 1 presents segmentations of BSD images using our three variants:\n[7]+Ours, [5]+Ours, and ([3]+[1])+Ours. We consider the standard metrics: Probabilistic Rand\nIndex (P RI), and Variation of Information (V I) [26]. P RI between estimated and ground-truth\nsegmentations, S and G, is de\ufb01ned as the sum of the number of pairs of pixels that have the same\nlabel in S and G, and those that have different labels in both segmentations, divided by the total\nnumber of pairs of pixels. V I measures the distance between S and G in terms of their average\nconditional entropy. P RI should be large, and V I small. For all variants of our approach, we\nrun the MWIS algorithm 10 times, starting from different initial points, and report the average\nP RI and V I values. For [7], we report their best results obtained by an exhaustive search for the\noptimal value of their input parameter Pb. As can be seen, [7]+Ours does not hand-pick the optimal\ninput parameters, and outperforms the best results of original [7]. Surprisingly, when working with\n\n6\n\n\fAlgorithm 1: Our MWIS Algorithm\n\nInput: Graph G including w and A, convergence\nthreshold \u03b4, regularization parameter \u03b1 = 2\n\nOutput: The MWIS of G denoted as x\n\u2217\nx \u2212 1\nDe\ufb01ne IQP objective: f (x) , w\nInitialize t=0, and x\nrepeat\n\n2 \u03b1x\n(0)\u2208{0, 1}n, y\n\n\u2217=0, y\n\nT\n\nTAx ;\n(0)6=0;\n\n(t)) as in (3);\n\nFind h(y, y\nUse (4) for \u02dcx= argmaxx\u2208{0,1}n h(x, y\nif f ( \u02dcx) \u2265 f (y\n\n(t)) then\n\n(t)) ;\n\n(t+1) = \u02dcx ;\n\ny\n\nelse\n\nUse (5) for\n\u03b7= argmax\n\u03b7\u2208[0,1]\n(t+1) = y\n\nf(cid:0)y\n\ny\nend\nif f ( \u02dcx) \u2265 f (x\n\n\u2217 = \u02dcx ;\n\nx\n\nend\n\n(t)+\u03b7( \u02dcx\u2212y\n\n(t))(cid:1)\n\n(t) + \u03b7( \u02dcx \u2212 y\n\n(t)) ;\n\n\u2217) then\n\nuntil(cid:13)(cid:13)y\n\n(t+1) \u2212 y\n\n(t)(cid:13)(cid:13) < \u03b4 ;\n\n1\n\n2\n3\n\n4\n\n5\n\n6\n\n7\n8\n9\n\n10\n11\n12\n13\n14\n\n15\n\nMethod\nHuman\n\n[7]\n\n([3]+[1])+MWC\n\n[5]+Ours\n\n([3]+[1])+Ours\n\n[7]+Ours\n\nP RI\n0.87\n0.81\n0.78\n0.79\n0.80\n0.83\n\nV I\n1.16\n1.68\n1.75\n1.69\n1.71\n1.59\n\nTable 1: A comparison on BSD. Prob-\nabilistic Rand Index (P RI) should be\nlarge, and Variation of Information\n(V I) small. Input segments are gener-\nated by the methods of [7, 5, 3, 1], and\nthen selected by the maximum weight\nclique formulation (MWC) of [17], or\nby our algorithm. For [7], we report\ntheir best results obtained by an ex-\nhaustive search for the optimal value\nof their input parameter Pb.\n\nsegments generated by Meanshift, Ncuts, and [5], the performances of [5]+Ours and ([3]+[1])+Ours\ncome very close to those of [7]. This is unexpected, because Meanshift, Ncuts, and the method of\n[5] are known to produce poor performance in terms of P RI and V I values, relative to [7]. Also,\nnote that ([3]+[1])+Ours outperforms the relaxation-based method ([3]+[1])+MWC.\n\nFig. 2 shows the sensitivity of the convergence rate of our approach to a speci\ufb01c choice of \u03b1. The\nTAy of the IQP objective function is averaged over all 200 graphs, each with about\npenalty term \u03b1y\nTAy\n300 nodes, obtained from 200 BSD images. As can be seen, for \u03b1 \u2265 2, the penalty term \u03b1y\nconverges to 0 with some initial oscillations. Experimentally, the convergence rate is maximum\nwhen \u03b1 = 2. We use this value in all our experiments.\n\nMethod\n\nb2500 [24]\n\np3000-7000 [12]\n\n[12]\n\nOurs\n\navg\nsec\navg\nsec\n\n2\n74\n0\n21\n\n175\n1650\n62\n427\n\nFigure 2: Convergence rate vs. a speci\ufb01c choice\nof \u03b1, averaged over 200 BSD images: \u03b1 < 2 is\nmarked red, and \u03b1 \u2265 2 is marked blue.\n\nTable 2: Average of solution difference, and\ncomputation time in seconds for problem sets\nfrom [24] and [12].\n\nMWIS performance: We also test our Alg. 1 on two sets of problems beyond image segmentation.\nAs input we use a graph constructed from data from the OR-Library [24], and from the problem sets\npresented in [12]. For the \ufb01rst set of problems (b2500), we only consider the largest graphs. We use\nten instances, called b2500-1 to b2500-10, of size 2500 and with density 10%. For the second set\nof problem (p3000 to p7000), we take into account graphs of size 4000, 5000, 6000 and 7000. Five\ngraph instances per size are used. Tab. 2 shows the average difference between the estimated and\nground-truth solution, and computation time in seconds. The presented comparison with Iterative\nTabu Search (ITS) [12] demonstrates that, on average, we achieve better performance, under much\nsmaller running times.\n\n7\n\n\fFigure 3: Segmentation of BSD images. (top) Original images. (bottom) Results using our variant\n[7]+Ours. Failures, such as the painters\u2019 shoulder, the bird\u2019s lower body part, and the top left \ufb01sh,\noccur simply because these regions are not present in the input segmentations.\n\nFigure 4: Comparison with the state-of-the-art segmentation algorithms on BSD images. (top row)\nOriginal images. (middle row) The three left results are from [7], and the rightmost result is from\n[25]. (bottom row) Results of [7]+Ours. By extracting \u201cmeaningful\u201d segments from a segmentation\nhierarchy produced by [7] we correct the best, manually optimized results of [7].\n\n5 Conclusion\n\nTo our knowledge, this is the \ufb01rst attempt to formulate image segmentation as MWIS. Our empirical\n\ufb01ndings suggest that this is a powerful framework that permits good segmentation performance\nregardless of a particular MWIS algorithm used. We have presented a new \ufb01xed point algorithm that\nef\ufb01ciently solves MWIS, with complexity O(|E|), on a graph with |E| edges, and proved that the\nalgorithm converges to a maximum. Our MWIS algorithm seeks a solution directly in the discrete\ndomain, instead of resorting to the relaxation, as is common in the literature. We have empirically\nobserved that our algorithm runs faster and outperforms the other competing MWIS algorithms on\nbenchmark datasets. Also, we have shown a comparison with the state-of-the-art segmenter [7]\non the benchmark Berkeley segmentation dataset. Our selection of \u201cmeaningful\u201d regions from a\nsegmentation hierarchy produced by [7] outperforms the manually optimized best results of [7], in\nterms of Probabilistic Rand Index and Variation of Information.\n\n8\n\n\fReferences\n\n[1] J. Shi and J. Malik, \u201cNormalized cuts and image segmentation,\u201d IEEE TPAMI, vol. 22, no. 8, pp. 888\u2013905,\n\n2000.\n\n[2] M. Pavan and M. Pelillo, \u201cDominant sets and pairwise clustering,\u201d IEEE TPAMI, vol. 29, no. 1, pp. 167\u2013\n\n172, 2007.\n\n[3] D. Comaniciu and P. Meer, \u201cMeanshift: a robust approach toward feature space analysis,\u201d IEEE TPAMI,\n\nvol. 24, no. 5, pp. 603\u2013619, 2002.\n\n[4] M. Kass, A. Witkin, and D. Terzopoulos, \u201cSnakes: Active contour models,\u201d IJCV, vol. V1, no. 4, pp.\n\n321\u2013331, 1988.\n\n[5] N. Ahuja, \u201cA transform for multiscale image segmentation by integrated edge and region detection,\u201d IEEE\n\nTPAMI, vol. 18, no. 12, pp. 1211\u20131235, 1996.\n\n[6] X. Ren, C. Fowlkes, and J. Malik, \u201cLearning probabilistic models for contour completion in natural\n\nimages,\u201d IJCV, vol. 77, no. 1-3, pp. 47\u201363, 2008.\n\n[7] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, \u201cFrom contours to regions: An empirical evaluation,\u201d in\n\nCVPR, 2009.\n\n[8] S. Bagon, O. Boiman, and M. Irani, \u201cWhat is a good image segment? A uni\ufb01ed approach to segment\n\nextraction,\u201d in ECCV, 2008.\n\n[9] S. Todorovic and N. Ahuja, \u201cTexel-based texture segmentation,\u201d in ICCV, 2009.\n[10] B. Russell, A. Efros, J. Sivic, B. Freeman, and A. Zisserman, \u201cSegmenting scenes by matching image\n\ncomposites,\u201d in NIPS, 2009.\n\n[11] L. Trevisan, \u201cInapproximability of combinatorial optimization problems,\u201d Electronic Colloquium on\n\nComputational Complexity, Tech. Rep. TR04065, 2004.\n\n[12] G. Palubeckis, \u201cIterated tabu search for the unconstrained binary quadratic optimization problem,\u201d Infor-\n\nmatica, vol. 17, no. 2, pp. 279\u2013296, 2006.\n\n[13] D. Warrier, W. E. Wilhelm, J. S. Warren, and I. V. Hicks, \u201cA branch-and-price approach for the maximum\n\nweight independent set problem,\u201d Netw., vol. 46, no. 4, pp. 198\u2013209, 2005.\n\n[14] S. Sanghavi, D. Shah, and A. S. Willsky, \u201cMessage-passing for max-weight independent set,\u201d in NIPS,\n\n2007.\n\n[15] M. Groetschel, L. Lovasz, and A. Schrijver, \u201cPolynomial algorithms for perfect graphs,\u201d in Topics on\n\nPerfect Graphs, C. Berge and V. Chvatal, Eds. North-Holland, 1984, vol. 88, pp. 325 \u2013 356.\n\n[16] M. Todd, \u201cSemide\ufb01nite optimization,\u201d Acta Numerica, vol. 10, pp. 515\u2013560, 2001.\n[17] I. M. Bomze, M. Pelillo, and V. Stix, \u201cApproximating the maximum weight clique using replicator dy-\n\nnamics,\u201d IEEE Trans. Neural Net., vol. 11, no. 6, pp. 1228\u20131241, 2000.\n\n[18] S. Busygin, C. Ag, S. Butenko, and P. M. Pardalos, \u201cA heuristic for the maximum independent set problem\nbased on optimization of a quadratic over a sphere,\u201d Journal of Combinatorial Optimization, vol. 6, pp.\n287\u2013297, 2002.\n\n[19] M. P. Kumar and D. Koller, \u201cEf\ufb01ciently selecting regions for scene understanding,\u201d in CVPR, 2010.\n[20] M. Leordeanu, M. Hebert, and R. Sukthankar, \u201cAn integer projected \ufb01xed point method for graph match-\n\ning and MAP inference,\u201d in NIPS, 2009.\n\n[21] S. Gold and A. Rangarajan, \u201cA graduated assignment algorithm for graph matching,\u201d IEEE TPAMI,\n\nvol. 18, no. 4, pp. 377\u2013388, 1996.\n\n[22] M. Varma and R. Garg, \u201cLocally invariant fractal features for statistical texture classi\ufb01cation,\u201d in ICCV,\n\n2007.\n\n[23] D. Martin, C. Fowlkes, D. Tal, and J. Malik, \u201cA database of human segmented natural images and its\n\napplication to evaluating segmentation algorithms and measuring ecological statistics,\u201d in ICCV, 2001.\n\n[24] J. E. Beasley, \u201cObtaining test problems via internet,\u201d Journal of Global Optimization, vol. 8, no. 4, pp.\n\n429\u2013433, 1996.\n\n[25] M. Galun, E. Sharon, R. Basri, and A. Brandt, \u201cTexture segmentation by multiscale aggregation of \ufb01lter\n\nresponses and shape elements,\u201d in ICCV, 2003, pp. 716\u2013723.\n\n[26] R. Unnikrishnan, C. Pantofaru, and M. Hebert, \u201cToward objective evaluation of image segmentation al-\n\ngorithms,\u201d IEEE TPAMI, vol. 29, no. 6, pp. 929\u2013944, 2007.\n\n9\n\n\f", "award": [], "sourceid": 122, "authors": [{"given_name": "William", "family_name": "Brendel", "institution": null}, {"given_name": "Sinisa", "family_name": "Todorovic", "institution": null}]}