{"title": "Higher-Order Correlation Clustering for Image Segmentation", "book": "Advances in Neural Information Processing Systems", "page_first": 1530, "page_last": 1538, "abstract": "For many of the state-of-the-art computer vision algorithms, image segmentation is an important preprocessing step. As such, several image segmentation algorithms have been proposed, however, with certain reservation due to high computational load and many hand-tuning parameters. Correlation clustering, a graph-partitioning algorithm often used in natural language processing and document clustering, has the potential to perform better than previously proposed image segmentation algorithms. We improve the basic correlation clustering formulation by taking into account higher-order cluster relationships. This improves clustering in the presence of local boundary ambiguities. We first apply the pairwise correlation clustering to image segmentation over a pairwise superpixel graph and then develop higher-order correlation clustering over a hypergraph that considers higher-order relations among superpixels. Fast inference is possible by linear programming relaxation, and also effective parameter learning framework by structured support vector machine is possible. Experimental results on various datasets show that the proposed higher-order correlation clustering outperforms other state-of-the-art image segmentation algorithms.", "full_text": "Higher-Order Correlation Clustering for Image\n\nSegmentation\n\nSungwoong Kim\n\nDepartment of EE, KAIST\n\nDaejeon, South Korea\n\nSebastian Nowozin\n\nMicrosoft Research Cambridge\n\nCambridge, UK\n\nsungwoong.kim01@gmail.com\n\nSebastian.Nowozin@microsoft.com\n\nPushmeet Kohli\n\nMicrosoft Research Cambridge\n\nCambridge, UK\n\nChang D. Yoo\n\nDepartment of EE, KAIST\n\nDaejeon, South Korea\n\npkohli@microsoft.com\n\ncdyoo@ee.kaist.ac.kr\n\nAbstract\n\nFor many of the state-of-the-art computer vision algorithms, image segmentation\nis an important preprocessing step. As such, several image segmentation algo-\nrithms have been proposed, however, with certain reservation due to high compu-\ntational load and many hand-tuning parameters. Correlation clustering, a graph-\npartitioning algorithm often used in natural language processing and document\nclustering, has the potential to perform better than previously proposed image seg-\nmentation algorithms. We improve the basic correlation clustering formulation by\ntaking into account higher-order cluster relationships. This improves clustering\nin the presence of local boundary ambiguities. We \ufb01rst apply the pairwise cor-\nrelation clustering to image segmentation over a pairwise superpixel graph and\nthen develop higher-order correlation clustering over a hypergraph that consid-\ners higher-order relations among superpixels. Fast inference is possible by lin-\near programming relaxation, and also effective parameter learning framework by\nstructured support vector machine is possible. Experimental results on various\ndatasets show that the proposed higher-order correlation clustering outperforms\nother state-of-the-art image segmentation algorithms.\n\n1 Introduction\n\nImage segmentation, a partitioning of an image into disjoint regions such that each region is homo-\ngeneous, is an important preprocessing step for many of the state-of-the-art algorithms for high-level\nimage/scene understanding for three reasons. First, the coherent support of a region, commonly as-\nsumed to be of a single label, serves as a good prior for many labeling tasks. Second, these coherent\nregions allow a more consistent feature extraction that can incorporate surrounding contextual in-\nformation by pooling many feature responses over the region. Third, compared to pixels, a small\nnumber of larger homogeneous regions signi\ufb01cantly reduces the computational cost for a successive\nlabeling task.\nImage segmentation algorithms can be categorized into either non-graph-based or graph-based al-\ngorithms. Some well-known non-graph-based algorithms represented by mode-seeking algorithms\nsuch as the K-means [1], mean-shift [2], and EM [3] are available, while well-known graph-based\nalgorithms are available as the min-cuts [4], normalized cuts [5] and Felzenszwalb-Huttenlocher\n(FH) segmentation algorithm [6]. In comparison to non-graph-based segmentations, graph-based\nsegmentations have been shown to produce consistent segmentations by adaptively balancing local\n\n1\n\n\fjudgements of similarity [7]. Moreover, the graph-based segmentation algorithms with global ob-\njective functions such as the min-cuts and normalized cuts have been shown to perform better than\nthe FH algorithm that is based on the local objective function, since the global-objective algorithms\nbene\ufb01t from the global nature of the information [7]. However, in contrast to the min-cuts and nor-\nmalized cuts which are node-labeling algorithms, the FH algorithm bene\ufb01ts from the edge-labeling\nin that it leads to faster inference and does not require a pre-speci\ufb01ed number of segmentations in\neach image [7].\nCorrelation clustering is a graph-partitioning algorithm [8] that simultaneously maximizes intra-\ncluster similarity and inter-cluster dissimilarity by solving the global objective (discriminant) func-\ntion. In comparison with the previous image segmentation algorithms, correlation clustering is a\ngraph-based, global-objective, and edge-labeling algorithm and therefore, has the potential to per-\nform better for image segmentation. Furthermore, correlation clustering leads to the linear discrimi-\nnant function which allows for approximate polynomial-time inference by linear programming (LP)\nand large margin training based on structured support vector machine (S-SVM) [9]. A framework\nthat uses S-SVM for training the parameters in correlation clustering has been considered previ-\nously by Finley et al. [10]; however, the framework was applied to noun-phrase and news article\nclusterings. Taskar derived a max-margin formulation for learning the edge scores for correlation\nclustering [11]. However, his learning criterion is different from the S-SVM and is limited to ap-\nplications involving two different segmentations of a single image. Furthermore, Taskar does not\nprovide any experimental comparisons or quantitative results.\nEven though the previous (pairwise) correlation clustering can consider global aspects of an im-\nage using the discriminatively-trained discriminant function, it is restricted in resolving the segment\nboundary ambiguities caused by neighboring pairwise relations presented by the pairwise graph.\nTherefore, to capture long-range dependencies of distant nodes in a global context, this paper pro-\nposes a novel higher-order correlation clustering to incorporate higher-order relations. We \ufb01rst\napply the pairwise correlation clustering to image segmentation over a pairwise superpixel graph\nand then develop higher-order correlation clustering over a hypergraph that considers higher-order\nrelations among superpixels.\nThe proposed higher-order correlation clustering is de\ufb01ned over a hypergraph in which an edge can\nconnect to two or more nodes [12]. Hypergraphs have been previously used to lift certain limi-\ntations of conventional pairwise graphs [13, 14, 15]. However, previously proposed hypergraphs\nfor image segmentation are restricted to partitioning based on the generalization of normalized cut\nframework, which suffer from a number of problems. First, inference is slow and dif\ufb01cult espe-\ncially with increasing graph size. A number of algorithms to approximate the inference process\nhave been introduced based on the coarsening algorithm [14] and the hypergraph Laplacian ma-\ntrices [13]; these are heuristic approaches and therefore are sub-optimal. Second, incorporating a\nsupervised learning algorithm for parameter estimation under the spectral hypergraph partitioning\nframework is dif\ufb01cult. This is in line with the dif\ufb01culties in learning spectral graph partitioning. This\nrequires a complex and unstable eigenvector approximation which must be differentiable [16, 17].\nThird, utilizing rich region-based features is restricted. Almost all previous hypergraph-based image\nsegmentation algorithms are restricted to use only color variances as region features.\nThe proposed higher-order correlation clustering overcomes all of these problems due to the gener-\nalization of the pairwise correlation clustering and enables to take advantages of using a hypergraph.\nThe proposed higher-order correlation clustering algorithm uses as its input a hypergraph and leads\nto a linear discriminant function. A rich feature vector is de\ufb01ned based on several visual cues in-\nvolving higher-order relations among superpixels. For fast inference, the LP relaxation is used to\napproximately solve the higher-order correlation clustering problem, and for supervised training of\nthe parameter vector by S-SVM, we apply a decomposable structured loss function to handle un-\nbalanced classes. We incorporate this loss function into the cutting plane procedure for S-SVM\ntraining. Experimental results on various datasets show that the proposed higher-order correlation\nclustering outperforms other state-of-the-art image segmentation algorithms.\nThe rest of the paper is organized as follows. Section 2 presents the higher-order correlation clus-\ntering for image segmentation. Section 3 describes large margin training for supervised image seg-\nmentation based on the S-SVM and the cutting plane algorithm. A number of experimental and\ncomparative results are presented and discussed in Section 4, followed by a conclusion in Section 5.\n\n2\n\n\fFigure 1: Illustrations of a part of (a) the pairwise graph (b) and the triplet graph built on superpixels.\n2 Higher-order correlation clustering\nThe proposed image segmentation is based on superpixels which are small coherent regions pre-\nserving almost all boundaries between different regions, since superpixels signi\ufb01cantly reduce com-\nputational cost and allow feature extraction to be conducted from a larger homogeneous region.\nThe proposed correlation clustering merges superpixels into disjoint homogeneous regions over a\nsuperpixel graph.\n\n2.1 Pairwise correlation clustering over pairwise superpixel graph\nDe\ufb01ne a pairwise undirected graph G = (V;E) where a node corresponds to a superpixel and a link\nbetween adjacent superpixels corresponds to an edge (see Figure 1.(a)). A binary label yjk for an\nedge (j; k) 2 E between nodes j and k is de\ufb01ned such that\n\n{\n\nyjk =\n\n1;\n0; otherwise.\n\nif nodes j and k belong to the same region,\n\n(1)\n\nA discriminant function, which is the negative energy function, is de\ufb01ned over an image x and label\ny of all edges as\n\nF (x; y; w) =\n\n\u2211\n\u2211\n\n(j;k)2E\n\n=\n\n(j;k)2E\n\nSimw(x; j; k)yjk\n\u27e8w; \u03d5jk(x)\u27e9yjk = \u27e8w;\n\n\u2211\n\n(j;k)2E\n\n\u03d5jk(x)yjk\u27e9 = \u27e8w; (cid:8)(x; y)\u27e9\n\n(2)\n\nwhere the similarity measure between nodes j and k, Simw(x; j; k), is parameterized by w and\ntakes values of both signs such that a large positive value means strong similarity while a large\nnegative value means high degree of dissimilarity. Note that the discriminant function F (x; y; w)\nis assumed to be linear in both the parameter vector w and the joint feature map (cid:8)(x; y), and\n\u03d5jk(x) is a pairwise feature vector which re\ufb02ects the correspondence between the jth and the kth\nsuperpixels. An image segmentation is to infer the edge label, ^y, over the pairwise superpixel graph\nG by maximizing F such that\n\n^y = argmax\n\ny2Y F (x; y; w)\n\n(3)\nwhere Y is the set of f0; 1gE that corresponds to a valid segmentation, the so called multicut poly-\ntope. However, solving (3) with this Y is generally NP-hard. Therefore, we approximate Y by means\nof a common multicut LP relaxation [18] with the following two constraints: (1) cycle inequality and\n(2) odd-wheel inequality. When producing the segmentation results based on the approximated LP\nsolutions, we take the \ufb02oor of a fractionally-predicted label of each edge independently for simply\nobtaining valid integer solutions that may be sub-optimal.\nEven though this pairwise correlation clustering takes a rich pairwise feature vector and a trained\nparameter vector (which will be presented later), it often produces incorrectly predicted segments\ndue to the segment boundary ambiguities caused by limited pairwise relations of neighboring su-\nperpixels (see Figure 2). Therefore, to incorporate higher-order relations, we develop higher-order\ncorrelation clustering by generalizing the correlation clustering over a hypergraph.\n\n2.2 Higher-order correlation clustering over hypergraph\n\nThe proposed higher-order correlation clustering is de\ufb01ned over a hypergraph in which an edge\ncalled hyperedge can connect to two or more nodes. For example, as shown in Figure 1.(b), one\n\n3\n\nijk(a)(b)ijkyjkyikyijyjkyjk\fFigure 2: Example of segmentation result by pairwise correlation clustering. (a) Original image. (b)\nGround-truth. (c) Superpixels. (d) Segments obtained by pairwise correlation clustering.\n\n\u222a\ncan introduce binary labels for each adjacent vertices forming a triplet such that yijk = 1 if all\nvertices in the triplet (fi; j; kg) are in the same cluster; otherwise, yijk = 0. De\ufb01ne a hypergraph\nHG = (V;E) where V is a set of nodes (superpixels) and E is a set of hyperedges (subsets of V) such\ne2E = V. Here, a hyperedge e has at least two nodes, i.e. jej (cid:21) 2. Therefore, the hyperedge\nthat\nset E can be divided into two disjoint subsets: pairwise edge set Ep = fe 2 E j jej = 2g and higher-\norder edge set Eh = fe 2 E j jej > 2g such that Ep\nfor higher-order correlation clustering all hyperedges containing just two nodes (8ep 2 Ep) are\nlinked between adjacent superpixels. The pairwise superpixel graph is a special hypergraph where\nall hyperedges contain just two (neighboring) superpixels: Ep = E. A binary label ye for a hyperedge\ne 2 E is de\ufb01ned such that\n\n\u222aEh = E. Note that in the proposed hypergraph\n\n{\n\nye =\n\n1; if all nodes in e belong to the same region,\n0; otherwise.\n\n(4)\n\n\u2211\nSimilar to the pairwise correlation clustering, a linear discriminant function is de\ufb01ned over an image\nx and label y of all hyperedges as\n\u2211\nHomw(x; e)ye\n\u27e8w; \u03d5e(x)\u27e9ye =\n\n\u27e8wh; \u03d5eh (x)\u27e9yeh =\u27e8w; (cid:8)(x; y)\u27e9 (5)\n\n\u27e8wp; \u03d5ep(x)\u27e9yep+\n\nF (x; y; w) =\n\n\u2211\n\n\u2211\n\nep2Ep\n\ne2E\n\ne2E\n\n=\n\neh2Eh\n\np ; wT\n\nwhere the homogeneity measure among nodes in e, Homw(x; e), is also the inner product of the\nparameter vector w and the feature vector \u03d5e(x) and takes values of both signs such that a large\npositive value means strong homogeneity while a large negative value means high degree of non-\nhomogeneity. Note that the proposed discriminant function for higher-order correlation clustering is\ndecomposed into two terms by assigning different parameter vectors to the pairwise edge set Ep and\nthe higher-order edge set Eh such that w = [wT\nh ]T . Thus, in addition to the pairwise similarity\nbetween neighboring superpixels, the proposed higher-order correlation clustering considers a broad\nhomogeneous region re\ufb02ecting higher-order relations among superpixels.\nNow the problem is how to build our hypergraph from a given image. Here, we use unsupervised\nmultiple partitionings (quantizations) from baseline superpixels. We obtain unsupervised multiple\npartitionings by merging not pixels but superpixels with different image quantizations using the\nultrametric contour maps [19]. For example, in Figure 3, there are three region layers, one superpixel\n(pairwise) layer and two higher-order layers, from which a hypergraph is constructed by de\ufb01ning\nhyperedges as follows: \ufb01rst, all edges (black line) in the pairwise superpixel graph from the \ufb01rst\nlayer are incorporated into the pairwise edge set Ep, while hyperedges (yellow line) corresponding\nto regions (groups of superpixels) in the second and third layers are included in the higher-order edge\nset Eh. Note that we can further decompose the higher-order term in (5) into two terms associated\nwith the second layer and the third layer, respectively, by assigning different parameter vectors;\nhowever for simplicity, this paper aggregates all higher-order edges from all higher-order layers into\na single higher-order edge set assigning the same parameter vector.\n\n2.2.1 LP relaxation for inference\nAn image segmentation is to infer the hyperedge label, ^y, over the hypergraph HG by maximizing\nthe discriminant function F such that\n\n^y = argmax\n\ny2Y F (x; y; w)\n\n4\n\n(6)\n\n(a)(cid:13)(b)(cid:13)(c)(cid:13)(d)(cid:13)\fFigure 3: Hypergraph construction from multiple partitionings.\n(a) Multiple partitionings from\nbaseline superpixels. (b) Hyperedge (yellow line) corresponding to a region in the second layer. (c)\nHyperedge (yellow line) corresponding to a region in the third layer.\nwhere Y is also the set of f0; 1gE that corresponds to a valid segmentation. Since the inference\nproblem (6) is also NP-hard, we relax Y by (facet-de\ufb01ning) linear inequalities. In addition to the\nconstraints placed on pairwise labels such that the cycle inequality and odd-wheel inequality hold\npairwise correlation clustering, we augment the constraints for labels on the higher-order edges,\ncalled higher-order inequalities, for a valid segmentation; there is no all-one pairwise labels in a\nregion for which the higher-order edge is labeled as zero (non-homogeneous region), and when a\nregion is labeled as one (homogeneous region), all pairwise labels in that region should be one.\nThese higher-order inequalities can be formulated as\n\n\u2211\n\nep2Epjep(cid:26)eh\n\n(cid:20) yep; 8ep 2 Epjep (cid:26) eh;\n(1 (cid:0) yep):\n\u2211\n\nyeh\n(1 (cid:0) yeh ) (cid:20)\n\u2211\n\u222a\n\u27e8wp; \u03d5ep(x)\u27e9yep +\nEh); ye 2 [0; 1];\n\ny\n\neh2Eh\n\nep2Ep\n8 e 2 E(= Ep\n8 ep 2 Ep;\n8 eh 2 Eh; higher-order inequalities (7):\n\nargmax\n\n\u27e8wh; \u03d5eh (x)\u27e9yeh\n\n(7)\n\n(8)\n\nIndeed, the LP relaxation to approximately solve (6) is formulated as\n\ns.t.\n\ncycle inequalities, odd-wheel inequalities [18];\n\nNote that the proposed higher-order correlation clustering follows the concept of soft constraints:\nsuperpixels within a hyperedge are encouraged to merge if a hyperedge is highly homogeneous.\n\n2.2.2 Feature vector\n\nWe construct a 771-dimensional feature vector \u03d5e(x) by concatenating several visual cues with dif-\nferent quantization levels and thresholds. The pairwise feature vector \u03d5ep(x) re\ufb02ects the correspon-\ndence between neighboring superpixels, and the higher-order feature vector \u03d5eh (x) characterizes a\nmore complex relations among superpixels in a broader region to measure homogeneity. The mag-\nnitude of w determines the importance of each feature, and this importance is task-dependent. Thus,\nw is estimated by supervised training described in Section 3.\n\n1. Pairwise feature vector (611-dim): \u03d5ep = [\u03d5c\nep\n\n; \u03d5t\nep\n\n; \u03d5s\nep\n\n; \u03d5e\nep\n\n; \u03d5v\nep\n\n; 1]:\n\n(cid:15) Color difference \u03d5c: The 26 RGB/HSV color distances (absolute differences, (cid:31)2-\n\ndistances, earth mover\u2019s distances) between two adjacent superpixels.\n\n5\n\nSuperpixel(pairwise) layerSuperpixel(pairwise) layerHigher-order layerHigher-order layer(a)(b)(c)\f(cid:15) Texture difference \u03d5t: The 64 texture distances (absolute differences, (cid:31)2-distances,\nearth mover\u2019s distances) between two adjacent superpixels using 15 Leung-Malik\n(LM) \ufb01lter banks [19].\n(cid:15) Shape/location difference \u03d5s: The 5-dimensional shape/location feature proposed in\n(cid:15) Edge strength \u03d5e: The 1-of-15 coding of the quantized edge strength proposed in [19].\n(cid:15) Joint visual word posterior \u03d5v: The 100-dimensional vector holding the joint visual\nword posteriors for a pair of neighboring superpixels using 10 visual words and the\n400-dimensional vector holding the joint posteriors based on 20 visual words [21].\n\n[20].\n\n2. Higher-order feature vector (160-dim): \u03d5eh = [\u03d5va\neh\n\n; \u03d5e\neh\n\n; \u03d5tm\neh\n\n; 1]:\n\na hyperedge.\n\n(cid:15) Variance \u03d5va: The 14 color variances and 30 texture variances among superpixels in\n(cid:15) Edge strength \u03d5e: The 1-of-15 coding of the quantized edge strength proposed in [19].\n(cid:15) Template matching score \u03d5tm: The color/texture and shape/location features of all\nregions in the training images are clustered using k-means with k = 100 to obtain 100\nrepresentative templates of distinct regions. The 100-dimensional template matching\nfeature vector is composed of the matching scores between a region de\ufb01ned by a\nhyperedge and templates using the Gaussian RBF kernel.\n\nIn each feature vector, the bias (=1) is augmented for proper similarity/homogeneity measure which\ncan either be positive or negative.\n3 Structural learning\n\nThe proposed discriminant function is de\ufb01ned over the superpixel graph, and therefore, the ground-\ntruth segmentation needs to be transformed to the ground-truth edge labels in the superpixel graph.\nFor this, we \ufb01rst assign a single dominant segment label to each superpixel by majority voting over\nthe superpixel\u2019s constituent pixels and then obtain the ground-truth edge labels.\nUsing this ground-truth edge labels of the training data, the S-SVM [9] is performed to estimate the\nparameter vector. Given N training samples fxn; yngN\nn=1 where yn is the ground-truth edge labels\nfor the nth training image, the S-SVM [9] optimizes w by minimizing a quadratic objective function\nsubject to a set of linear margin constraints:\n\nN\u2211\n\nmin\nw;(cid:24)\n\n1\n2\n\n\u2225w\u22252 + C\n\n(cid:24)n\n\n(9)\n\ns.t.\n\nn=1\n\n\u27e8w; \u2206(cid:8)(xn; y)\u27e9 (cid:21) \u2206(yn; y) (cid:0) (cid:24)n; 8n; y 2 Ynyn;\n(cid:24)n (cid:21) 0;\n\n8n\n\nwhere \u2206(cid:8)(xn; y) = (cid:8)(xn; yn) (cid:0) (cid:8)(xn; y), and C > 0 is a constant that controls the trade-off\nbetween margin maximization and training error minimization. In the S-SVM, the margin is scaled\nwith a loss \u2206(yn; y), which is the difference measure between prediction y and ground-truth label\nyn of the nth image. The S-SVM offers good generalization ability as well as the \ufb02exibility to\nchoose any loss function [9].\nThe cutting plane algorithm [9, 18] with LP relaxation for loss-augmented inference is used to solve\nthe optimization problem of S-SVM, since fast convergence and high robustness of the cutting plane\nalgorithm in handling a large number of margin constraints are well-known [22, 23].\nA loss function is usually a non-negative function, and a loss function that is decomposable is pre-\nferred, since it enables the loss-augmented inference in the cutting plane algorithm to be performed\nef\ufb01ciently. The most popular loss function that is decomposable is the Hamming distance which\nis equivalent to the number of mismatches between yn and y at the edge level in this correlation\nclustering. Unfortunately, in the proposed correlation clustering for image segmentation, the num-\nber of edges which are labeled as 1 is considerably higher than that of edges which are labeled as 0.\nThis unbalance makes other learning methods such as the perceptron algorithm inappropriate, since\nit leads to the clustering of the whole image as one segment. This problem due to the unbalance also\n\n6\n\n\f\u2211\n\n(\n\nep2Ep\n\n)\n\n\u2211\n\n(\n\n+D\n\neh2Eh\n\nFigure 4: Obtained evaluation measures from segmentation results on the SBD.\n\noccurs when we use the Hamming loss in the S-SVM. Therefore, we use the following loss function:\n\n)\n\n\u2206(yn; y)=\n\nRp yn\nep\n\n+yep\n\n(cid:0) (Rp + 1)yn\n\nep\n\nyep\n\nRh yn\neh\n\n+yeh\n\n(cid:0) (Rh + 1)yn\n\neh\n\nyeh\n\n(10)\n\nwhere D is the relative weight of the loss at higher-order edge level to that of the loss at pairwise\nedge level. In addition, Rp and Rh control the relative importance between the incorrect merging of\nthe superpixels and the incorrect separation of the superpixels by imposing different weights to the\nfalse negative and the false positive. Here, we set both Rp and Rh to be less than 1 to overcome the\nproblem due to the unbalance.\n\n4 Experiments\n\nTo evaluate segmentations obtained by various algorithms against the ground-truth segmentation,\nwe conducted image segmentations on three benchmark datasets: Stanford background dataset [24]\n(SBD), Berkeley segmentation dataset (BSDS) [25], MSRC dataset [26]. For image segmentation\nbased on correlation clustering, we initially obtain baseline superpixels (438 superpixels per image\non average) by the gPb contour detector and the oriented watershed transform [19] and then construct\na hypergraph. The function parameters are initially set to zero, and then based on the S-SVM, the\nstructured output learning is used to estimate the parameter vectors. Note that the relaxed solutions\nin loss-augmented inference are used during training, while in testing, our simple rounding method is\nused to produce valid segmentation results. Rounding is only necessary in case we obtain fractional\nsolutions from LP-relaxed correlation clustering.\nWe compared the proposed pairwise/higher-order correlation clustering to the following state-of-the-\nart image segmentation algorithms: multiscale NCut [27], gPb-owt-ucm [19], and gPb-Hoiem [20]\nthat grouped the same superpixels based on pairwise same-label likelihoods. The pairwise same-\nlabel likelihoods were independently learnt from the training data with the same 611-dimensional\npairwise feature vector. We consider four performance measures: probabilistic Rand index (PRI)\n[28], variation of information (VOI) [29], segmentation covering (SCO) [19], and boundary dis-\nplacement error (BDE) [30]. As the predicted segmentation is close to the ground-truth segmenta-\ntion, the PRI and SCO are increased while the VOI and BDE are decreased.\n\n4.1 Stanford background dataset\n\nThe SBD consists of 715 outdoor images with corresponding pixel-wise annotations. We employed\n5-fold cross-validation with the dataset randomly split into 572 training images and 143 test images\nfor each fold. Figure 4 shows the obtained four measures from segmentation results according to\nthe average number of regions. Note that the performance varies with different numbers of regions,\nand for this reason, we designed each algorithm to produce multiple segmentations (20 to 40 re-\ngions). Speci\ufb01cally, multiple segmentations in the proposed algorithm were obtained by varying\nRp (0.001(cid:24)0.2) and Rh (0.1(cid:24)1.0) in the loss function during training (D=10). Irrespective of the\nmeasure, the proposed higher-order correlation clustering (Corr-Cluster-Higher) performed better\nthan other algorithms including the pairwise correlation clustering (Corr-Cluster-Pairwise). Figure\n5 shows some example segmentations. The proposed higher-order correlation clustering yielded\nthe best segmentation results. In speci\ufb01c, incorrectly predicted segments by pairwise correlation\nclustering were reduced in the segmentation results obtained by higher-order correlation clustering\n\n7\n\n20(cid:13)25(cid:13)30(cid:13)35(cid:13)40(cid:13)0.76(cid:13)0.78(cid:13)0.8(cid:13)0.82(cid:13)0.84(cid:13)Average number of regions(cid:13)PRI(cid:13) (cid:13) (cid:13)Multi(cid:13)-(cid:13)NCut(cid:13)gPb(cid:13)-(cid:13)Hoiem(cid:13)gPb(cid:13)-(cid:13)owt(cid:13)-(cid:13)ucm(cid:13)Corr(cid:13)-(cid:13)Cluster(cid:13)-(cid:13)Pairwise(cid:13)Corr(cid:13)-(cid:13)Cluster(cid:13)-(cid:13)Higher(cid:13)20(cid:13)25(cid:13)30(cid:13)35(cid:13)40(cid:13)2(cid:13)2.5(cid:13)3(cid:13)Average number of regions(cid:13)VOI(cid:13) (cid:13) (cid:13)20(cid:13)25(cid:13)30(cid:13)35(cid:13)40(cid:13)0.3(cid:13)0.4(cid:13)0.5(cid:13)0.6(cid:13)Average number of regions(cid:13)SCO(cid:13) (cid:13) (cid:13)20(cid:13)25(cid:13)30(cid:13)35(cid:13)40(cid:13)8(cid:13)9(cid:13)10(cid:13)11(cid:13)Average number of regions(cid:13)BDE(cid:13) (cid:13) (cid:13)\fFigure 5: Results of image segmentation.\n\nTable 1: Quantitative results on the BSDS test set and on the MSRC test set.\n\nTest set\n\nMulti-NCut\ngPb-owt-ucm\ngPb-Hoiem\n\nCorr-Cluster-Pairwise\nCorr-Cluster-Higher\n\nBSDS\n\nVOI\n3.043\n1.909\n3.194\n1.829\n1.743\n\nSCO\n0.315\n0.571\n0.316\n0.585\n0.599\n\nPRI\n0.728\n0.794\n0.724\n0.806\n0.814\n\nMSRC\n\nBDE\n14.257\n11.461\n14.795\n11.194\n10.377\n\nPRI\n0.628\n0.779\n0.614\n0.773\n0.784\n\nVOI\n2.765\n1.675\n2.847\n1.648\n1.594\n\nSCO\n0.341\n0.628\n0.353\n0.632\n0.648\n\nBDE\n11.941\n9.800\n13.533\n9.194\n9.040\n\nowing to the consideration of higher-order relations in broad regions. Regarding the runtime of our\nalgorithm, we observed that for test-time inference it took on average around 15 seconds (graph\nconstruction and feature extraction: 14s, LP: 1s) per image on a 2.67GHz processor, whereas the\noverall training took 10 hours on the training set. Note that other segmentation algorithms such as\nthe multiscale NCut and the gPb-owt-ucm took on average a few minutes per image.\n\n4.2 Berkeley segmentation dataset and MSRC dataset\n\nThe BSDS contains 300 natural images split into the 200 training images and 100 test images. Since\neach image is segmented by multiple human subjects, we de\ufb01ned a single probabilistic (real-valued)\nground-truth segmentation of each image for training in the proposed correlation clustering. The\nMSRC dataset is composed of 591 natural images. We split the data into 45% training, 10% valida-\ntion, and 45% test sets, following [26]. The performance was evaluated using the clean ground-truth\nobject instance labeling of [31]. On average, all segmentation algorithms were set to produce 30\ndisjoint regions per image on the BSDS and 15 disjoint regions per image on the MSRC dataset.\nAs shown in Table 1, the proposed higher-order correlation clustering gave the best results on both\ndatasets. Especially, the obtained results on the BSDS are similar or even better than the best results\never reported on the BSDS [32, 19].\n5 Conclusion\nThis paper proposed the higher-order correlation clustering over a hypergraph to merge superpixels\ninto homogeneous regions. The LP relaxation was used to approximately solve the higher-order\ncorrelation clustering over a hypergraph where a rich feature vector was de\ufb01ned based on several vi-\nsual cues involving higher-order relations among superpixels. The S-SVM was used for supervised\ntraining of parameters in correlation clustering, and the cutting plane algorithm with LP-relaxed in-\nference was applied to solve the optimization problem of S-SVM. Experimental results showed that\nthe proposed higher-order correlation clustering outperformed other image segmentation algorithms\non various datasets. The proposed framework is applicable to a variety of other areas.\n\nAcknowledgments\n\nThis work was supported by the National Research Foundation of Korea (NRF) grant funded by the\nKorea government (MEST) (No.2011-0018249).\n\n8\n\nGround(cid:13)-(cid:13)truth(cid:13)Multi(cid:13)-(cid:13)NCut(cid:13)gPb(cid:13)-(cid:13)Hoiem(cid:13)gPb(cid:13)-(cid:13)owt(cid:13)-(cid:13)ucm(cid:13)Corr(cid:13)-(cid:13)Cluster(cid:13)-(cid:13)Pairwise(cid:13)Corr(cid:13)-(cid:13)Cluster(cid:13)-(cid:13)Higher(cid:13)Original image(cid:13)\fReferences\n[1] T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Wu, \u201cAn ef\ufb01cient k-means\n\nclustering algorithm: Analysis and implementation,\u201d PAMI, vol. 24, pp. 881\u2013892, 2002.\n\n[2] D. Comaniciu and P. Meer, \u201cMean shift: A robust approach toward feature space analysis,\u201d PAMI, vol.\n\n24, pp. 603\u2013619, 2002.\n\n[3] C. Carson, S. Belongie, H. Greenspan, and J. Malik, \u201cBlobworld: image segmentation using expectation-\n\nmaximization and its application to image querying,\u201d PAMI, vol. 24, pp. 1026\u20131038, 2002.\n\n[4] F. Estrada and A. Jepson, \u201cSpectral embedding and mincut for image segmentation,\u201d in BMVC, 2004.\n[5] J. Shi and J. Malik, \u201cNormalized cuts and image segmentation,\u201d PAMI, vol. 22, pp. 888\u2013905, 2000.\n[6] P. Felzenszwalb and D. Huttenlocher, \u201cEf\ufb01cient graph-based image segmentation,\u201d IJCV, vol. 59, pp.\n\n167\u2013181, 2004.\n\n[7] F. Estrada and A. Jepson, \u201cBenchmarking image segmentation algorithms,\u201d IJCV, vol. 85, 2009.\n[8] N. Bansal, A. Blum, and S. Chawla, \u201cCorrelation clustering,\u201d Machine Learning, vol. 56, 2004.\n[9] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, \u201cLarge margin methods for structured and\n\nindependent output variables,\u201d JMLR, vol. 6, 2005.\n\n[10] T. Finley and T. Joachims, \u201cSupervised clustering with support vector machines,\u201d in ICML, 2005.\n[11] B. Taskar, \u201cLearning structured prediction models: a large margin approach,\u201d Ph.D. thesis, Stanford\n\nUniversity, 2004.\n\n[12] C. Berge, Hypergraphs, North-Holland, Amsterdam, 1989.\n[13] L. Ding and A. Yilmaz, \u201cImage segmentation as learning on hypergraphs,\u201d in Proc. ICMAL, 2008.\n[14] S. Rital, \u201cHypergraph cuts and unsupervised representation for image segmentation,\u201d Fundamenta Infor-\n\nmaticae, vol. 96, pp. 153\u2013179, 2009.\n\n[15] A. Ducournau, S. Rital, A. Bretto, and B. Laget, \u201cA multilevel spectral hypergraph partitioning approach\n\nfor color image segmentation,\u201d in Proc. ICSIPA, 2009.\n\n[16] F. Bach and M. I. Jordan, \u201cLearning spectral clustering,\u201d in NIPS, 2003.\n[17] T. Cour, N. Gogin, and J. Shi, \u201cLearning spectral graph segmentation,\u201d in AISTATS, 2005.\n[18] S. Nowozin and S. Jegelka, \u201cSolution stability in linear programming relaxations: Graph partitioning and\n\nunsupervised learning,\u201d in ICML, 2009.\n\n[19] P. Arbel\u00b4aez, M. Maire, C. Fowlkes, and J. Malik, \u201cContour detection and hierarchical image segmenta-\n\ntion,\u201d PAMI, vol. 33, pp. 898\u2013916, 2011.\n\n[20] D. Hoiem, A. A. Efros, and M. Hebert, \u201cRecovering surface layout from an image,\u201d IJCV, 2007.\n[21] D. Batra, R. Sukthankar, and T. Chen, \u201cLearning class-speci\ufb01c af\ufb01nities for image labelling,\u201d in CVPR,\n\n2008.\n\n[22] T. Finley and T. Joachims, \u201cTraining structural SVMs when exact inference is intractable,\u201d in ICML,\n\n2008.\n\n[23] A. Kulesza and F. Pereira, \u201cStructured learning with approximate inference,\u201d in NIPS, 2007.\n[24] S. Gould, R. Fulton, and D. Koller, \u201cDecomposing a scene into geometric and semantically consistent\n\nregions,\u201d in ICCV, 2009.\n\n[25] C. Fowlkes, D. Martin, and J. Malik, The Berkeley Segmentation Dataset and Benchmark (BSDB),\n\nhttp://www.cs.berkeley.edu/projects/vision/grouping/segbench/.\n\n[26] J. Shotton, J. Winn, C. Rother, and A. Criminisi, \u201cTextonboost: joint apprearence, shape and context\n\nmodeling for multi-class object recognition and segmentation,\u201d in ECCV, 2006.\n\n[27] T. Cour, F. Benezit, and J. Shi, \u201cSpectral segmentation with multiscale graph decomposition,\u201d in CVPR,\n\n2005.\n\n[28] W. M. Rand, \u201cObjective criteria for the evaluation of clustering methods,\u201d Journal of the American\n\nStatistical Association, vol. 66, pp. 846\u2013850, 1971.\n\n[29] M. Meila, \u201cComputing clusterings: An axiomatic view,\u201d in ICML, 2005.\n[30] J. Freixenet, X. Munoz, D. Raba, J. Marti, and X. Cu\ufb01, \u201cYet another survey on image segmentation:\n\nRegion and boundary information integration,\u201d in ECCV, 2002.\n\n[31] T. Malisiewicz and A. A. Efros, \u201cImproving spatial support for objects via multiple segmentations,\u201d in\n\nBMVC, 2007.\n\n[32] T. Kim, K. Lee, and S. Lee, \u201cLearning full pairwise af\ufb01nities for spectral segmentation,\u201d in CVPR, 2010.\n\n9\n\n\f", "award": [], "sourceid": 873, "authors": [{"given_name": "Sungwoong", "family_name": "Kim", "institution": null}, {"given_name": "Sebastian", "family_name": "Nowozin", "institution": null}, {"given_name": "Pushmeet", "family_name": "Kohli", "institution": null}, {"given_name": "Chang", "family_name": "Yoo", "institution": null}]}