{"title": "Maximin affinity learning of image segmentation", "book": "Advances in Neural Information Processing Systems", "page_first": 1865, "page_last": 1873, "abstract": "Images can be segmented by first using a classifier to predict an affinity graph that reflects the degree to which image pixels must be grouped together and then partitioning the graph to yield a segmentation. Machine learning has been applied to the affinity classifier to produce affinity graphs that are good in the sense of minimizing edge misclassification rates. However, this error measure is only indirectly related to the quality of segmentations produced by ultimately partitioning the affinity graph. We present the first machine learning algorithm for training a classifier to produce affinity graphs that are good in the sense of producing segmentations that directly minimize the Rand index, a well known segmentation performance measure. The Rand index measures segmentation performance by quantifying the classification of the connectivity of image pixel pairs after segmentation. By using the simple graph partitioning algorithm of finding the connected components of the thresholded affinity graph, we are able to train an affinity classifier to directly minimize the Rand index of segmentations resulting from the graph partitioning. Our learning algorithm corresponds to the learning of maximin affinities between image pixel pairs, which are predictive of the pixel-pair connectivity.", "full_text": "Maximin af\ufb01nity learning of image segmentation\n\nSrinivas C. Turaga \u2217\n\nMIT\n\nKevin L. Briggman\n\nMax-Planck Insitute for Medical Research\n\nMoritz Helmstaedter\n\nMax-Planck Insitute for Medical Research\n\nWinfried Denk\n\nMax-Planck Insitute for Medical Research\n\nH. Sebastian Seung\n\nMIT, HHMI\n\nAbstract\n\nImages can be segmented by \ufb01rst using a classi\ufb01er to predict an af\ufb01nity graph\nthat re\ufb02ects the degree to which image pixels must be grouped together and then\npartitioning the graph to yield a segmentation. Machine learning has been applied\nto the af\ufb01nity classi\ufb01er to produce af\ufb01nity graphs that are good in the sense of\nminimizing edge misclassi\ufb01cation rates. However, this error measure is only indi-\nrectly related to the quality of segmentations produced by ultimately partitioning\nthe af\ufb01nity graph. We present the \ufb01rst machine learning algorithm for training a\nclassi\ufb01er to produce af\ufb01nity graphs that are good in the sense of producing seg-\nmentations that directly minimize the Rand index, a well known segmentation\nperformance measure.\nThe Rand index measures segmentation performance by quantifying the classi\ufb01-\ncation of the connectivity of image pixel pairs after segmentation. By using the\nsimple graph partitioning algorithm of \ufb01nding the connected components of the\nthresholded af\ufb01nity graph, we are able to train an af\ufb01nity classi\ufb01er to directly\nminimize the Rand index of segmentations resulting from the graph partitioning.\nOur learning algorithm corresponds to the learning of maximin af\ufb01nities between\nimage pixel pairs, which are predictive of the pixel-pair connectivity.\n\n1 Introduction\n\nSupervised learning has emerged as a serious contender in the \ufb01eld of image segmentation, ever\nsince the creation of training sets of images with \u201cground truth\u201d segmentations provided by humans,\nsuch as the Berkeley Segmentation Dataset [15]. Supervised learning requires 1) a parametrized\nalgorithm that map images to segmentations, 2) an objective function that quanti\ufb01es the performance\nof a segmentation algorithm relative to ground truth, and 3) a means of searching the parameter space\nof the segmentation algorithm for an optimum of the objective function.\nIn the supervised learning method presented here,\nthe segmentation algorithm consists of a\nparametrized classi\ufb01er that predicts the weights of a nearest neighbor af\ufb01nity graph over image\npixels, followed by a graph partitioner that thresholds the af\ufb01nity graph and \ufb01nds its connected\ncomponents. Our objective function is the Rand index [18], which has recently been proposed as a\nquantitative measure of segmentation performance [23]. We \u201csoften\u201d the thresholding of the classi-\n\ufb01er output and adjust the parameters of the classi\ufb01er by gradient learning based on the Rand index.\n\n\u2217sturaga@mit.edu\n\n1\n\n\fsegmentation algorithm\n\nhypothetical thresholded a(cid:31)nity graphs\n\nmerge!\n\nmissing\nmissing\nmissing\nmissing\nmissing\nmissing\nmissing\nmissing\nmissing\nmissing\nmissing\nmissing\n\na\ufb03nity\n\nprediction\n\nthreshold,\nconnected\ncomponents\n\nimage\n\nweighted a(cid:31)nity graph\n\nsegmentation\n\na(cid:31)nity graph #1\n\na(cid:31)nity graph #2\na(cid:31)nity graph #2\na(cid:31)nity graph #2\na(cid:31)nity graph #2\n\nFigure 1: (left) Our segmentation algorithm. We \ufb01rst generate a nearest neighbor weighted af\ufb01n-\nity graph representing the degree to which nearest neighbor pixels should be grouped together. The\nsegmentation is generated by \ufb01nding the connected components of the thresholded af\ufb01nity graph.\n(right) Af\ufb01nity misclassi\ufb01cation rates are a poor measure of segmentation performance. Af\ufb01n-\nity graph #1 makes only 1 error (dashed edge) but results in poor segmentations, while graph #2\ngenerates a perfect segmentation despite making many af\ufb01nity misclassi\ufb01cations (dashed edges).\n\nBecause maximin edges of the af\ufb01nity graph play a key role in our learning method, we call it max-\nimin af\ufb01nity learning of image segmentation, or MALIS. The minimax path and edge are standard\nconcepts in graph theory, and maximin is the opposite-sign sibling of minimax. Hence our work can\nbe viewed as a machine learning application of these graph theoretic concepts. MALIS focuses on\nimproving classi\ufb01er output at maximin edges, because classifying these edges incorrectly leads to\ngenuine segmentation errors, the splitting or merging of segments.\nTo the best of our knowledge, MALIS is the \ufb01rst supervised learning method that is based on opti-\nmizing a genuine measure of segmentation performance. The idea of training a classi\ufb01er to predict\nthe weights of an af\ufb01nity graph is not novel. Af\ufb01nity classi\ufb01ers were previously trained to minimize\nthe number of misclassi\ufb01ed af\ufb01nity edges [9, 16]. This is not the same as optimizing segmentations\nproduced by partitioning the af\ufb01nity graph. There have been attempts to train af\ufb01nity classi\ufb01ers to\nproduce good segmentations when partitioned by normalized cuts [17, 2]. But these approaches do\nnot optimize a genuine measure of segmentation performance such as the Rand index. The work of\nBach and Jordan [2] is the closest to our work. However, they only minimize an upper bound to a\nrenormalized version of the Rand index. Both approaches require many approximations to make the\nlearning tractable.\nIn other related work, classi\ufb01ers have been trained to optimize performance at detecting image pixels\nthat belong to object boundaries [16, 6, 14]. Our classi\ufb01er can also be viewed as a boundary detector,\nsince a nearest neighbor af\ufb01nity graph is essentially the same as a boundary map, up to a sign\ninversion. However, we combine our classi\ufb01er with a graph partitioner to produce segmentations.\nThe classi\ufb01er parameters are not trained to optimize performance at boundary detection, but to\noptimize performance at segmentation as measured by the Rand index.\nThere are also methods for supervised learning of image labeling using Markov or conditional ran-\ndom \ufb01elds [10]. But image labeling is more similar to multi-class pixel classi\ufb01cation rather than\nimage segmentation, as the latter task may require distinguishing between multiple objects in a\nsingle image that all have the same label.\nIn the cases where probabilistic random \ufb01eld models have been used for image parsing and seg-\nmentation, the models have either been simplistic for tractability reasons [12] or have been trained\npiecemeal. For instance, Tu et al. [22] separately train low-level discriminative modules based on a\nboosting classi\ufb01er, and train high-level modules of their algorithm to model the joint distribution of\nthe image and the labeling. These models have never been trained to minimize the Rand index.\n\n2 Partitioning a thresholded af\ufb01nity graph by connected components\n\nOur class of segmentation algorithms is constructed by combining a classi\ufb01er and a graph partitioner\n(see Figure 1). The classi\ufb01er is used to generate the weights of an af\ufb01nity graph. The nodes of the\ngraph are image pixels, and the edges are between nearest neighbor pairs of pixels. The weights of\nthe edges are called af\ufb01nities. A high af\ufb01nity means that the two pixels tend to belong to the same\n\n2\n\n\fsegment. The classi\ufb01er computes the af\ufb01nity of each edge based on an image patch surrounding the\nedge.\nThe graph partitioner \ufb01rst thresholds the af\ufb01nity graph by removing all edges with weights less\nthan some threshold value \u03b8. The connected components of this thresholded af\ufb01nity graph are the\nsegments of the image.\nFor this class of segmentation algorithms, it\u2019s obvious that a single misclassi\ufb01ed edge of the af\ufb01nity\ngraph can dramatically alter the resulting segmentation by splitting or merging two segments (see\nFig. 1). This is why it is important to learn by optimizing a measure of segmentation performance\nrather than af\ufb01nity prediction.\nWe are well aware that connected components is an exceedingly simple method of graph partition-\ning. More sophisticated algorithms, such as spectral clustering [20] or graph cuts [3], might be more\nrobust to misclassi\ufb01cations of one or a few edges of the af\ufb01nity graph. Why not use them instead?\nWe have two replies to this question.\nFirst, because of the simplicity of our graph partitioning, we can derive a simple and direct method\nof supervised learning that optimizes a true measure of image segmentation performance. So far\nlearning based on more sophisticated graph partitioning methods has fallen short of this goal [17, 2].\nSecond, even if it were possible to properly learn the af\ufb01nities used by more sophisticated graph\npartitioning methods, we would still prefer our simple connected components. The classi\ufb01er in\nour segmentation algorithm can also carry out sophisticated computations, if its representational\npower is suf\ufb01ciently great. Putting the sophistication in the classi\ufb01er has the advantage of making it\nlearnable, rather than hand-designed.\nThe sophisticated partitioning methods clean up the af\ufb01nity graph by using prior assumptions about\nthe properties of image segmentations. But these prior assumptions could be incorrect. The spirit of\nthe machine learning approach is to use a large amount of training data and minimize the use of prior\nassumptions. If the sophisticated partitioning methods are indeed the best way of achieving good\nsegmentation performance, we suspect that our classi\ufb01er will learn them from the training data. If\nthey are not the best way, we hope that our classi\ufb01er will do even better.\n\n3 The Rand index quanti\ufb01es segmentation performance\n\nImage segmentation can be viewed as a special case of the general problem of clustering, as image\nsegments are clusters of image pixels. Long ago, Rand proposed an index of similarity between two\nclusterings [18]. Recently it has been proposed that the Rand index be applied to image segmen-\ntations [23]. De\ufb01ne a segmentation S as an assignment of a segment label si to each pixel i. The\nindicator function \u03b4(si, sj) is 1 if pixels i and j belong to the same segment (si = sj) and 0 otherwise.\nGiven two segmentations S and \u02c6S of an image with N pixels, de\ufb01ne the function\n\n(cid:18)N\n\n(cid:19)\u22121 \u2211\n\n(cid:12)(cid:12)\u03b4(si, sj) \u2212 \u03b4(\u02c6si, \u02c6sj)(cid:12)(cid:12)\n\n(1)\n\n1 \u2212 RI( \u02c6S, S) =\n\n2\n\ni