{"title": "Learning to Detect Natural Image Boundaries Using Brightness and Texture", "book": "Advances in Neural Information Processing Systems", "page_first": 1279, "page_last": 1286, "abstract": null, "full_text": "Learning to Detect Natural Image Boundaries\n\nUsing Brightness and Texture\n\nDavid R. Martin Charless C. Fowlkes\n\nJitendra Malik\n\nComputer Science Division, EECS, U.C. Berkeley, Berkeley, CA 94720\n\n dmartin,fowlkes,malik\n\n@cs.berkeley.edu\n\nAbstract\n\nThe goal of this work is to accurately detect and localize boundaries in\nnatural scenes using local image measurements. We formulate features\nthat respond to characteristic changes in brightness and texture associated\nwith natural boundaries. In order to combine the information from these\nfeatures in an optimal way, a classi\ufb01er is trained using human labeled\nimages as ground truth. We present precision-recall curves showing that\nthe resulting detector outperforms existing approaches.\n\n1 Introduction\n\nConsider the image patches in Figure 1. Though they lack global context, it is clear which\ncontain boundaries and which do not. The goal of this paper is to use features extracted\nfrom the image patch to estimate the posterior probability of a boundary passing through\nthe center point. Such a local boundary model is integral to higher-level segmentation algo-\nrithms, whether based on grouping pixels into regions [21, 8] or grouping edge fragments\ninto contours [22, 16].\n\nThe traditional approach to this problem is to look for discontinuities in image brightness.\nFor example, the widely employed Canny detector [2] models boundaries as brightness step\nedges. The image patches show that this is an inadequate model for boundaries in natural\nimages, due to the ubiquitous phenomenon of texture. The Canny detector will \ufb01re wildly\ninside textured regions where high-contrast contours are present but no boundary exists. In\naddition, it is unable to detect the boundary between textured regions when there is only a\nsubtle change in average image brightness.\n\nThese signi\ufb01cant problems have lead researchers to develop boundary detectors that ex-\nplicitly model texture. While these work well on synthetic Brodatz mosaics, they have\nproblems in the vicinity of brightness edges. Texture descriptors over local windows that\nstraddle a boundary have different statistics from windows contained in either of the neigh-\nboring regions. This results in thin halo-like regions being detected around contours.\n\nClearly, boundaries in natural images are marked by changes in both texture and brightness.\nEvidence from psychophysics [18] suggests that humans make combined use of these two\ncues to improve detection and localization of boundaries. There has been limited work in\ncomputational vision on addressing the dif\ufb01cult problem of cue combination. For example,\nthe authors of [8] associate a measure of texturedness with each point in an image in order\nto suppress contour processing in textured regions and vice versa. However, their solution\nis full of ad-hoc design decisions and hand chosen parameters.\n\n\u0001\n\fThe main contribution of this paper is to provide a more principled approach to cue com-\nbination by framing the task as a supervised learning problem. A large dataset of natural\nimages that have been manually segmented by multiple human subjects [10] provides the\nground truth label for each pixel as being on- or off-boundary. The task is then to model\nthe probability of a pixel being on-boundary conditioned on some set of locally measured\nimage features. This sort of quantitative approach to learning and evaluating boundary\ndetectors is similar to the work of Konishi et al. [7] using the Sowerby dataset of En-\nglish countryside scenes. Our work is distinguished by an explicit treatment of texture and\nbrightness, enabling superior performance on a more diverse collection of natural images.\n\nThe outline of the paper is as follows. In Section 2 we describe the oriented energy and\ntexture gradient features used as input to our algorithm. Section 3 discusses the classi\ufb01ers\nwe use to combine the local features. Section 4 presents our evaluation methodology along\nwith a quantitative comparison of our method to existing boundary detection methods. We\nconclude in Section 5.\n\n2 Image Features\n\n2.1 Oriented Energy\n\nIn natural images, brightness edges are more than simple steps. Phenomena such as spec-\nularities, mutual illumination, and shading result in composite intensity pro\ufb01les consisting\nof steps, peaks, and roofs. The oriented energy (OE) approach [12] can be used to detect\nand localize these composite edges [14]. OE is de\ufb01ned as:\n\n\f\u0002\u000e\u0010\u000f\u0012\u0019\n\u0003\u0013\u0005\n\nwhere \u000f\n\u0007 are a quadrature pair of even- and odd-symmetric \ufb01lters at orientation\n\u001a and scale \u001b . Our even-symmetric \ufb01lter is a Gaussian second-derivative, and the corre-\n\u0007 has maximum response for\nsponding odd-symmetric \ufb01lter is its Hilbert transform.\n\b\u001d\u001c\u001f\u001e! #\"\ncontours at orientation \u001a . We compute OE at 3 half-octave scales starting at \u001b\nthe image diagonal. The \ufb01lters are elongated by a ratio of 3:1 along the putative boundary\ndirection.\n\n\u0007\t\b\u000b\n\r\f\u0002\u000e\u0010\u000f\u0012\u0011\n\u0003\u0013\u0005\n\n\u0002\u0001\u0004\u0003\u0006\u0005\n\n\u0003\u0013\u0005\n\n\u0007 and \u000f\n\n\u0003\u0013\u0005\n\n\u0007\u0015\u0014\u0017\u0016\t\u0018\n\n\u0007\u0015\u0014\u0017\u0016\n\n\u0002\u0001\n\n\u0003\u0013\u0005\n\n2.2 Texture Gradient\n\nWe would like a directional operator that measures the degree to which texture varies at\nin direction \u001a . A natural way to operationalize this is to consider a disk\na location \n%$'&)(\nof radius \u001b centered on \n*$'&+(\n\u0014 , and divided in two along a diameter at orientation \u001a . We\ncan then compare the texture in the two half discs with some texture dissimilarity measure.\nOriented texture processing along these lines has been pursued by [19].\n\nWhat texture dissimilarity measure should one use? There is an emerging consensus that\nfor texture analysis, an image should \ufb01rst be convolved with a bank of \ufb01lters tuned to\nvarious orientations and spatial frequencies [4, 9]. After \ufb01ltering, a texture descriptor is\nthen constructed using the empirical distribution of \ufb01lter responses in the neighborhood of\na pixel. This approach has been shown to be very powerful both for texture synthesis [5] as\nwell as texture discrimination [15].\n\nPuzicha et al. [15] evaluate a wide range of texture descriptors in this framework. We\nchoose the approach developed in [8]. Convolution with a \ufb01lter bank containing both even\nand odd \ufb01lters at multiple orientations as well as a radially symmetric center-surround\n\ufb01lter associates a vector of \ufb01lter responses to every pixel. These vectors are clustered\nusing k-means and each pixel is assigned to one of the cluster centers, or textons. Texture\ndissimilarities can then be computed by comparing the histograms of textons in the two\n\ndisc halves. Let ,.- and /0- count how many pixels of texton type 1 occur in each half disk.\n\n\u0011\n\u0019\n\u0014\n\fImage\n\nIntensity\n\n\u0002\u0001\n\n\u0002\u0001\n\n\u0001\u0003\u0002\n\n\u0001\u0003\u0002\n\ns\ne\ni\nr\na\nd\nn\nu\no\nB\n-\nn\no\nN\n\ns\ne\ni\nr\na\nd\nn\nu\no\nB\n\nFigure 1: Local image features. In each row, the \ufb01rst panel shows the image patch. The following\npanels show feature pro\ufb01les along the line marked in each patch. The features are raw image intensity,\nraw oriented energy\n, and localized\n\u0004\u0006\u0005\ntexture gradient\n. The vertical line in each pro\ufb01le marks the patch center. The challenge is to\ncombine these features in order to detect and localize boundaries.\n\n, localized oriented energy\n\n, raw texture gradient\n\n\u0004\u0006\u0005\n\n\b\n\t\n\n\b\n\t\n\nWe de\ufb01ne the texture gradient (TG) to be the\n\b\r\f\n\nThe texture gradient is computed at each pixel \n*$'&+(\nof the image diagonal.\nscales starting at \u001b\n\n2.3 Localization\n\n\u0016 distance between these two histograms:\n\n-\u0011\u0010\n,#-\n\u0014 over 12 orientations and 3 half-octave\n\n/0-\n\nThe underlying function we are trying to learn is tightly peaked around the location of\nimage boundaries marked by humans. In contrast, Figure 1 shows that the features we have\ndiscussed so far don\u2019t have this structure. By nature of the fact that they pool information\nover some support, they produce smooth, spatially extended outputs. The texture gradient\nis particularly prone to this effect, since the texture in a window straddling the boundary is\ndistinctly different than the textures on either side of the boundary. This often results in a\nwide plateau or even double peaks in the texture gradient.\n\nSince each pixel is classi\ufb01ed independently, these spatially extended features are partic-\nularly problematic as both on-boundary pixels and nearby off-boundary pixels will have\nlarge OE and TG. In order to make this spatial structure available to the classi\ufb01er we trans-\nform the raw OE and TG signals in order to emphasize local maxima. Given a feature \u000f\n\n%$\n\n\n\n\u0007\n\u0007\n\u000b\n\u000b\n\u0016\n\n,\n&\n/\n\u0014\n\u000e\n\u000f\n\n,\n/\n-\n\u0014\n\u0016\n\u0018\n\b\n\f\n\u001e\n\u0012\n\"\n\u0014\n\f\n%$\n\n%$\n\n%$\n\n*$\n\n%$\n\n*$\n\n%$\n\n\u000f\u0007\u0006\n\n%$\n\n\u0014\u0001\u0003\u0002\n\n\u000f\u0007\u0006\n\u0006\r\n%$\n\n\u0014 , where\u0002\n\nde\ufb01ned over spatial coordinate $ orthogonal to the edge orientation, consider the derived\n\u000f\u0007\u0006\r\n%$\nis the \ufb01rst-order approximation\nfeature \nof the distance to the nearest maximum of \u000f\n\n*$\n\u0010\u0005\u0004\n\n*$\n\u0014\t\b\u000b\n\nwith\f chosen to optimize the performance of the feature. By incorporating the\n\n\u0014 will have narrower peaks than the raw \u000f\n\n\u0014 . We use the stabilized version\n\n\u0018\r\f\u000f\u000e\n\nlocalization term,\n\nTo robustly estimate the directional derivatives and localize the peaks, we \ufb01t a cylindrical\nparabola over a circular window of radius\ncentered at each pixel. The coef\ufb01cients of\nthe quadratic \ufb01t\n\u0014 , where\u0014 and\nrequire half-wave recti\ufb01cation.1\nabove becomes \nThis transformation is applied to the oriented energy and texture gradient signals at each\norientation \u001a and scale \u001b separately. In order to set\n\n\u0003\u0002\n\u0018\u0015\u0014 provide directly the signal derivatives, so the transform\n\u0014\u0018\nand\f , we optimized the performance\n\nof each feature independently with respect to the training data. 2\nColumns 4 and 6 in Figure 1 show the results of applying this transformation which clearly\nhas the effect of reducing noise and tightly localizing the boundaries. Our \ufb01nal feature set\nconsists of these localized signals\n, each at three scales. This yields a 6-element\nand \nfeature vector at 12 orientations at each pixel.\n\n\u0018\u0013\u0012\n\n\u0018\u0019\f\n\n(1)\n\n%$\n\n%$\n\n\u0014 .\n\n\u0014\u0017\u0016\n\n\u001a\u001c\u001b\n\n\u001d\u001f\u001e\n\n3 Cue Combination Using Classi\ufb01ers\n\n&)\u001a\n\nWe would like to combine the cues given by the local feature vector in order to estimate\nthe posterior probability of a boundary at each image location \n*$'&+(\n\u0014 . Previous work\non learning boundary models includes [11, 7]. We consider several parametric and non-\nparametric models, covering a range of complexity and computational cost. The simplest\nare able to capture the complementary information in the 6 features. The more powerful\nclassi\ufb01ers have the potential to capture non-linear cue \u201cgating\u201d effects. For example, one\nmay wish to ignore brightness edges inside high-contrast textures where OE is high and\nTG is low. These are the classi\ufb01ers we use:\nDensity Estimation Adaptive bins are provided by vector quantization using k-means.\nEach centroid provides the density estimate of its Voronoi cell as the fraction of on-\nboundary samples in the cell. We use k=128 and average the estimates from 10 runs.\nClassi\ufb01cation Trees The domain is partitioned hierarchically. Top-down axis-parallel\nsplits are made so as to maximize the information gain. A 5% bound on the error of\n400 points\nthe density estimate is enforced by splitting cells only when both classes have\npresent.\nLogistic Regression This is the simplest of our classi\ufb01ers, and the one perhaps most easily\nreplicated by neurons in the visual cortex. Initialization is random, and convergence is fast\nand reliable by maximizing the likelihood. We also consider two variants: quadratic com-\nbinations of features, and boosting using the con\ufb01dence-rated generalization of AdaBoost\nby Schapire and Singer [20]. No more than 10 rounds of boosting are required for this\nproblem.\nHierarchical Mixtures of Experts The HME model of Jordan and Jacobs [6] is a mixture\nmodel where both the components and mixing coef\ufb01cients are \ufb01t by logistic functions. We\n\n1Windowed parabolic \ufb01tting is known as 2nd-order Savitsky-Golay \ufb01ltering. We also considered\n\nGaussian derivative \ufb01lters !#\"%$'&\u0018\")(\n$,+\n2The \ufb01tted values are / =! 0.1,0.075,0.013\nfor TG.0\nand0 =! 6.66,9.31,11.72\n\n&*\")(\n\nto estimate!#-\u0017$'&*-\u000f(\n&*-\u000f(\n$.+\nand0 =! 2.1,2.5,3.1\n\nis measured in pixels.\n\nwith nearly identical results.\n\nfor OE, and/ =! .057,.016,.005\n\n\u000f\n\u0014\n\b\n\u000f\n\u0014\n\b\n\u0014\n\u0004\n\n\u0006\n\u0014\n\n\u000f\n\u0014\n\b\n\u000f\n\u0010\n\u0014\n\u0004\n\u000f\n\u0006\n\u0014\n\u0004\n\f\n\u0014\n\n\u000f\n\u0010\n\u0011\n$\n\u0016\n$\n\u000f\n\b\n\u0010\n\n\u000e\n\u0011\n\u0016\n\n\u0004\n\u0012\n\u0004\n\u0011\n\u0010\n\n \n$\n(\n$\n(\n+\n+\n+\n+\n\fRaw Features\n\nLocalized Features\n\ni\n\ni\n\nn\no\ns\nc\ne\nr\nP\n\n1\n\n0.75\n\n0.5\n\n0.25\n\n0\n\n0\n\nall F=.65\noe0 F=.59\noe1 F=.60\noe2 F=.61\ntg0 F=.64\ntg1 F=.64\ntg2 F=.61\n\n0.25\n\n0.5\n\nRecall\n\n0.75\n\n1\n\ni\n\ni\n\nn\no\ns\nc\ne\nr\nP\n\n1\n\n0.75\n\n0.5\n\n0.25\n\n0\n\n0\n\nall F=.67\noe0 F=.60\noe1 F=.62\noe2 F=.63\ntg0 F=.65\ntg1 F=.65\ntg2 F=.63\n\n0.25\n\n0.5\n\nRecall\n\n0.75\n\n1\n\nFigure 2: Performance of raw (left) and localized features (right). The precision and recall axes are\ndescribed in Section 4. Curves towards the top (lower noise) and right (higher accuracy) are more\ndesirable. Each curve is scored by the F-measure, the value of which is shown in the legend. In all\nthe precision-recall graphs in this paper, the maximum F-measure occurs at a recall of approximately\n75%. The left plot shows the performance of the raw OE and TG features using the logistic regression\nclassi\ufb01er. The right plot shows the performance of the features after applying the localization process\nof Equation 1. It is clear that the localization function greatly improves the quality of the individual\nfeatures, especially the texture gradient. The top curve in each graph shows the performance of the\nparameters individually is suboptimal,\n\noverall performance still improves.\n\nfeatures in combination. While tuning each feature\u2019s!\n\nconsider small binary trees up to a depth of 3 (8 experts). The model is initialized in a\ngreedy, top-down manner and \ufb01t with EM.\nSupport Vector Machines We use the SVM package libsvm [3] to do soft-margin clas-\nsi\ufb01cation using Gaussian kernels. The optimal parameters were \u0001 =0.2 and\nThe ground truth boundary data is based on the dataset of [10] which provides 5-6 human\nsegmentations for each of 1000 natural images from the Corel image database. We used\n200 images for training and algorithm development. The 100 test images were used only\nto generate the \ufb01nal results for this paper. The authors of [10] show that the segmentations\nof a single image by the different subjects are highly consistent, so we consider all human-\nmarked boundaries valid. We declare an image location \n%$\nto be on-boundary if it is\n$ =2 pixels and \u0003\n\u001a =30 degrees of any human-marked boundary. The remainder\nwithin \u0003\nare labeled off-boundary.\n\n=0.2.\n\n&+\u001a\n\n&+(\n\nThis classi\ufb01cation task is characterized by relatively low dimension, a large amount of data\n(100M samples for our 240x160-pixel images), and poor separability. The maximum fea-\nsible amount of data, uniformly sampled, is given to each classi\ufb01er. This varies from 50M\nsamples for density estimation to 20K samples for the SVM. Note that a high degree of\nclass overlap in any local feature space is inevitable because the human subjects make use\nof both global constraints and high-level information to resolve locally ambiguous bound-\naries.\n\n4 Results\n\nThe output of each classi\ufb01er is a set of oriented\na boundary at each image location \n%$'&)(\n\nimages, which provide the probability of\n\u0014 based on local information. For several of the\n\n\u0004\u0006\u0005\n\n&+\u001a\n\n/\n&\n0\n+\n\u0002\n\u0014\n\f(a) Feature Combinations\n\n(b) Classifiers\n\ni\n\ni\n\nn\no\ns\nc\ne\nr\nP\n\n1\n\n0.75\n\n0.5\n\n0.25\n\n0\n\n0\n\nall F=.67\noe2+tg1 F=.67\ntg* F=.66\noe* F=.63\n\n0.25\n\n0.5\n\nRecall\n\n0.75\n\n1\n\ni\n\ni\n\nn\no\ns\nc\ne\nr\nP\n\n1\n\n0.75\n\n0.5\n\n0.25\n\n0\n\n0\n\nDensity Estimation F=.68\nClassification Tree F=.68\nLogistic Regression F=.67\nQuadratic LR F=.68\nBoosted LR F=.68\nHier. Mix. of Experts F=.68\nSupport Vector Machine F=.66\n\n0.25\n\n0.5\n\nRecall\n\n0.75\n\n1\n\nFigure 3: Precision-recall curves for (a) different feature combinations, and (b) different classi\ufb01ers.\nThe left panel shows the performance of different combinations of the localized features using the\nlogistic regression classi\ufb01er: the 3 OE features (oe*), the 3 TG features (tg*), the best performing\nsingle OE and TG features (oe2+tg1), and all 6 features together. There is clearly independent infor-\nmation in each feature, but most of the information is captured by the combination of one OE and\none TG feature. The right panel shows the performance of different classi\ufb01ers using all 6 features.\nAll the classi\ufb01ers achieve similar performance, except for the SVM which suffers from the poor sep-\naration of the data. Classi\ufb01cation trees performs the best by a slim margin. Based on performance,\nsimplicity, and low computation cost, we favor the logistic regression and its variants.\n\nimage provides actual posterior probabilities, which is par-\nclassi\ufb01ers we consider, the\nticularly appropriate for the local measurement model in higher-level vision applications.\nFor the purpose of evaluation, we take the maximum\n\nover orientations.\n\nIn order to evaluate the boundary model against the human ground truth, we use the\nprecision-recall framework, a standard evaluation technique in the information retrieval\ncommunity [17]. It is closely related to the ROC curves used for by [1] to evaluate bound-\nary models. The precision-recall curve captures the trade-off between accuracy and noise\nas the detector threshold is varied. Precision is the fraction of detections which are true pos-\nitives, while recall is the fraction of positives that are detected. These are computed using\na distance tolerance of 2 pixels to allow for small localization errors in both the machine\nand human boundary maps.\n\n\u0004\u0006\u0005\n\nThe precision-recall curve is particularly meaningful in the context of boundary detection\nwhen we consider applications that make use of boundary maps, such as stereo or object\nrecognition. It is reasonable to characterize higher level processing in terms of how much\ntrue signal is required to succeed, and how much noise can be tolerated. Recall provides\nthe former and precision the latter. A particular application will de\ufb01ne a relative cost\nbetween these quantities, which focuses attention at a speci\ufb01c point on the precision-recall\n\u0014 , captures this trade-off. The\ncurve. The F-measure, de\ufb01ned as \u0001\n\u0004\u0003\u0002\nlocation of the maximum F-measure along the curve provides the optimal threshold given\n, which we set to 0.5 in our experiments.\n\n\u0004\u0002\n\nFigure 2 shows the performance of the raw and localized features. This provides a clear\nquantitative justi\ufb01cation for the localization process described in Section 2.3. Figure 3a\nshows the performance of various linear combinations of the localized features. The com-\nbination of multiple scales improves performance, but the largest gain comes from using\nOE and TG together.\n\n\u0004\n\u0005\n\n\b\n\n\n\u0018\n\n\f\n\u0010\n\n\u0014\n\u0004\n\n\fi\n\ni\n\nn\no\ns\nc\ne\nr\nP\n\n(a) Detector Comparison\n\n(b) F-Measure vs. Tolerance\n\n1\n\n0.75\n\n0.5\n\n0.25\n\n0\n\n0\n\nHuman F=.75\nUs F=.67\nNitzberg F=.65\nCanny F=.57\n\n0.25\n\n0.5\n\nRecall\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\ne\nr\nu\ns\na\ne\nM\n-\nF\n\n0.75\n\n1\n\n0.4\n\n1\n\nHuman\nUs\nNitzberg\nCanny\n\n1.5\n\n2\n\n2.5\nTolerance (in pixels)\n\n3\n\nFigure 4: The left panel shows precision-recall curves for a variety of boundary detection schemes,\nalong with the precision and recall of the human segmentations when compared with each other. The\nright panel shows the F-measure of each detector as the distance tolerance for measuring precision\nand recall varies. We take the Canny detector as the baseline due to its widespread use. Our detector\noutperforms the learning-based Nitzberg detector proposed by Konishi et al. [7], but there is still a\nsigni\ufb01cant gap with respect to human performance.\n\nThe results presented so far use the logistic regression classi\ufb01er. Figure 3b shows the per-\nformance of the 7 different classi\ufb01ers on the complete feature set. The most obvious trend\nis that they all perform similarly. The simple non-parametric models \u2013 the classi\ufb01cation\ntree and density estimation \u2013 perform the best, as they are most able to make use of the\nlarge quantity of training data to provide unbiased estimates of the posterior. The plain\nlogistic regression model performs extremely well, with the variants of logistic regression\n\u2013 quadratic, boosted, and HME \u2013 performing only slightly better. The SVM is a disap-\npointment because of its lower performance, high computational cost, and fragility. These\n20% of the training\nproblems result from the non-separability of the data, which requires\nexamples to be used as support vectors. Balancing considerations of performance, model\ncomplexity, and computational cost, we favor the logistic model and its variants. 3\nFigure 4 shows the performance of our detector compared to two other approaches. Be-\ncause of its widespread use, MATLAB\u2019s implementation of the classic Canny [2] detector\nforms the baseline. We also consider the Nitzberg detector [13, 7], since it is based on a\nsimilar supervised learning approach, and Konishi et al. [7] show that it outperforms pre-\nvious methods. To make the comparisons fair, the parameters of both Canny and Nitzberg\nwere optimized using the training data. For Canny, this amounts to choosing the optimal\nscale. The Nitzberg detector generates a feature vector containing eigenvalues of the 2nd\nmoment matrix; we train a classi\ufb01er on these 2 features using logistic regression.\n\nFigure 4 also shows the performance of the human data as an upper-bound for the algo-\nrithms. The human precision-recall points are computed for each segmentation by com-\nparing it to the other segmentations of the same image. The approach of this paper is a\nclear improvement over the state of the art in boundary detection, but it will take the addi-\ntion of high-level and global information to close the gap between the machine and human\nperformance.\n\nwith an offset of -2.79. The features have been separately normalized to have unit variance.\n\n3The \ufb01tted coef\ufb01cients for the logistic are ! .088,-.029,.019\n\nfor OE and! .31,.26,.27\n\nfor TG,\n\n \n+\n+\n\f5 Conclusion\n\nWe have de\ufb01ned a novel set of brightness and texture cues appropriate for constructing a\nlocal boundary model. By using a very large dataset of human-labeled boundaries in natural\nimages, we have formulated the task of cue combination for local boundary detection as\na supervised learning problem. This approach models the true posterior probability of a\nboundary at every image location and orientation, which is particularly useful for higher-\nlevel algorithms. Based on a quantitative evaluation on 100 natural images, our detector\noutperforms existing methods.\n\nReferences\n[1] K. Bowyer, C. Kranenburg, and S. Dougherty. Edge detector evaluation using empirical ROC\n\ncurves. Proc. IEEE Conf. Comput. Vision and Pattern Recognition, 1999.\n\n[2] J. Canny. A computational approach to edge detection.\n\nMachine Intelligence, 8:679\u2013698, 1986.\n\nIEEE Trans. Pattern Analysis and\n\n[3] C. Chang and C. Lin. LIBSVM: a library for support vector machines, 2001. Software available\n\nat http://www.csie.ntu.edu.tw/\u02dccjlin/libsvm.\n\n[4] I. Fogel and D. Sagi. Gabor \ufb01lters as texture discriminator. Bio. Cybernetics, 61:103\u201313, 1989.\n[5] D. J. Heeger and J. R. Bergen. Pyramid-based texture analysis/synthesis. In Proceedings of\n\nSIGGRAPH \u201995, pages 229\u2013238, 1995.\n\n[6] M. I. Jordan and R. A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural\n\nComputation, 6:181\u2013214, 1994.\n\n[7] S. Konishi, A. L. Yuille, J. Coughlan, and S. C. Zhu. Fundamental bounds on edge detection:\nan information theoretic evaluation of different edge cues. Proc. IEEE Conf. Comput. Vision\nand Pattern Recognition, pages 573\u2013579, 1999.\n\n[8] J. Malik, S. Belongie, T. Leung, and J. Shi. Contour and texture analysis for image segmenta-\n\ntion. Int\u2019l. Journal of Computer Vision, 43(1):7\u201327, June 2001.\n\n[9] J. Malik and P. Perona. Preattentive texture discrimination with early vision mechanisms. J.\n\nOptical Society of America, 7(2):923\u201332, May 1990.\n\n[10] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images\nand its application to evaluating segmentation algorithms and measuring ecological statistics.\nIn Proc. 8th Int\u2019l. Conf. Computer Vision, volume 2, pages 416\u2013423, July 2001.\n[11] M. Meil\u02d8a and J. Shi. Learning segmentation by random walks. In NIPS, 2001.\n[12] M.C. Morrone and D.C. Burr. Feature detection in human vision: a phase dependent energy\n\nmodel. Proc. R. Soc. Lond. B, 235:221\u201345, 1988.\n\n[13] M. Nitzberg, D. Mumford, and T. Shiota. Filtering, Segmentation and Depth. Springer-Verlag,\n\n1993.\n\n[14] P. Perona and J. Malik. Detecting and localizing edges composed of steps, peaks and roofs. In\n\nProc. Int. Conf. Computer Vision, pages 52\u20137, Osaka, Japan, Dec 1990.\n\n[15] J. Puzicha, T. Hofmann, and J. Buhmann. Non-parametric similarity measures for unsupervised\ntexture segmentation and image retrieval. In Computer Vision and Pattern Recognition, 1997.\n\n[16] X. Ren and J. Malik. A probabilistic multi-scale model for contour completion based on image\n\nstatistics. Proc. 7th Europ. Conf. Comput. Vision, 2002.\n\n[17] C. Van Rijsbergen.\n\n1979.\n\nInformation Retrieval, 2nd ed. Dept. of Comp. Sci., Univ. of Glasgow,\n\n[18] J. Rivest and P. Cavanagh. Localizing contours de\ufb01ned by more than one attribute. Vision\n\nResearch, 36(1):53\u201366, 1996.\n\n[19] Y. Rubner and C. Tomasi. Coalescing texture descriptors. ARPA Image Understanding Work-\n\nshop, 1996.\n\n[20] R. E. Schapire and Y. Singer. Improved boosting algorithms using con\ufb01dence-rated predictions.\n\nMachine Learning, 37(3):297\u2013336, 1999.\n\n[21] Z. Tu, S. Zhu, and H. Shum. Image segmentation by data driven markov chain monte carlo. In\n\nProc. 8th Int\u2019l. Conf. Computer Vision, volume 2, pages 131\u2013138, July 2001.\n\n[22] L.R. Williams and D.W. Jacobs. Stochastic completion \ufb01elds: a neural model of illusory contour\n\nshape and salience. In Proc. 5th Int. Conf. Computer Vision, pages 408\u201315, June 1995.\n\n\f", "award": [], "sourceid": 2217, "authors": [{"given_name": "David", "family_name": "Martin", "institution": null}, {"given_name": "Charless", "family_name": "Fowlkes", "institution": null}, {"given_name": "Jitendra", "family_name": "Malik", "institution": null}]}