{"title": "Discriminative Fields for Modeling Spatial Dependencies in Natural Images", "book": "Advances in Neural Information Processing Systems", "page_first": 1531, "page_last": 1538, "abstract": "", "full_text": "Discriminative Fields for Modeling Spatial\n\nDependencies in Natural Images\n\nSanjiv Kumar and Martial Hebert\n\nThe Robotics Institute\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\nfskumar,hebertg@ri.cmu.edu\n\nAbstract\n\nIn this paper we present Discriminative Random Fields (DRF), a discrim-\ninative framework for the classi\ufb01cation of natural image regions by incor-\nporating neighborhood spatial dependencies in the labels as well as the\nobserved data. The proposed model exploits local discriminative models\nand allows to relax the assumption of conditional independence of the\nobserved data given the labels, commonly used in the Markov Random\nField (MRF) framework. The parameters of the DRF model are learned\nusing penalized maximum pseudo-likelihood method. Furthermore, the\nform of the DRF model allows the MAP inference for binary classi\ufb01ca-\ntion problems using the graph min-cut algorithms. The performance of\nthe model was veri\ufb01ed on the synthetic as well as the real-world images.\nThe DRF model outperforms the MRF model in the experiments.\n\n1\n\nIntroduction\n\nFor the analysis of natural images, it is important to use contextual information in the form\nof spatial dependencies in images. In a probabilistic framework, this leads one to random\n\ufb01eld modeling of the images. In this paper we address the main challenge involving such\nmodeling, i.e. how to model arbitrarily complex dependencies in the observed image data\nas well as the labels in a principled manner.\n\nIn the literature, Markov Random Field (MRF) is a commonly used model to incorporate\ncontextual information [1]. MRFs are generally used in a probabilistic generative frame-\nwork that models the joint probability of the observed data and the corresponding labels.\nIn other words, let y be the observed data from an input image, where y = fyigi2S, yi\nis the data from ith site, and S is the set of sites. Let the corresponding labels at the im-\nage sites be given by x = fxigi2S. In the MRF framework, the posterior over the labels\ngiven the data is expressed using the Bayes\u2019 rule as, p(xjy) / p(x; y) = p(x)p(yjx)\nwhere the prior over labels, p(x) is modeled as a MRF. For computational tractability, the\nobservation or likelihood model, p(yjx) is usually assumed to have a factorized form, i.e.\nsumption is too restrictive for the analysis of natural images. For example, consider a class\nthat contains man-made structures (e.g. buildings). The data belonging to such a class is\nhighly dependent on its neighbors since the lines or edges at spatially adjoining sites follow\n\np(yjx) = Qi2S p(yijxi)[1][2]. However, as noted by several researchers [3][4], this as-\n\n\fsome underlying organization rules rather than being random (See Fig. 2). This is also true\nfor a large number of texture classes that are made of structured patterns.\n\nSome efforts have been made in the past to model the dependencies in the data [3][4], but\nthey make factored approximations of the actual likelihood for tractability. In addition,\nsimplistic forms of the factors preclude capturing stronger relationships in the observations\nin the form of arbitrarily complex features that might be desired to discriminate between\ndifferent classes. Now considering a different point of view, for classi\ufb01cation purposes, we\nare interested in estimating the posterior over labels given the observations, i.e., p(xjy).\nIn a generative framework, one expends efforts to model the joint distribution p(x; y),\nwhich involves implicit modeling of the observations. In a discriminative framework, one\nmodels the distribution p(xjy) directly. As noted in [2], a potential advantage of using the\ndiscriminative approach is that the true underlying generative model may be quite complex\neven though the class posterior is simple. This means that the generative approach may\nspend a lot of resources on modeling the generative models which are not particularly\nrelevant to the task of inferring the class labels. Moreover, learning the class density models\nmay become even harder when the training data is limited [5].\n\nIn this work we present a Discriminative Random Field (DRF) model based on the con-\ncept of Conditional Random Field (CRF) proposed by Lafferty et al. [6] in the context of\nsegmentation and labeling of 1-D text sequences. The CRFs directly model the posterior\ndistribution p(xjy) as a Gibbs \ufb01eld. This approach allows one to capture arbitrary de-\npendencies between the observations without resorting to any model approximations. Our\nmodel further enhances the CRFs by proposing the use of local discriminative models to\ncapture the class associations at individual sites as well as the interactions in the neigh-\nboring sites on 2-D grid lattices. The proposed model uses local discriminative models to\nachieve the site classi\ufb01cation while permitting interactions in both the observed data and\nthe label \ufb01eld in a principled manner. The research presented in this paper alleviates several\nproblems with the previous version of the DRFs described in [7].\n\n2 Discriminative Random Field\n\nWe \ufb01rst restate in our notations the de\ufb01nition of the Conditional Random Fields as given\nby Lafferty et al. [6]. In this work we will be concerned with binary classi\ufb01cation, i.e.\nxi 2 f(cid:0)1; 1g. Let the observed data at site i, yi 2 0 8 x has been assumed implicitly. Now, using the Hammer-\nsley Clifford theorem [1] and assuming only up to pairwise clique potentials to be nonzero,\nthe joint distribution over the labels x given the observations y can be written as,\n\np(xjy) =\n\n1\nZ\n\n(1)\n\nexp0@Xi2S\n\nAi(xi; y)+Xi2S Xj2Ni\n\nIij(xi; xj; y)1A\n\nwhere Z is a normalizing constant known as the partition function, and -Ai and -Iij are the\nunary and pairwise potentials respectively. With a slight abuse of notations, in the rest of\nthe paper we will call Ai as association potential and Iij as interaction potential. Note that\nboth the terms explicitly depend on all the observations y. In the DRFs, the association\npotential is seen as a local decision term which decides the association of a given site to a\ncertain class ignoring its neighbors. The interaction potential is seen as a data dependent\n\n\fsmoothing function. For simplicity, in the rest of the paper we assume the random \ufb01eld\ngiven in (1) to be homogeneous and isotropic, i.e. the functional forms of Ai and Iij are\nindependent of the locations i and j. Henceforth we will leave the subscripts and simply\nuse the notations A and I. Note that the assumption of isotropy can be easily relaxed at the\ncost of a few additional parameters.\n\n2.1 Association potential\n\nIn the DRF framework, A(xi; y) is modeled using a local discriminative model that outputs\nthe association of the site i with class xi. Generalized Linear Models (GLM) are used\nextensively in statistics to model the class posteriors given the observations [8]. For each\nsite i, let f i(y) be a function that maps the observations y on a feature vector such that\nf i : y !