{"title": "Using Combinatorial Optimization within Max-Product Belief Propagation", "book": "Advances in Neural Information Processing Systems", "page_first": 369, "page_last": 376, "abstract": null, "full_text": "Using Combinatorial Optimization within Max-Product Belief Propagation\nJohn Duchi Daniel Tarlow Gal Elidan Daphne Koller Department of Computer Science Stanford University Stanford, CA 94305-9010 {jduchi,dtarlow,galel,koller}@cs.stanford.edu\n\nAbstract\nIn general, the problem of computing a maximum a posteriori (MAP) assignment in a Markov random field (MRF) is computationally intractable. However, in certain subclasses of MRF, an optimal or close-to-optimal assignment can be found very efficiently using combinatorial optimization algorithms: certain MRFs with mutual exclusion constraints can be solved using bipartite matching, and MRFs with regular potentials can be solved using minimum cut methods. However, these solutions do not apply to the many MRFs that contain such tractable components as sub-networks, but also other non-complying potentials. In this paper, we present a new method, called C O M P O S E, for exploiting combinatorial optimization for sub-networks within the context of a max-product belief propagation algorithm. C O M P O S E uses combinatorial optimization for computing exact maxmarginals for an entire sub-network; these can then be used for inference in the context of the network as a whole. We describe highly efficient methods for computing max-marginals for subnetworks corresponding both to bipartite matchings and to regular networks. We present results on both synthetic and real networks encoding correspondence problems between images, which involve both matching constraints and pairwise geometric constraints. We compare to a range of current methods, showing that the ability of C O M P O S E to transmit information globally across the network leads to improved convergence, decreased running time, and higher-scoring assignments.\n\n1 Introduction\nMarkov random fields (MRFs) [12] have been applied to a wide variety of real-world problems. However, the probabilistic inference task in MRFs -- computing the posterior distribution of one or more variables -- is tractable only in small tree-width networks, which are not often an appropriate model in practice. Thus, one typically must resort to the use of approximate inference methods, most commonly (in recent years) some variant of loopy belief propagation [11]. An alternative approach, whose popularity has grown in recent years, is based on the maximum a posteriori (MAP) inference problem -- computing the single most likely assignment relative to the distribution. Somewhat surprisingly, there are certain classes of networks where MAP inference can be performed very efficiently using combinatorial optimization algorithms, even though posterior probability inference is intractable. So far, two main such classes of networks have been studied. Regular (or associative) networks [18], where the potentials encode a preference for adjacent variables to take the same value, can be solved optimally or almost optimally using a minimum cut algorithm. Conversely, matching networks, where the potentials encode a type of mutual exclusion constraints between values of adjacent variables, can be solved using matching algorithms. These types of networks have been shown to be applicable in a variety of applications, such as stereo reconstruction [13] and segmentation for regular networks, and image correspondence [15] or word alignment for matching networks [19].\n\n\f\nIn many real-world applications, however, the problem formulation does not fall neatly into one of these tractable subclasses. The problem may well have a large component that can be well-modeled as regular or as a matching problem, but there may be additional constraints that take it outside this restricted scope. For example, in a task of registering features between two images or 3D scans, we may formulate the task as a matching problem, but may also want to encode constraints that enforce the preservation of local or global geometry [1]. Unfortunately, once the network contains some \"non-complying\" potentials, it is not clear if and how one can apply the combinatorial optimization algorithm, even if only as a subroutine. In practice, in such networks, one often simply resorts to applying standard inference methods, such as belief propagation. Unfortunately, belief propagation may be far from an ideal procedure for these types of networks. In many cases, the MRF structures associated with the tractable components are quite dense and contain many small loops, leading to convergence problems and bad approximations. Indeed, recent empirical studies studies [17] show that belief propagation methods perform considerably worse than min-cut-based methods when applied to a variety of (purely) regular MRFs. Thus, falling back on belief propagation methods for these MRFs may result in poor performance. The main contribution of this paper is a message-passing scheme for max-product inference that can exploit combinatorial optimization algorithms for tractable subnetworks. The basic idea in our algorithm, called C O M P O S E (Combinatorial Optimization for Max-Product on Subnetworks), is that the network can often be partitioned into a number of subnetworks whose union is equivalent to the original distribution. If we can efficiently solve the MAP problem for each of these subnetworks, we would like to combine these results in order to find an approximate MAP for the original problem. The obvious difficulty is that a MAP solution, by itself, provides only a single assignment, and one cannot simply combine different assignments. The key insight is that we can combine the information from the different sub-networks by computing max-marginals for each one. A maxmarginal for an individual variable X is a vector that specifies, for each value x, the probability of the MAP assignment in which X = x. If we have a black box that computes a max-marginal for each variable X in a subnetwork, we can embed that black box as a subroutine in a max-product belief propagation algorithm, without changing the algorithm's basic properties. In the remainder of this paper, we define the C O M P O S E scheme, and show how combinatorial algorithms for both regular networks and matching networks can be embedded in this framework. In particular, we also describe efficient combinatorial optimization algorithms for both types of networks that can compute all the max-marginals in the network at a cost similar to that of finding the single MAP assignment. We evaluate the applicability of C O M P O S E on synthetic networks and on an image registration task for scans of a cell obtained using an electron microscope, all of which are matching problems with additional pairwise constraints. We compare C O M P O S E to variants of both max-product and sum-product belief propagation, as well as to straight matching. Our results demonstrate that the ability of C O M P O S E to transmit information globally across the network leads to improved convergence, decreased running time, and higher-scoring assignments.\n\n2 Markov Random Fields\nIn this paper, for simplicity of presentation, we restrict our discussion to pairwise Markov networks (or Markov Random Fields) over discrete variables X = {X1 , . . . , XN }. We emphasize that our results extend easily to the more general case of non-pairwise Markov networks. We denote an assignment of values to X with x, and an assignment of a value to a single variable X i with xi . A pairwise Markov network M is defined as a graph G = (V , E ) and set of potentials F that include both node potentials i (xi ) and edge potentials ij (xi , xj ). The network encodes a joint probability N i distribution via an unnormalized density PF (x) = i=1 i (xi ) ,j U ij (xi , xj ), defining the xF 1 distribution as PF (x) = Z PF (x), where Z is the partition function given by Z = P (x). There are different types of queries that one may want to compute on a Markov network. Most common are (conditional) probability queries, where the task is to compute the marginal probability of one or more variables, possibly given some evidence. This type of inference is essentially equivalent to computing the partition function, which sums up exponentially many assignments, a computation which is currently intractable except in networks of low tree width. An alternative type of inference task is the is maximum a posteriori (MAP) problem -- finding arg maxx PF (x) = arg maxx PF (x). In the MAP problem, we can avoid computing the partition function, so there are certain classes of networks to which the MAP assignment can be computed effectively, even though computing the partition problem can be shown to be intractable; we describe two such important classes in Section 4.\n\n\f\nIn general, however, an exact solution to the MAP problem is also intractable. Max-product belief propagation (MPBP) [20] is a commonly-used method for finding an approximate solution. In this algorithm, each node Xi passes to its neighboring nodes Ni a message which is a vector defining a value for each value xi : k ki (xi ) . ij (xj ) := max i (xi )ij (xi , xj )\nxi Ni -{j }\n\nAt convk rgence, each variable can compute its own local belief as: e b i (xi ) = i (xi ) In a tree structured MRF, if such messages are passed from the Ni ki (xi ). leaves towards a single root, the value of the message passed by X i towards the root encodes a partial max-marginal: the entry for xi is the probability of the most likely assignment, to the subnetwork emanating from Xi away from the root, where we force Xi = xi . At the root, we obtain exact max-marginals for the entire joint distribution. However, applied to a network with loops, MPBP often does not converge, even when combined with techniques such as smoothing and asynchronous message passing, and the answers obtained can be quite approximate.\n\n3 Composing Max-Product Inference on Subnetworks\nWe now describe the C O M P O S E scheme for decomposing the network into hopefully more tractable components, and allowing approximate max-product computation over the network as a whole to be performed by iteratively computing max-product in one component and passing approximate maxmarginals to the other(s). As the unnormalized probability of an assignment in a Markov network is a product of local potentials, we can partition the potentials in an MRF into an ensemble of k subgraphs G1 , . . . Gk over the same set of nodes V , associated edges E1 , . . . , Ek and sets of factors F1 , . . . , Fk . We require that the product of the potentials in these subnetworks maintain the same information as the original MRF. That is, if we originally have a factor i F and associated factors k (l) (1) (k) i F1 , . . . , i Fk , we must have that l=1 i (Xi ) = i (Xi ). One method of partitioning that achieves this equality is simply to select, for each potential i , one subgraph in which it appears (l) unchanged, and set all of the other i to be 1. Even if MAP inference in the original network is intractable, it may be tractable in each of the sub-networks in the ensemble. But how do we combine the results from MAP inference in an ensemble of networks over the same set of variables? Our approach draws its motivation from the MPBP algorithm, which computes messages that correspond to pseudo-max-marginals over single variables (approximate max-marginals, that do not account for the loops in the network). We begin by conceptually reformulating the ensemble as a set of networks over disjoint sets of variables (l) (l) {X1 , . . . , Xn } for l = 1, . . . , k ; we enforce consistency of the joint assignment using a set of (l) \"communicator\" variables X1 , . . . , Xn , such that each Xi must take the same value as Xi . We assume that each subnetwork is associated with an algorithm that can \"read in\" pseudo-max-marginals over the communicator variables, and compute pseudo-max-marginals over these variables. More precisely, let (l)i be the message sent from subnetwork l to Xi and i(l) the opposite message. Then we define the C O M P O S E message passing scheme as follows: j (l) j (l) (Xj ) (1) PFl (x(l) ) (l)i (xi ) = max\nx(l) : Xi =xi\n(l)\n\n=i\n\ni(l)\n\n=\n\nl\n\n(l )i .\n=l\n\n(2)\n\nThat is, each subnetwork computes its local pseudo-max-marginals over each of the individual variables, given, as input, the pseudo-max-marginals over the others. The separate pseudo-maxmarginals are integrated via the communicator variables. It is not difficult to see that this message passing scheme is equivalent to a particular scheduling algorithm for max-product belief propagation over the ensemble of networks, assuming that the max-product computation in each of the subnetworks is computed exactly using a black-box subroutine. We note that this message passing scheme is somewhat related to the tree-reweighted maxproduct (TRW) method of Wainwright et al. [8], where the network distribution is partitioned as a weighted combination of trees, which also communicate pseudo-max-marginals with each other.\n\n\f\n4 Efficient Computation of Max-Marginals\nIn this section, we describe two important classes of networks where the MAP problem can be solved efficiently using combinatorial algorithms: matching networks, which can be solved using bipartite matching algorithms; and regular networks, which can be solved using (iterated application of) minimum cut algorithms. We show how the same algorithms can be adapted, at minimal computational cost, for computing not only the single MAP assignment, but also the set of max-marginals. This allows these algorithms to be used as one of our \"black boxes\" in the C O M P O S E framework. Bipartite matching. Many problems can be well-formulated as maximum-score (or minimum weight) bipartite matching: We are given a graph G = (A, U ), whose nodes are partitioned into disjoint sets A = A B . In G , each edge (a, b) has one endpoint in A and the other in B and an associated score c(a, b). A bipartite matching is a subset of the edges W U such that each node appears in at most one edge. The notion of a matching can be relaxed to include other types of degree constraints, e.g., constraining certain nodes to appear in at most k edges. The score of the matching is simply the sum of the scores of the edges in W . The matching problem can also be formulated as an MRF, in several different ways. For example, in the degree-1 case (each node in A is matched to one node in B ), we can have a variable X a for each a A whose possible values are all of the nodes in B . The edge scores in the matching graph are then simply singleton potentials in the MRF, where a (Xa = b) = exp(c(a, b)). Unfortunately, while the costs can be easily encoded in an MRF, the degree constraints on the matching induce a set of pairwise mutual-exclusion potentials on all pairs of variables in the MRF, leading to a fully connected network. Thus, standard methods for MRF inference cannot handle the networks associated with matching problems. Nevertheless, finding the maximum score bipartite matching (with any set of degree constraints) can be accomplished easily using standard combinatorial optimization algorithms (e.g., [6]). However, we also need to find all the max-marginals. Fortunately, we can adapt the standard algorithm for finding a single best matching to also find all of the max-marginals. A standard solution to the max-matching problem reduces it to a max-weight flow problem, by introducing an additional \"source\" node that connects to all the nodes in A, and an additional \"sink\" node that connects to all the nodes in B ; the capacity of these edges is the degree constraint of the node (1 for a 1-to-1 matching). We now run a standard max-weight flow algorithm, and define an edge to be in the matching if it bears flow. Standard results show that, if the edge capacities are integers, then the flow too is integral, so that it defines a matching. Let w be the weight of the flow in the graph. A flow in the graph defines a residual graph, where there is an edge in the graph whose capacity is the amount of flow it can carry relative to the current flow. Thus, for example, if the current solution carries a unit of flow along a particular edge (a, b) in the original graph, the residual graph will have an edge with a unit capacity going in the reverse direction, corresponding to the fact that we can now choose to \"eliminate\" the flow from a to b. The scores in these inverse edges are also negative, corresponding to the fact that score is lost when we reduce the flow. Our goal now is to find, for each pair (a, b), the score of the optimal matching where we force this pair to be matched. If this pair is matched in the current solution, then the score is simply w . Otherwise, we simply find the highest scoring path from b to a in the residual graph. Any edges on this new path from A to B will be included in the new matching; any edges from B to A were included in the old matching, but are not in the new matching because of the augmenting path. This path is the best way of changing the flow so as to force flow from a to b. Letting be the weight of this augmenting path, the overall score of the new flow is w + . It follows that the cost of this path is necessarily negative, for otherwise it would have been optimal to apply it to the original flow, improving its score. Thus, we can find the highest-scoring path by simply negating all edge costs and finding the shortest path in the graph. Thus, to compute all of the max-marginals, we simply need to find the shortest path from every node a A to every node b B . We can find this using the Floyd-Warshall all-pairs-shortest-paths algorithm, which runs in O((nA + nB )3 ) time, for nA = |A| and nB = |B |; or we can run a singlesource shortest-path algorithm for each node in B , at a total cost of O(n B nA nB log(nA nB )). By comparison, the cost of solving the initial flow problem is O(n3 log(nA )). A Minimum Cuts. A very different class of networks that admits an efficient solution is based on the application of a minimum cut algorithm to a graph. At a high level, these networks encode situations where adjacent variables like to take \"similar\" values. There are many variants of this condition. The simplest variant is applied to pairwise MRFs over binary-valued random variables.\n\n\f\nIn this case, a potential is said to be regular if: ij (Xi = 1, Xj = 1) ij (Xi = 0, Xj = 0) ij (Xi = 0, Xj = 1) ij (Xi = 1, Xj = 0). For MRFs with only regular potentials, the MAP solution can be found as the minimum cut of a weighted graph constructed from the MRF [9]. This construction can be extended in various ways (see [9] for a survey), including to the class of networks with non-binary variables whose negative-log-probability is a convex function [5]. Moreover, for a range of conditions on the potentials, an -expansion procedure [2], which iteratively applies a mincut to a series of graphs, can be used to find a solution with guaranteed approximation error relative to the optimal MAP assignment. As above, a single joint assignment does not suffice for our purposes. In recent work, Kohli and Torr [7], studying the problem of confidence estimation in MAP problems, showed how all of the max-marginals in a regular network can be computed using dynamic algorithms for flow computations. Their method also applies to non-binary networks with convex potentials (as in [5]), but not to networks for which -expansion is used to find an approximate MAP assignment.\n\n5 Experimental Results\nWe evaluate C O M P O S E on the image correspondence problem, which is characteristic of matching problems with geometric constraints. We compare both max-product tree-reparameterization (TRMP) [8] and asynchronous max-product (AMP). The axes along which we compare all algorithms are: the ability to achieve convergence, the time it takes to reach a solution, and the quality -- log of the unnormalized likelihood -- of the solution found, in the Markov network that defines the problem. We use standard message damping of .3 for the max-product algorithms and a convergence threshold of 10-3 for all propagation algorithms. All tests were run on a 3.4 GHz Pentium 4 processor with 2GB of memory. We focus our experiments on an image correspondence task, where the goal is to find a 1to-1 mapping between landmarks in two images. Here, we have a set of template points S = {x1 , . . . , xn } and a set T of target points, {x1 , . . . , xn }. We encode our MRF with a variable Xi for each marker xi in the source image, whose value corresponds to its aligned candidate x j in the target image. Our MRF contains singleton potentials i , which may encode both local appearance information, so that a marker xi prefers to be aligned to a candidate xj in the target image whose neighborhood looks similar to xi 's, or a distance potential so that markers xi prefer to be aligned to candidates xj in locations close to those in the source image. The MRF also contains pairwise potentials {ij } that can encode dependencies between the landmark assignments. In particular, we may want to encode geometric potentials, which enforce a preference for preservation of distance or orientation for pairs of markers xi , xj and their assigned targets xk , xl . Finally, as the goal is to find a 1-to-1 mapping between landmarks in the source and target images, we also encode a set of mutual exclusion potentials over pairs of variables, enforcing the constraint that no two markers are assigned to the same candidate xk . Our task is to find the MAP solution in this MRF. Synthetic Networks. We first experimented with synthetically generated networks that follow the above form. To generate the networks, we first create a source \"image\" that contains a set of template points S = {x1 , . . . , xn }, chosen by uniformly sampling locations from a two-dimensional plane. Next, the target set of points T = {x1 , . . . , xn } is generated by generating one point from each template point xi , sampling from a Gaussian distribution with mean xi and a diagonal covariance matrix 2 I. As there was no true local information, the matching (or singleton) potentials for both types of synthetic networks were generated uniformly at random on [0, 1). The `correct' matching point, or the one the template variable generates, was given weight .7, ensuring that the correct matching gets a non-negligible weight without making the correspondence too obvious. We consider two different formulations for the geometric potentials. The first utilizes a minimum spanning tree connecting the points in S , and the second simply a chain. In both cases, we generate pairwise geometric potentials ij (Xi , Xj ) that are Gaussian with mean = (xi - xj ) and standard deviation proportional to the Euclidean distance between xi and xj and variance 2 . Results for the two constructions were similar, so, due to lack of space, we present results only for the line networks. Fig. 1(a) shows the cumulative percentage of convergent runs as a function of CPU time. C O M P O S E converges significantly more often than either AMP or state-of-the-art TRMP. For TRMP, we created one tree over all the geometric and singleton potentials to quickly pass information through the entire graph; the rest of the trees chosen for TRMP were over a singleton potential, all the neighboring mutual exclusion potentials, and pairwise potentials neighboring the singleton, allowing us to maintain the mutual exclusion constraints during different reparameterization steps in TRMP. Since\n\n\f\nFigure 1: (a) Cumulative percentage of convergent runs versus CPU time on networks with 30 variables and sigma ranging from 3 to 9. (b) The effect of changing the number of variables on the log score. Shown is the difference between the log score of each algorithm and the (a) (b) score found by AMP. (c) Direct comparison of C O M P O S E to TRMP on individual runs from the same set of networks as in (b), grouped by algorithm convergence. (d) Score of assignment based on intermediate beliefs versus time for C O M P O S E , TRMP, and matching on 100 variable networks. All al(c) (d) gorithms were allowed to run for 5000 seconds. sum-product algorithms are known in general to be less susceptible to oscillation than their maxproduct counterparts, we also compared against sum-product asynchronous belief propagation. In our experiments, however, sum-production BP did not achieve good scores even on runs in which it did converge, perhaps because the distribution was fairly diffuse, leading to an averaging of diverse solutions; we omit results for lack of space. Fig. 1(b) shows the average difference in log scores between each algorithm's result and the average log score of AMP as a function of the number of variables in the networks. C O M P O S E clearly outperforms the other algorithms, gaining a larger score margin as the size of the problem increases. In the synthetic tests we ran for (b) and (c), C O M P O S E achieved the best score in over 90% of cases. This difference was greatest in more difficult problems, where there is greater variance in the locations of candidates in the target image leading to difficulty achieving a 1-to-1 correspondence. In Fig. 1(c), we further examine scores from individual runs, comparing C O M P O S E directly to the strongest competitor, TRMP. C O M P O S E consistently outperforms TRMP and never loses by more than a small margin; C O M P O S E often achieves scores on the order of 2 40 times better than those achieved by TRMP. Interestingly, there appears not to be a strong correlation between relative performance and whether or not the algorithms converged. Fig. 1(d) examines the intermediate scores obtained by C O M P O S E and TRMP on intermediate assignments reached during the inference process, for large (100 variable) problems. Though C O M P O S E does not reach convergence in messages, it quickly takes large steps to a very good score on the large networks. TRMP also takes larger steps near the beginning, but it is less consistent and it never achieves a score as high as C O M P O S E. This indicates that C O M P O S E scales better than TRMP to larger problems. This behavior may also help to explain the results from (c), where we see that, even when C O M P O S E does not converge in messages, it still is able to achieve good scores. Overall, these results indicate that we can use intermediate results for C O M P O S E even before convergence. Real Networks. We now consider real networks generated for the task of electron microscope tomography: the three-dimensional reconstruction of cell and organelle structures based on a series of images obtained at different tilt angles. The problem is to localize and track markers in images across time, and it is a difficult one; traditional methods like cross correlation and graph matching often result in many errors. We can encode the problem, however, as an MRF, as described above. In this case, the geometric constraints were more elaborate, and it was not clear how to construct a good set of spanning trees. We therefore used a variant on AMP called residual max-product (RMP) [3] that schedules messages in an informed way over the network; in this work and others, we have found this variant to achieve better performance than TRMP on difficult networks. Fig. 2(a) shows a source set of markers in an electron tomography image; Fig. 2(b) shows the correspondence our algorithm achieves, and Fig. 2(c) shows the correspondence that RMP achieves. Note that, in Fig. 2(c), points from the source image are assigned to the same point in the target image, whereas C O M P O S E does not have the same failing. Of the twelve pairs of images we tested,\n\n\f\nRMP failed to converge on 11/12 within 20 minutes, whereas C O M P O S E failed to converge on only two of the twelve. Because the network structure was difficult for loopy approximate methods, we ran experiments where we replaced mutual exclusion constraints with soft location constraints on individual landmarks; while convergence improved, actual performance was inferior. Fig. 2(d) shows the scores for the different methods we use to solve these problems. Using RMP as the baseline score, we see the difference in scores for the different methods. It is clear that, though RMP and TRMP run on a simpler network with soft mutual exclusion constraints are competitive with, and even very slightly better than C O M P O S E on simple problems, as problems become more difficult (more variance in target images), C O M P O S E clearly dominates. We also compare C O M P O S E to simply finding the best matching of markers to candidates without any geometric information; C O M P O S E dominates this approach, never scoring worse than the matching.\n\n(a)\n\n(b)\n\nFigure 2: (a) Labeled markers in a source electron microscope image (b) Candidates C O M P O S E assigns in the target image (c) Candidates RMP assigns in the target image (note the Xs through incorrect or duplicate assignments) (d) A score comparison of C O M P O S E, matching, and RMP on the image correspondences\n\n(c)\n\n(d)\n\n6 Discussion\n\nIn this paper, we have presented C O M P O S E, an algorithm that exploits the presence of tractable substructures in MRFs within the context of max-product belief propagation. Motivated by the existence of very efficient algorithms to extract all max-marginals from combinatorial substructures, we presented a variation of belief propagation methods that used the max-marginals to take large steps in inference. We also demonstrated that C O M P O S E significantly outperforms state-of-the-art methods on different challenging synthetic and real problems. We believe that one of the major reasons that belief propagation algorithms have difficulty with the augmented matching problems described above is that the mutual exclusion constraints create a phenomenon where small changes to local regions of the network can have strong effects on distant parts of the network, and it is difficult for belief propagation to adequately propagate information. Some existing variants of belief propagation (such as TRMP) attempt to speed the exchange of information across opposing sides of the network by means of intelligent message scheduling. Even intelligently-scheduled message passing is limited, however, as messages are inherently local. If there are oscillations across a wide diameter, due to global interactions in the network, they might contribute significantly to poor performance by BP algorithms. C O M P O S E slices the network along a different axis, using subnetworks that are global in nature but that do not have all of the information about any subset of variables. If the component of the network that is difficult for belief propagation can be encoded in an efficient special-purpose subnetwork such as a matching, then we have a means of effectively propagating global information. We conjecture that C O M P O S E's ability to globally pass information contributes both to its improved convergence and to the better results it obtains even without convergence. Some very recent work explores the case where a regular MRF contains terms that are not regular [14, 13], but this work is largely specific to certain types of \"close-to-regular\" MRFs. It would be interesting to compare C O M P O S E and these methods on a range of networks containing regu-\n\n\f\nlar subgraphs. Our work is also related to work trying to solve the quadratic assignment problem (QAP) [10], a class of problems of which our generalized matching networks are a special case. Standard algorithms for QAP include simulated annealing, tabu search, branch and bound, and ant algorithms [16]; the latter have some of the flavor of message passing, walking trails over the graph representing a QAP and iteratively updating scores of different assignments to the QAP. To the best of our knowledge, however, none of these previous methods attempts to use a combinatorial algorithm as a component in a general message-passing algorithm, thereby exploiting the structure of the pairwise constraints. There are many interesting directions arising from this work. It would be interesting to perform a theoretical analysis of the C O M P O S E approach, perhaps providing conditions under which it is guaranteed to provide a certain level of approximation. A second major direction is the identification of other tractable components within real-world MRFs that one can solve using combinatorial optimization methods, or other efficient approaches. For example, the constraint satisfaction community has studied several special-purpose constraint types that can be solved more efficiently than using generic methods [4]; it would be interesting to explore whether these constraints arise within MRFs, and, if so, whether the special-purpose procedures can be integrated into the C O M P O S E framework. Overall, we believe that real-world MRFs often contain large structured sub-parts that can be solved efficiently with special-purpose algorithms; the combination of special-purpose solvers within a general inference scheme may allow us to solve problems that are intractable to any current method.\n\nAcknowledgments References\n\nThis research was supported by the Defense Advanced Research Projects Agency (DARPA) under the Transfer Learning Program. We also thank David Karger for useful conversations and insights. [1] D. Anguelov, D. Koller, P. Srinivasan, S. Thrun, H. Pang, and J. Davis. The correlated correspondence algorithm for unsupervised registration of nonrigid surfaces. In NIPS, 2004. [2] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. In ICCV, 1999. [3] G. Elidan, I. McGraw, and D. Koller. Residual belief propagation. In UAI, 2006. [4] J. Hooker, G. Ottosson, E.S. Thorsteinsson, and H.J. Kim. A scheme for unifying optimization and constraint satisfaction methods. In Knowledge Engineering Review, 2000. [5] H. Ishikawa. Exact optimization for Markov random fields with convex priors. PAMI, 2003. [6] J. Kleinberg and E. Tardos. Algorithm Design. Addison-Wesley, 2005. [7] P. Kohli and P. Torr. Measuring uncertainty in graph cut solutions - efficiently computing min-marginal energies using dynamic graph cuts. In ECCV, 2006. [8] V. Kolmogorov and M. Wainwright. On the optimality of tree-reweighted max-product message-passing. In UAI '05. [9] V. Kolmogorov and R. Zabih. What energy functions can be minimized via graph cuts? In ECCV, 2002. [10] E. Lawler. The quadratic assignment problem. In Management Science, 1963. [11] K. Murphy and Y. Weiss. Loopy belief propagation for approximate inference: An empirical study. In UAI '99. [12] J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1988. [13] A. Raj, G. Singh, and R. Zabih. MRF's for MRI's: Bayesian reconstruction of MR images via graph cuts. In CVPR, 2006. To appear. [14] C. Rother, S. Kumar, V. Kolmogorov, and A. Blake. Digital tapestry. In CVPR, 2005. [15] J. Shi and J. Malik. Normalized cuts and image segmentation. PAMI, 2000. [16] T. Stutzle and M. Dorigo. ACO algorithms for the quadratic assignment problem. In New Ideas in Optimization. 1999. [17] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother. A comparative study of energy minimization methods for Markov random fields. In ECCV, 2006. [18] B. Taskar, V. Chatalbashev, and D. Koller. Learning associative markov networks. In ICML '04. [19] B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin. Learning structured prediction models: a large margin approach. In ICML '05. [20] Y. Weiss and W. Freeman. On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs. IEEE Transactions on Information Theory, 47, 2001.\n\n\f\n", "award": [], "sourceid": 3117, "authors": [{"given_name": "Daniel", "family_name": "Tarlow", "institution": null}, {"given_name": "Gal", "family_name": "Elidan", "institution": null}, {"given_name": "Daphne", "family_name": "Koller", "institution": null}, {"given_name": "John", "family_name": "Duchi", "institution": null}]}