{"title": "Worst-case bounds on the quality of max-product fixed-points", "book": "Advances in Neural Information Processing Systems", "page_first": 2325, "page_last": 2333, "abstract": "We study worst-case bounds on the quality of any fixed point assignment of the max-product algorithm for Markov Random Fields (MRF). We start proving a bound independent of the MRF structure and parameters. Afterwards, we show how this bound can be improved for MRFs with particular structures such as bipartite graphs or grids. Our results provide interesting insight into the behavior of max-product. For example, we prove that max-product provides very good results (at least 90% of the optimal) on MRFs with large variable-disjoint cycles (MRFs in which all cycles are variable-disjoint, namely that they do not share any edge and in which each cycle contains at least 20 variables).", "full_text": "Worst-case bounds on the quality of max-product\n\n\ufb01xed-points\n\nMeritxell Vinyals\n\nJes\u00b4us Cerquides\n\nArti\ufb01cial Intelligence Research Institute (IIIA)\nSpanish Scienti\ufb01c Research Council (CSIC)\n\nArti\ufb01cial Intelligence Research Institute (IIIA)\nSpanish Scienti\ufb01c Research Council (CSIC)\n\nCampus UAB, Bellaterra, Spain\nmeritxell@iiia.csic.es\n\nCampus UAB, Bellaterra, Spain\ncerquide@iiia.csic.es\n\nAlessandro Farinelli\n\nDepartment of Computer Science\n\nUniversity of Verona\n\nStrada le Grazie, 15,Verona, Italy\n\nalessandro.farinelli@univr.it\n\nJuan Antonio Rodr\u00b4\u0131guez-Aguilar\n\nArti\ufb01cial Intelligence Research Institute (IIIA)\nSpanish Scienti\ufb01c Research Council (CSIC)\n\nCampus UAB, Bellaterra, Spain\n\njar@iiia.csic.es\n\nAbstract\n\nWe study worst-case bounds on the quality of any \ufb01xed point assignment of the\nmax-product algorithm for Markov Random Fields (MRF). We start providing a\nbound independent of the MRF structure and parameters. Afterwards, we show\nhow this bound can be improved for MRFs with speci\ufb01c structures such as bipar-\ntite graphs or grids. Our results provide interesting insight into the behavior of\nmax-product. For example, we prove that max-product provides very good results\n(at least 90% optimal) on MRFs with large variable-disjoint cycles1.\n\n1\n\nIntroduction\n\nGraphical models such as Markov Random Fields (MRFs) have been successfully applied to a wide\nvariety of applications such as image understanding [1], error correcting codes [2], protein folding\n[3] and multi-agent systems coordination [4]. Many of these practical problems can be formulated\nas \ufb01nding the maximum a posteriori (MAP) assignment, namely the most likely joint variable as-\nsignment in an MRF. The MAP problem is NP-hard [5], thus requiring approximate methods.\nHere we focus on a particular MAP approximate method: the (loopy) max-product belief propaga-\ntion [6, 7]. Max-product\u2019s popularity stems from its very good empirical performance on general\nMRFs [8, 9, 10, 11], but it comes with few theoretical guarantees. Concretely, max-product is\nknown to be correct in acyclic and single-cycle MRFs [11], although convergence is only guar-\nanteed in the acyclic case. Recently, some works have established that max-product is guarantee\nto return the optimal solution, if it converges, on MRFs corresponding to some speci\ufb01c problems,\nnamely: (i) weighted b-matching problems [12, 13]; (ii) maximum weight independent set problems\n[14]; or (iii) problems whose equivalent nand Markov random \ufb01eld (NMRF) is a perfect graph [?].\nFor weighted b-matching problems with a bipartite structure Huang and Jebara [15] establish that\nmax-product algorithm always converges to the optimal.\nDespite these guarantees provided in these particular cases, for arbitrary MRFs little is known on\nthe quality of the max-product \ufb01xed-point assignments. To the best of our knowledge, the only\nresult in this line is the work of Wainwright et al. [16] where, given any arbitrary MRF, authors\nderive an upper bound on the absolute error of the max-product \ufb01xed-point assignment. This bound\n\n1MRFs in which all cycles are variable-disjoint, namely that they do not share any edge and in which each\n\ncycle contains at least 20 variables.\n\n1\n\n\fis calculated after running the max-sum algorithm and depends on the particular MRF (structure\nand parameters) and therefore provide no guarantees on the quality of max-product assignments on\narbitrary MRFs with cycles.\nIn this paper we provide quality guarantees for max-product \ufb01xed-points in general settings that can\nbe calculated prior to the execution of the algorithm. To this end, we de\ufb01ne worst-case bounds on the\nquality of any max-product \ufb01xed-point for any MRF, independently of its structure and parameters.\nFurthermore, we show how tighter guarantees can be obtained for MRFs with speci\ufb01c structures.\nFor example, we prove that in 2-D grids max-product \ufb01xed points assignments have at least 33% of\nthe quality of the optimum; and that for MRFs with large variable-disjoint cycles1 they have at least\n90% of the quality of the optimum. These results shed some light on the relationship between the\nquality of max-product assignments and the structure of MRFs.\nOur results build upon two main components: (i) the characterization of any \ufb01xed-point max-product\nassignment as a neighbourhood maximum in a speci\ufb01c region of the MRF [17]; and (ii) the worst-\ncase bounds on the quality of a neighbourhood maximum obtained in the K-optimality framework\n[18, 19]. We combine these two results by: (i) generalising the worst-case bounds in [18, 19] to\nconsider any arbitrary region; and (ii) assessing worst-case bounds for the speci\ufb01c region presented\nin [17] (for which any \ufb01xed-point max-product assignment is known to be maximal).\n\n2 Overview\n\n2.1 The max-sum algorithm in Pairwise Markov Random Fields\n\nA discrete pairwise Markov Random Field (MRF) is an undirected graphical model where each in-\nteraction is speci\ufb01ed by a discrete potential function, de\ufb01ned on a single or a pair of variables. The\nstructure of an MRF de\ufb01nes a graph G = (cid:104)V, E(cid:105), in which the nodes V represent discrete variables,\nand edges E represent interactions between nodes. Then, an MRF contains a unary potential func-\ntion \u03a8s for each node s \u2208 V and a pairwise potential function \u03a8st for each edge (s, t) \u2208 E; the\njoint probability distribution of the MRF assumes the following form:\n\n(cid:89)\n\ns\u2208V\n\np(x) =\n\n1\nZ\n\n(cid:89)\n\n(s,t)\u2208E\n\n\u03a8s(xs)\n\n\u03a8st(xs, xt) =\n\n\u03b8s(xs) +\n\n\u03b8st(xs, xt)\n\n\uf8eb\uf8ed(cid:88)\n\ns\u2208V\n\n1\nZ\n\nexp\n\n(cid:88)\n\n(s,t)\u2208E\n\n\uf8f6\uf8f8 =\n\n1\nZ\n\nexp (\u03b8(x)),\n\n(1)\n\n\u2217 (cid:52)\nx\n\nwhere Z is a normalization constant and \u03b8s(xs), \u03b8st(xs, xt) stand for the logarithm of\n\n\u03a8s(xs)\n\n\u03a8st(xs, xt)\n\n= arg max\nx\u2208X N\n\n\uf8f9\uf8fb (cid:52)\n\n\uf8ee\uf8f0(cid:89)\n\ns\u2208V\n\n\uf8ee\uf8f0(cid:88)\n\n\u03a8s(xs), \u03a8(xs, xt) which are well-de\ufb01ned if \u03a8s(xs), \u03a8(xs, xt) are strictly positive.\nWithin this setting, the classical problem of maximum a posteriori (MAP) estimation corresponds\nto \ufb01nding the most likely con\ufb01guration under distribution p(x) in equation 1. In more formal terms,\ns|s \u2208 V } is given by:\nthe MAP con\ufb01guration x\u2217 = {x\u2217\n(cid:89)\n\n(cid:88)\n(s,t)\u2208E\nwhere X N is the Cartesian product space in which x = {xs|s \u2208 V } takes values.\nNote that the MAP con\ufb01guration may not be unique, that is, there may be multiple con\ufb01gurations,\nthat attain the maximum in equation 1. In this work we assume that: (i) there is a unique MAP\nassignment (as assumed in [17]); and (ii) all potentials \u03b8s and \u03b8st are non-negative.\nThe max-product algorithm is an iterative, local, message-passing algorithm for \ufb01nding the MAP\nassignment in a discrete MRF as speci\ufb01ed by equation 2. The max-sum algorithm is the corre-\nspondent of the max-product algorithm when we consider the log-likelihood domain. The standard\nupdate rules for max-sum algorithm are:\n\n= arg max\nx\u2208X N\n\n\u03b8s(xs) +\n\n\u03b8st(xs, xt)\n\n\uf8f9\uf8fb ,\n\n(s,t)\u2208E\n\ns\u2208V\n\n(2)\n\n\uf8ee\uf8f0\u03b8i(xi) + \u03b8ij(xi, xj) +\n\n(cid:88)\n\n\uf8f9\uf8fb\n\nmij(xj) = \u03b1ij + max\n\nxi\n\nmki(xi)\n\nk\u2208N (i)\\j\n\nbi(xi) = \u03b8i(xi) + (cid:80)\n\nmki(xi)\n\nk\u2208N (i)\n\nwhere \u03b1ij is a normalization constant and N (i) is the set of indices for variables that are connected\nto xi. Here mij(xj) represents the message that variable xi sends to variable xj. At the \ufb01rst iteration\nall messages are initialised to constant functions. At each following iteration, each variable xi aggre-\ngates all incoming messages and computes the belief bi(xi), which is then used to obtain the max-\nsum assignment xM S. Speci\ufb01cally, for every variable xi \u2208 V we have xM S\ni = arg maxxi bi(xi).\n\n2\n\n\fx0\n\nx2\n\nx1\n\nx0\n\nx1\n\nx0\n\nx1\n\nx0\n\nx1\n\nx0\n\nx3\n\nx2\n\nx3\n\nx2\n\nx3\n\nx2\n\nx3\n\nx2\n\n(a)\n\n(b)\n\n(c)\n\n(d)\n\n(e)\n\nx1\n\nx3\n\nFigure 1: (a) 4-complete graph and (b)-(e) sets of variables covered by the SLT-region.\n\nThe convergence of the max-sum is usually characterized considering \ufb01xed points for the message\nupdate rules, i.e. when all the messages exchanged are equal to the last iteration. Now, the max-sum\nalgorithm is known to be correct over acyclic and single-cycle graphs. Unfortunately, on general\ngraphs the aggregation of messages \ufb02owing into each variable only represents an approximate solu-\ntion to the maximization problem. Nonetheless, it is possible to characterise the solution obtained\nby max-sum as we discuss below.\n\n2.2 Neighborhood maximum characterisation of max-sum \ufb01xed points\n\nIn [17], Weiss et al. characterize how well max-sum approximates the MAP assignment. In par-\nticular, they \ufb01nd the conditions for a \ufb01xed-point max-sum assignment xM S to be neighbourhood\nmaximum, namely greater than all other assignments in a speci\ufb01c large region around xM S. Notice\nthat characterising an assignment as neighbourhood maximum is weaker than a global maximum,\nintroduce the notion of Single Loops and Trees\nbut stronger than a local maximum. Weiss et al.\n(SLT) region to characterise the assignments in such region.\nDe\ufb01nition 1 (SLT region). An SLT-region of x in G includes all assignments x(cid:48) that can be obtained\nfrom x by: (i) choosing an arbitrary subset S \u2286 V such that its vertex-induced subgraph contains at\nmost one cycle per connected component; (ii) assigning arbitrary values to the variables in S while\nkeeping the assignment to the other variables as in x.\n\nHence, we say that an assignment xSLT is SLT-optimal if it is greater than any other assignment in\nits SLT region. Finally, the main result in [17] is the characterisation of any max-sum \ufb01xed-point\nassignments as an SLT-optimum. Figures 1(b)-(e) illustrate examples of assignments in the SLT-\nregion in the complete graph of \ufb01gure 1(a), here boldfaced nodes stand for variables that vary the\nassignment with respect to xSLT .\n\n3 Generalizing size and distance optimal bounds\n\nIn [18], Pearce et al. introduced worst-case bounds on the quality of a neighbourhood maximum in a\nregion characterized by its size. Similary, Kiekintveld et al. introduced in [19] analogous worst-case\nbounds but using as a criterion the distance in the graph. In this section we generalize these bounds to\nuse them for any neighbourhood maximum in a region characterized by arbitrary criteria. Concretely\nwe show that our generalization can be used for bounding the quality of max-sum assignments.\n3.1 C-optimal bounds\nHereafter we propose a general notion of region optimality, the so-called C-optimality, and describe\nhow to calculate bounds for a C-optimal assignment, namely an assignment that is neighbourhood\nmaximum in a region characterized by an arbitrary C criteria. The concept of C-optimality requires\nthe introduction of several concepts.\nGiven A, B \u2286 V we say that B completely covers A if A \u2286 B. We say that B does not cover A\nat all if A \u2229 B = \u2205. Otherwise, we say that B covers A partially. A region C \u2282 P(V ) is a set\ncomposed by subsets of V . We say that A \u2286 V is covered by C if there is a C \u03b1 \u2208 C such that C \u03b1\ncompletely covers A.\nGiven two assignments xA and xB, we de\ufb01ne D(xA, xB) as the set containing the variables whose\nvalues in xA and xB differ. An assignment is C-optimal if it cannot be improved by changing the\nvalues in any group of variables covered by C. That is, an assignment xA is C-optimal if for every\nassignment xB s.t. D(xA, xB) is covered by C we have that \u03b8(xA) \u2265 \u03b8(xB).\nFor any S \u2208 E we de\ufb01ne cc(S,C) = |{C \u03b1 \u2208 C s.t S \u2286 C \u03b1}|, that is, the number of elements in C\nthat cover S completely. We also de\ufb01ne nc(S,C) = |{C \u03b1 \u2208 C s.t S \u2229 C \u03b1 = \u2205}|, that is, the number\nof elements in C that do not cover S at all.\n\n3\n\n\fProposition 1. Let G = (cid:104)V, E(cid:105) be a graphical model and C a region. If xC is a C-optimum then\n\n\u03b8(xC) \u2265\n\ncc\u2217\n\n|C| \u2212 nc\u2217\n\n\u03b8(x\u2217)\n\n(3)\n\nwhere cc\u2217 = minS\u2208E cc(S,C), nc\u2217 = minS\u2208E nc(S,C), and x\u2217 is the MAP assignment.\nProof. The proof is a generalization of the one in [20] for k-optimality. For every C \u03b1 \u2208 C, consider\ni if xi \u2208 C \u03b1. Since xC is C-optimal,\nan assignment x\u03b1 such that x\u03b1\nfor all C \u03b1 \u2208 C, \u03b8(xC) \u2265 \u03b8(x\u03b1) holds, and hence:\n\ni if xi (cid:54)\u2208 C \u03b1 and x\u03b1\n\ni = xC\n\ni = x\u2217\n(cid:33)\n\n/|C|.\n\n(cid:32) (cid:88)\n\nC\u03b1\u2208C\n\nC\n\u03b8(x\n\n) \u2265\n\n\u03b8(x\u03b1)\n\n(4)\n\ning the structure of the MRF. In so doing, for each x\u03b1, we have that \u03b8(x\u03b1) =(cid:80)\npletely (T (C \u03b1)), partially (P (C \u03b1)), or not at all (N (C \u03b1)), so that \u03b8(x\u03b1) =(cid:80)\n(cid:80)\nS\u2208P (C\u03b1) \u03b8S(x\u03b1) +(cid:80)\nS\u2208T (C\u03b1) \u03b8S(x\u03b1) +(cid:80)\ncost of obtaining a looser bound. Hence \u03b8(x\u03b1) \u2265(cid:80)\nHence, \u03b8(x\u03b1) \u2265(cid:80)\n\nNotice that although \u03b8(x\u03b1) is de\ufb01ned as the sum of unary potentials and pairwise potentials values\nwe can always get rid of unary potentials by combining them into pairwise potentials without chang-\nS\u2208E \u03b8S(x\u03b1). We\nclassify each edge S \u2208 E into one of three disjoint groups, depending on whether C \u03b1 covers S com-\nS\u2208T (C\u03b1) \u03b8S(x\u03b1) +\nS\u2208N (C\u03b1) \u03b8S(x\u03b1). We can remove the partially covered potentials at the\nS\u2208N (C\u03b1) \u03b8S(x\u03b1). Now,\nby de\ufb01nition of x\u03b1, for every variable xi in a potential completely covered by C \u03b1 we have that\ni = xC\ni = x\u2217\ni .\nx\u03b1\nS\u2208N (C\u03b1) \u03b8S(xC). To assess a bound, after substituting this\n\ni , and for every variable xi in a potential not covered at all by C \u03b1 we have that x\u03b1\n\ninequality in equation 4, we have that:\n\nS\u2208T (C\u03b1) \u03b8S(x\u2217) +(cid:80)\n\n(cid:80)\nS\u2208N (C\u03b1) \u03b8S(xC)\n\nS\u2208T (C\u03b1) \u03b8S(x\u2217) +(cid:80)\n(cid:80)\n\n(cid:80)\n\nC\u03b1\u2208C\n\nC\n\n) \u2265\n\n\u03b8(x\n\n.\n\n(5)\n\nWe need to express the numerator in terms of \u03b8(xC) and \u03b8(x\u2217). Here is where the previously\nde\ufb01ned sets cc(S,C) and nc(S,C) come into play. Grouping the sum by potentials and recall that\ncc\u2217 = minS\u2208E cc(S,C), the term on the left can be expressed as:\n\n\u2217\n\u03b8S(x\n\n) =\n\ncc(S,C) \u00b7 \u03b8S(x\n\u2217\n\ncc\u2217 \u00b7 \u03b8S(x\n\n\u2217\n\n) = cc\u2217 \u00b7 \u03b8(x\n\u2217\n\n).\n\nFurthermore, recall that nc\u2217 = minS\u2208E nc(S,C), we can do the same with the right term:\n\nC\n\n) =\n\n\u03b8S(x\n\nnc(S,C) \u00b7 \u03b8S(x\nC\n\nnc\u2217 \u00b7 \u03b8S(x\nC\n\n) = nc\u2217 \u00b7 \u03b8(x\n\nC\n\n).\n\n(cid:88)\n\n(cid:88)\n\nC\u03b1\u2208C\n\nS\u2208T (C\u03b1)\n\n(cid:88)\n\n(cid:88)\n\nC\u03b1\u2208C\n\nS\u2208N (C\u03b1)\n\n(cid:88)\n\nS\u2208E\n\n(cid:88)\n\nS\u2208E\n\nC\u03b1\u2208C\n\n|C|\n\n) \u2265 (cid:88)\n) \u2265 (cid:88)\n\nS\u2208E\n\nS\u2208E\n\nAfter substituting these two results in equation 5 and rearranging terms, we obtain equation 3.\n3.2 Size-optimal bounds as a speci\ufb01c case of C-optimal bounds\nNow we present the main result in [18] as a speci\ufb01c case of C-optimality. An assignment is k-size-\noptimal if it can not be improved by changing the value of any group of size k or fewer variables.\nProposition 2. For any MRF and for any k-optimal assignment xk:\n\n\u03b8(xk) \u2265\n\n(k \u2212 1)\n\n(2|V | \u2212 k \u2212 1)\n\n\u03b8(x\u2217)\n\n(6)\n\nProof. This result is just a speci\ufb01c case of our general result where we take as a region all subsets of\n\nsize k, that is C = {C \u03b1 \u2286 V | |C \u03b1| = k}. The number of elements in the region is |C| =(cid:0)|V |\nnumber of elements in C that completely cover S is cc(S,C) =(cid:0)|V |\u22122\nS at all is nc(S,C) =(cid:0)|V |\u22122\n\n(cid:1). The\n(cid:1) (take the two variables in S\n(cid:1) (take k variables out of the remaining |V | \u2212 2 variables). Finally, we\n\nplus k \u2212 2 variables out of the remaining |V | \u2212 2). The number of elements in C that do not cover\nobtain equation 6 by using |V |, cc\u2217 and nc\u2217 in equation 3, and simplifying.\n\nk\u22122\n\nk\n\nk\n\n4\n\n\f(a) Bounds on complete, bipartite and 2-D\nstructures when varying the number of variables.\n\n(b) Bounds on MRFs with variable-disjoint cycles\nwhen varying the number of cycles and their size.\n\nFigure 2: Percent optimal bounds for max-sum \ufb01xed point assignments in speci\ufb01c MRF structures.\n\n4 Quality guarantees on max-sum \ufb01xed-point assignments\n\nIn this section we de\ufb01ne quality guarantees for max-sum \ufb01xed-point assignments in MRFs with\narbitary and speci\ufb01c structures. Our quality guarantees prove that the value of any max-sum \ufb01xed-\npoint assignments can not be less than a fraction of the optimum.\nThe main idea is that by virtue of the characterization of any max-sum \ufb01xed point assignment as\nSLT-optimal, we can select any region C composed of a combination of single cycles and trees of\nour graph and use it for computing its corresponding C-optimal bound by means of proposition 1.\nWe start by proving that bounds for a given graph apply to its subgraphs. Then, we \ufb01nd that the\nbound for the complete graph applies to any MRF independently of its structure and parameters.\nAfterwards we provide tighter bounds for MRFs with speci\ufb01c structures.\n4.1 C-optimal bounds based on the SLT region\nIn this section we show that C-optimal bounds based on SLT-optimality for a given graph can be\napplied to any of its subgraphs.\nProposition 3. Let G = (cid:104)V, E(cid:105) be a graphical model and C the SLT-region of G. Let G(cid:48) = (cid:104)V (cid:48), E(cid:48)(cid:105)\nbe a subgraph of G. Then the bound of equation 3 for G holds for any SLT-optimal assignment in G(cid:48).\nSketch of the proof. We can compose a region C(cid:48) containing the same elements as C but removing\nthose variables which are not contained in V (cid:48). Note that SLT-optimality on G(cid:48) guarantees optimality\nin each element of C(cid:48). Observe that the bound obtained by applying equation 3 to C(cid:48) is greater or\nequal than the bound obtained for C. Hence, the bound for G applies also to G(cid:48).\nA direct conclusion of proposition 3 is that any bound based on the SLT-region of a complete graph\nof n variables can be directly applied to any subgraph of n or fewer variables regardless of its\nstructure. In what follows we assess the bound for a complete graph.\nProposition 4. Let G = (cid:104)V, E(cid:105) be a complete MRF. For any max-sum \ufb01xed point assignment xM S,\n(7)\n\n\u03b8(xM S) \u2265\n\n\u00b7 \u03b8(x\u2217).\n\n1\n\n|V | \u2212 2\n\nProof. Let C be a region containing every possible combination of three variables in V . Every set of\nthree variables is part of the SLT-region because it can contain at most one cycle. The development\nin the proof of proposition 2 can be applied here for k = 3 to obtain equation 7.\nCorollary 5. For any MRF, any max-sum \ufb01xed point assignment xM S satis\ufb01es equation 7.\n\nSince any graph can be seen as a subgraph of the complete graph with the same number of vari-\nables, the corollary is straightforward given propositions 3 and 4. Figure 2(a) plots this structure-\nindependent bound when varying the number of variables. Observe that it rapidly decreases with\n\n5\n\n020406080100Number of variables020406080100Percent optimal (\u03b8(xMS)\u03b8(x\u2217)\u00b7100 )2D gridBipartiteComplete/Structure-independent31020304050Minimum number of variables in each cycle30405060708090100Percent optimal (\u03b8(xMS)\u03b8(x\u2217)\u00b7100 )d=2d=4d=8d=128d=1024\fx0\n\nx1\n\nx2\n\nx3\n\nx4\n\nx5\n\n(a)\n\nx0\n\nx1\n\nx2\n\nx0\n\nx1\n\nx2\n\nx3\n\nx0\n\nx3\n\nx0\n\nx3\n\nx0\n\nx3\n\nx0\n\nx4\n\nx1\n\nx4\n\nx1\n\nx4\n\nx1\n\nx4\n\nx1\n\nx5\n\nx2\n\nx5\n\nx2\n\nx5\n\nx2\n\nx5\n\nx2\n\n(b)\n\n(c)\n\n(d)\n\n(e)\n\n(f)\n\nx3\n\nx0\n\nx3\n\nx0\n\nx3\n\nx0\n\nx3\n\nx0\n\nx4\n\nx1\n\nx4\n\nx1\n\nx4\n\nx1\n\nx4\n\nx1\n\n(k)\n\nx5\n\nx2\n\n(l)\n\nx5\n\nx2\n\nx5\n\nx2\n\n(m)\n\n(n)\n\nx5\n\nx2\n\n(o)\n\nx3\n\nx4\n\nx5\n\nx3\n\nx4\n\nx5\n\nx0\n\nx1\n\nx2\n\nx0\n\nx1\n\nx2\n\nx3\n\nx4\n\nx5\n\nx3\n\nx4\n\nx5\n\n(g)\n\n(p)\n\nx0\n\nx1\n\nx2\n\nx3\n\nx4\n\nx5\n\nx0\n\nx1\n\nx2\n\nx3\n\nx4\n\nx5\n\nx0\n\nx1\n\nx2\n\nx3\n\nx4\n\nx5\n\n(j)\n\n(i)\n\n(h)\n\nFigure 3: Example of (a) a 3-3 bipartite graph and (b)-(p) sets of variables covered by the SLT-region.\n\nthe number of variables and it is only signi\ufb01cant on very small MRFs. In the next section, we show\nhow to exploit the knowledge of the structure of an MRF to improve the bound\u2019s signi\ufb01cance.\n\n4.2 SLT-bounds for speci\ufb01c MRF structures and independent of the MRF parameters\n\nIn this section we show that for MRFs with speci\ufb01c structures, it is possible to provide bounds much\ntighter than the structure-independent bound provided by corollary 5. These structures include, but\nare not limited to, bipartite graphs, 2-D grids, and variable-disjoint cycle graphs.\n\n4.2.1 Bipartite graphs\nIn this section we de\ufb01ne the C-optimal bound of equation 3 for any max-sum \ufb01xed point assignment\nin an n-m bipartite MRF. An n-m bipartite MRF is a graph whose vertices can be divided into two\ndisjoint sets, one with n variables and another one with m variables, such that the n variables in the\n\ufb01rst set are connected to the m variables in the second set. Figure 3(a) depicts a 3-3 bipartite MRF.\nProposition 6. For any MRF with n-m bipartite structure where m \u2265 n, and for any max-sum \ufb01xed\npoint assignment xM S we have that:\n\n(cid:40) 1\n\n\u03b8(xM S) \u2265 b(n, m) \u00b7 \u03b8(x\u2217)\n\nb(n, m) =\n\nm \u2265 n + 3\nn\nn+m\u22122 m < n + 3\n\n2\n\n(8)\n\nProof. Let CA be a region including one out of the n variables and all of the m variables (in \ufb01gure\n3, elements (n)-(p)). Since the elements of this region are trees, we can guarantee optimality on\nthem. The number of elements of the region is |CA| = n. It is clear that each edge in the graph\nis completely covered by one of the elements of CA, and hence cc\u2217 = 1. Furthermore, every edge\nis partially covered, since all of the m variables are present in every element, and hence nc\u2217 = 0.\nApplying equation 3 gives the bound 1/n.\nAlternatively, we can de\ufb01ne a region CB formed by taking sets of four variables, two from each set.\nSince the elements of CB are single-cycle graphs (in \ufb01gure 3, elements (b)-(j)), we can guarantee\noptimality on them. Applying proposition 1, we obtain the bound\nn+m\u22122 >\nn when m < n + 3, and so equation 8 holds (details can be found in the additional material).\n1\nExample 1. Consider the 3-3 bipartite MRF of \ufb01gure 3(a). Figures 3(b)-(j) show the elements in\nthe region CB composed of sets of four variables, two from each side. Therefore |CB| is 9. Then, for\nany edge S \u2208 E there are 4 sets in CB that contain its two variables. For example, the edge that\nlinks the upper left variable (x0) and the upper right variable (x3) is included in the subgraphs of\n\ufb01gures 3(b), (c), (e) and (f). Moreover, for any edge S \u2208 E there is a single element in CB that does\nnot cover it at all. For example, the only graph that does not include neither x0 nor x3 is the graph\nof \ufb01gure 3(j). Thus, the bound is 4/(9 \u2212 1) = 1/2.\nFigure 2(a) plots the bound of equation 8 for bipartite graphs when varying the number of variables.\nNote that although, also in this case, the value of the bound rapidly decreases with the number of\nvariables, it is two times the values of the structure-independent bound (see equation 7).\n\nn+m\u22122. Observe that\n\n2\n\n2\n\n4.2.2 Two-dimensional (2-D) grids\nIn this section we de\ufb01ne the C-optimal bound of equation 3 for any max-sum \ufb01xed point assignment\nin a two-dimensional grid MRF. An n-grid structure stands for a graph with n rows and n columns\nwhere each variable has 4 neighbours. Figure 4 (a) depicts a 4-grid MRF.\n\n6\n\n\fx0\n\nx4\n\nx8\n\nx1\n\nx5\n\nx9\n\nx2\n\nx6\n\nx3\n\nx7\n\nx10\n\nx11\n\nx0\n\nx4\n\nx8\n\nx1\n\nx5\n\nx9\n\nx2\n\nx6\n\nx3\n\nx7\n\nx10\n\nx11\n\nx0\n\nx4\n\nx8\n\nx1\n\nx5\n\nx9\n\nx2\n\nx6\n\nx3\n\nx7\n\nx10\n\nx11\n\nx0\n\nx4\n\nx8\n\nx1\n\nx5\n\nx9\n\nx2\n\nx6\n\nx3\n\nx7\n\nx10\n\nx11\n\nx0\n\nx4\n\nx8\n\nx1\n\nx5\n\nx9\n\nx2\n\nx6\n\nx3\n\nx7\n\nx10\n\nx11\n\nx12\n\nx13\n\nx14\n\nx15\n\nx12\n\nx13\n\nx14\n\nx15\n\nx12\n\nx13\n\nx14\n\nx15\n\nx12\n\nx13\n\nx14\n\nx15\n\nx12\n\nx13\n\nx14\n\nx15\n\n(a)\n\n(b)\n\n(c)\n\n(d)\n\n(e)\n\nFigure 4: Example of (a) a 4-grid graph and (b)-(e) sets of variables covered by the SLT-region.\n\nProposition 7. For any MRF with an n grid structure where n is an even number, for any max-sum\n\ufb01xed point assignment xM S we have that\n\n\u03b8(xM S) \u2265 n\n\n3n \u2212 4\n\n\u00b7 \u03b8(x\u2217)\n\n(9)\n\nProof. We can partition columns in pairs joining column 1 with column (n/2) + 1, column 2 with\ncolumn (n/2) + 2 and so on. We can partition rows in the same way. Let C be a region where\neach element contains the vertices in a pair of rows at distance n\n2 together with those in a pair of\n2 . Note that optimality is guaranteed in each C \u03b1 \u2208 C because variables in two\ncolumns at distance n\nnon-consecutive rows and two non-consecutive columns create a single-cycle graph. Since we take\nevery possible combination, |C| = ( n\n2 elements and hence\n2 \u2212 2) elements of C that do not cover\ncc\u2217 = n\nS at all. Substituting these values into equation 3 leads to equation 9.\n\n2 . Finally2, for each edge S, there are nc\u2217 = ( n\n\n2 )2. Each edge is completely covered by n\n\n2 \u2212 1)( n\n\nExample 2. Consider the 4-grid MRF of \ufb01gure 4 (a). Figures 4 (b)-(e) show the vertex-induced\nsubgraphs for each set of vertices in the region C formed by the combination of any pairs of rows in\n{(1, 3), (2, 4)} and pair of columns in {(1, 3), (2, 4)}. Therefore |C| = 4. Then, for any edge S \u2208 E\nthere are 2 sets that contain its two variables. For example, the edge that links the two \ufb01rst variables\nin the \ufb01rst row, namely x0 and x1, is included in the subgraphs of \ufb01gures (a) and (b). Moreover, for\nany edge S \u2208 E there is no set that contains no variable from S. Thus, the bound is 1/2.\n\nFigure 2(a) plots the bound for 2-D grids when varying the number of variables. Note that when\ncompared with the bound for complete and bipartite structures, the bound for 2-D grids decreases\nsmoothly and tends to stabilize as the number of variables increases. In fact, observe that by equation\n9, the bound for 2-D grids is never less that 1/3 independently of the grid size.\n\n4.2.3 MRFs that are a union of variable-disjoint cycles\n\nIn this section we assess a bound for MRFs composed of a set of variable-disjoint cycles, namely of\ncycles that do not share any variable.\nA common pattern shared by the bounds assessed so far is that they decrease as the number of vari-\nables of an MRF grows. This section provides an example showing that there are speci\ufb01c structures\nfor which C-optimality obtains signi\ufb01cant bounds for large MRFs.\nExample 3. Consider the MRF composed of two variable-disjoint cycles of size 4 depicted in \ufb01g-\nure 5(a). To create the region, we remove each of the variables of the \ufb01rst cycle, one at a time (see\n\ufb01gures 5(b)-(e)). We act analogously with the second cycle. Hence, C is composed of 8 elements.\nJust by counting we observe that each edge is completely covered 6 times, so cc\u2217 = 6. Since we are\nremoving a single variable at a time, nc\u2217 = 0. Hence, the bound for a max-sum \ufb01xed point in this\nMRF structure is 6/8 = 3/4.\n\nThe following result generalizes the previous example to MRFs containing d variable-disjoint cycles\nof size larger or equal to l.\nProposition 8. For any MRF such that every pair of cycles is variable-disjoint and where there are\nat most d cycles of size l or larger, and for any max-sum \ufb01xed point assignment xM S, we have that:\n\n\u03b8(xM S) \u2265\n\n1 \u2212 2(d \u2212 1)\nd \u00b7 l\n\n\u00b7 \u03b8(x\u2217) =\n\n(l \u2212 2) \u00b7 d + 2\n\nl \u00b7 d\n\n\u00b7 \u03b8(x\u2217).\n\n(10)\n\n(cid:18)\n\n(cid:19)\n\n2Details can be found in the additional material\n\n7\n\n\f1\nx\n\n3\nx\n\n5\nx\n\n7\nx\n\n1\nx\n\n3\nx\n\n5\nx\n\n7\nx\n\n1\nx\n\n3\nx\n\n5\nx\n\n7\nx\n\n1\nx\n\n3\nx\n\n5\nx\n\n7\nx\n\n1\nx\n\n3\nx\n\n5\nx\n\n7\nx\n\n0\nx\n\n2\nx\n\n4\nx\n\n6\nx\n\n0\nx\n\n2\nx\n\n4\nx\n\n6\nx\n\n0\nx\n\n2\nx\n\n4\nx\n\n6\nx\n\n0\nx\n\n2\nx\n\n4\nx\n\n6\nx\n\n0\nx\n\n2\nx\n\n4\nx\n\n6\nx\n\n(a)\n\n(b)\n\n(c)\n\n(d)\n\n(e)\n\nFigure 5: (a) 2 variable-disjoint cycles MRF of size 4 and (b-e) sets of variables covered by the SLT-region.\n\nThe proof generalizes the region explained in example 3 to any variable-disjoint cycle MRF by\nde\ufb01ning a region that includes an element for every possible edge removal from every cycle but one.\nThe proof is omitted here due to lack of space but can be consulted in the additional material.\nEquation 10 shows that the bound: (i) decreases with the number of cycles; and (ii) increases as the\nmaximum number of variables in each cycle grows. Figure 2(b) illustrates the relationship between\nthe bound, the number of cycles (d), and the maximum size of the cycles (l). The \ufb01rst thing we\nobserve is that the size of the cycles has more impact on the bound than the number of cycles. In\nfact, observe that by equation 10, the bound for a variable-disjoint cycle graph with a maximum\ncycle size of l is at least (l\u22122)\n, independently of the number of cycles. Thus, if the minimum size of\na cycle is 20, the quality for a \ufb01xed point is guaranteed to be at least 90%. Hence, quality guarantees\nfor max-sum \ufb01xed points are good whenever: (i) the cycles in the MRF do not share any variables;\nand (ii) the smallest cycle in the MRF is large. Therefore, our result con\ufb01rms and re\ufb01nes the recent\nresults obtained for single-cycle MRFs [11].\n\nl\n\n4.3 SLT-bounds for arbitrary MRF structures and independent of the MRF parameters\n\nIn this section we discuss how to assess tight SLT-bounds for any arbitrary MRF structure. Similarly\nto [18, 20], we can use linear fractional programming (LFP) to compute the structure speci\ufb01c SLT\nbounds in any MRF with arbitrary structure. Let C be a region for all subsets in the SLT region\nof the graphical model G = (cid:104)V, E(cid:105) of an MRF. For each S \u2208 E, the LFP contains two LFP\nvariables that represents the value of the edge S for the SLT-optimum, xM S, and for the MAP\nsuch that for all C\u03b1 \u2208 C,\nassignment, x\u2217. The objective of the LFP is to minimize\n\u03b8(xM S) \u2212 \u03b8(x\u03b1) \u2265 0. Following [18, 20], for each C\u03b1 \u2208 C, \u03b8(x\u03b1) can be expressed in terms of\nthe value of the potentials for xM S and x\u2217. Then, the optimal value of this LFP is a tight bound\nfor any MRF with the given speci\ufb01c structure. Indeed, the solution of the LFP provides the values\nof potentials for xM S and x\u2217 that produce the worst-case MRF whose SLT-optimum has the lowest\nvalue with respect to the optimum. However, because this method requires to list all the sets in\nthe SLT-region, the complexity of generating an LFP increases exponentially with the number of\nvariables in the MRF. Therefore, although this method provides more \ufb02exibility to deal with any\narbitrary structure, its computational cost does not scale with the size of MRFs in contrast with the\nstructure speci\ufb01c SLT-bounds of section 4.2, that are assessed in constant time.\n\nS\u2208E \u03b8S (xM S )\nS\u2208E \u03b8S (x\u2217)\n\n(cid:80)\n(cid:80)\n\n5 Conclusions\n\nWe provided worst-case bounds on the quality of any max-product \ufb01xed point. With this aim, we\nhave introduced C-optimality, which has proven a valuable tool to bound the quality of max-product\n\ufb01xed points. Concretely, we have proven that independently of an MRF structure, max-product has\na quality guarantee that decreases with the number of variables of the MRF. Furthermore, our results\nallow to identify new classes of MRF structures, besides acyclic and single-cycle, for which we\ncan provide theoretical guarantees on the quality of max-product assignments. As an example, we\nde\ufb01ned signi\ufb01cant bounds for 2-D grids and MRFs with variable-disjoint cycles.\n\nAcknowledgments\n\nWork funded by projects EVE (TIN2009-14702-C02-01,TIN2009-14702-C02-02), AT(CONSOLIDER\nCSD2007-0022), and Generalitat de Catalunya (2009-SGR-1434). Vinyals is supported by the Ministry of\nEducation of Spain (FPU grant AP2006-04636).\n\n8\n\n\fReferences\n[1] Marshall F. Tappen and William T. Freeman. Comparison of graph cuts with belief propagation for stereo,\n\nusing identical mrf parameters. In In ICCV, pages 900\u2013907, 2003.\n\n[2] Jon Feldman, Martin J. Wainwright, and David R. Karger. Using linear programming to decode binary\n\nlinear codes. IEEE Transactions on Information Theory, 51(3):954\u2013972, 2005.\n\n[3] Chen Yanover and Yair Weiss. Approximate inference and protein-folding. In Advances in Neural Infor-\n\nmation Processing Systems, pages 84\u201386. MIT Press, 2002.\n\n[4] Alessandro Farinelli, Alex Rogers, Adrian Petcu, and Nicholas R. Jennings. Decentralised coordination\n\nof low-power embedded devices using the max-sum algorithm. In AAMAS, pages 639\u2013646, 2008.\n\n[5] Solomon Eyal Shimony. Finding MAPs for belief networks is NP-Hard. Artif. Intell., 68(2):399\u2013410,\n\n1994.\n\n[6] Judea Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann Publishers Inc., San\n\nFrancisco, CA, USA, 1988.\n\n[7] Srinivas M. Aji and Robert J. McEliece. The generalized distributive law. IEEE Transactions on Infor-\n\nmation Theory, 46(2):325\u2013343, 2000.\n\n[8] Srinivas Aji, Gavin Horn, Robert Mceliece, and Meina Xu.\n\nIterative min-sum decoding of tail-biting\n\ncodes. In In Proc. IEEE Information Theory Workshop, pages 68\u201369, 1998.\n\n[9] Brendan J. Frey, Ralf Koetter, G. David Forney Jr., Frank R. Kschischang, Robert J. McEliece, and\nDaniel A. Spielman. Introduction to the special issue on codes on graphs and iterative algorithms. IEEE\nTransactions on Information Theory, 47(2):493\u2013497, 2001.\n\n[10] Brendan J. Frey, Ralf Koetter, and Nemanja Petrovic. Very loopy belief propagation for unwrapping phase\n\nimages. In NIPS, pages 737\u2013743, 2001.\n\n[11] Yair Weiss. Correctness of local probability propagation in graphical models with loops. Neural Compu-\n\ntation, 12(1):1\u201341, 2000.\n\n[12] Mohsen Bayati, Christian Borgs, Jennifer T. Chayes, and Riccardo Zecchina. Belief-propagation for\nweighted b-matchings on arbitrary graphs and its relation to linear programs with integer solutions. CoRR,\nabs/0709.1190, 2007.\n\n[13] Sujay Sanghavi, Dmitry Malioutov, and Alan Willsky. Linear programming analysis of loopy belief\npropagation for weighted matching. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances\nin Neural Information Processing Systems 20, pages 1273\u20131280. MIT Press, Cambridge, MA, 2008.\n\n[14] Sujay Sanghavi, Devavrat Shah, and Alan S. Willsky. Message-passing for maximum weight independent\n\nset. CoRR, abs/0807.5091, 2008.\n\n[15] Bert Huang and Tony Jebara. Loopy belief propagation for bipartite maximum weight b-matching. In\nMarina Meila and Xiaotong Shen, editors, In Proceedings of the Eleventh International Conference on\nArti\ufb01cial Intelligence and Statistics, March 2007.\n\n[16] Martin J. Wainwright, Tommi Jaakkola, and Alan S. Willsky. Tree consistency and bounds on the perfor-\nmance of the max-product algorithm and its generalizations. Statistics and Computing, 14(2):143\u2013166,\n2004.\n\n[17] Yair Weiss and William T. Freeman. On the optimality of solutions of the max-product belief-propagation\n\nalgorithm in arbitrary graphs. IEEE Transactions on Information Theory, 47(2):736\u2013744, 2001.\n\n[18] Jonathan P. Pearce and Milind Tambe. Quality guarantees on k-optimal solutions for distributed constraint\n\noptimization problems. In IJCAI, pages 1446\u20131451, 2007.\n\n[19] Christopher Kiekintveld, Zhengyu Yin, Atul Kumar, and Milind Tambe. Asynchronous algorithms for\napproximate distributed constraint optimization with quality bounds. In AAMAS, pages 133\u2013140, 2010.\n[20] J. P. Pearce. Local Optimization in Cooperative Agent Networks. PhD thesis, University of Southern\n\nCalifornia, Los Angeles, CA, August 2007.\n\n9\n\n\f", "award": [], "sourceid": 445, "authors": [{"given_name": "Meritxell", "family_name": "Vinyals", "institution": null}, {"given_name": "Jes\\'us", "family_name": "Cerquides", "institution": null}, {"given_name": "Alessandro", "family_name": "Farinelli", "institution": null}, {"given_name": "Juan", "family_name": "Rodr\u00edguez-aguilar", "institution": null}]}