{"title": "A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles", "book": "Advances in Neural Information Processing Systems", "page_first": 2022, "page_last": 2030, "abstract": "Max-product \u2018belief propagation\u2019 (BP) is a popular distributed heuristic for finding the Maximum A Posteriori (MAP) assignment in a joint probability distribution represented by a Graphical Model (GM). It was recently shown that BP converges to the correct MAP assignment for a class of loopy GMs with the following common feature: the Linear Programming (LP) relaxation to the MAP problem is tight (has no integrality gap). Unfortunately, tightness of the LP relaxation does not, in general, guarantee convergence and correctness of the BP algorithm. The failure of BP in such cases motivates reverse engineering a solution \u2013 namely, given a tight LP, can we design a \u2018good\u2019 BP algorithm. In this paper, we design a BP algorithm for the Maximum Weight Matching (MWM) problem over general graphs. We prove that the algorithm converges to the correct optimum if the respective LP relaxation, which may include inequalities associated with non-intersecting odd-sized cycles, is tight. The most significant part of our approach is the introduction of a novel graph transformation designed to force convergence of BP. Our theoretical result suggests an efficient BP-based heuristic for the MWM problem, which consists of making sequential, \u201ccutting plane\u201d, modifications to the underlying GM. Our experiments show that this heuristic performs as well as traditional cutting-plane algorithms using LP solvers on MWM problems.", "full_text": "A Graphical Transformation for Belief Propagation:\nMaximum Weight Matchings and Odd-Sized Cycles\n\nJinwoo Shin\n\nDepartment of Electrical Engineering\n\nKorea Advanced Institute of Science and Technology\n\nDaejeon, 305-701, Republic of Korea\n\njinwoos@kaist.ac.kr\n\nAndrew E. Gelfand \u2217\nDepartment of Computer Science\nUniversity of California, Irvine\nIrvine, CA 92697-3435, USA\nagelfand@ics.uci.edu\n\nMichael Chertkov\nTheoretical Division &\n\nCenter for Nonlinear Studies\n\nLos Alamos National Laboratory\nLos Alamos, NM 87545, USA\n\nchertkov@lanl.gov\n\nAbstract\n\nMax-product \u2018belief propagation\u2019 (BP) is a popular distributed heuristic for \ufb01nd-\ning the Maximum A Posteriori (MAP) assignment in a joint probability distribu-\ntion represented by a Graphical Model (GM). It was recently shown that BP con-\nverges to the correct MAP assignment for a class of loopy GMs with the following\ncommon feature: the Linear Programming (LP) relaxation to the MAP problem is\ntight (has no integrality gap). Unfortunately, tightness of the LP relaxation does\nnot, in general, guarantee convergence and correctness of the BP algorithm. The\nfailure of BP in such cases motivates reverse engineering a solution \u2013 namely,\ngiven a tight LP, can we design a \u2018good\u2019 BP algorithm.\nIn this paper, we design a BP algorithm for the Maximum Weight Matching\n(MWM) problem over general graphs. We prove that the algorithm converges\nto the correct optimum if the respective LP relaxation, which may include in-\nequalities associated with non-intersecting odd-sized cycles, is tight. The most\nsigni\ufb01cant part of our approach is the introduction of a novel graph transformation\ndesigned to force convergence of BP. Our theoretical result suggests an ef\ufb01cient\nBP-based heuristic for the MWM problem, which consists of making sequential,\n\u201ccutting plane\u201d, modi\ufb01cations to the underlying GM. Our experiments show that\nthis heuristic performs as well as traditional cutting-plane algorithms using LP\nsolvers on MWM problems.\n\n1\n\nIntroduction\n\nGraphical Models (GMs) provide a useful representation for reasoning in a range of scienti\ufb01c \ufb01elds\n[1, 2, 3, 4]. Such models use a graph structure to encode the joint probability distribution, where\nvertices correspond to random variables and edges (or lack of thereof) specify conditional depen-\ndencies. An important inference task in many applications involving GMs is to \ufb01nd the most likely\nassignment to the variables in a GM - the maximum a posteriori (MAP) con\ufb01guration. Belief Prop-\nagation (BP) is a popular algorithm for approximately solving the MAP inference problem. BP is\nan iterative, message passing algorithm that is exact on tree structured GMs. However, BP often\nshows remarkably strong heuristic performance beyond trees, i.e. on GMs with loops. Distributed\nimplementation, associated ease of programming and strong parallelization potential are among the\nmain reasons for the popularity of the BP algorithm, e.g., see the parallel implementations of [5, 6].\nThe convergence and correctness of BP was recently established for a certain class of loopy GM\nformulations of several classic combinatorial optimization problems, including matchings [7, 8, 9],\nperfect matchings [10], independent sets [11] and network \ufb02ows [12]. The important common\n\n\u2217Also at Theoretical Division of Los Alamos National Lab.\n\n1\n\n\ffeature of these instances is that BP converges to a correct MAP assignment when the Linear Pro-\ngramming (LP) relaxation of the MAP inference problem is tight, i.e., it shows no integrality gap.\nWhile this demonstrates that LP tightness is necessary for the convergence and correctness of BP,\nit is unfortunately not suf\ufb01cient in general. In other words, BP may not work even when the corre-\nsponding LP relaxation to the MAP inference problem is tight. This motivates a quest for improving\nBP-based MAP solvers so that they work when the LP is tight.\nIn this paper, we consider a speci\ufb01c class of GMs corresponding to the Maximum Weight Matching\n(MWM) problem and study if BP can be used as an iterative, message passing-based LP solver\nwhen the MWM LP (relaxation) is tight. It was recently shown [15] that a MWM can be found in\npolynomial time by solving a carefully chosen sequence of LP relaxations, where the sequence of\nLPs are formed by adding and removing sets of so-called \u201cblossom\u201d inequalities [13] to the base\nLP relaxation. Utilizing successive LP relaxations to solve the MWM problem is an example of\nthe popular cutting plane method for solving combinatorial optimization problems [14]. While the\napproach in [15] is remarkable in that one needs only a polynomial number of \u201ccut\u201d inequalities,\nit unfortunately requires solving an emerging sequence of LPs via traditional, centralized methods\n(e.g., ellipsoid, interior-point or simplex) that may not be practical for large-scale problems. This\nmotivates our search for an ef\ufb01cient and distributed BP-based LP solver for this class of problems.\nOur work builds upon that of Sanghavi, Malioutov and Willsky [8], who studied BP for the GM\nformulation of the MWM problem on an arbitrary graph. The authors showed that max-product BP\nconverges to the correct, MAP solution if the base LP relaxation with no blossom - referred to herein\nas MWM-LP - is tight. Unfortunately, the tightness is not guaranteed in general, and the convergence\nand correctness for max-product BP do not readily extend to a GM with blossom constraints.\nTo resolve this issue, we propose a novel GM formulation of the MWM problem and show that max-\nproduct BP on this new GM converges to the MWM assignment as long as the MWM-LP relaxation\nwith blossom constraints is tight. The only restriction placed on our GM construction is that the\nset of blossom constraints added to the base MWM-LP be non-intersecting (in edges). Our GM\nconstruction is motivated by the so-called \u2018degree-two\u2019 (DT) condition, which requires that every\nvariable in the GM be associated to at most two factor functions. The DT condition is necessary\nfor analysis of BP using the computational tree technique, developed and advanced in [7, 8, 12, 16,\n18, 19]. Note, that the DT condition is not satis\ufb01ed by the standard MWM GM formulation, and\nhence, we design a new GM that satis\ufb01es the DT condition via a clever graphical transformation -\nnamely, collapsing odd-sized cycles and de\ufb01ning new weights on the contracted graph. Importantly,\nthe MAP assignments of the two GMs are in one-to-one correspondence guaranteeing that a solution\nto the original problem can be recovered.\nOur theoretical result suggests a cutting-plane approach to the MWM problem, where BP is used\nas the LP solver. In particular, we examine the BP solution to identify odd-sized cycle constraints\n- \u201ccuts\u201d - to add to the MWM-LP relaxation; then construct a new GM using our graphical trans-\nformation, run BP and repeat. We evaluate this heuristic empirically and show that its performance\nis close to a traditional cutting-plane approach employing an LP solver rather than BP. Finally, we\nnote that the DT condition may neither be suf\ufb01cient nor necessary for BP to work. It was necessary,\nhowever, to provide theoretical guarantees for the special class of GMs considered. To our knowl-\nedge, our result is the \ufb01rst to suggest how to \u201c\ufb01x\u201d BP via a graph transformation so that it works\nproperly, i.e., recovers the desired LP solution. We believe that our success in crafting a graphical\ntransformation will offer useful insight into the design and analysis of BP algorithms for a wider\nclass of problems.\nOrganization. In Section 2, we introduce a standard GM formulation of the MWM problem as well\nas the corresponding BP and LP. In Section 3, we introduce our new GM and describe performance\nguarantees of the respective BP algorithm. In Section 4, we describe a cutting-plane(-like) method\nusing BP for the MWM problem and show its empirical performance for random MWM instances.\n\n2\n\n\f2 Preliminaries\n\n2.1 Graphical Model for Maximum Weight Matchings\nA joint distribution of n (discrete) random variables Z = [Zi] \u2208 \u2126n is called a Graphical Model\n(GM) if it factorizes as follows: for z = [zi] \u2208 \u2126n,\n\n(1)\nwhere F is a collection of subsets of \u2126, z\u03b1 = [zi : i \u2208 \u03b1 \u2282 \u2126] is a subset of variables, and \u03c8\u03b1 is\nsome (given) non-negative function. The function \u03c8\u03b1 is called a factor (variable) function if |\u03b1| \u2265 2\n(|\u03b1| = 1). For variable functions \u03c8\u03b1 with \u03b1 = {i}, we simply write \u03c8\u03b1 = \u03c8i. One calls z a valid\nassignment if Pr[Z = z] > 0. The MAP assignment z\u2217 is de\ufb01ned as\n\n\u03c8\u03b1(z\u03b1),\n\n\u03b1\u2208F\n\nPr[Z = z] \u221d (cid:89)\n\nz\u2217 = arg max\nz\u2208\u2126n\n\nPr[Z = z].\n\nLet us introduce the Maximum Weight Matching (MWM) problem and its related GM. Suppose we\nare given an undirected graph G = (V, E) with weights {we : e \u2208 E} assigned to its edges. A\nmatching is a set of edges without common vertices. The weight of a matching is the sum of cor-\nresponding edge weights. The MWM problem consists of \ufb01nding a matching of maximum weight.\nAssociate a binary random variable with each edge X = [Xe] \u2208 {0, 1}|E| and consider the proba-\nbility distribution: for x = [xe] \u2208 {0, 1}|E|,\n\nPr[X = x] \u221d (cid:89)\n\newexe(cid:89)\n\ne\u2208E\n\ni\u2208V\n\n\u03c8i(x)\n\n\u03c8C(x),\n\n(cid:89)\n(cid:40)\n1 if (cid:80)\n\nC\u2208C\n\nand\n\n\u03c8C (x) =\n\n0 otherwise\n\ne\u2208E(C) xe \u2264 |C|\u22121\n\n2\n\nwhere\n\n\u03c8i(x) =\n\n(cid:40)\n1 if (cid:80)\n\n0 otherwise\n\ne\u2208\u03b4(i) xe \u2264 1\n\n(2)\n\n.\n\nHere C is a set of odd-sized cycles C \u2282 2V , \u03b4(i) = {(i, j) \u2208 E} and E(C) = {(i, j) \u2208 E :\ni, j \u2208 C}. Throughout the manuscript, we assume that cycles are non-intersecting in edges, i.e.,\nE(C1) \u2229 E(C2) = \u2205 for all C1, C2 \u2208 C. It is easy to see that a MAP assignment x\u2217 for the GM (2)\ninduces a MWM in G. We also assume that the MAP assignment is unique.\n\n2.2 Belief Propagation and Linear Programming for Maximum Weight Matchings\n\nIn this section, we introduce max-product Belief Propagation (BP) and the Linear Programming\n(LP) relaxation to computing the MAP assignment in (2). We \ufb01rst describe the BP algorithm for the\ngeneral GM (1), then tailor the algorithm to the MWM GM (2). The BP algorithm updates the set of\ni\u2192\u03b1(zi) : zi \u2208 \u2126} between every variable i and its associated factors\n2|\u2126| messages {mt\n(cid:88)\n(cid:89)\n\u03b1 \u2208 Fi = {\u03b1 \u2208 F : i \u2208 \u03b1,|\u03b1| \u2265 2} using the following update rules:\n\n\u03b1\u2192i(zi), mt\n\n(cid:89)\n\n(cid:48)\n\nand\n\nmt+1\n\ni\u2192\u03b1(zi) = \u03c8i(zi)\n\nmt\n\n\u03b1(cid:48)\u2192i(zi).\n\nmt\n\nj\u2192\u03b1(z\n\n(cid:48)\nj)\n\nmt+1\n\n\u03b1\u2192i(zi) =\n\n\u03c8\u03b1(z\n\n)\n\nz(cid:48):z(cid:48)\n\ni=zi\n\nj\u2208\u03b1\\i\n\nHere t denotes time and initially m0\n{mi\u2192\u03b1(\u00b7), m\u03b1\u2192i(\u00b7))}, the BP (max-marginal) beliefs {ni(zi)} are de\ufb01ned as follows:\n\n\u03b1\u2192i(\u00b7) = m0\n\ni\u2192\u03b1(\u00b7) = 1. Given a set of messages\n\nni(zi) = \u03c8i(zi)\n\nm\u03b1\u2192i(zi).\n\n\u03b1\u2208Fi\n\nFor the GM (2), we let nt\nthe MAP estimate at time t, xBP(t) = [xBP\n\ne(\u00b7) to denote the BP belief on edge e \u2208 E at time t. The algorithm outputs\n|E|, using the using the beliefs and the rule:\n\ne (t)] \u2208 [0, ?, 1]\n\n\u03b1(cid:48)\u2208Fi\\\u03b1\n\nxBP\ne (t) =\n\ne(0) < nt\nij(0) = nt\ne(0) > nt\n\ne(1)\ne(1)\ne(1)\n\n.\n\nThe LP relaxation to the MAP problem for the GM (2) is:\n\n(cid:88)\n\nC-LP :\n\nmax\n\ns.t. (cid:88)\n\nwexe\n\ne\u2208E\nxe \u2264 1,\n\ne\u2208\u03b4(i)\n\n\u2200i \u2208 V,\n\nxe \u2264 |C| \u2212 1\n\n2\n\n\u2200C \u2208 C,\n\n,\n\nxe \u2208 [0, 1].\n\n(cid:89)\n\uf8f1\uf8f4\uf8f2\uf8f4\uf8f31 if nt\n(cid:88)\n\nif nt\n?\n0 if nt\n\ne\u2208E(C)\n\n3\n\n\fObserve that if the solution xC-LP to C-LP is integral, i.e., xC-LP \u2208 {0, 1}|E|, then it is a MAP\nassignment, i.e., xC-LP = x\u2217. Sanghavi, Malioutov and Willsky [8] proved the following theorem\nconnecting the performance of BP and C-LP in a special case:\nTheorem 2.1. If C = \u2205 and the solution of C-LP is integral and unique, then xBP(t) under the GM\n(2) converges to the MWM assignment x\u2217.\nAdding small random component to every weight guarantees the uniqueness condition required by\nTheorem 2.1. A natural hope is that Theorem 2.1 extends to a non-empty C since adding more cycles\ncan help to reduce the integrality gap of C-LP. However, the theorem does not hold when C (cid:54)= \u2205. For\nexample, BP does not converge for a triangle graph with edge weights {2, 1, 1} and C consisting of\nthe only cycle. This is true even though the solution to its C-LP is unique and integral.\n\n3 A Graphical Transformation for Convergent & Correct BP\nThe loss of convergence and correctness of BP when the MWM LP is tight (and unique) but C (cid:54)= \u2205\nmotivates the work in this section. We resolve the issue by designing a new GM, equivalent to the\noriginal GM, such that when BP is run on this new GM it converges to the MAP/MWM assignment\nwhenever the LP relaxation is tight and unique - even if C (cid:54)= \u2205. The new GM is de\ufb01ned on an\nauxiliary graph G(cid:48) = (V (cid:48), E(cid:48)) with new weights {w(cid:48)\n\ne : e \u2208 E(cid:48)}, as follows:\n\n= E \u222a {(iC , j) : j \u2208 V (C), C \u2208 C} \\ {e : e \u2208 \u222aC\u2208CE(C)}\n\n(cid:40) 1\n\n= V \u222a {iC : C \u2208 C},\n\n(cid:80)\ne(cid:48)\u2208E(C)(\u22121)dC (j,e(cid:48))we(cid:48)\n\nE\n\n(cid:48)\n\n2\n\n(cid:48)\n\nV\n\n(cid:48)\ne =\n\nw\n\nwe\n\nif e = (iC , j)\notherwise\n\nfor some C \u2208 C\n\n.\n\nHere dC(j, e) is the graph distance of j and e in cycle C = (j1, j2, . . . , jk), e.g., if e = (j2, j3),\nthen dC(j1, e) = 1.\n\nFigure 1: Example of original graph G (left) and new graph G(cid:48) (right) after collapsing cycle C =\n(1, 2, 3, 4, 5). In the new graph G(cid:48), edge weight w1C = 1/2(w12 \u2212 w23 + w34 \u2212 w45 + w15).\n\nAssociate a binary variable with each new edge and consider the new probability distribution on\ny = [ye : e \u2208 E(cid:48)] \u2208 {0, 1}|E(cid:48)|:\n\n(cid:89)\n\nC\u2208C\n\ne\u2208E(cid:48)\n\nPr[Y = y] \u221d (cid:89)\n\uf8f1\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f3\n\n\u03c8C (y) =\n\new(cid:48)\n\neye(cid:89)\n0 if (cid:80)\n0 if (cid:80)\n\ne\u2208\u03b4(iC )\n\ni\u2208V\n\nj\u2208V (C)\n1 otherwise\n\nwhere\n\n\u03c8i(y) =\n\n\uf8f1\uf8f2\uf8f31 if (cid:80)\n\ne\u2208\u03b4(i)\n0 otherwise\n\nye \u2264 1\n\n\u03c8i(y)\n\n\u03c8C(y),\n\n(3)\n\nye > |C| \u2212 1\n(\u22121)dC (j,e)yiC ,j /\u2208 {0, 2} for some e \u2208 E(C)\n\n.\n\nIt is not hard to check that the number of operations required to update messages at each round of\nBP under the above GM is O(|V ||E|), as messages updates involving factor \u03c8C require solving a\nMWM problem on a simple cycle \u2013 which can be done ef\ufb01ciently via dynamic programming in time\nO(|C|) \u2013 and the summation of the numbers of edges of non-intersecting cycles is at most |E|. We\nare now ready to state the main result of this paper.\nTheorem 3.1. If the solution of C-LP is integral and unique, then the BP-MAP estimate yBP(t)\nunder the GM (3) converges to the corresponding MAP assignment y\u2217. Furthermore, the MWM\nassignment x\u2217 is reconstructible from y\u2217 as:\n\n(cid:40) 1\n\n2\n\n(cid:80)\nj\u2208V (C)(\u22121)dC (j,e)y\u2217\n\niC ,j\n\nx\u2217\ne =\n\ny\u2217\n\ne\n\nif e \u2208(cid:83)\n\notherwise\n\nC\u2208C E(C)\n\n.\n\n(4)\n\n4\n\n\fThe proof of Theorem 3.1 is provided in the following sections. We also establish the convergence\ntime of the BP algorithm under the GM (3) (see Lemma 3.2). We stress that the new GM (3) is\ndesigned so that each variable is associated to at most two factor nodes. We call this condition,\nwhich did not hold for the original GM (2), the \u2018degree-two\u2019 (DT) condition. The DT condition\nwill play a critical role in the proof of Theorem 3.1. We further remark that even under the DT\ncondition and given tightness/uniqueness of the LP, proving correctness and convergence of BP is\nstill highly non trivial. In our case, it requires careful study of the computation tree induced by BP\nwith appropriate truncations at its leaves.\n\n3.1 Main Lemma for Proof of Theorem 3.1\n\nLet us introduce the following auxiliary LP over the new graph and weights.\n\n\u2200i \u2208 V,\n\nye \u2208 [0, 1],\n\n\u2200e \u2208 E\n\n(cid:48)\n\n,\n\n(\u22121)dC (j,e)yiC ,j \u2208 [0, 2],\n\n\u2200e \u2208 E(C),\n\nye \u2264 |C| \u2212 1,\n\n\u2200C \u2208 C.\n\n(5)\n\n(6)\n\n(cid:88)\n\ne\u2208\u03b4(iC )\n\n(cid:88)\n\n(cid:48)\neye\n\nw\n\ne\u2208E(cid:48)\nye \u2264 1,\n\nC-LP(cid:48)\n\n: max\n\ns.t. (cid:88)\n(cid:88)\n\ne\u2208\u03b4(i)\n\nj\u2208V (C)\n\n(cid:40)(cid:80)\n\nConsider the following one-to-one linear mapping between x = [xe : e \u2208 E] and y = [ye : e \u2208 E(cid:48)]:\n\n(cid:40) 1\n\n(cid:80)\n\nxe =\n\n2\n\nj\u2208V (C)(\u22121)dC (j,e)yiC ,j\n\nye\n\nif e \u2208(cid:83)\n\notherwise\n\nC\u2208C E(C)\n\n.\n\nye =\n\ne(cid:48)\u2208E(C)\u2229\u03b4(i) xe(cid:48)\nxe\n\nif e = (i, iC )\notherwise\n\nUnder the mapping, one can check that C-LP = C-LP(cid:48) and if the solution xC-LP of C-LP is unique\n= y\u2217. Hence, (4) in Theorem 3.1\nand integral, the solution yC-LP(cid:48)\ne ] to C-LP(cid:48) is unique and integral, there exists c > 0\nfollows. Furthermore, since the solution y\u2217 = [y\u2217\nsuch that\n\nof C-LP(cid:48) is as well, i.e., yC-LP(cid:48)\n\nc =\n\ny(cid:54)=y\u2217:y is feasible to C-LP(cid:48)\n\ninf\n\nw(cid:48) \u00b7 (y\u2217 \u2212 y)\n\n|y\u2217 \u2212 y|\n\n,\n\ne]. Using this notation, we establish the following lemma characterizing performance\n\nwhere w(cid:48) = [w(cid:48)\nof the max-product BP over the new GM (3). Theorem 3.1 follows from this lemma directly.\nLemma 3.2. If the solution yC-LP(cid:48)\n\nof C-LP(cid:48) is integral and unique, i.e., yC-LP(cid:48)\n\n= y\u2217, then\n\ne[1] > nt\n\ne = 1, nt\n\n\u2022 If y\u2217\n\u2022 If y\u2217\ne[\u00b7] denotes the BP belief of edge e at time t under the GM (3) and w(cid:48)\n\ne[0] for all t > 6w(cid:48)\ne[0] for all t > 6w(cid:48)\n\nc + 6,\n\nc + 6,\n\ne = 0, nt\n\ne[1] < nt\n\nmax\n\nmax\n\nwhere nt\n\nmax = maxe\u2208E(cid:48) |w(cid:48)\ne|.\n\n3.2 Proof of Lemma 3.2\nThis section provides the complete proof of Lemma 3.2. We focus here on the case of y\u2217\ntranslation of the result to the opposite case of y\u2217\nassume that nt\nthe computational tree, using the following scheme:\n\ne = 1, while\ne = 0 is straightforward. To derive a contradiction,\ne[0] and construct a tree-structured GM Te(t) of depth t + 1, also known as\n\ne[1] \u2264 nt\n\n1. Add a copy of Ye \u2208 {0, 1} as the (root) variable (with variable function ew(cid:48)\n2. Repeat the following t times for each leaf variable Ye on the current tree-structured GM.\n\neYe).\n\n2-1. For each i \u2208 V such that e \u2208 \u03b4(i) and \u03c8i is not associated to Ye of the current model, add \u03c8i\nas a factor (function) with copies of {Ye(cid:48) \u2208 {0, 1} : e(cid:48) \u2208 \u03b4(i) \\ e} as child variables (with\ncorresponding variable functions, i.e., {ew(cid:48)\n2-2. For each C \u2208 C such that e \u2208 \u03b4(iC ) and \u03c8C is not associated to Ye of the current model, add\n\u03c8C as a factor (function) with copies of {Ye(cid:48) \u2208 {0, 1} : e(cid:48) \u2208 \u03b4(iC )\\ e} as child variables (with\ncorresponding variable functions, i.e., {ew(cid:48)\n\ne(cid:48) Ye(cid:48)}).\n\ne(cid:48) Ye(cid:48)}).\n\n5\n\n\fIt is known from [17] that there exists a MAP con\ufb01guration yTMAP on Te(t) with yTMAP\n= 0 at the\nroot variable. Now we construct a new assignment yNEW on the computational tree Te(t) as follows.\n\ne\n\n1. Initially, set yNEW \u2190 yTMAP and e is the root of the tree.\n2. yNEW \u2190 FLIPe(yNEW).\n3. For each child factor \u03c8, which is equal to \u03c8i (i.e., e \u2208 \u03b4(i)) or \u03c8C (i.e., e \u2208 \u03b4(iC )), associated with\n\ne,\n(a) If \u03c8 is satis\ufb01ed by yNEW and FLIPe(y\u2217) (i.e., \u03c8(yNEW) = \u03c8(FLIPe(y\u2217)) = 1), then do\n\nnothing.\n\n(b) Else if there exists a e\u2019s child e(cid:48) through factor \u03c8 such that yNEW\n\n(cid:54)= y\u2217\nFLIPe(cid:48) (yNEW) and FLIPe(cid:48) (FLIPe(y\u2217)), then go to the step 2 with e \u2190 e(cid:48).\n\ne(cid:48)\n\ne(cid:48) and \u03c8 is satis\ufb01ed by\n\n(c) Otherwise, report ERROR.\n\nTo aid readers understanding, we provide a \ufb01gure describing an example of the above construction\nin our technical report [21]. In the construction, FLIPe(y) is the 0-1 vector made by \ufb02ipping (i.e.,\nchanging from 0 to 1 or 1 to 0) the e\u2019s position in y. We note that there exists exactly one child\nfactor \u03c8 in step 3 and we only choose one child e(cid:48) in step (b) (even though there are many possible\ncandidates). Due to this reason, \ufb02ip operations induce a path structure P in tree Te(t).1 Now we\nstate the following key lemma for the above construction of yNEW.\nLemma 3.3. ERROR is never reported in the construction described above.\n\ne\n\ne(cid:48) = 0 and \ufb02ipping yNEW\n\nis \ufb02ipped as 1 \u2192 0 (i.e., y\u2217\n\nProof. The case when \u03c8 = \u03c8i at the step 3 is easy, and we only provide the proof for the case when\n\u03c8 = \u03c8C. We also assume that yNEW\ne = 0), where the proof for the\ncase 0 \u2192 1 follows in a similar manner. First, one can observe that y satis\ufb01es \u03c8C if and only if y\nis the 0-1 indicator vector of a union of disjoint even paths in the cycle C. Since yNEW\nis \ufb02ipped as\n1 \u2192 0, the even path including e is broken into an even (possibly, empty) path and an odd (always,\nnon-empty) path. We consider two cases: (a) there exists e(cid:48) within the odd path (i.e., yNEW\ne(cid:48) = 1)\nas 1 \u2192 0 broke the odd path into two even (disjoint) paths; (b)\nsuch that y\u2217\nthere exists no such e(cid:48) within the odd path.\nFor the \ufb01rst case (a), it is easy to see that we can maintain the structure of disjoint even paths in\nas 1 \u2192 0, i.e., \u03c8 is satis\ufb01ed by FLIPe(cid:48)(yNEW). For the second case (b),\nyNEW after \ufb02ipping yNEW\nwe choose e(cid:48) as a neighbor of the farthest end point (from e) in the odd path, i.e., yNEW\ne(cid:48) = 0 (before\ne(cid:48) = 1 since y\u2217 satis\ufb01es factor \u03c8C and induces a union of disjoint even paths in\n\ufb02ipping). Then, y\u2217\nas 0 \u2192 1, then we can still maintain the structure of disjoint\nthe cycle C. Therefore, if we \ufb02ip yNEW\neven paths in yNEW, \u03c8 is satis\ufb01ed by FLIPe(cid:48)(yNEW). The proof for the case of the \u03c8 satis\ufb01ed by\nFLIPe(cid:48)(FLIPe(y\u2217)) is similar. This completes the proof of Lemma 3.3.\n\ne(cid:48)\n\ne(cid:48)\n\ne(cid:48)\n\ne\n\nDue to how it is constructed yNEW is a valid con\ufb01guration, i.e., it satis\ufb01es all the factor functions in\nTe(t). Hence, it suf\ufb01ces to prove that w(cid:48)(yNEW) > w(cid:48)(yTMAP), which contradicts to the assumption\nthat yM AP is a MAP con\ufb01guration on Te(t). To this end, for (i, j) \u2208 E(cid:48), let n0\u21921\nbe the\nnumber of \ufb02ip operations 0 \u2192 1 and 1 \u2192 0 for copies of (i, j) in the step 2 of the construction of\nTe(t). Then, one derives\n\nand n1\u21920\n\nij\n\nij\n\nw(cid:48)(yNEW) = w(cid:48)(yTMAP) + w(cid:48) \u00b7 n0\u21921 \u2212 w(cid:48) \u00b7 n1\u21920,\n] and n1\u21920 = [n1\u21920\n\nij\n\nij\n\nwhere n0\u21921 = [n0\u21921\n]. We consider two cases: (i) the path P does not arrive\nat a leave variable of Te(t), and (ii) otherwise. Note that the case (i) is possible only when the\ncondition in the step (a) holds during the construction of yNEW.\nij \u2212 n0\u21921\nCase (i).\nLemma 3.4. y\u2020 is feasible to C-LP(cid:48) for small enough \u03b5 > 0.\nProof. We have to show that y\u2020 satis\ufb01es (5) and (6). Here, we prove that y\u2020 satis\ufb01es (6) for small\nenough \u03b5 > 0, and the proof for (5) can be argued in a similar manner. To this end, for given C \u2208 C,\n\n), and establish the following lemma.\n\nIn this case, we de\ufb01ne y\n\nij + \u03b5(n1\u21920\n\n\u2020\nij := y\u2217\n\nij\n\n1P may not have an alternating structure since both yNEW\n\ne\n\nand its child yNEW\n\ne(cid:48)\n\ncan be \ufb02ipped in a same way.\n\n6\n\n\f(cid:88)\n\n(cid:88)\n\nj\u2208V (C)\n\n\u2200j \u2208 C,\n\n\u2200e \u2208 E(C).\n\nyiC ,j \u2264 |C| \u2212 1,\n\n(\u22121)dC (j,e)yiC ,j \u2208 [0, 2],\n\nC(i) = FLIPe(cid:48)(FLIPe(y\u2217\n\nwe consider the following polytope PC :\nyiC ,j \u2208 [0, 1],\nj\u2208V (C)\n\u2020\nC = [ye : e \u2208 \u03b4(iC)] is within the polytope. It is easy to see that the\nWe have to show that y\ncondition of the step (a) never holds if \u03c8 = \u03c8C in the step 3. For the i-th copy of \u03c8C in P \u2229 Te(t),\nC(i) \u2208 PC. Since the path P does not\nwe set y\u2217\n(cid:88)N\nhit a leave variable of Te(t), we have\n\nC)) in the step (b), where y\u2217\n\nC \u2212 n0\u21921\nwhere N is the number of copies of \u03c8C in P \u2229 Te(t). Furthermore, 1\nC(i) \u2208 PC. Therefore, y\ny\u2217\nThe above lemma with w(cid:48)(y\u2217) > w(cid:48)(y\u2020) (due to the uniqueness of y\u2217) implies that w(cid:48) \u00b7 n0\u21921 >\nw(cid:48) \u00b7 n1\u21920, which leads to w(cid:48)(yNEW) > w(cid:48)(yTMAP).\n\n\u2020\nC \u2208 PC if \u03b5 \u2264 1/N. This completes the proof of Lemma 3.4.\n\n(cid:80)N\ni=1 y\u2217\n\nC(i) \u2208 PC due to\n\n(cid:0)n1\u21920\n\ny\u2217\nC(i) = y\u2217\n\n(cid:1) ,\n\nC +\n\n1\nN\n\n1\nN\n\ni=1\n\nN\n\nC\n\nCase (ii). We consider the case when only one end of P hits a leave variable Ye of Te(t),\nIn this case, we de\ufb01ne\nwhere the proof of the other case follows in a similar manner.\n\u2021\n] is con-\ny\nij\nstructed as follows:\n\n), where m1\u21920 = [m1\u21920\n\n] and m0\u21921 = [m0\u21921\n\nij \u2212 m0\u21921\n\nij + \u03b5(m1\u21920\n\n:= y\u2217\n\nij\n\nij\n\nij\n\n1. Initially, set m1\u21920, m0\u21921 by n1\u21920, n0\u21921.\n\n2. If yNEW\nm1\u21920\n\ne\nby 1 and\n\ne\n\nis \ufb02ipped as 1 \u2192 0 and it is associated to a cycle parent factor \u03c8C for some C \u2208 C, then decrease\n\n2-1 If the parent yNEW\nby 1.\n2-2 Else if there exists a \u2018brother\u2019 edge e(cid:48)(cid:48) \u2208 \u03b4(iC ) of e such that y\u2217\n\nis \ufb02ipped from 1 \u2192 0, then decrease m1\u21920\n\ne(cid:48)\n\ne(cid:48)\n\nFLIPe(cid:48)(cid:48) (FLIPe(cid:48) (y\u2217)), then increase m0\u21921\ne(cid:48)(cid:48)\n\nby 1.\n\ne(cid:48)(cid:48) = 1 and \u03c8C is satis\ufb01ed by\n\n2-3 Otherwise, report ERROR.\n\n3. If yNEW\ne\nm1\u21920\nby 1.\n\ne\n\nis \ufb02ipped as 1 \u2192 0 and it is associated to a vertex parent factor \u03c8i for some i \u2208 V , then decrease\n\n4. If yNEW\nm0\u21921\ne(cid:48)\n\n, m1\u21920\n\ne\n\ne\n\nis \ufb02ipped as 0 \u2192 1 and it is associated to a vertex parent factor \u03c8i for some i \u2208 V , then decrease\n\nby 1, where e(cid:48) \u2208 \u03b4(i) is the \u2018parent\u2019 edge of e, and\n\n4-1 If the parent yNEW\n\ne(cid:48)\n\nis associated to a cycle parent factor \u03c8C,\n\nis \ufb02ipped from 1 \u2192 0, then decrease m1\u21920\n4-1-1 If the grad-parent yNEW\ne(cid:48)(cid:48)\n4-1-2 Else if there exists a \u2018brother\u2019 edge e(cid:48)(cid:48)(cid:48) \u2208 \u03b4(iC ) of e(cid:48) such that y\u2217\ne(cid:48)(cid:48)(cid:48) = 1 and \u03c8C is satis\ufb01ed by\n\nby 1.\n\ne(cid:48)(cid:48)\n\nFLIPe(cid:48)(cid:48)(cid:48) (FLIPe(cid:48)(cid:48) (y\u2217)), then increase m0\u21921\n\ne(cid:48)(cid:48)(cid:48) by 1.\n\n4-1-3 Otherwise, report ERROR.\n\n4-2 Otherwise, do nothing.\n\nWe establish the following lemmas.\nLemma 3.5. ERROR is never reported in the above construction.\nLemma 3.6. y\u2021 is feasible to C-LP(cid:48) for small enough \u03b5 > 0.\nProofs of Lemma 3.5 and Lemma 3.6 are analogous to those of Lemma 3.3 and Lemma 3.4, respec-\ntively. From Lemma 3.6, we have\n\n\u2264 \u03b5(cid:0)w(cid:48)(m0\u21921 \u2212 m1\u21920)(cid:1)\n\n\u03b5(t \u2212 3)\n\n\u2264 \u03b5(cid:0)w(cid:48)(n0\u21921 \u2212 n1\u21920) + 3w(cid:48)\n\n\u03b5(t \u2212 3)\n\nc \u2264 w(cid:48) \u00b7 (y\u2217 \u2212 y\u2021)\n|y\u2217 \u2212 y\u2021|\n\n(cid:1)\n\nmax\n\n,\n\nwhere |y\u2217 \u2212 y\u2021| \u2265 \u03b5(t \u2212 3) follows from the fact that P hits a leave variable of Te(t) and there are\nat most three increases or decreases in m0\u21921 and m1\u21920 in the above construction. Hence,\n\nw(cid:48)(n0\u21921 \u2212 n1\u21920) \u2265 c(t \u2212 3) \u2212 3w(cid:48)\n\nmax > 0\n\nif\n\nt >\n\n7\n\n3w(cid:48)\nmax\nc\n\n+ 3,\n\n\fwhich implies w(cid:48)(yNEW) > w(cid:48)(yTMAP). If both ends of P hit leave variables of Te(t), we need\nt > 6w(cid:48)\n\nc + 6. This completes the proof of Lemma 3.2.\n\nmax\n\n4 Cutting-Plane Algorithm using Belief Propagation\n\nIn the previous section we established that BP on a carefully designed GM using non-intersecting\nodd-sized cycles solves the MWM problem when the corresponding MWM-LP relaxation is tight.\nHowever, \ufb01nding a collection of odd-sized cycles to ensure tightness of the MWM-LP is a challeng-\ning task. In this section, we provide a heuristic algorithm which we call CP-BP (cutting-plane using\nBP) for this task. It consists of making sequential, \u201ccutting plane\u201d, modi\ufb01cations to the underlying\nLP (and corresponding GM) using the output of the BP algorithm in the previous step. CP-BP is\nde\ufb01ned as follows:\n\n1. Initialize C = \u2205.\n2. Run BP on the GM in (3) for T iterations\n\n3. For each edge e \u2208 E, set ye =\n\n\uf8f1\uf8f4\uf8f2\uf8f4\uf8f31\n\nif nT\nif nT\n\ne [1] > nT\ne [1] < nT\n\n0\n1/2 otherwise\n\ne [0] and nT\u22121\ne [0] and nT\u22121\n\ne\n\ne\n\n[1] > nT\u22121\n[1] < nT\u22121\n\ne\n\ne\n\n[0]\n[0]\n\n.\n\n4. Compute x = [xe] using y = [ye] as per (4), and terminate if x /\u2208 {0, 1/2, 1}|E|.\n5. If there is no edge e with xe = 1/2, return x. Otherwise, add a non-intersecting odd-sized cycle of\nedges {e : xe = 1/2} to C and go to step 2; or terminate if no such cycle exists.\n\nIn the above procedure, BP can be replaced by an LP solver to directly obtain x in step 4. This\nresults in a traditional cutting-plane LP (CP-LP) method for the MWM problem [20]. The primary\nreason why we design CP-BP to terminate when x /\u2208 {0, 1/2, 1}|E| is because the solution x of\nC-LP is always half integral 2. Note that x /\u2208 {0, 1/2, 1}|E| occurs when BP fails to \ufb01nd the solution\nto the current MWM-LP.\nWe compare CP-BP and CP-LP in order to gauge the effectiveness of BP as an LP solver for MWM\nproblems. We conducted experiments on two types of synthetically generated problems: 1) Sparse\nGraph instances; and 2) Triangulation instances. The sparse graph instances were generated by\nforming a complete graph on |V | = {50, 100} nodes and independently eliminating edges with\nprobability p = {0.5, 0.9}. Integral weights, drawn uniformly in [1, 220], are assigned to the re-\nmaining edges. The triangulation instances were generated by randomly placing |V | = {100, 200}\npoints in the 220 \u00d7 220 square and computing a Delaunay triangulation on this set of points. Edge\nweights were set to the rounded Euclidean distance between two points. A set of 100 instances were\ngenerated for each setting of |V | and CP-BP was run for T = 100 iterations.\nThe results are summarized in Table 1 and show that: 1) CP-BP is almost as good as CP-LP for\nsolving the MWM problem; and 2) our graphical transformation allows BP to solve signi\ufb01cantly\nmore MWM problems than are solvable by BP run on the \u2018bare\u2019 LP without odd-sized cycles.\n\n|V | / |E|\n50 / 490\n100 / 1963\n\n50 % sparse graphs\n# CP-BP\n# Tight LPs\n\n94 %\n92 %\n\n65 %\n48 %\n\n# CP-LP\n\n98 %\n95 %\n\n|V | / |E|\n50 / 121\n100 / 476\n\n90 % sparse graphs\n# CP-BP\n# Tight LPs\n\n90 %\n63 %\n\n59 %\n50 %\n\n# CP-LP\n\n91 %\n63 %\n\nTriangulation, |V | = 100, |E| = 285\n# Correct / # Converged\n\nTriangulation, |V | = 200, |E| = 583\n# Correct / # Converged\n\nAlgorithm\nCP-BP\nCP-LP\n\n33 / 36\n34 / 100\n\nTime (sec)\n0.2 [0.0,0.4]\n0.1 [0.0,0.3]\n\n11 / 12\n15 / 100\n\nTime (sec)\n0.9 [0.2,2.5]\n0.8 [0.3,1.6]\n\nTable 1: Evaluation of CP-BP and CP-LP on random MWM instances. Columns # CP-BP and # CP-LP indicate the percentage of instances\nin which the cutting plane methods found a MWM. The column # Tight LPs indicates the percentage for which the initial MWM-LP is tight\n(i.e. C = \u2205). # Correct and # Converged indicate the number of correct matchings and number of instances in which CP-BP converged upon\ntermination, but we failed to \ufb01nd a non-intersecting odd-sized cycle. The Time column indicates the mean [min,max] time.\n\n2A proof of 1\n\n2 -integrality, which we did not \ufb01nd in the literature, is presented in our technical report [21].\n\n8\n\n\fReferences\n[1] J. Yedidia, W. Freeman, and Y. Weiss, \u201cConstructing free-energy approximations and general-\nized belief propagation algorithms,\u201d IEEE Transactions on Information Theory, vol. 51, no. 7,\npp. 2282 \u2013 2312, 2005.\n\n[2] T. J. Richardson and R. L. Urbanke, Modern Coding Theory. Cambridge University Press,\n\n2008.\n\n[3] M. Mezard and A. Montanari, Information, physics, and computation, ser. Oxford Graduate\n\nTexts. Oxford: Oxford Univ. Press, 2009.\n\n[4] M. J. Wainwright and M. I. Jordan, \u201cGraphical models, exponential families, and variational\n\ninference,\u201d Foundations and Trends in Machine Learning, vol. 1, no. 1, pp. 1\u2013305, 2008.\n\n[5] J. Gonzalez, Y. Low, and C. Guestrin. \u201cResidual splash for optimally parallelizing belief propa-\n\ngation,\u201d in International Conference on Arti\ufb01cial Intelligence and Statistics, 2009.\n\n[6] Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein, \u201cGraphLab:\nA New Parallel Framework for Machine Learning,\u201d in Conference on Uncertainty in Arti\ufb01cial\nIntelligence (UAI), 2010.\n\n[7] M. Bayati, D. Shah, and M. Sharma, \u201cMax-product for maximum weight matching: Conver-\ngence, correctness, and lp duality,\u201d IEEE Transactions on Information Theory, vol. 54, no. 3,\npp. 1241 \u20131251, 2008.\n\n[8] S. Sanghavi, D. Malioutov, and A. Willsky, \u201cLinear Programming Analysis of Loopy Belief\nPropagation for Weighted Matching,\u201d in Neural Information Processing Systems (NIPS), 2007\n[9] B. Huang, and T. Jebara, \u201cLoopy belief propagation for bipartite maximum weight b-matching,\u201d\n\nin Arti\ufb01cial Intelligence and Statistics (AISTATS), 2007.\n\n[10] M. Bayati, C. Borgs, J. Chayes, R. Zecchina, \u201cBelief-Propagation for Weighted b-Matchings\non Arbitrary Graphs and its Relation to Linear Programs with Integer Solutions,\u201d SIAM Journal\nin Discrete Math, vol. 25, pp. 989\u20131011, 2011.\n\n[11] S. Sanghavi, D. Shah, and A. Willsky, \u201cMessage-passing for max-weight independent set,\u201d in\n\nNeural Information Processing Systems (NIPS), 2007.\n\n[12] D. Gamarnik, D. Shah, and Y. Wei, \u201cBelief propagation for min-cost network \ufb02ow: conver-\n\ngence & correctness,\u201d in SODA, pp. 279\u2013292, 2010.\n\n[13] J. Edmonds, \u201cPaths, trees, and \ufb02owers\u201d, Canadian Journal of Mathematics, vol. 3, pp. 449\u2013\n\n467, 1965.\n\n[14] G. Dantzig, R. Fulkerson, and S. Johnson, \u201cSolution of a large-scale traveling-salesman prob-\n\nlem,\u201d Operations Research, vol. 2, no. 4, pp. 393\u2013410, 1954.\n\n[15] K. Chandrasekaran, L. A. Vegh, and S. Vempala. \u201cThe cutting plane method is polynomial for\n\nperfect matchings,\u201d in Foundations of Computer Science (FOCS), 2012\n\n[16] R. G. Gallager, \u201cLow Density Parity Check Codes,\u201d MIT Press, Cambridge, MA, 1963.\n[17] Y. Weiss, \u201cBelief propagation and revision in networks with loops,\u201d MIT AI Laboratory, Tech-\n\nnical Report 1616, 1997.\n\n[18] B. J. Frey, and R. Koetter, \u201cExact inference using the attenuated max-product algorithm,\u201d Ad-\nvanced Mean Field Methods: Theory and Practice, ed. Manfred Opper and David Saad, MIT\nPress, 2000.\n\n[19] Y. Weiss, and W. T. Freeman, \u201cOn the Optimality of Solutions of the MaxProduct BeliefProp-\nagation Algorithm in Arbitrary Graphs,\u201d IEEE Transactions on Information Theory, vol. 47,\nno. 2, pp. 736\u2013744. 2001.\n\n[20] M. Grotschel, and O. Holland, \u201cSolving matching problems with linear programming,\u201d Math-\n\nematical Programming, vol. 33, no. 3, pp. 243\u2013259. 1985.\n\n[21] J. Shin, A.E. Gelfand, and M. Chertkov, \u201cA Graphical Transformation for Belief Propagation:\nMaximum Weight Matchings and Odd-Sized Cycles,\u201d arXiv preprint arXiv:1306.1167 (2013).\n\n9\n\n\f", "award": [], "sourceid": 1021, "authors": [{"given_name": "Jinwoo", "family_name": "Shin", "institution": "KAIST"}, {"given_name": "Andrew", "family_name": "Gelfand", "institution": "UC Irvine"}, {"given_name": "Misha", "family_name": "Chertkov", "institution": "Los Alamos National Laboratory"}]}