{"title": "Message Passing for Max-weight Independent Set", "book": "Advances in Neural Information Processing Systems", "page_first": 1281, "page_last": 1288, "abstract": null, "full_text": "Message Passing for Max-weight Independent Set\n\nSujay Sanghavi\n\nLIDS, MIT\n\nsanghavi@mit.edu\n\nDevavrat Shah\n\nDept. of EECS, MIT\n\ndevavrat@mit.edu\n\nAlan Willsky\n\nDept. of EECS, MIT\nwillsky@mit.edu\n\nAbstract\n\nWe investigate the use of message-passing algorithms for the problem of \ufb01nding\nthe max-weight independent set (MWIS) in a graph. First, we study the perfor-\nmance of loopy max-product belief propagation. We show that, if it converges,\nthe quality of the estimate is closely related to the tightness of an LP relaxation\nof the MWIS problem. We use this relationship to obtain suf\ufb01cient conditions for\ncorrectness of the estimate. We then develop a modi\ufb01cation of max-product \u2013 one\nthat converges to an optimal solution of the dual of the MWIS problem. We also\ndevelop a simple iterative algorithm for estimating the max-weight independent\nset from this dual solution. We show that the MWIS estimate obtained using these\ntwo algorithms in conjunction is correct when the graph is bipartite and the MWIS\nis unique. Finally, we show that any problem of MAP estimation for probability\ndistributions over \ufb01nite domains can be reduced to an MWIS problem. We believe\nthis reduction will yield new insights and algorithms for MAP estimation.\n\n1 Introduction\n\nThe max-weight independent set (MWIS) problem is the following: given a graph with positive\nweights on the nodes, \ufb01nd the heaviest set of mutually non-adjacent nodes. MWIS is a well studied\ncombinatorial optimization problem that naturally arises in many applications. It is known to be\nNP-hard, and hard to approximate [6].\nIn this paper we investigate the use of message-passing\nalgorithms, like loopy max-product belief propagation, as practical solutions for the MWIS problem.\nWe now summarize our motivations for doing so, and then outline our contribution.\n\nOur primary motivation comes from applications. The MWIS problem arises naturally in many\nscenarios involving resource allocation in the presence of interference.\nIt is often the case that\nlarge instances of the weighted independent set problem need to be (at least approximately) solved\nin a distributed manner using lightweight data structures.\nIn Section 2.1 we describe one such\napplication: scheduling channel access and transmissions in wireless networks. Message passing\nalgorithms provide a promising alternative to current scheduling algorithms.\n\nAnother, equally important, motivation is the potential for obtaining new insights into the perfor-\nmance of existing message-passing algorithms, especially on loopy graphs. Tantalizing connections\nhave been established between such algorithms and more traditional approaches like linear pro-\ngramming (see [9] and references). The MWIS problem provides a rich, yet relatively tractable, \ufb01rst\nframework in which to investigate such connections.\n\n1.1 Our contributions\n\nIn Section 4 we construct a probability distribution whose MAP estimate corresponds to the MWIS\nof a given graph, and investigate the application of the loopy Max-product algorithm to this distrit-\nbuion. We demonstrate that there is an intimate relationship between the max-product \ufb01xed-points\nand the natural LP relaxation of the original independent set problem. We use this relationship to\nprovide a certi\ufb01cate of correctness for the max-product \ufb01xed point in certain problem instances.\n\n1\n\n\fIn Section 5 we develop two iterative message-passing algorithms. The \ufb01rst, obtained by a minor\nmodi\ufb01cation of max-product, calculates the optimal solution to the dual of the LP relaxation of the\nMWIS problem. The second algorithm uses this optimal dual to produce an estimate of the MWIS.\nThis estimate is correct when the original graph is bipartite.\n\nIn Section 3 we show that any problem of MAP estimation in which all the random variables can\ntake a \ufb01nite number of values (and the probability distribution is positive over the entire domain) can\nbe reduced to a max-weight independent set problem. This implies that any algorithm for solving\nthe independent set problem immediately yields an algorithm for MAP estimation. We believe this\nreduction will prove useful from both practical and analytical perspectives.\n\n2 Max-weight Independent Set, and its LP Relaxation\n\nConsider a graph G = (V, E), with a set V of nodes and a set E of edges. Let N (i) = {j \u2208 V :\n(i, j) \u2208 E} be the neighbors of i \u2208 V . Positive weights wi, i \u2208 V are associated with each node.\nA subset of V will be represented by vector x = (xi) \u2208 {0, 1}|V |, where xi = 1 means i is in the\nsubset xi = 0 means i is not in the subset. A subset x is called an independent set if no two nodes\nin the subset are connected by an edge: (xi, xj) 6= (1, 1) for all (i, j) \u2208 E. We are interested in\n\ufb01nding a maximum weight independent set (MWIS) x\u2217. This can be naturally posed as an integer\nprogram, denoted below by IP. The linear programing relaxation of IP is obtained by replacing the\nintegrality constraints xi \u2208 {0, 1} with the constraints xi \u2265 0. We will denote the corresponding\nlinear program by LP. The dual of LP is denoted below by DUAL.\n\nmax\n\nwixi,\n\nDUAL :\n\nIP :\n\ns.t.\n\nnXi=1\n\nxi + xj \u2264 1 for all (i, j) \u2208 E,\nxi \u2208 {0, 1}.\n\nmin X(i,j)\u2208E\n\n\u03bbij,\n\ns.t. Xj\u2208N (i)\n\n\u03bbij \u2265 wi, for all i \u2208 V,\n\n\u03bbij \u2265 0, for all (i, j) \u2208 E.\n\nIt is well-known that LP can be solved ef\ufb01ciently, and if it has an integral optimal solution then this\nsolution is an MWIS of G. If this is the case, we say that there is no integrality gap between LP and\nIP or equivalently that the LP relaxation is tight. usIt is well known [3] that the LP relaxation is\ntight for bipartite graphs. More generally, for non-bipartite graphs, tightness will depend on the\nnode weights. We will use the performance of LP as a benchmark with which to compare the\nperformance of our message passing algorithms.\n\nThe next lemma states the standard complimentary slackness conditions of linear programming,\nspecialized for LP above, and for the case when there is no integrality gap.\n\nLemma 2.1 When there is no integrality gap between IP and LP, there exists a pair of optimal\nsolutions x = (xi), \u03bb = (\u03bbij) of LP and DUAL respectively, such that: (a) x \u2208 {0, 1}n, (b)\n\nxi(cid:16)Pj\u2208N (i) \u03bbij \u2212 wi(cid:17) = 0 for all i \u2208 V , (c) (xi + xj \u2212 1) \u03bbij = 0, for all (i, j) \u2208 E.\n\n2.1 Sample Application: Scheduling in Wireless Networks\n\nWe now brie\ufb02y describe an important application that requires an ef\ufb01cient, distributed solution to the\nMWIS problem: transmision scheduling in wireless networks that lack a centralized infrastructure,\nand where nodes can only communicate with local neighbors (e.g. see [4]). Such networks are\nubiquitous in the modern world: examples range from sensor networks that lack wired connections\nto the fusion center, and ad-hoc networks that can be quickly deployed in areas without coverage,\nto the 802.11 wi-\ufb01 networks that currently represent the most widely used method for wireless data\naccess.\n\nFundamentally, any two wireless nodes that transmit at the same time and over the same frequencies\nwill interfere with each other, if they are located close by. Interference means that the intended\nreceivers will not be able to decode the transmissions. Typically in a network only certain pairs\n\n2\n\n\fof nodes interfere. The scheduling problem is to decide which nodes should transmit at a given\ntime over a given frequency, so that (a) there is no interference, and (b) nodes which have a large\namount of data to send are given priority. In particular, it is well known that if each node is given a\nweight equal to the data it has to transmit, optimal network operation demands scheduling the set of\nnodes with highest total weight. If a \u201c con\ufb02ict graph\u201d is made, with an edge between every pair of\ninterfering nodes, the scheduling problem is exactly the problem of \ufb01nding the MWIS of the con\ufb02ict\ngraph. The lack of an infrastructure, the fact that nodes often have limited capabilities, and the local\nnature of communication, all necessitate a lightweight distributed algorithm for solving the MWIS\nproblem.\n\n3 MAP Estimation as an MWIS Problem\n\nIn this section we show that any MAP estimation problem is equivalent to an MWIS problem on\na suitably constructed graph with node weights. This construction is related to the \u201covercomplete\nbasis\u201d representation [7]. Consider the following canonical MAP estimation problem: suppose we\nare given a distribution q(y) over vectors y = (y1, . . . , yM ) of variables ym, each of which can take\na \ufb01nite value. Suppose also that q factors into a product of strictly positive functions, which we \ufb01nd\nconvenient to denote in exponential form:\n\nq(y) =\n\n1\n\nZ Y\u03b1\u2208A\n\nexp (\u03c6\u03b1(y\u03b1)) =\n\n1\nZ\n\nexp X\u03b1\u2208A\n\n\u03c6\u03b1(y\u03b1)!\n\nHere \u03b1 speci\ufb01es the domain of the function \u03c6\u03b1, and y\u03b1 is the vector of those variables that are in\nthe domain of \u03c6\u03b1. The \u03b1\u2019s also serve as an index for the functions. A is the set of functions. The\nMAP estimation problem is to \ufb01nd a maximizing assignment y\u2217 \u2208 arg maxy q(y).\n\nwhere y\u03b1 is an assignment (i.e. a set of values for the variables) of domain \u03b1. We will denote this\n\nWe now build an auxillary graph eG, and assign weights to its nodes, such that the MAP estimation\nproblem above is equivalent to \ufb01nding the MWIS of eG. There is one node in eG for each pair (\u03b1, y\u03b1),\nnode of eG by \u03b4(\u03b1, y\u03b1).\nThere is an edge in eG between any two nodes \u03b4(\u03b11, y1\n\n\u03b12 ) if and only if there exists\n\na variable index m such that\n\n\u03b11 ) and \u03b4(\u03b12, y2\n\n1. m is in both domains, i.e. m \u2208 \u03b11 and m \u2208 \u03b12, and\n2. the corresponding variable assignments are different, i.e. y1\n\nm.\nm 6= y2\n\nIn other words, we put an edge between all pairs of nodes that correspond to inconsistent assign-\n\nc + \u03c6\u03b1(y\u03b1) > 0 for all \u03b1 and y\u03b1. The existence of such a c follows from the fact that the set of\nassignments and domains is \ufb01nite. Assign to each node \u03b4(\u03b1, y\u03b1) a weight of c + \u03c6\u03b1(y\u03b1).\n\nments. Given this graph eG, we now assign weights to the nodes. Let c > 0 be any number such that\nLemma 3.1 Suppose q and eG are as above.\n\u03b1) | \u03b1 \u2208 A} be the set of nodes in eG that correspond to each domain being consistent\nwith y\u2217. Then, \u03b4\u2217 is an MWIS of eG. (b) Conversely, suppose \u03b4\u2217 is an MWIS of eG. Then, for every\n\n\u03b1) included in \u03b4\u2217. Further, the corresponding domain\n\u03b1 | \u03b1 \u2208 A} are consistent, and the resulting overall vector y\u2217 is a MAP estimate of q.\n\ndomain \u03b1, there is exactly one node \u03b4(\u03b1, y\u2217\nassignments{y\u2217\n\n(a) If y\u2217 is a MAP estimate of q,\n\nlet \u03b4\u2217 =\n\n{\u03b4(\u03b1, y\u2217\n\nExample. Let y1 and y2 be binary variables with joint distribution\n\nq(y1, y2) =\n\n1\nZ\n\nexp(\u03b81y1 + \u03b82y2 + \u03b812y1y2)\n\nto the right. Let c be any number such that c + \u03b81, c + \u03b82 and c + \u03b812\n\nwhere the \u03b8 are any real numbers. The corresponding eG is shown\nare all greater than 0. The weights on the nodes in eG are: \u03b81 + c on\n\nnode \u201c1\u201d on the left, \u03b82 + c for node \u201c1\u201d on the right, \u03b812 + c for\nthe node \u201c11\u201d, and c for all the other nodes.\n\n00\n\n01\n\n10\n\n11\n\n0\n\n1\n\n0\n\n1\n\n3\n\n\f4 Max-product for MWIS\n\nThe classical max-product algorithm is a heuristic that can be used to \ufb01nd the MAP assignment of a\nprobability distribution. Now, given an MWIS problem on G = (V, E), associate a binary random\nvariable Xi with each i \u2208 V and consider the following joint distribution: for x \u2208 {0, 1}n,\n\np (x) =\n\n1\n\nZ Y(i,j)\u2208E\n\nwhere Z is the normalization constant. In the above, 1 is the standard indicator function: 1true = 1\nand 1false = 0.\np(x) = 0 otherwise. Thus, any MAP estimate arg maxx p(x) corresponds to a maximum weight\nindependent set of G.\n\nIt is easy to see that p(x) = 1\n\nexp(wixi),\n\n1{xi+xj \u22641}Yi\u2208V\nZ exp (Pi wixi) if x is an independent set, and\n\n(1)\n\nThe update equations for max-product can be derived in a standard and straightforward fashion from\nthe probability distribution. We now describe the max-product algorithm as derived from p. At every\niteration t each node i sends a message {mt\ni\u2192j(1)} to each neighbor j \u2208 N (i). Each node\ni(1)} vector. The message and belief updates, as well as the \ufb01nal\nalso maintains a belief {bt\ni(0), bt\noutput, are computed as follows.\n\ni\u2192j(0), mt\n\nMax-product for MWIS\n\n(o) Initially, m0\n\ni\u2192j(0) = m0\n\nj\u2192i(1) = 1 for all (i, j) \u2208 E.\n\n(i) The messages are updated as follows:\n\nmt+1\n\nmt+1\n\ni\u2192j(0) = max\uf8f1\uf8f2\uf8f3 Yk6=j,k\u2208N (i)\ni\u2192j(1) = Yk6=j,k\u2208N (i)\n\nmt\n\nk\u2192i(0).\n\nmt\n\nk\u2192i(0) , ewi Yk6=j,k\u2208N (i)\n\nmt\n\nk\u2192i(1)\uf8fc\uf8fd\uf8fe\n\n,\n\n(ii) Nodes i \u2208 V , compute their beliefs as follows:\n\nbt+1\ni\n\n(0) = Yk\u2208N (i)\n\nmt+1\n\nk\u2192i(0),\n\nbt+1\ni\n\n(1) = ewi Yk\u2208N (i)\n\nmt+1\n\nk\u2192i(1).\n\n(iii) Estimate max. wt. independent set x(bt+1) as follows: xi(bt+1\n\ni\n\n) = 1{bt+1\n\ni\n\n(1)>bt+1\n\ni\n\n(0)}.\n\n(iv) Update t = t + 1; repeat from (i) till x(bt) converges and output the converged estimate.\n\nFor the purpose of analysis, we \ufb01nd it convenient to transform the messages be de\ufb01ning1 \u03b3t\n\ni\u2192j =\n\n,\n\n(2)\n\nlog(cid:16) mt\n\nmt\n\ni\u2192j (0)\n\ni\u2192j (1)(cid:17). Step (i) of max-product now becomes\n0,\uf8eb\uf8edwi \u2212 Xk6=j,k\u2208N (i)\n\ni\u2192j = max\uf8f1\uf8f2\uf8f3\n\n\u03b3t+1\n\n\u03b3t\n\nk\u2192i\uf8f6\uf8f8\uf8fc\uf8fd\uf8fe\n\nwhere we use the notation (x)+ = max{x, 0}. The estimation of step (iii) of max-product becomes:\nxi(\u03b3t+1) = 1{wi\u2212Pk\u2208N (i) \u03b3k\u2192i>0}. This modi\ufb01cation of max-product is often known as the \u201cmin-\nsum\u201d algorithm, and is just a reformulation of the max-product. In the rest of the paper we refer to\nthis as simply the max-product algorithm.\n\n1If the algorithm starts with all messages being strictly positive, the messages will remain strictly positive\n\nover any \ufb01nite number of iterations. Thus taking logs is a valid operation.\n\n4\n\n\f4.1 Fixed Points of Max-Product\n\nWhen applied to general graphs, max product may either (a) not converge, (b) converge, and yield\nthe correct answer, or (c) converge but yield an incorrect answer. Characterizing when each of the\nthree situations can occur is a challenging and important task. One approach to this task has been to\nlook directly at the \ufb01xed points, if any, of the iterative procedure [8].\n\nProposition 4.1 Let \u03b3 represent a \ufb01xed point of the algorithm, and let x(\u03b3) = (xi(\u03b3)) be the\ncorresponding estimate for the independent set. Then, the following properties hold:\n\n(a) Let i be a node with estimate xi(\u03b3) = 1, and let j \u2208 N (i) be any neighbor of i. Then,\nthe messages on edge (i, j) satisfy \u03b3i\u2192j > \u03b3j\u2192i. Further, from this it can be deduced that x(\u03b3)\nrepresents an independent set in G.\n\nSuppose now there exists a neighbor i \u2208 N (j) whose estimate is xi(\u03b3) = 1. Then it has to be that\n\n(b) Let j be a node with xj(\u03b3) = 0, which by de\ufb01nition means that wj \u2212Pk\u2208N (j) \u03b3k\u2192j \u2264 0.\nwj \u2212Pk\u2208N (j) \u03b3k\u2192j < 0, i.e. the inequality is strict.\n\n(c) For any edge (j1, j2) \u2208 E, if the estimates of the endpoints are xj1 (\u03b3) = xj2(\u03b3) = 0, then it has\nto be that \u03b3j1\u2192j2 = \u03b3j2\u2192j1. In addition, if there exists a neighbor i1 \u2208 N (j1) of j1 whose estimate\nis xi1(\u03b3) = 1, then it has to be that \u03b3j1\u2192j2 = \u03b3j2\u2192j1 = 0 (and similarly for a neighbor i2 of j2).\n\nThe properties shown in Proposition 4.1 reveal striking similarities between the messages \u03b3 of \ufb01xed\npoints of max-product, and the optimal \u03bb that solves the dual linear program DUAL. In particular,\nsuppose that \u03b3 is a \ufb01xed point at which the corresponding estimate x(\u03b3) is a maximal independent\nset: for every j whose estimate xj(\u03b3) = 0 there exists a neighbor i \u2208 N (j) whose estimate is\nxi(\u03b3) = 1. The MWIS, for example, is also maximal (if not, one could add a node to the MWIS and\nobtain a higher weight). For a maximal estimate, it is easy to see that\n\n\u2022 (xi(\u03b3) + xj(\u03b3) \u2212 1) \u03b3i\u2192j = 0 for all edges (i, j) \u2208 E.\n\n\u2022 xi(\u03b3)(cid:16)\u03b3i\u2192j +Pk\u2208N (i)\u2212j \u03b3k\u2192i \u2212 wi(cid:17) = 0 for all i, j \u2208 V\n\nAt least semantically, these relations share a close resemblance to the complimentary slackness\nconditions of Lemma 2.1. In the following lemma we leverage this resemblance to derive a certi\ufb01cate\nof optimality of the max-product \ufb01xed point estimate for certain problems.\n\nLemma 4.1 Let \u03b3 be a \ufb01xed point of max-product and x(\u03b3) the corresponding estimate of the\nindependent set. De\ufb01ne G\u2032 = (V, E\u2032) where E\u2032 = E\\{(i, j) \u2208 E : \u03b3i\u2192j = \u03b3j\u2192i = 0} is the\nset of edges with at least one non-zero message. Then, if G\u2032 is acyclic, we have that : (a) x(\u03b3) is\na solution to the MWIS for G, and (b) there is no integrality gap between LP and IP, i.e. x(\u03b3) is\nan optimal solution to LP. Thus the lack of cycles in G\u2032 provides a certi\ufb01cate of optimality for the\nestimate x(\u03b3).\n\nMax-product vs. LP relaxation. The following general question has been of great recent interest:\nwhich of the two, max-product and LP relaxation, is more powerful ? We now brie\ufb02y investigate\nthis question for MWIS. As presented below, we \ufb01nd that there are examples where one technique\nis better than the other. That is, neither technique clearly dominates the other.\n\nTo understand whether correctness of max-product (e.g. Lemma 4.1) provides information about\nLP relaxation, we consider the simplest loopy graph: a cycle. For bipartite graph, we know that\nLP relaxation is tight, i.e. provides answer to MWIS. Hence, we consider odd cycle. The following\nresult suggests that if max-product works then it must be that LP relaxation is tight (i.e. LP is no\nweaker than max-product for cycles).\n\nCorollary 4.1 Let G be an odd cycle, and \u03b3 a \ufb01xed point of Max-product. Then, if there exists at\nleast one node i whose estimate xi(\u03b3) = 1, then there is no integrality gap between LP and IP.\n\nNext, we present two examples which help us conclude that neither max-product nor LP relaxation\ndominate the other. The following \ufb01gures present graphs and the corresponding \ufb01xed points of\nmax-product. In each graph, numbers represent node weights, and an arrow from i to j represents\n\n5\n\n\fa message value of \u03b3i\u2192j = 2. All other messages have \u03b3 are equal to 0. The boxed nodes indicate\nthe ones for which the estimate xi(\u03b3) = 1. It is easy to verify that both represent max-product \ufb01xed\npoints.\n\n2\n\n2\n\n2\n\n2\n\n2\n\n2\n\n3\n\n3\n\n3\n\n3\n\n3\n\n3\n\nFor the graph on the left, the max-product \ufb01xed point results in an incorrect estimate. However,\nthe graph is bipartite, and hence LP will get the correct answer. In the graph on the right, there is\nan integrality gap between LP and IP: setting each xi = 1\n2 yields an optimal value of 7.5, while\nthe optimal solution to IP has value 6. However, the estimate at the \ufb01xed point of max-product\nis the correct MWIS. In both of these examples, the \ufb01xed points lie in the strict interiors of non-\ntrivial regions of attraction: starting the iterative procedure from within these regions will result in\nconvergence to the \ufb01xed point.\n\nThese examples indicate that it may not be possible to resolve the question of relative strength of the\ntwo procedures based solely on an analysis of the \ufb01xed points of max-product.\n\n5 A Convergent Message-passing Algorithm\n\nIn this section we present our algorithm for \ufb01nding the MWIS of a graph. It is based on modifying\nmax-product by drawing upon a dual co-ordinate descent and barrier method. Speci\ufb01cally, the\nalgorithm is as follows: (1) For small enough parameters \u03b5, \u03b4, run subroutine DESCENT(\u01eb, \u03b4) (close\nto) convergence. This will produce output \u03bb\u03b5,\u03b4 = (\u03bb\u03b5,\u03b4\nij )(i,j)\u2208E. (2) For small enough parameter \u03b41,\nuse subroutine EST(\u03bb\u03b5,\u03b4, \u03b41), to produce an estimate for the MWIS as the output of algorithm.\nBoth of the subroutines, DESCENT, EST are iterative message-passing procedures. Before going\ninto details of the subroutines, we state the main result about correctness and convergence of this\nalgorithm.\n\nTheorem 5.1 The following properties hold for arbitrary graph G and weights: (a) For any choice\nof \u03b5, \u03b4, \u03b41 > 0, the algorithm always converges. (b) As \u03b5, \u03b4 \u2192 0, \u03bb\u03b5,\u03b4 \u2192 \u03bb\u2217 where \u03bb\u2217 is an optimal\nsolution of DUAL . Further, if G is bipartite and the MWIS is unique, then the following holds: (c)\nFor small enough \u03b5, \u03b4, \u03b41, the algorithm produces the MWIS as output.\n\n5.1 Subroutine: DESCENT\n\nConsider the standard coordinate descent algorithm for DUAL: the variables are {\u03bbij, (i, j) \u2208\nE}(with notation \u03bbij = \u03bbji) and at each iteration t one edge (i, j) \u2208 E is picked2 and update\n\n\u03bbt\n\njk\uf8f6\uf8f8\uf8fc\uf8fd\uf8fe\n\n(3)\n\n\u03bbt+1\n\nij = max\uf8f1\uf8f2\uf8f3\n\n0,\uf8eb\uf8edwi \u2212 Xk\u2208N (i),k6=j\n\n\u03bbt\n\nik\uf8f6\uf8f8 , \uf8eb\uf8edwj \u2212 Xk\u2208N (j),k6=i\n\nThe \u03bb on all the other edges remain unchanged from t to t + 1. Notice the similarity (at least\nsyntactic) between (3) and update of max-product (min-sum) (2): essentially, the dual coordinate\ndescent is a sequential bidirectional version of the max-product algorithm !\n\nIt is well known that the coordinate descent always coverges, in terms of cost for linear programs.\nFurther, it converges to an optimal solution if the constraints are of the product set type (see [2] for\n\ndetails). However, due to constraints of typePj\u2208N (i) \u03bbij \u2265 wi in DUAL, the algorithm may not\n\n2A good policy for picking edges is round-robin or uniformly at random\n\n6\n\n\fconverge to an optimal of DUAL. Therefore, a direct adaptation of max-product to mimic dual co-\nordinate descent is not good enough. We use barrier (penalty) function based approach to overcome\nthis dif\ufb01culty. Consider the following convex optimization problem obtained from DUAL by adding\na logarithmic barrier for constraint violations with \u03b5 \u2265 0 controlling penalty due to violation.\n\nCP(\u03b5) :\n\nmin \uf8eb\uf8ed X(i,j)\u2208E\n\n\u03bbij\uf8f6\uf8f8 \u2212 \u03b5\uf8eb\uf8edXi\u2208V\n\nlog\uf8ee\uf8f0 Xj\u2208N (i)\n\n\u03bbij \u2212 wi\uf8f9\uf8fb\uf8f6\uf8f8\n\n\u03bbij \u2265 0, for all (i, j) \u2208 E.\nThe following is coordinate descent algorithm for CP(\u03b5).\n\nsubject to\n\nDESCENT(\u03b5, \u03b4)\n\n(o) The parameters are variables \u03bbij, one for each edge (i, j) \u2208 E. We will use notation that\n\nji. The vector \u03bb is iteratively updated, with t denoting the iteration number.\n\nij = \u03bbt\n\u03bbt\n\u25e6 Initially, set t = 0 and \u03bb0\n\nij = max{wi, wj} for all (i, j) \u2208 E.\n\n(i) In iteration t + 1, update parameters as follows:\n\n\u25e6 Pick an edge (i, j) \u2208 E. This edge selection is done so that each edge is chosen\nin\ufb01nitely often as t \u2192 \u221e (for example, at each t choose an edge uniformly at random.)\n\n\u25e6 For all (i\u2032, j\u2032) \u2208 E, (i\u2032, j\u2032) 6= (i, j) do nothing, i.e. \u03bbt+1\ni\u2032j\u2032 = \u03bbt\n\u25e6 For edge (i, j), nodes i and j exchange messages as follows:\n\ni\u2032j\u2032 .\n\n\u03b3t+1\n\ni\u2192j =\uf8eb\uf8edwi \u2212 Xk6=j,k\u2208N (i)\n\n\u03bbt\n\nki\uf8f6\uf8f8+\n\n, \u03b3t+1\n\n\u25e6 Update \u03bbt+1\n\nij\n\nas follows: with a = \u03b3t+1\n\ni\u2192j and b = \u03b3t+1\nj\u2192i,\n\nj\u2192i =\uf8eb\uf8edwj \u2212 Xk\u20326=i,k\u2032\u2208N (j)\n!+\n\n2\n\n.\n\n\u03bbt\n\nk\u2032j\uf8f6\uf8f8+\n\n(4)\n\nij = a + b + 2\u03b5 +p(a \u2212 b)2 + 4\u03b52\n\n\u03bbt+1\n\n(ii) Update t = t + 1 and repeat till algorithm converges within \u03b4 for each component.\n(iii) Output \u03bb, the vector of paramters at convergence,\n\nRemark. The iterative step (4) can be rewritten as follows: for some \u03b2 \u2208 [1, 2],\n\n\u03bbt+1\n\nij = \u03b2\u03b5 + max\uf8f1\uf8f2\uf8f3\n\n\u2212\u03b2\u03b5,\uf8eb\uf8edwi \u2212 Xk\u2208N (i)\\j\n\n\u03bbt\n\nik\uf8f6\uf8f8 ,\uf8eb\uf8ed wj \u2212 Xk\u2208N (j)\\i\n\n\u03bbt\n\n,\n\nkj\uf8f6\uf8f8\uf8fc\uf8fd\uf8fe\n\nwhere \u03b2 depends on values of \u03b3t+1\nj\u2192i. Thus the updates in DESCENT are obtained by small\nbut important perturbation of dual coordinate descent for DUAL, and making it convergent. The\noutput of DESCENT(\u03b5, \u03b4), say \u03bb\u03b5,\u03b4 \u2192 \u03bb\u2217 as \u03b5, \u03b4 \u2192 0 where \u03bb\u2217 is an optimal solution of DUAL.\n\ni\u2192j, \u03b3t+1\n\n5.2 Subroutine: EST\n\nDESCENT yields a good estimate of the optimal solution to DUAL, for small values of \u01eb and\n\u03b4. However, we are interested in the (integral) optimum of LP. In general, it is not possible to\nrecover the solution of a linear program from a dual optimal solution. However, we show that such\na recovery is possible through EST algorithm described below for the MWIS problem when G is\nbipartite with unique MWIS. This procedure is likely to extend for general G when LP relaxation is\ntight and LP has unique solution.\n\nEST(\u03bb, \u03b41).\n\n7\n\n\f(o) The algorithm iteratively estimates x = (xi) given \u03bb.\n\n(i) Initially, color a node i gray and set xi = 0 ifPj\u2208N (i) \u03bbij > wi. Color all other nodes\nwith green and leave their values unspeci\ufb01ed. The conditionPj\u2208N (i) \u03bbij > wi is checked\nas whetherPj\u2208N (i) \u03bbij \u2265 wi + \u03b41 or not.\n\n(ii) Repeat the following steps (in any order) till no more changes can happen:\n\n\u25e6 if i is green and there exists a gray node j \u2208 N (i) with \u03bbij > 0, then set xi = 1 and\n\ncolor it orange. The condition \u03bbij > 0 is checked as whether \u03bbij \u2265 \u03b41 or not.\n\u25e6 if i is green and some orange node j \u2208 N (i), then set xi = 0 and color it gray.\n\n(iii) If any node is green, say i, set xi = 1 and color it red.\n(iv) Produce the output x as an estimation.\n\n6 Discussion\n\nWe believe this paper opens several interesting directions for investigation. In general, the exact rela-\ntionship between max-product and linear programming is not well understood. Their close similarity\nfor the MWIS problem, along with the reduction of MAP estimation to an MWIS problem, suggests\nthat the MWIS problem may provide a good \ufb01rst step in an investigation of this relationship.\n\nAlso, our novel message-passing algorithm and the reduction of MAP estimation to an MWIS prob-\nlem immediately yields a new message-passing algorithm for MAP estimation. It would be interest-\ning to investigate the power of this algorithm on more general discrete estimation problems.\n\nReferences\n\n[1] M. Bayati, D. Shah and M. Sharma, \u201cMax Weight Matching via Max Product Belief Propagation,\u201d IEEE\n\nISIT, 2005.\n\n[2] D. Bertsekas, \u201cNon Linear Programming\u201d, Athena Scienti\ufb01c.\n[3] M. Grtschel, L. Lovsz, and A. Schrijver, \u201cPolynomial algorithms for perfect graphs,\u201d in C. Berge and V.\nChvatal (eds.) Topics on Perfect Graphs Ann. Disc. Math. 21, North-Holland, Amsterdam(1984) 325-356.\n\n[4] K. Jung and D. Shah, \u201cLow Delay Scheduing in Wireless Networks,\u201d IEEE ISIT, 2007.\n[5] C. Moallemi and B. Van Roy, \u201cConvergence of the Min-Sum Message Passing Algorithm for Quadratic\n\nOptimization,\u201d Preprint, 2006 available at arXiv:cs/0603058\n\n[6] Luca Trevisan, \u201cInapproximability of combinatorial optimization problems,\u201d Technical Report TR04-065,\n\nElectronic Colloquium on Computational Complexity, 2004.\n\n[7] M. Wainwright and M. Jordan, \u201cGraphical models, exponential families, and variational inference,\u201d UC\n\nBerkeley, Dept. of Statistics, Technical Report 649. September, 2003.\n\n[8] J. Yedidia, W. Freeman and Y. Weiss, \u201cGeneralized Belief Propagation,\u201d Mitsubishi Elect. Res. Lab., TR-\n\n2000-26, 2000.\n\n[9] Y. Weiss, C. Yanover, T. Meltzer \u201cMAP Estimation, Linear Programming and Belief Propagation with\n\nConvex Free Energies\u201d UAI 2007\n\n8\n\n\f", "award": [], "sourceid": 923, "authors": [{"given_name": "Sujay", "family_name": "Sanghavi", "institution": null}, {"given_name": "Devavrat", "family_name": "Shah", "institution": null}, {"given_name": "Alan", "family_name": "Willsky", "institution": null}]}