{"title": "Augmentative Message Passing for Traveling Salesman Problem and Graph Partitioning", "book": "Advances in Neural Information Processing Systems", "page_first": 289, "page_last": 297, "abstract": "The cutting plane method is an augmentative constrained optimization procedure that is often used with continuous-domain optimization techniques such as linear and convex programs. We investigate the viability of a similar idea within message passing -- for integral solutions -- in the context of two combinatorial problems: 1) For Traveling Salesman Problem (TSP), we propose a factor-graph based on Held-Karp formulation, with an exponential number of constraint factors, each of which has an exponential but sparse tabular form. 2) For graph-partitioning (a.k.a. community mining) using modularity optimization, we introduce a binary variable model with a large number of constraints that enforce formation of cliques. In both cases we are able to derive surprisingly simple message updates that lead to competitive solutions on benchmark instances. In particular for TSP we are able to find near-optimal solutions in the time that empirically grows with $N^3$, demonstrating that augmentation is practical and efficient.", "full_text": "Augmentative Message Passing for Traveling\nSalesman Problem and Graph Partitioning\n\nSiamak Ravanbakhsh\n\nDepartment of Computing Science\n\nUniversity of Alberta\n\nEdmonton, AB T6G 2E8\n\nmravanba@ualberta.ca\n\nReihaneh Rabbany\n\nDepartment of Computing Science\n\nUniversity of Alberta\n\nEdmonton, AB T6G 2E8\n\nrabbanyk@ualberta.ca\n\nRussell Greiner\n\nDepartment of Computing Science\n\nUniversity of Alberta\n\nEdmonton, AB T6G 2E8\n\nrgreiner@ualberta.ca\n\nAbstract\n\nThe cutting plane method is an augmentative constrained optimization procedure\nthat is often used with continuous-domain optimization techniques such as linear\nand convex programs. We investigate the viability of a similar idea within message\npassing \u2013 for integral solutions in the context of two combinatorial problems: 1)\nFor Traveling Salesman Problem (TSP), we propose a factor-graph based on Held-\nKarp formulation, with an exponential number of constraint factors, each of which\nhas an exponential but sparse tabular form. 2) For graph-partitioning (a.k.a. com-\nmunity mining) using modularity optimization, we introduce a binary variable\nmodel with a large number of constraints that enforce formation of cliques. In\nboth cases we are able to derive simple message updates that lead to competitive\nsolutions on benchmark instances. In particular for TSP we are able to \ufb01nd near-\noptimal solutions in the time that empirically grows with N 3, demonstrating that\naugmentation is practical and ef\ufb01cient.\n\n1\n\nIntroduction\n\nProbabilistic Graphical Models (PGMs) provide a principled approach to approximate constraint op-\ntimization for NP-hard problems. This involves a message passing procedure (such as max-product\nBelief Propagation; BP) to \ufb01nd an approximation to maximum a posteriori (MAP) solution. Mes-\nsage passing methods are also attractive as they are easily mass parallelize. This has contributed to\ntheir application in approximating many NP-hard problems, including constraint satisfaction [1, 2],\nconstrained optimization [3, 4], min-max optimization [5], and integration [6].\nThe applicability of PGMs to discrete optimization problems is limited by the size and number of\nfactors in the factor-graph. While many recent attempts have been made to reduce the complexity\nof message passing over high-order factors [7, 8, 9], to our knowledge no published result addresses\nthe issues of dealing with large number of factors. We consider a scenario where a large number\nof factors represent hard constraints and ask whether it is possible to \ufb01nd a feasible solution by\nconsidering only a small fraction of these constraints.\nThe idea is to start from a PGM corresponding to a tractible subsset of constraints, and after obtain-\ning an approximate MAP solution using min-sum BP, augment the PGM with the set of constraints\nthat are violated in the current solution. This general idea has been extensively studied under the\n\n1\n\n\fterm cutting plane methods in different settings. Dantzig et al. [10] \ufb01rst investigated this idea in the\ncontext of TSP and Gomory et al.[11] provided a elegant method to generate violated constraints\nin the context of \ufb01nding integral solutions to linear programs (LP). It has since been used to also\nsolve a variety of nonlinear optimization problems. In the context of PGMs, Sontag and Jaakkola\nuse cutting plane method to iteratively tighten the marginal polytope \u2013 that enforces the local con-\nsistency of marginals \u2013 in order to improve the variational approximation [12]. This differs from our\napproach, where the augmentation changes the factor-graph (i.e., the inference problem) rather than\nimproving the approximation of inference.\nRecent studies show that message passing can be much faster than LP in \ufb01nding approximate MAP\nassignments for structured optimization problems [13]. This further motivates our inquiry regarding\nthe viability of augmentation for message passing. We present an af\ufb01rmative answer to this question\nin application to two combinatorial problems. Section 2 introduces our factor-graph formulations\nfor Traveling Salesman Problem (TSP) and graph-partitioning. Section 3 derives simple message\nupdate equations for these factor-graphs and reviews our augmentation scheme. Finally, Section 4\npresents experimental results for both applications.\n\nenergy function f (x) (cid:44) (cid:80)I\u2208F fI(xI) where F denotes the set of factors. Here the goal of\n\n2 Background and Representation\nLet x = {x1, . . . , xD} \u2208 X = X1 \u00d7X2 . . .\u00d7XD denote an instance of a tuple of discrete variables.\nLet xI refer to a sub-tuple, where I \u2286 {1, . . . , D} indexes a subset of these variables. De\ufb01ne the\ninference is to \ufb01nd an assignment with minimum energy x\u2217 = argx min f (x). This model can be\nconveniently represented using a bipartite graph, known as factor-graph [14], where a factor node\nfI(xI) is connected to a variable node xi iff i \u2208 I.\n\n2.1 Traveling Salesman Problem\n\nA Traveling Salesman Problem (TSP) seeks the minimum length tour of N cities that visits each\ncity exactly once. TSP is NP-hard, and for general distances, no constant factor approximation to\nthis problem is possible [15]. The best known exact solver, due to Held et al.[16], uses dynamic\nprogramming to reduce the cost of enumerating all orderings from O(N !) to O(N 22N ). The de-\nvelopment of many (now) standard optimization techniques, such as simulated annealing, mixed\ninteger linear programming, dynamic programming, and ant colony optimization are closely linked\nwith advances in solving TSP. Since Dantzig et al.[10] manually applied the cutting plane method\nto 49-city problem, a combination of more sophisticated cuts, used with branch-and-bound tech-\nniques [17], has produced the state-of-the-art TSP-solver, Concorde [18]. Other notable results on\nvery large instances have been reported by LinKernighan heuristic [19] that continuously improves\na solution by exchanging nodes in the tour. In a related work, Wang et al.[20] proposed a message\npassing solution to TSP. However their method does not scale beyond small toy problems (authors\nexperimented with N = 5 cities). For a readable historical background of the state-of-the-art in TSP\nand its various applications, see [21].\n\n2.1.1 TSP Factor-Graph\nLet G = (V,E) denote a graph, where V = {v1, . . . , vN} is the set of nodes and the set of edges\nE contains ei\u2212j iff vi and vj are connected. Let x = {xe1, . . . , xeM} \u2208 X = {0, 1}M be a set of\nbinary variables, one for each edge in the graph (i.e., M = |E|) where we will set xem = 1 iff em is\nin the tour. For each node vi, let \u2202vi = {ei\u2212j | ei\u2212j \u2208 E} denote the edges adjacent to vi. Given a\ndistance function d : E \u2192 (cid:60), de\ufb01ne the local factors for each edge e \u2208 E as fe(xe) = xe d(e) \u2013 so\nthis is either d(e) or zero. Any valid tour satis\ufb01es the following necessary and suf\ufb01cient constraints\n\u2013 a.k.a. Held-Karp constraints [22]:\n1. Degree constraints: Exactly two edges that are adjacent to each vertex should be in the tour.\nDe\ufb01ne the factor f\u2202vi (x\u2202vi ) : {0, 1}|\u2202vi| \u2192 {0,\u221e} to enforce this constraint\n\nf\u2202vi (x\u2202vi ) (cid:44) I\u221e\n\nxe = 2\n\n\u2200vi \u2208 V\n\n(cid:33)\n\n(cid:32)(cid:88)\n\ne\u2208\u2202vi\n\n2\n\n\fwhere I\u221e(condition) (cid:44) 0 iff the condition is satis\ufb01ed and +\u221e otherwise.\n\n2. Subtour constraints: Ensure that there are no short-circuits \u2013 i.e., there are no loops that contain\nstrict subsets of nodes. To enforce this, for each S \u2282 V, de\ufb01ne \u03b4(S) (cid:44) {ei\u2212j \u2208 E | vi \u2208 S, vj /\u2208 S}\nto be the set of edges, with one end in S and the other end in V \\ S.\nWe need to have at least two edges leaving each subset S. The following set of factors enforce these\nconstraints\n\n(cid:33)\n\nf\u03b4(S)(x\u03b4(S)) = I\u221e\n\nxe \u2265 2\n\n\u2200S \u2282 V, S (cid:54)= \u2205\n\n(cid:32)(cid:88)\n\nxe\u2208S\n\nThese three types of factors de\ufb01ne a factor-graph, whose minimum energy con\ufb01guration is the small-\nest tour for TSP.\n\n2.2 Graph Partitioning\n\nGraph partitioning \u2013a.k.a. community mining\u2013 is an active \ufb01eld of research that has recently pro-\nduced a variety of community detection methods (e.g., see [23] and its references), a notable one\nof which is Modularity maximization [24]. However, exact optimization of Modularity is NP-hard\n[25]. Modularity is closely related to fully connected Potts graphical models [26]. However, due to\nfull connectivity of PGM, message passing is not able to \ufb01nd good solutions. Many have proposed\nvarious other heuristics for modularity optimization [27, 28, 26, 29, 30]. We introduce a factor-graph\nrepresentation of this problem that has a large number of factors. We then discuss a stochastic but\nsparse variation of modularity that enables us to ef\ufb01ciently partition relatively large sparse graphs.\n\n2.2.1 Clustering Factor-Graph\n\nLet G = (V,E) be a graph, with a weight function (cid:101)\u03c9 : V \u00d7 V \u2192 (cid:60), where (cid:101)\u03c9(vi, vj) (cid:54)= 0\niff ei:j \u2208 E. Let Z = (cid:80)\nAlso let \u03c9(\u2202vi) (cid:44) (cid:80)\n\n2Z be the normalized weights.\n\u03c9(vi, vj) denote the normalized degree of node vi. Graph clustering us-\ning modularity optimization seeks a partitioning of the nodes into unspeci\ufb01ed number of clusters\nC = {C1, . . . ,CK}, maximizing\nq(C) =\n\n\u03c9(vi, vj) \u2212 \u03c9(\u2202vi) \u03c9(\u2202vj)\n\nv1,v2\u2208V(cid:101)\u03c9(v1, v2) and \u03c9(vi, vj) (cid:44) (cid:101)\u03c9\n(cid:88)\n\n(cid:88)\n\n(cid:18)\n\n(cid:19)\n\n(1)\n\nvj\n\nCi\u2208C\n\nvi,vj\u2208Ci\n\nThe \ufb01rst term of modularity is proportional to within-cluster edge-weights. The second term is\nproportional to the expected number of within cluster edge-weights for a null model with the same\nweighted node degrees for each node vi.\nHere the null model is a fully-connected graph. We generate a random sparse null model with\nMnull < \u03b1M weighted edges (Enull), by randomly sampling two nodes, each drawn indepen-\n\ndently from P(vi) \u221d(cid:112)\u03c9(\u2202vi), and connecting them with a weight proportional to(cid:101)\u03c9null(vi, vj) \u221d\n(cid:112)\u03c9(\u2202vi)\u03c9(\u2202vj). If they have been already connected, this weight is added to their current weight.\nWe repeat this process \u03b1M times, however since some of the edges are repeated, the total number\n(cid:101)\u03c9null(vi,vj )\nof edges in the null model may be under \u03b1M. Finally the normalized edge-weight in the sparse\nvi,vj (cid:101)\u03c9null(vi,vj ) . It is easy to see that this generative process in\nnull model is \u03c9null(vi, vj) (cid:44)\nexpectation produces the fully connected null model.1\nHere we use the following binary-valued factor-graph formulation. Let x = {xi1:j1, . . . , xiL:jL} =\n{0, 1}L be a set of binary variables, one for each edge ei:j \u2208 E \u222a Enull \u2013 i.e., |E \u222a Enull| = L.\nDe\ufb01ne the local factor for each variable as fi:j(xi:j) = \u2212xi\u2212j(\u03c9(vi, vj) \u2212 \u03c9null(vi, vj)). The\nidea is to enforce formation of cliques, while minimizing the sum of local factors. By doing so the\n\n2(cid:80)\n\n1The choice of using square root of weighted degrees for both sampling and weighting is to reduce the\nvariance. One may also use pure importance sampling (i.e., use the product of weighted degrees for sampling\nand set the edge-weights in the null model uniformly), or uniform sampling of edges, where the edge-weights\nof the null model are set to the product of weighted degrees.\n\n3\n\n\fnegative sum of local factors evaluates to modularity (eq 1). For each three edges ei:j, ej:k, ei:k \u2208\nE \u222a Enull, i < j < k that form a triangle, de\ufb01ne a clique constraint as\n\nf{i:j,j:k,i:k}(xi:j, xj:k, xi:k) (cid:44) I\u221e(xi:j + xj:k + xi:k (cid:54)= 2)\n\nThese factors ensure the formation of cliques \u2013 i.e., if the weights of two edges that are adjacent to\nthe same node are non-zero, the third edge in the triangle should also have non-zero weight. The\ncomputational challenge here is the large number of clique constraints. Brandes et al.[25] use a\nsimilar LP formulation. However, since they include all the constraints from the beginning and the\nnull model is fully connected, their method is only applied to small toy problems.\n\n3 Message Passing\n\nMin-sum belief propagation is an inference procedure, in which a set of messages are exchanged be-\ntween variables and factors. The factor-to-variable (\u03bdI\u2192e) and variable-to-factor (\u03bde\u2192I) messages\nare de\ufb01ned as\n\n\u03bde\u2192I(xe) (cid:44) (cid:88)\n(cid:26)\n\n\u03bdI\u2192e(xe) (cid:44) min\n\nI(cid:48)(cid:51)e,I(cid:48)(cid:54)=I\n\n\u03bdI(cid:48)\u2192e(xe)\n\nfI(xI\\e, xe)\n\n(cid:88)\n\ne(cid:48)\u2208I\\e\n\n(cid:27)\n\n\u03bde(cid:48)\u2192I(xe(cid:48))\n\nxI\\e\n\n(2)\n\n(3)\n\noften prevents oscillations: \u03bdI\u2192e(xe) = \u03bb(cid:101)\u03bdI\u2192e(xe) + (1 \u2212 \u03bb)\u03bdI\u2192e(xe). Here(cid:101)\u03bdI\u2192e is the new\n\nexact, the set of local beliefs \u00b5e(xe) (cid:44) (cid:80)I(cid:51)e \u03bdI\u2192e(xe) indicate the minimum value that can be\n\nwhere I (cid:51) e indexes all factors that are adjacent to the variable xe on the factor-graph. Starting\nfrom an initial set of messages, this recursive update is performed until convergence.\nThis procedure is exact on trees, factor-graphs with single cycle as well as some special settings\n[4]. However it is found to produce good approximations in general loopy graphs. When BP is\nobtained for a particular assignment of xe. When there are no ties, the joint assignment x\u2217, obtained\nby minimizing individual local beliefs, is optimal.\nWhen BP is not exact or the marginal beliefs are tied, a decimation procedure can improve the\nquality of \ufb01nal assignment. Decimation involves \ufb01xing a subset of variables to their most biased\nvalues, and repeating the BP update. This process is repeated until all variables are \ufb01xed.\nAnother way to improve performance of BP when applied to loopy graphs is to use damping, which\nmessage as calculated by eq 3 and \u03bb \u2208 (0, 1] is the damping parameter. Damping can also be applied\nto variable-to-factor messages.\nWhen applying BP equations eqs 2, 3 to the TSP and clustering factor-graphs, as de\ufb01ned above,\nwe face two computational challenges: (a) Degree constraints for TSP can depend on N variables,\nresulting in O(2N ) time complexity of calculating factor-to-variable messages. For subtour con-\nstraints, this is even more expensive as fS (x\u03b4(S)) depends on O(M ) (recall M = |E| which can be\nO(N 2)) variables. (b) The complete TSP factor-graph has O(2N ) subtour constraints. Similarly the\nclustering factor-graph can contain a large number of clique constraints. For the fully connected null\nmodel, we need O(N 3) such factors and even using the sparse null model \u2013 assuming a random edge\nprobability a.k.a. Erdos-Reny graph \u2013 there are O( L3\nN 3 ) triangles in the graph (recall\nthat L = |E \u222a Enull|). In the next section, we derive the compact form of BP messages for both\nproblems. In the case of TSP, we show how to exploit the sparsity of degree and subtour constraints\nto calculate the factor-to-variable messages in O(N ) and O(M ) respectively.\n\nN 6 N 3) = O( L3\n\n3.1 Closed Form of Messages\nFor simplicity we work with normalized message \u03bdI\u2192e (cid:44) \u03bdI\u2192e(1)\u2212 \u03bdI\u2192e(0), which is equivalent\nto assuming \u03bdI\u2192e(0) = 0 \u2200I, e. The same notation is used for variable-to-factor message, and\nmarginal belief. We refer to the normalized marginal belief, \u00b5e = \u00b5e(1) \u2212 \u00b5(0)e as bias.\nDespite their exponentially large tabular form, both degree and subtour constraint factors for TSP\nare sparse. Similar forms of factors is studied in several previous works [7, 8, 9]. By calculating\n\n4\n\n\fFigure 1: (left) The message passing results after each augmentation step for the complete graph of\nprinting board instance from [31]. The blue lines in each \ufb01gure show the selected edges at the end\nof message passing. The pale red lines show the edges with the bias that, although negative (\u00b5e <\n0), were close to zero. (middle) Clustering of power network (N = 4941) by message passing.\nDifferent clusters have different colors and the nodes are scaled by their degree. (right) Clustering\nof politician blogs network (N = 1490) by message passing and by meta-data \u2013 i.e., liberal or\nconservative.\n\n,\n\nthe closed form of these messages for TSP factor-graph, we observe that they have a surprisingly\nsimple form. Rewriting eq 3 for degree constraint factors, we get:\n\u03bd\u2202vi\u2192e(1) = min{\u03bde(cid:48)\u2192\u2202vi}e(cid:48)\u2208\u2202vi\\e\n\u03bd\u2202vi\u2192e(0) = min{\u03bde(cid:48)\u2192\u2202vi + \u03bde(cid:48)(cid:48)\u2192\u2202vi}e(cid:48),e(cid:48)(cid:48)\u2208\u2202vi\\e (4)\nwhere we have dropped the summation and the factor from eq 3. For xe = 1, in order to have\nf\u2202vi(x\u2202i) < \u221e, only one other xe(cid:48) \u2208 x\u2202vi should be non-zero. On the other hand, we know that\nmessages are normalized such that \u03bde\u2192\u2202vi(0) = 0 \u2200vi, e \u2208 \u2202vi, which means they can be ignored\nin the summation. For xe = 0, in order to satisfy the constraint factor, two of the adjacent variables\nshould have a non-zero value. Therefore we seek two such incoming messages with minimum\nvalues. Let min[k]A denote the kth smallest value in the set A \u2013 i.e., minA \u2261 min[1]A. We\ncombine the updates above to get a \u201cnormalized message\u201d, \u03bd\u2202vi\u2192e, which is simply the negative of\nthe second largest incoming message (excluding \u03bde\u2192\u2202vi) to the factor f\u2202vi:\n\n\u03bd\u2202vi\u2192e = \u03bd\u2202vi\u2192e(1) \u2212 \u03bd\u2202vi\u2192e(0) = \u2212 min[2]{\u03bde(cid:48)\u2192\u2202vi}e(cid:48)\u2208\u2202vi\\e\n\n(5)\n\nFollowing a similar procedure, factor-to-variable messages for subtour constraints is given by\n\n\u03bd\u03b4(S)\u2192e = \u2212 max{0, min[2]{\u03bde(cid:48)\u2192\u03b4(S)}e(cid:48)\u2208\u03b4(S)\\e}}\n\n(6)\nHere while we are searching for the minimum incoming message, if we encounter two messages\nwith negative or zero values, we can safely assume \u03bd\u03b4(S)\u2192e = 0, and stop the search. This results\nin signi\ufb01cant speedup in practice. Note that both eq 5 and eq 6 only need to calculate the second\nsmallest message in the set {\u03bde(cid:48)\u2192\u03b4(S)}e(cid:48)\u2208\u03b4(S)\\e. In the asynchronous calculation of messages, this\nminimization should be repeated for each outgoing message. However in a synchronous update by\n\ufb01nding three smallest incoming messages to each factor, we can calculate all the factor-to-variable\nmessages at the same time.\nFor the clustering factor-graph, the clique factor is satis\ufb01ed only if either zero, one, or all three of\nthe variables in its domain are non-zero. The factor-to-variable messages are given by\n\n\u03bd{i:j,j:k,i:k}\u2192i:j(0) = min{0, \u03bdj:k\u2192{i:j,j:k,i:k}, \u03bdi:k\u2192{i:j,j:k,i:k}}\n\u03bd{i:j,j:k,i:k}\u2192i:j(1) = min{0, \u03bdj:k\u2192{i:j,j:k,i:k} + \u03bdi:k\u2192{i:j,j:k,i:k}}\n\n(7)\nFor xi:j = 0, the minimization is over three feasible cases (a) xj:k = xi:k = 0, (b) xj:k = 1, xi:k = 0\nand (c) xj:k = 0, xi:k = 1. For xi:j = 1, there are two feasible cases (a) xj:k = xi:k = 0 and\n(b) xj:k = xi:k = 1. Normalizing these messages we have\n\n\u03bd{i:j,j:k,i:k}\u2192i:j = min{0, \u03bdj:k\u2192{i:j,j:k,i:k} + \u03bdi:k\u2192{i:j,j:k,i:k}}\u2212\n\nmin{0, \u03bdj:k\u2192{i:j,j:k,i:k}, \u03bdi:k\u2192{i:j,j:k,i:k}}\n\n(8)\n\n3.2 Finding Violations\n\nDue to large number of factors, message passing for the full factor-graph in our applications is not\npractical. Our solution is to start with a minimal set of constraints. For TSP, we start with no subtour\nconstraints and for clustering, we start with no clique constraint. We then use message passing to\n\ufb01nd marginal beliefs \u00b5e and select the edges with positive bias \u00b5e > 0.\n\n5\n\n\fFigure 2: Results of message passing for TSP on different benchmark problems. From left to right, the\nplots show: (a) running time, (b) optimality ratio (compared to Concorde), (c) iterations of augmentation and\n(d) number of subtours constraints \u2013 all as a function of number of nodes. The optimality is relative to the\nresult reported by Concorde. Note that all plots except optimality are log-log plots where a linear trend shows\na monomial relation (y = axm) between the values on the x and y axis, where the slope shows the power m.\n\nWe then \ufb01nd the constraints that are violated. For TSP, this is achieved by \ufb01nding connected com-\nponents C = {Si \u2282 V} of the solution in O(N ) time and de\ufb01ne new subtour constraints for each\nSi \u2208 C (see Figure 1(left)).\nFor graph partitioning, we simply look at pairs of positively \ufb01xed edges around each node and if\nthe third edge of the triangle is not positively \ufb01xed, we add the corresponding clique factor to the\nfactor-graph; see Appendix A for more details.\n\n4 Experiments\n4.1 TSP\nHere we evaluate our method over \ufb01ve benchmark datasets: (I) TSPLIB, which contains a variety\nof real-world benchmark instances, the majority of which are 2D or 3D Euclidean or geographic\n\n6\n\n\fTable 1: Comparison of different modularity optimization methods.\n\nmessage passing (full)\n\nmessage passing (sparse)\n\nSpin-glass\n\nL-Eigenvector\n\nFastGreedy\n\nLouvian\n\n?\nd\ne\nt\nh\ng\ni\ne\n\nW\ny\ny\nn\nn\ny\nn\nn\ny\ny\n\ns\ne\nd\no\nN\n105\n115\n34\n1589\n62\n77\n297\n1490\n34\n\ns\ne\ng\nd\nE\n441\n615\n78\n2742\n159\n254\n2359\n19090\n\n78\n\ny\nt\ni\nr\na\nl\nu\nd\no\nM\n\nt\ns\no\nL C\n\n5461\n6554\n562\nNA\n1892\n2927\n43957\nNA\n562\n\nNA\n\n5.68% 0.511\n27.85% 0.591\n12.34% 0.431\nNA\n14.02% 0.508\n5.14% 0.531\n16.70% 0.391\nNA\n14.32% 0.355\n\nNA\n\ny\nt\ni\nr\na\nl\nu\nd\no\nM\n\nt\ns\no\nL C\n\n3624\n5635\n431\n53027\n1269\n1601\n21380\n156753\n423\n\n13.55% 0.506\n17.12% 0.594\n15.14% 0.401\n.0004% 0.941\n6.50% 0.521\n1.7%\n0.534\n3.16% 0.404\n.14%\n0.411\n17.54% 0.390\n\ni\n\ne\nm\nT\n.04\n0.14\n0\n2.01\n0.01\n0.01\n2.82\n32.75\n0\n\ny\nt\ni\nr\na\nl\nu\nd\no\nM\n0.525\n0.601\n0.444\n0.907\n0.523\n0.529\n0.406\n0.427\n0.417\n\ni\n\ne\nm\nT\n1.648\n0.87\n0.557\n8.459\n0.728\n1.31\n5.849\n67.674\n0.531\n\ny\nt\ni\nr\na\nl\nu\nd\no\nM\n0.467\n0.487\n0.421\n0.889\n0.491\n0.483\n0.278\n0.425\n0.393\n\ni\n\ne\nm\nT\n0.179\n0.151\n0.095\n0.303\n0.109\n0.081\n0.188\n0.33\n0.086\n\ny\nt\ni\nr\na\nl\nu\nd\no\nM\n0.501\n0.548\n0.410\n0.926\n0.495\n0.472\n0.367\n0.427\n0.380\n\ni\n\ne\nm\nT\n0.643\n0.08\n0.085\n0.154\n0.107\n0.073\n0.12\n0.305\n0.079\n\ny\nt\ni\nr\na\nl\nu\nd\no\nM\n0.489\n0.602\n0.443\n0.948\n0.517\n0.566\n0.435\n0.426\n0.395\n\ni\n\ne\nm\nT\n0.03\n0.019\n0.027\n0.218\n0.011\n0.011\n0.031\n0.099\n0.009\n\ni\n\ne\nm\nT\n.07\n0.41\n0\nNA\n0.01\n0\n10.89\nNA\n0\n\nm\ne\nl\nb\no\nr\nP\n\npolbooks\nfootball\nwkarate\nnetscience\ndolphins\nlesmis\n\ncelegansneural\n\npolblogs\nkarate\n\ndistances.2 (II) Euclidean distance between random points in 2D. (III) Random (symmetric) dis-\ntance matrices. (IV) Hamming distance between random binary vectors with \ufb01xed length (20 bits).\nThis appears in applications such as data compression [32] and radiation hybrid mapping in ge-\nnomics [33]. (V) Correlation distance between random vectors with 5 random features (e.g., using\nTSP for gene co-clustering [34]). In producing random points and features as well as random dis-\ntances (in (III)), we used uniform distribution over [0, 1].\nFor each of these cases, we report the (a) run-time, (b) optimality, (c) number of iterations of aug-\nmentation and (d) number of subtour factors at the \ufb01nal iteration. In all of the experiments, we use\nConcorde [18] with its default settings to obtain the optimal solution.3 Since there are very large\nnumber of TSP solvers, comparison with any particular method is pointless. Instead we evaluate the\nquality of message passing against the \u201coptimal\u201d solution. The results in Figure 2(2nd column from\nleft) reports the optimality ratio \u2013 i.e., ratio of the tour found by message passing, to the optimal\ntour. To demonstrate the non-triviality of these instance, we also report the optimality ratio for two\nheuristics that have optimality guarantees for metric instances [35]: (a) nearest neighbour heuristic\n(O(N 2)), which incrementally adds the to any end of the current path the closest city that does not\nform a loop; (b) greedy algorithm (O(N 2 log(N ))), which incrementally adds a lowest cost edge to\nthe current edge-set, while avoiding subtours.\nIn all experiments, we used the full graph G = (V,E), which means each iteration of message\npassing is O(N 2\u03c4 ), where \u03c4 is the number of subtour factors. All experiments use Tmax = 200\niterations, \u0001max = median{d(e)}e\u2208E and damping with \u03bb = .2. We used decimation, and \ufb01xed\n10% of the remaining variables (out of N) per iteration of decimation.4 This increases the cost of\nmessage passing by an O(log(N )) multiplicative factor, however it often produces better results.\nAll the plots in Figure 2, except for the second column, are in log-log format. When using log-log\nplot, a linear trend shows a monomial relation between x and y axes \u2013 i.e., y = axm. Here m\nindicates the slope of the line in the plot and the intercept corresponds to log(a). By studying the\nslope of the linear trend in the run-time (left column) in Figure 2, we observe that, for almost all\ninstances, message passing seems to grow with N 3 (i.e., slope of \u223c 3). Exceptions are TSPLIB\ninstances, which seem to pose a greater challenge, and random distance matrices which seem to be\neasier for message passing. A similar trend is suggested by the number of subtour constraints and\niterations of augmentation, which has a slope of \u223c 1, suggesting a linear dependence on N. Again\nthe exceptions are TSPLIB instances that grow faster than N and random distance matrices that\nseem to grow sub-linearly.5 Finally, the results in the second column suggests that message passing\nis able to \ufb01nd near optimal (in average \u223c 1.1-optimal) solutions for almost all instances and the\nquality of tours does not degrade with increasing number of nodes.\n\n2Geographic distance is the distance on the surface of the earth as a large sphere.\n3For many larger instances, Concorde (with default setting and using CPLEX as LP solver) was not able\nto \ufb01nd the optimal solution. Nevertheless we used the upper-bound on the optimal produced by Concord in\nevaluating our method.\n4Note that here we are only \ufb01xing the top N variables with positive bias. The remaining M \u2212 N variables\n\n5Since we measured the time in milliseconds, the \ufb01rst column does not show the instances that had a running\n\nare automatically clamped to zero.\n\ntime of less than a millisecond.\n\n7\n\n\f4.2 Graph Partitioning\n\nFor graph partitioning, we experimented with a set of classic benchmarks6. Since the optimization\ncriteria is modularity, we compared our method only against best known \u201cmodularity optimization\u201d\nheuristics: (a) FastModularity[27], (b) Louvain [30], (c) Spin-glass [26] and (d) Leading eigenvec-\ntor [28]. For message passing, we use \u03bb = .1, \u0001max = median{|\u03c9(e) \u2212 \u03c9null(e)|}e\u2208E\u222aEnull and\nTmax = 10. Here we do not perform any decimation and directly \ufb01x the variables based on their\nbias \u00b5e > 0 \u21d4 xe = 1.\nTable 1 summarizes our results (see also Figure 1(middle,right)). Here for each method and each\ndata-set, we report the time (in seconds) and the Modularity of the communities found by each\nmethod. The table include the results of message passing for both full and sparse null models, where\nwe used a constant \u03b1 = 20 to generate our stochastic sparse null model. For message passing, we\nalso included L = |E + Enull| and the saving in the cost using augmentation. This column shows\nthe percentage of the number of all the constraints considered by the augmentation. For example,\nthe cost of .14% for the polblogs data-set shows that augmentation and sparse null model meant\nusing .0014 times fewer clique-factors, compared to the full factor-graph.\nOverall, the results suggest that our method is comparable to state-of-the-art in terms both time and\nquality of clustering. But more importantly it shows that augmentative message passing is able to\n\ufb01nd feasible solutions using a small portion of the constraints.\n\n5 Conclusion\n\nWe investigate the possibility of using cutting-plane-like, augmentation procedures with message\npassing. We used this procedure to solve two combinatorial problems; TSP and modularity optimiza-\ntion. In particular, our polynomial-time message passing solution to TSP often \ufb01nds near-optimal\nsolutions to a variety of benchmark instances.\nDespite losing the guarantees that make cutting plane method very powerful, our approach has sev-\neral advantages: First, message passing is more ef\ufb01cient than LP for structured optimization [13]\nand it is highly parallelizable. Moreover by directly obtaining integral solutions, it is much easier\nto \ufb01nd violated constraints. (Note the cutting plane method for combinatorial problems operates\non fractional solutions, whose rounding may eliminate its guarantees.) For example, for TSPs,\nour method simply adds violated constraints by \ufb01nding connected components. However, due to\nnon-integral assignments, cutting plane methods require sophisticated tricks to \ufb01nd violations [21].\nAlthough powerful branch-and-cut methods, such as Concorde, are able to exactly solve instances\nwith few thousands of variables, their general run-time on benchmark instances remains exponen-\ntial [18, p495], while our approximation appears to be O(N 3). Overall our studies indicate that\naugmentative message passing is an ef\ufb01cient procedure for constraint optimization with large num-\nber of constraints.\n\nReferences\n\n[1] M. Mezard, G. Parisi, and R. Zecchina, \u201cAnalytic and algorithmic solution of random satis\ufb01ability prob-\n\nlems,\u201d Science, 2002.\n\n[2] S. Ravanbakhsh and R. Greiner, \u201cPerturbed message passing for constraint satisfaction problems,\u201d arXiv\n\npreprint arXiv:1401.6686, 2014.\n\n[3] B. Frey and D. Dueck, \u201cClustering by passing messages between data points,\u201d Science, 2007.\n[4] M. Bayati, D. Shah, and M. Sharma, \u201cMaximum weight matching via max-product belief propagation,\u201d\n\nin ISIT, 2005.\n\n[5] S. Ravanbakhsh, C. Srinivasa, B. Frey, and R. Greiner, \u201cMin-max problems on factor-graphs,\u201d ICML,\n\n2014.\n\n[6] B. Huang and T. Jebara, \u201cApproximating the permanent with belief propagation,\u201d arXiv preprint\n\narXiv:0908.1769, 2009.\n\n6Obtained form Mark Newman\u2019s website:\n\nnetdata/\n\nhttp://www-personal.umich.edu/\u02dcmejn/\n\n8\n\n\f[7] B. Potetz and T. S. Lee, \u201cEf\ufb01cient belief propagation for higher-order cliques using linear constraint\n\nnodes,\u201d Computer Vision and Image Understanding, vol. 112, no. 1, pp. 39\u201354, 2008.\n\n[8] R. Gupta, A. A. Diwan, and S. Sarawagi, \u201cEf\ufb01cient inference with cardinality-based clique potentials,\u201d in\n\nICML, 2007.\n\n[9] D. Tarlow, I. E. Givoni, and R. S. Zemel, \u201cHop-map: Ef\ufb01cient message passing with high order poten-\n\ntials,\u201d in International Conference on Arti\ufb01cial Intelligence and Statistics, pp. 812\u2013819, 2010.\n\n[10] G. Dantzig, R. Fulkerson, and S. Johnson, \u201cSolution of a large-scale traveling-salesman problem,\u201d J\n\nOperations Research society of America, 1954.\n\n[11] R. E. Gomory et al., \u201cOutline of an algorithm for integer solutions to linear programs,\u201d Bulletin of the\n\nAmerican Mathematical Society, vol. 64, no. 5, pp. 275\u2013278, 1958.\n\n[12] D. Sontag and T. S. Jaakkola, \u201cNew outer bounds on the marginal polytope,\u201d in Advances in Neural\n\nInformation Processing Systems, pp. 1393\u20131400, 2007.\n\n[13] C. Yanover, T. Meltzer, and Y. Weiss, \u201cLinear programming relaxations and belief propagation\u2013an empir-\n\nical study,\u201d JMLR, 2006.\n\n[14] F. Kschischang and B. Frey, \u201cFactor graphs and the sum-product algorithm,\u201d Information Theory, IEEE,\n\n2001.\n\n[15] C. H. Papadimitriou, \u201cThe euclidean travelling salesman problem is np-complete,\u201d Theoretical Computer\n\nScience, vol. 4, no. 3, pp. 237\u2013244, 1977.\n\n[16] M. Held and R. M. Karp, \u201cA dynamic programming approach to sequencing problems,\u201d Journal of the\n\nSociety for Industrial & Applied Mathematics, vol. 10, no. 1, p. 196210, 1962.\n\n[17] M. Padberg and G. Rinaldi, \u201cA branch-and-cut algorithm for the resolution of large-scale symmetric\n\ntraveling salesman problems,\u201d SIAM review, vol. 33, no. 1, pp. 60\u2013100, 1991.\n\n[18] D. Applegate, R. Bixby, V. Chvatal, and W. Cook, \u201cConcorde TSP solver,\u201d 2006.\n[19] K. Helsgaun, \u201cGeneral k-opt submoves for the lin\u2013kernighan tsp heuristic,\u201d Mathematical Programming\n\nComputation, 2009.\n\n[20] C. Wang, J. Lai, and W. Zheng, \u201cMessage-passing for the traveling salesman problem,\u201d\n[21] D. Applegate, The traveling salesman problem: a computational study. Princeton, 2006.\n[22] M. Held and R. Karp, \u201cThe traveling-salesman problem and minimum spanning trees,\u201d Operations Re-\n\nsearch, 1970.\n\n[23] J. Leskovec, K. Lang, and M. Mahoney, \u201cEmpirical comparison of algorithms for network community\n\ndetection,\u201d in WWW, 2010.\n\n[24] M. Newman and M. Girvan, \u201cFinding and evaluating community structure in networks,\u201d Physical Review\n\nE, 2004.\n\n[25] U. Brandes, D. Delling, et al., \u201con clustering,\u201d IEEE KDE, 2008.\n[26] J. Reichardt and S. Bornholdt, \u201cDetecting fuzzy community structures in complex networks with a potts\n\nmodel,\u201d Physical Review Letters, vol. 93, no. 21, p. 218701, 2004.\n\n[27] A. Clauset, \u201cFinding local community structure in networks,\u201d Physical Review E, 2005.\n[28] M. Newman, \u201cFinding community structure in networks using the eigenvectors of matrices,\u201d Physical\n\nreview E, 2006.\n\n[29] P. Ronhovde and Z. Nussinov, \u201cLocal resolution-limit-free potts model for community detection,\u201d Physi-\n\ncal Review E, vol. 81, no. 4, p. 046114, 2010.\n\n[30] V. Blondel, J. Guillaume, et al., \u201cFast unfolding of communities in large networks,\u201d J Statistical Mechan-\n\nics, 2008.\n\n[31] G. Reinelt, \u201cTspliba traveling salesman problem library,\u201d ORSA journal on computing, vol. 3, no. 4,\n\npp. 376\u2013384, 1991.\n\n[32] D. Johnson, S. Krishnan, J. Chhugani, S. Kumar, and S. Venkatasubramanian, \u201cCompressing large\n\nBoolean matrices using reordering techniques,\u201d in VLDB, 2004.\n\n[33] A. Ben-Dor and B. Chor, \u201cOn constructing radiation hybrid maps,\u201d J Computational Biology, 1997.\n[34] S. Climer and W. Zhang, \u201cTake a walk and cluster genes: A TSP-based approach to optimal rearrangement\n\nclustering,\u201d in ICML, 2004.\n\n[35] D. Johnson and L. McGeoch, \u201cThe traveling salesman problem: A case study in local optimization,\u201d\n\nLocal search in combinatorial optimization, 1997.\n\n9\n\n\f", "award": [], "sourceid": 225, "authors": [{"given_name": "Siamak", "family_name": "Ravanbakhsh", "institution": "University of Alberta"}, {"given_name": "Reihaneh", "family_name": "Rabbany", "institution": "University of Alberta"}, {"given_name": "Russell", "family_name": "Greiner", "institution": "University of Alberta"}]}