{"title": "Synthesis of MCMC and Belief Propagation", "book": "Advances in Neural Information Processing Systems", "page_first": 1453, "page_last": 1461, "abstract": "Markov Chain Monte Carlo (MCMC) and Belief Propagation (BP) are the most popular algorithms for computational inference in Graphical Models (GM). In principle, MCMC is an exact probabilistic method which, however, often suffers from exponentially slow mixing. In contrast, BP is a deterministic method, which is typically fast, empirically very successful, however in general lacking control of accuracy over loopy graphs. In this paper, we introduce MCMC algorithms correcting the approximation error of BP, i.e., we provide a way to compensate for BP errors via a consecutive BP-aware MCMC. Our framework is based on the Loop Calculus (LC) approach which allows to express the BP error as a sum of weighted generalized loops. Although the full series is computationally intractable, it is known that a truncated series, summing up all 2-regular loops, is computable in polynomial-time for planar pair-wise binary GMs and it also provides a highly accurate approximation empirically. Motivated by this, we, first, propose a polynomial-time approximation MCMC scheme for the truncated series of general (non-planar) pair-wise binary models. Our main idea here is to use the Worm algorithm, known to provide fast mixing in other (related) problems, and then design an appropriate rejection scheme to sample 2-regular loops. Furthermore, we also design an efficient rejection-free MCMC scheme for approximating the full series. The main novelty underlying our design is in utilizing the concept of cycle basis, which provides an efficient decomposition of the generalized loops. In essence, the proposed MCMC schemes run on transformed GM built upon the non-trivial BP solution, and our experiments show that this synthesis of BP and MCMC outperforms both direct MCMC and bare BP schemes.", "full_text": "Synthesis of MCMC and Belief Propagation\n\nSungsoo Ahn\u2217 Michael Chertkov\u2020\n\n\u2217School of Electrical Engineering,\n\nJinwoo Shin\u2217\n\nKorea Advanced Institute of Science and Technology, Daejeon, Korea\n\n\u20201 Theoretical Division, T-4 & Center for Nonlinear Studies,\n\nLos Alamos National Laboratory, Los Alamos, NM 87545, USA,\n\n\u20202Skolkovo Institute of Science and Technology, 143026 Moscow, Russia\n\n\u2217{sungsoo.ahn, jinwoos}@kaist.ac.kr\n\n\u2020chertkov@lanl.gov\n\nAbstract\n\nMarkov Chain Monte Carlo (MCMC) and Belief Propagation (BP) are the most\npopular algorithms for computational inference in Graphical Models (GM). In\nprinciple, MCMC is an exact probabilistic method which, however, often suffers\nfrom exponentially slow mixing. In contrast, BP is a deterministic method, which is\ntypically fast, empirically very successful, however in general lacking control of ac-\ncuracy over loopy graphs. In this paper, we introduce MCMC algorithms correcting\nthe approximation error of BP, i.e., we provide a way to compensate for BP errors\nvia a consecutive BP-aware MCMC. Our framework is based on the Loop Calculus\napproach which allows to express the BP error as a sum of weighted generalized\nloops. Although the full series is computationally intractable, it is known that a trun-\ncated series, summing up all 2-regular loops, is computable in polynomial-time for\nplanar pair-wise binary GMs and it also provides a highly accurate approximation\nempirically. Motivated by this, we \ufb01rst propose a polynomial-time approximation\nMCMC scheme for the truncated series of general (non-planar) pair-wise binary\nmodels. Our main idea here is to use the Worm algorithm, known to provide fast\nmixing in other (related) problems, and then design an appropriate rejection scheme\nto sample 2-regular loops. Furthermore, we also design an ef\ufb01cient rejection-free\nMCMC scheme for approximating the full series. The main novelty underlying\nour design is in utilizing the concept of cycle basis, which provides an ef\ufb01cient\ndecomposition of the generalized loops. In essence, the proposed MCMC schemes\nrun on transformed GM built upon the non-trivial BP solution, and our experiments\nshow that this synthesis of BP and MCMC outperforms both direct MCMC and\nbare BP schemes.\n\n1\n\nIntroduction\n\nGMs express factorization of the joint multivariate probability distributions in statistics via graph of\nrelations between variables. The concept of GM has been used successfully in information theory,\nphysics, arti\ufb01cial intelligence and machine learning [1, 2, 3, 4, 5, 6]. Of many inference problems\none can set with a GM, computing partition function (normalization), or equivalently marginalizing\nthe joint distribution, is the most general problem of interest. However, this paradigmatic inference\nproblem is known to be computationally intractable in general, i.e., formally it is #P-hard even to\napproximate [7, 8].\nTo address this obstacle, extensive efforts have been made to develop practical approximation methods,\namong which MCMC- [9] based and BP- [10] based algorithms are, arguably, the most popular\nand practically successful ones. MCMC is exact, i.e., it converges to the correct answer, but its\nconvergence/mixing is, in general, exponential in the system size. On the other hand, message\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fpassing implementations of BP typically demonstrate fast convergence, however in general lacking\napproximation guarantees for GM containing loops. Motivated by this complementarity of the\nMCMC and BP approaches, we aim here to synthesize a hybrid approach bene\ufb01ting from a joint use\nof MCMC and BP.\nAt a high level, our proposed scheme uses BP as the \ufb01rst step and then runs MCMC to correct for the\napproximation error of BP. To design such an \u201cerror-correcting\" MCMC, we utilize the Loop Calculus\napproach [11] which allows, in a nutshell, to express the BP error as a sum (i.e., series) of weights\nof the so-called generalized loops (sub-graphs of a special structure). There are several challenges\none needs to overcome. First of all, to design an ef\ufb01cient Markov Chain (MC) sampler, one needs to\ndesign a scheme which allows ef\ufb01cient transitions between the generalized loops. Second, even if\none designs such a MC which is capable of accessing all the generalized loops, it may mix slowly.\nFinally, weights of generalized loops can be positive or negative, while an individual MCMC can\nonly generate non-negative contributions.\nSince approximating the full loop series (LS) is intractable in general, we \ufb01rst explore whether we\ncan deal with the challenges at least in the case of the truncated LS corresponding to 2-regular loops.\nIn fact, this problem has been analyzed in the case of the planar pairwise binary GMs [12, 13] where\nit was shown that the 2-regular LS is computable exactly in polynomial-time through a reduction\nto a Pfaf\ufb01an (or determinant) computation [14]. In particular, the partition function of the Ising\nmodel without external \ufb01eld (i.e., where only pair-wise factors present) is computable exactly via\nthe 2-regular LS. Furthermore, the authors show that in the case of general planar pairwise binary\nGMs, the 2-regular LS provides a highly accurate approximation empirically. Motivated by these\nresults, we address the same question in the general (i.e., non-planar) case of pairwise binary GMs\nvia MCMC. For the choice of MC, we adopt the Worm algorithm [15]. We prove that with some\nmodi\ufb01cation including rejections, the algorithm allows to sample (with probabilities proportional to\nrespective weights) 2-regular loops in polynomial-time. Then, we design a novel simulated annealing\nstrategy using the sampler to estimate separately positive and negative parts of the 2-regular LS.\nGiven any \u03b5 > 0, this leads to a \u03b5-approximation polynomial-time scheme for the 2-regular LS under\na mild assumption.\nWe next turn to estimating the full LS. In this part, we ignore the theoretical question of establishing\nthe polynomial mixing time of a MC, and instead focus on designing an empirically ef\ufb01cient MCMC\nscheme. We design an MC using a cycle basis of the graph [16] to sample generalized loops directly,\nwithout rejections. It transits from one generalized loop to another by adding or deleting a random\nelement of the cycle basis. Using the MC sampler, we design a simulated annealing strategy for\nestimating the full LS, which is similar to what was used earlier to estimate the 2-regular LS. Notice\nthat even though the prime focus of this paper is on pairwise binary GMs, the proposed MCMC\nscheme allows straightforward generalization to general non-binary GMs.\nIn summary, we propose novel MCMC schemes to estimate the LS correction to the BP contribution\nto the partition function. Since already the bare BP provides a highly non-trivial estimation for the\npartition function, it is naturally expected and con\ufb01rmed in our experimental results that the proposed\nalgorithm outperforms other standard (not related to BP) MCMC schemes applied to the original\nGM. We believe that our approach provides a new angle for approximate inference on GM and is of\nbroader interest to various applications involving GMs.\n\n2 Preliminaries\n\n2.1 Graphical models and belief propagation\nGiven undirected graph G = (V, E) with |V | = n,|E| = m, a pairwise binary Markov Random\nFields (MRF) de\ufb01nes the following joint probability distribution on x = [xv \u2208 {0, 1} : v \u2208 V ]:\n\n\u03c8v(xv)\n\n\u03c8u,v(xu, xv),\n\nZ :=\n\n\u03c8v(xv)\n\n\u03c8u,v,(xu, xv)\n\n(cid:89)\n\nv\u2208V\n\np(x) =\n\n1\nZ\n\n(cid:89)\n\n(u,v)\u2208E\n\n(cid:88)\n\n(cid:89)\n\n(cid:89)\n\nx\u2208{0,1}n\n\nv\u2208V\n\n(u,v)\u2208E\n\nwhere \u03c8v, \u03c8u,v are some non-negative functions, called compatibility or factor functions, and the\nnormalization constant Z is called the partition function. Without loss of generality, we assume G\nis connected. It is known that approximating the partition function is #P-hard in general [8]. Belief\nPropagation (BP) is a popular message-passing heuristic for approximating marginal distributions of\n\n2\n\n\fMRF. The BP algorithm iterates the following message updates for all (u, v) \u2208 E:\n\nu\u2192v(xv) \u221d (cid:88)\n\nmt+1\n\nxu\u2208{0,1}\n\n(cid:89)\n\nw\u2208N (u)\\v\n\n\u03c8u,v(xu, xv)\u03c8u(xu)\n\nmt\n\nw\u2192u(xu),\n\nwhere N (v) denotes the set of neighbors of v. In general BP may fail to converge, however in this\ncase one may substitute it with a somehow more involved algorithm provably convergent to its \ufb01xed\npoint [22, 23, 24]. Estimates for the marginal probabilities are expressed via the \ufb01xed-point messages\n\n{mu\u2192v : (u, v) \u2208 E} as follows: \u03c4v(xv) \u221d \u03c8v(xv)(cid:81)\n\uf8eb\uf8ed (cid:89)\n\n\u03c4u,v(xu, xv) \u221d \u03c8u(xu)\u03c8v(xv)\u03c8u,v(xu, xv)\n\nu\u2208N (v) mu\u2192v(xv) and\n\n\uf8f6\uf8f8 .\n\nmw\u2192v(xu)\n\nmw\u2192v(xv)\n\nw\u2208N (u)\n\n\uf8f6\uf8f8\uf8eb\uf8ed (cid:89)\n\nw\u2208N (v)\n\nBP marginals also results in the following Bethe approximation for the partition function Z:\n\n2.2 Bethe approximation and loop calculus\n\nlog ZBethe =\n\n(cid:88)\n\u2212(cid:88)\n\nv\u2208V\n\nv\u2208V\n\n(cid:88)\n(cid:88)\n\nxv\n\nxv\n\n\u03c4v(xv) log \u03c8v(xv) +\n\n(cid:88)\n\u03c4v(xv) log \u03c4v(xv) \u2212 (cid:88)\n\n(u,v)\u2208E\n\n(cid:88)\n(cid:88)\n\nxu,xv\n\n(u,v)\u2208E\n\nxu,xv\n\n\u03c4u,v(xu, xv) log \u03c8u,v(xu, xv)\n\n\u03c4u,v(xu, xv) log\n\n\u03c4u,v(xu, xv)\n\u03c4u(xu)\u03c4v(xv)\n\nIf graph G is a tree, the Bethe approximation is exact, i.e., ZBethe = Z. However, in general, i.e. for\nthe graph with cycles, BP algorithm provides often rather accurate but still an approximation.\nLoop Series (LS) [11] expresses, Z/ZBethe, as the following sum/series:\nw(F ), w(\u2205) = 1,\n\n(cid:88)\n\n= ZLoop :=\n\nZ\n\n(cid:89)\n\nZBethe\n\n(cid:18) \u03c4u,v(1, 1)\n\n\u03c4u(1)\u03c4v(1)\n\n(cid:32)\n\n(cid:19) (cid:89)\n\nv\u2208VF\n\n\u2212 1\n\nF\u2208L\n\nw(F ) :=\n\n(u,v)\u2208EF\n\n\u03c4v(1) + (\u22121)dF (v)\n\n(cid:19)dF (v)\u22121\n\n(cid:18) \u03c4v(1)\n\n1 \u2212 \u03c4v(1)\n\n(cid:33)\n\n\u03c4v(1)\n\nwhere each term/weight is associated with the so-called generalized loop F and L denotes the set of\nall generalized loops in graph G (including the empty subgraph \u2205). Here, a subgraph F of G is called\ngeneralized loop if all vertices v \u2208 F have degree dF (v) (in the subgraph) no smaller than 2.\nSince the number of generalized loops is exponentially large, computing ZLoop is intractable in\ngeneral. However, the following truncated sum of ZLoop, called 2-regular loop series, is known to be\ncomputable in polynomial-time if G is planar [12]:1\n\n(cid:88)\n\nZ2-Loop :=\n\nw(F ),\n\nF\u2208L2-Loop\n\nwhere L2-Loop denotes the set of all 2-regular generalized loops, i.e., F \u2208 L2-Loop if dF (v) = 2 for\nevery vertex v of F . One can check that ZLoop = Z2-Loop for the Ising model without the external\n\ufb01elds. Furthermore, as stated in [12, 13] for the general case, Z2-Loop provides a good empirical\nestimation for ZLoop.\n\n3 Estimating 2-regular loop series via MCMC\n\nIn this section, we aim to describe how the 2-regular loop series Z2-Loop can be estimated in\npolynomial-time. To this end, we \ufb01rst assume that the maximum degree \u2206 of the graph G is\nat most 3. This degree constrained assumption is not really restrictive since any pairwise binary\nmodel can be easily expressed as an equivalent one with \u2206 \u2264 3, e.g., see the supplementary material.\n\n1 Note that the number of 2-regular loops is exponentially large in general.\n\n3\n\n\fThe rest of this section consists of two parts. We \ufb01rst propose an algorithm generating a 2-regular\nloop sample with the probability proportional to the absolute value of its weight, i.e.,\n\n\u03c02-Loop(F ) :=\n\n,\n\nwhere Z\n\n\u2020\n2-Loop =\n\n|w(F )|.\n\n|w(F )|\n\u2020\nZ\n2-Loop\n\n(cid:88)\n\nF\u2208L2-Loop\n\nNote that this 2-regular loop contribution allows the following factorization: for any F \u2208 L2-Loop,\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n\n(cid:112)\u03c4u(1)\u03c4v(1)(1 \u2212 \u03c4u(1))(1 \u2212 \u03c4v(1))\n\n\u03c4u,v(1, 1) \u2212 \u03c4u(1)\u03c4v(1)\n\n(1)\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) .\n\n(cid:89)\n\ne\u2208F\n\n|w(F )| =\n\nw(e),\n\nwhere w(e) :=\n\nIn the second part we use the sampler constructed in the \ufb01rst part to design a simulated annealing\nscheme to estimate Z2-Loop.\n3.1 Sampling 2-regular loops\n\nWe suggest to sample the 2-regular loops distributed according to \u03c02-Loop through a version of the\nWorm algorithm proposed by Proko\ufb01ev and Svistunov [15]. It can be viewed as a MC exploring\nthe set, L2-Loop\nvertices. Given current state F \u2208 L2-Loop\n\n(cid:83)L2-Odd, where L2-Odd is the set of all subgraphs of G with exactly two odd-degree\n\n(cid:83)L2-Odd, it chooses the next state F (cid:48) as follows:\n\nodd-degree vertex v (uniformly) from F .\n\n1. If F \u2208 L2-Odd, pick a random vertex v (uniformly) from V . Otherwise, pick a random\n2. Choose a random neighbor u of v (uniformly) within G, and set F (cid:48) \u2190 F initially.\n3. Update F (cid:48) \u2190 F \u2295 {u, v} with the probability\n\n\uf8f1\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f3\n\nmin\n\nmin\n\nmin\n\n(cid:16) 1\n(cid:16) n\n(cid:16) d(v)\n\nn\n\n4\n\n2d(u)\n\n, 1\n\n|w(F\u2295{u,v})|\n|w(F\u2295{u,v})|\n\n|w(F )|\n, 1\n|w(F )|\n|w(F\u2295{u,v})|\n\n|w(F )|\n\n, 1\n\n(cid:17)\n\n(cid:17)\n(cid:17)\n\nif F \u2208 L2-Loop\nelse if F \u2295 {u, v} \u2208 L2-Loop\nelse if F, F \u2295 {u, v} \u2208 L2-Odd\n\nw(F ) =(cid:81)\n\nHere, \u2295 denotes the symmetric difference and for F \u2208 L2-Odd, its weight is de\ufb01ned according to\ne\u2208F w(e). In essence, the Worm algorithm consists in either deleting or adding an edge\nto the current subgraph F . From the Worm algorithm, we transition to the following algorithm which\nsamples 2-regular loops with probability \u03c02-Loop simply by adding rejection of F if F \u2208 L2-Odd.\n\nAlgorithm 1 Sampling 2-regular loops\n1: Input: Number of trials N; number of iterations T of the Worm algorithm\n2: Output: 2-regular loop F .\n3: for i = 1 \u2192 N do\n4:\n5:\n6:\n7:\n8: end for\n9: Output F = \u2205.\n\nSet F \u2190 \u2205 and update it T times by running the Worm algorithm\nif F is a 2-regular loop then\nBREAK and output F .\n\nend if\n\nThe following theorem states that Algorithm 1 can generate a desired random sample in polynomial-\ntime.\nTheorem 1. Given \u03b4 > 0, choose inputs of Algorithm 1 as\n\nN \u2265 1.2 n log(3\u03b4\u22121),\n\nand\n\nT \u2265 (m \u2212 n + 1) log 2 + 4\u2206mn4 log(3n\u03b4\u22121).\n\nThen, it follows that\n\n1\n2\n\n(cid:88)\n\nF\u2208L2-Loop\n\n(cid:20)\n\n(cid:12)(cid:12)(cid:12)(cid:12)P\n\nAlgorithm 1 outputs F\n\n\u2212 \u03c02-Loop(F )\n\n(cid:21)\n\n(cid:12)(cid:12)(cid:12)(cid:12) \u2264 \u03b4.\n\nnamely, the total variation distance between \u03c02-Loop and the output distribution of Algorithm 1 is at\nmost \u03b4.\n\n4\n\n\fThe proof of the above theorem is presented in the supplementary material due to the space constraint.\nIn the proof, we \ufb01rst show that MC induced by the Worm algorithm mixes in polynomial time, and\nthen prove that acceptance of a 2-regular loop, i.e., line 6 of Algorithm 1, occurs with high probability.\nNotice that the uniform-weight version of the former proof, i.e., fast mixing, was recently proven in\n[18]. For completeness of the material exposition, we present the general case proof of interest for\nus. The latter proof, i.e., high acceptance, requires to bound |L2-Loop| and |L2-Odd| to show that the\nprobability of sampling 2-regular loops under the Worm algorithm is 1/poly(n) for some polynomial\nfunction poly(n).\n\n3.2 Simulated annealing for approximating 2-regular loop series\n\nHere we utilize Theorem 1 to describe an algorithm approximating the 2-regular LS Z2-Loop in\npolynomial time. To achieve this goal, we rely on the simulated annealing strategy [19] which\nrequires to decide a monotone cooling schedule \u03b20, \u03b21, . . . , \u03b2(cid:96)\u22121, \u03b2(cid:96), where \u03b2(cid:96) corresponds to the\ntarget counting problem and \u03b20 does to its relaxed easy version. Thus, designing an appropriate\ncooling strategy is the \ufb01rst challenge to address. We will also describe how to deal with the issue\nthat Z2-Loop is a sum of positive and negative terms, while most simulated annealing strategies in\nthe literature mainly studied on sums of non-negative terms. This second challenge is related to the\nso-called \u2018fermion sign problem\u2019 common in statistical mechanics of quantum systems [25]. Before\nwe describe the proposed algorithm in details, let us provide its intuitive sketch.\n\n\u2020\n2-Loop via a simulated annealing strategy\nThe proposed algorithm consists of two parts: a) estimating Z\n\u2020\nand b) estimating Z2-Loop/Z\n2-Loop via counting samples corresponding to negative terms in the 2-\nregular loop series. First consider the following \u03b2-parametrized, auxiliary distribution over 2-regular\nloops:\n\n\u03c02-Loop(F : \u03b2) =\n\n1\n\n\u2020\n2-Loop(\u03b2)\n\nZ\n\n|w(F )|\u03b2,\n\nfor 0 \u2264 \u03b2 \u2264 1.\n\n(2)\n\nNote that one can generate samples approximately with probability (2) in polynomial-time using\nAlgorithm 1 by setting w \u2190 w\u03b2. Indeed, it follows that for \u03b2(cid:48) > \u03b2,\n\nZ\n\n\u2020\n2-Loop(\u03b2(cid:48))\n\u2020\nZ\n2-Loop(\u03b2)\n\n=\n\n|w(F )|\u03b2(cid:48)\u2212\u03b2 |w(F )|\u03b2\n\u2020\n2-Loop(\u03b2)\n\nZ\n\n= E\u03c02-Loop(\u03b2)\n\n(cid:88)\n\nF\u2208L2-Loop\n\n(cid:104)|w(F )|\u03b2(cid:48)\u2212\u03b2(cid:105)\n\n,\n\nwhere the expectation can be estimated using O(1) samples if it is \u0398(1), i.e., \u03b2(cid:48) is suf\ufb01ciently close\nto \u03b2. Then, for any increasing sequence \u03b20 = 0, \u03b21, . . . , \u03b2n\u22121, \u03b2n = 1, we derive\n\n\u2020\n2-Loop =\n\nZ\n\nZ\n\n\u00b7 Z\nZ\n\n\u2020\n2-Loop(\u03b2n\u22121)\n\u2020\n2-Loop(\u03b2n\u22122)\n\n\u2020\nZ\n2-Loop(\u03b2n)\n\u2020\n2-Loop(\u03b2n\u22121)\n\u2020\n2-Loop(0), i.e., the total number of 2-regular loops, is exactly 2m\u2212n+1 [16].\n\n\u2020\n2-Loop(\u03b22)\n\u2020\n2-Loop(\u03b21)\n\n\u2020\n2-Loop(\u03b21)\n\u2020\n2-Loop(\u03b20)\n\n\u2020\n2-Loop(0),\n\n\u00b7\u00b7\u00b7 Z\nZ\n\n(cid:2)|w(F )|\u03b2i+1\u2212\u03b2i(cid:3) for all i.\n\n\u2020\n2-Loop simply by estimating E\u03c02-Loop(\u03b2i)\n\nZ\n\nZ\n\nZ\n\n\u2020\n2-Loop. Let L\u2212\n\n2-Loop denote the set of negative 2-regular\n\nwhere it is know that Z\nThis allows us to estimate Z\n\nOur next step is to estimate the ratio Z2-Loop/Z\nloops, i.e.,\n\nThen, the 2-regular loop series can be expressed as\n\n(cid:32)\n\nZ2-Loop =\n\n1 \u2212 2\n\n(cid:33)\n\n2-Loop := {F : F \u2208 L2-Loop, w(F ) < 0}.\nL\u2212\n(cid:20)\n\n(cid:80)\n(cid:21)(cid:19)\n(cid:2)w(F ) < 0(cid:3) again using samples generated by Algorithm 1.\n\nF\u2208L\u2212\n\u2020\n2-Loop\n\n1 \u2212 2P\u03c02\u2013Loop\n\n\u2020\n2-Loop =\n\n|w(F )|\n\nw(F ) < 0\n\n2-Loop\nZ\n\n(cid:18)\n\nZ\n\n\u2020\n2-Loop,\n\nZ\n\nwhere we estimate P\u03c02\u2013Loop\nWe provide the formal description of the proposed algorithm and its error bound as follows.\n\n5\n\n\fAlgorithm 2 Approximation for Z2-Loop\n1: Input: Increasing sequence \u03b20 = 0 < \u03b21 < \u00b7\u00b7\u00b7 < \u03b2n\u22121 < \u03b2n = 1; number of samples s1, s2;\n2: for i = 0 \u2192 n \u2212 1 do\n3:\n\nGenerate 2-regular loops F1, . . . , Fs1 for \u03c02-Loop(\u03b2i) using Algorithm 1 with input N1 and\n\nnumber of trials N1; number of iterations T1 for Algorithm 1.\n\nT1, and set\n\n(cid:88)\n\nj\n\nHi \u2190 1\ns1\n\nw(Fj)\u03b2i+1\u2212\u03b2i.\n\n4: end for\n5: Generate 2-regular loops F1, . . . , Fs2 for \u03c02-Loop using Algorithm 1 with input N2 and T2, and\n\nset\n\n6: Output: (cid:98)Z2-Loop \u2190 (1 \u2212 2\u03ba)2m\u2212n+1(cid:81)\n\ni Hi.\n\n\u03ba \u2190 |{Fj : w(Fj) < 0}|\n\n.\n\ns2\n\nmin(cid:100)log(6n\u03bd\u22121)(cid:101),\n\nTheorem 2. Given \u03b5, \u03bd > 0, choose inputs of Algorithm 2 as \u03b2i = i/n for i = 1, 2, . . . , n \u2212 1,\ns1 \u2265 18144n2\u03b5\u22122w\u22121\nmin),\nT1 \u2265 (m \u2212 n + 1) log 2 + 4\u2206mn4 log(48n\u03b5\u22121w\u22121\ns2 \u2265 18144\u03b6(1 \u2212 2\u03b6)\u22122\u03b5\u22122(cid:100)log(3\u03bd\u22121)(cid:101),\nT2 \u2265 (m \u2212 n + 1) log 2 + 4\u2206mn4 log(48\u03b5\u22121(1 \u2212 2\u03b6)\u22121)\nwhere wmin = mine\u2208E w(e) and \u03b6 = P\u03c02\u2013Loop[w(F ) < 0]. Then, the following statement holds\n\nN1 \u2265 1.2n log(144n\u03b5\u22121w\u22121\n\nN2 \u2265 1.2n log(144\u03b5\u22121(1 \u2212 2\u03b6)\u22121),\n\nmin),\n\n(cid:34)|(cid:98)Z2-Loop \u2212 Z2-Loop|\n\nZ2-Loop\n\nP\n\n(cid:35)\n\n\u2264 \u03b5\n\n\u2264 1 \u2212 \u03bd,\n\nwhich means Algorithm 2 estimates Z2-Loop within approximation ratio 1 \u00b1 \u03b5 with high probability.\nThe proof of the above theorem is presented in the supplementary material due to the space constraint.\nWe note that all constants entering in Theorem 2 were not optimized. Theorem 2 implies that\ncomplexity of Algorithm 2 is polynomial with respect to n, 1/\u03b5, 1/\u03bd under assumption that w\u22121\nmin and\n1 \u2212 2P\u03c02\u2013Loop[w(F ) < 0] are polynomially small. Both w\u22121\nmin and 1 \u2212 2P\u03c02\u2013Loop[w(F ) < 0] depend on\nthe choice of BP \ufb01xed point, however it is unlikely (unless a degeneracy) that these characteristics\nbecome large. In particular, P\u03c02\u2013Loop[w(F ) < 0] = 0 in the case of attractive models [20].\n\n4 Estimating full loop series via MCMC\n\nIn this section, we aim for estimating the full loop series ZLoop. To this end, we design a novel MC\nsampler for generalized loops, which adds (or removes) a cycle basis or a path to (or from) the current\ngeneralized loop. Therefore, we naturally start this section introducing necessary backgrounds on\ncycle basis. Then, we turn to describe the design of MC sampler for generalized loops. Finally, we\ndescribe a simulated annealing scheme similar to the one described in the preceding section. We also\nreport its experimental performance comparing with other methods.\n\n4.1 Sampling generalized loops with cycle basis\nThe cycle basis C of the graph G is a minimal set of cycles which allows to represent every Eulerian\nsubgraph of G (i.e., subgraphs containing no odd-degree vertex) as a symmetric difference of cycles\nin the set [16]. Let us characterize the combinatorial structure of the generalized loop using the cycle\nbasis. To this end, consider a set of paths between any pair of vertices:\n\ni.e., |P| =(cid:0)n\n\nP = {Pu,v : u (cid:54)= v, u, v \u2208 V, Pu,v is a path from u to v},\n\n(cid:1). Then the following theorem allows to decompose any generalized loop with respect\n\nto any selected C and P.\n\n2\n\n6\n\n\fTheorem 3. Consider any cycle basis C and path set P. Then, for any generalized loop F , there\nexists a decomposition, B \u2282 C \u222a P, such that F can be expressed as a symmetric difference of the\nelements of B, i.e., F = B1 \u2295 B2 \u2295 \u00b7\u00b7\u00b7 Bk\u22121 \u2295 Bk for some Bi \u2208 B.\nThe proof of the above theorem is given in the supplementary material due to the space constraint.\nNow given any choice of C,P, consider the following transition from F \u2208 L, to the next state F (cid:48):\n\n1. Choose, uniformly at random, an element B \u2208 C \u222a P, and set F (cid:48) \u2190 F initially.\n2. If F \u2295 B \u2208 L, update F (cid:48) \u2190\n\nF \u2295 B with probability min\nF\n\n|w(F\u2295B|)|\n\notherwise\n\n|w(F )|\n\n1,\n\n.\n\n(cid:110)\n\n(cid:111)\n\nDue to Theorem 3, it is easy to check that the proposed MC is irreducible and aperiodic, i.e., ergodic,\nand the distribution of its t-th state converges to the following stationary distribution as t \u2192 \u221e:\n\n\u03c0Loop(F ) =\n\n,\n\nwhere Z\n\n\u2020\nLoop =\n\n|w(F )|.\n\n(cid:40)\n\n|w(F )|\nZ\n\n\u2020\nLoop\n\n(cid:88)\n\nF\u2208LLoop\n\nOne also has a freedom in choosing C,P. To accelerate mixing of MC, we suggest to choose the\nminimum weighted cycle basis C and the shortest paths P with respect to the edge weights {log w(e)}\nde\ufb01ned in (1), which are computable using the algorithm in [16] and the Bellman-Ford algorithm\n[21], respectively. This encourages transitions between generalized loops with similar weights.\n\n4.2 Simulated annealing for approximating full loop series\n\nAlgorithm 3 Approximation for ZLoop\n1: Input: Decreasing sequence \u03b20 > \u03b21 > \u00b7\u00b7\u00b7 > \u03b2(cid:96)\u22121 > \u03b2(cid:96) = 1; number of samples s0, s1, s2;\n2: Generate generalized loops F1,\u00b7\u00b7\u00b7 , Fs0 by running T0 iterations of the MC described in Section\n\nnumber of iterations T0, T1, T2 for the MC described in Section 4.1\n\n4.1 for \u03c0Loop(\u03b20), and set\n\nwhere F \u2217 = arg maxF\u2208{F1,\u00b7\u00b7\u00b7 ,Fs0} |w(F )| and s\u2217 is the number of F \u2217 sampled.\n\n3: for i = 0 \u2192 (cid:96) \u2212 1 do\n4:\n\nSection 4.1 for \u03c0Loop(\u03b2i), and set Hi \u2190 1\n\ns1\n\nGenerate generalized loops F1,\u00b7\u00b7\u00b7 , Fs1 by running T1 iterations of the MC described in\n\n5: end for\n6: Generate generalized loops F1,\u00b7\u00b7\u00b7 Fs2 by running T2 iterations of the MC described in Section\n\nU \u2190 s0\n\ns\u2217|w(F \u2217)|\u03b20,\n(cid:80)\nj |w(Fj)|\u03b2i+1\u2212\u03b2i.\n\n4.1 for \u03c0Loop, and set\n\n7: Output: (cid:98)ZLoop \u2190 (1 \u2212 2\u03ba)(cid:81)\n\ni HiU.\n\n\u03ba \u2190 |{Fj : w(Fj) < 0}|\n\ns2\n\n.\n\nZ\n\n\u2020\nLoop =\n\n\u2020\nLoop(\u03b2(cid:96))\nZ\n\u2020\nLoop(\u03b2(cid:96)\u22121)\n\nNow we are ready to describe a simulated annealing scheme for estimating ZLoop. It is similar,\nin principle, with that in Section 3.2. First, we again introduce the following \u03b2-parametrized,\n\u2020\nauxiliary probability distribution \u03c0Loop(F : \u03b2) = |w(F )|\u03b2/Z\nLoop(\u03b2). For any decreasing sequence\nof annealing parameters, \u03b20, \u03b21,\u00b7\u00b7\u00b7 , \u03b2(cid:96)\u22121, \u03b2(cid:96) = 1, we derive\n\u2020\nLoop(\u03b22)\n\u2020\nLoop(\u03b21)\n\n\u2020\nLoop(\u03b21)\n\u2020\nLoop(\u03b20)\n\u2020\n\u2020\nLoop(\u03b2(cid:48))/Z\nFollowing similar procedures in Section 3.2, one can estimate Z\nLoop(\u03b2) =\n\u2020\nE\u03c0Loop(\u03b2)[|w(F )|\u03b2(cid:48)\u2212\u03b2] using the sampler described in Section 4.1. Moreover, Z\nLoop(\u03b20) =\n|w(F \u2217)|/P\u03c0Loop(\u03b20)(F \u2217) is estimated by sampling generalized loop F \u2217 with the highest probabil-\nity P\u03c0Loop(\u03b20)(F \u2217). For large enough \u03b20, the approximation error becomes relatively small since\nP\u03c0Loop(\u03b20)(F \u2217) \u221d |w(F \u2217)|\u03b20 dominates over the distribution. In combination, this provides a desired\napproximation for ZLoop. The result is stated formally in Algorithm 3.\n\n\u2020\nLoop(\u03b2(cid:96)\u22121)\n\u2020\nLoop(\u03b2(cid:96)\u22122)\n\n\u2020\nLoop(\u03b20).\n\n\u00b7\u00b7\u00b7 Z\nZ\n\n\u00b7 Z\nZ\n\n\u00b7 Z\nZ\n\nZ\n\nZ\n\n7\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 1: Plots of the log-partition function approximation error with respect to (average) interaction\nstrength: (a) Ising model with no external \ufb01eld, (b) Ising model with external \ufb01elds and (c) Hard-core\nmodel. Each point is averaged over 20 (random) models.\n\n4.3 Experimental results\n\nIn this section, we report experimental results for computing partition function of the Ising model\nand the hard-core model. We compare Algorithm 2 in Section 3 (coined MCMC-BP-2reg) and\nAlgorithm 3 in Section 4.2 (coined MCMC-BP-whole), with the bare Bethe approximation (coined\nBP) and the popular Gibbs-sampler (coined MCMC-Gibbs). To make the comparison fair, we use the\nsame annealing scheme for all MCMC schemes, thus making their running times comparable. More\nspeci\ufb01cally, we generate each sample after running T1 = 1, 000 iterations of an MC and take s1 = 100\nsamples to compute each estimation (e.g., Hi) at intermediate steps. For performance measure, we\nuse the log-partition function approximation error de\ufb01ned as | log Z \u2212 log Zapprox|/| log Z|, where\nZapprox is the output of the respective algorithm. We conducted 3 experiments on the 4 \u00d7 4 grid\ngraph. In our \ufb01rst experimental setting, we consider the Ising model with varying interaction strength\nand no external (magnetic) \ufb01eld. To prepare the model of interest, we start from the Ising model\nwith uniform (ferromagnetic/attractive and anti-ferromagnetic/repulsive) interaction strength and\nthen add \u2018glassy\u2019 variability in the interaction strength modeled via i.i.d Gaussian random variables\nwith mean 0 and variance 0.52, i.e. N (0, 0.52). In other words, given average interaction strength\n0.3, each interaction strength in the model is independently chosen as N (0.3, 0.52). The second\nexperiment was conducted by adding N (0, 0.52) corrections to the external \ufb01elds under the same\ncondition as in the \ufb01rst experiment. In this case we observe that BP often fails to converge, and use\nthe Concave Convex Procedure (CCCP) [23] for \ufb01nding BP \ufb01xed points. Finally, we experiment with\nthe hard-core model on the 4\u00d7 4 grid graph with varying a positive parameter \u03bb > 0, called \u2018fugacity\u2019\n[26]. As seen clearly in Figure 1, BP and MCMC-Gibbs are outperformed by MCMC-BP-2reg or\nMCMC-BP-whole at most tested regimes in the \ufb01rst experiment with no external \ufb01eld, where in this\ncase, the 2-regular loop series (LS) is equal to the full one. Even in the regimes where MCMC-Gibbs\noutperforms BP, our schemes correct the error of BP and performs at least as good as MCMC-Gibbs.\nIn the experiments, we observe that advantage of our schemes over BP is more pronounced when the\nerror of BP is large. A theoretical reasoning behind this observation is as follows. If the performance\nof BP is good, i.e. the loop series (LS) is close to 1, the contribution of empty generalized loop, i.e.,\nw(\u2205), in LS is signi\ufb01cant, and it becomes harder to sample other generalized loops accurately.\n\n5 Conclusion\n\nIn this paper, we propose new MCMC schemes for approximate inference in GMs. The main novelty\nof our approach is in designing BP-aware MCs utilizing the non-trivial BP solutions. In experiments,\nour BP based MCMC scheme also outperforms other alternatives. We anticipate that this new\ntechnique will be of interest to many applications where GMs are used for statistical reasoning.\n\nAcknowledgement\n\nThis work was supported by the National Research Council of Science & Technology (NST) grant by\nthe Korea government (MSIP) (No. CRC-15-05-ETRI), and funding from the U.S. Department of\nEnergy\u2019s Of\ufb01ce of Electricity as part of the DOE Grid Modernization Initiative.\n\n8\n\n\fReferences\n[1] J. Pearl, \u201cProbabilistic reasoning in intelligent systems: networks of plausible inference,\u201d Morgan Kaufmann,\n\n2014.\n\n[2] R. G. Gallager, \u201cLow-density parity-check codes,\u201d Information Theory, IRE Transactions 8(1): 21-28, 1962.\n[3] R. F. Kschischang, and J. F. Brendan, \u201cIterative decoding of compound codes by probability propagation in\n\ngraphical models,\u201d Selected Areas in Communications, IEEE Journal 16(2): 219-230, 1998.\n\n[4] M. I. Jordan, ed. \u201cLearning in graphical models,\u201d Springer Science & Business Media 89, 1998.\n[5] R.J. Baxter, \u201cExactly solved models in statistical mechanics,\u201d Courier Corporation, 2007.\n[6] W.T. Freeman, C.P. Egon, and T.C. Owen, \u201cLearning low-level vision.\u201d International journal of computer\n\nvision 40(1): 25-47, 2000.\n\n[7] V. Chandrasekaran, S. Nathan, and H. Prahladh, \u201cComplexity of Inference in Graphical Models,\u201d Association\n\nfor Uncertainty and Arti\ufb01cial Intelligence, 2008\n\n[8] M. Jerrum, and A. Sinclair, \u201cPolynomial-time approximation algorithms for the Ising model,\u201d SIAM Journal\n\non computing 22(5): 1087-1116, 1993.\n\n[9] C. Andrieu, N. Freitas, A. Doucet, and M. I. Jordan, \u201cAn introduction to MCMC for machine learning,\u201d\n\nMachine learning 50(1-2), 5-43, 2003.\n\n[10] J. Pearl, \u201cReverend Bayes on inference engines: A distributed hierarchical approach,\u201d Association for the\n\nAdvancement of Arti\ufb01cial Intelligence, 1982.\n\n[11] M. Chertkov, and V. Y. Chernyak, \u201cLoop series for discrete statistical models on graphs,\u201d Journal of\n\nStatistical Mechanics: Theory and Experiment 2006(6): P06009, 2006.\n\n[12] M. Chertkov, V. Y. Chernyak, and R. Teodorescu, \u201cBelief propagation and loop series on planar graphs,\u201d\n\nJournal of Statistical Mechanics: Theory and Experiment 2008(5): P05003, 2008.\n\n[13] V. Gomez, J. K. Hilbert, and M. Chertkov, \u201cApproximate inference on planar graphs using Loop Calculus\n\nand Belief Propagation,\u201d The Journal of Machine Learning Research, 11: 1273-1296, 2010.\n\n[14] P. W. Kasteleyn, \u201cThe statistics of dimers on a lattice,\u201d Classic Papers in Combinatorics. Birkh\u00e4user\n\nBoston, 281-298, 2009.\n\n[15] N. Prokof\u2019ev, and B. Svistunov, \u201cWorm algorithms for classical statistical models,\u201d Physical review letters\n\n87(16): 160601, 2001.\n\n[16] J.D. Horton, \u201cA polynomial-time algorithm to \ufb01nd the shortest cycle basis of a graph.\u201d SIAM Journal on\n\nComputing 16(2): 358-366, 1987. APA\n\n[17] H. A. Kramers, and G. H. Wannier, \u201cStatistics of the two-dimensional ferromagnet. Part II,\u201d Physical\n\nReview 60(3): 263, 1941.\n\n[18] A. Collevecchio, T. M. Garoni, T.Hyndman, and D. Tokarev, \u201cThe worm process for the Ising model is\n\nrapidly mixing,\u201d arXiv preprint arXiv:1509.03201, 2015.\n\n[19] S. Kirkpatrick, \u201cOptimization by simulated annealing: Quantitative studies.\u201d Journal of statistical physics\n\n34(5-6): 975-986, 1984.\n\n[20] R. Nicholas, \u201cThe Bethe partition function of log-supermodular graphical models,\u201d Advances in Neural\n\nInformation Processing Systems. 2012.\n\n[21] J. Bang, J., and G. Z. Gutin. \u201cDigraphs: theory, algorithms and applications.\u201d Springer Science & Business\n\nMedia, 2008.\n\n[22] Y. W. Teh and M. Welling, \u201cBelief optimization for binary networks: a stable alternative to loopy belief\npropagation,\u201d Proceedings of the Eighteenth conference on Uncertainty in arti\ufb01cial intelligence, 493-500,\n2001.\n\n[23] A. L. Yuille, \u201cCCCP algorithms to minimize the Bethe and Kikuchi free energies: Convergent alternatives\n\nto belief propagation,\u201d Neural Computation, 14(7): 1691-1722, 2002.\n\n[24] J. Shin, \u201cThe complexity of approximating a Bethe equilibrium,\u201d Information Theory, IEEE Transactions\n\non, 60(7): 3959-3969, 2014.\n\n[25] https://www.quora.com/Statistical-Mechanics-What-is-the-fermion-sign-problem\n[26] Dyer, M., Frieze, A., and Jerrum, M. \u201cOn counting independent sets in sparse graphs,\u201d SIAM Journal on\n\nComputing 31(5): 1527-1541, 2002.\n\n[27] J. Schweinsberg, \u201cAn O(n2) bound for the relaxation time of a Markov chain on cladograms.\u201d Random\n\nStructures & Algorithms 20(1): 59-70, 2002.\n\n9\n\n\f", "award": [], "sourceid": 819, "authors": [{"given_name": "Sung-Soo", "family_name": "Ahn", "institution": "KAIST"}, {"given_name": "Michael", "family_name": "Chertkov", "institution": "Los Alamos National Laboratory"}, {"given_name": "Jinwoo", "family_name": "Shin", "institution": "KAIST"}]}