{"title": "Fractional Belief Propagation", "book": "Advances in Neural Information Processing Systems", "page_first": 438, "page_last": 445, "abstract": null, "full_text": "Fractional Belief Propagation\n\nWim Wiegerinck and Tom Heskes\n\nSNN, University of Nijmegen\n\nGeert Grooteplein 21, 6525 EZ, Nijmegen, the Netherlands\n\n wimw,tom\n\n@snn.kun.nl\n\nAbstract\n\nWe consider loopy belief propagation for approximate inference in prob-\nabilistic graphical models. A limitation of the standard algorithm is that\nclique marginals are computed as if there were no loops in the graph.\nTo overcome this limitation, we introduce fractional belief propagation.\nFractional belief propagation is formulated in terms of a family of ap-\nproximate free energies, which includes the Bethe free energy and the\nnaive mean-\ufb01eld free as special cases. Using the linear response correc-\ntion of the clique marginals, the scale parameters can be tuned. Simula-\ntion results illustrate the potential merits of the approach.\n\n1 Introduction\n\nProbabilistic graphical models are powerful tools for learning and reasoning in domains\nwith uncertainty. Unfortunately, inference in large, complex graphical models is computa-\ntionally intractable. Therefore, approximate inference methods are needed. Basically, one\ncan distinguish between to types of methods, stochastic sampling methods and determinis-\ntic methods. One of methods in the latter class is Pearl\u2019s loopy belief propagation [1]. This\nmethod is increasingly gaining interest since its successful applications to turbo-codes. Un-\ntil recently, a disadvantage of the method was its heuristic character, and the absence of a\nconverge guarantee. Often, the algorithm gives good solutions, but sometimes the algo-\nrithm fails to converge. However, Yedidia et al. [2] showed that the \ufb01xed points of loopy\nbelief propagation are actually stationary points of the Bethe free energy from statistical\nphysics. This does not only give the algorithm a \ufb01rm theoretical basis, but it also solves\nthe convergence problem by the existence of an objective function which can be minimized\ndirectly [3]. Belief propagation is generalized in several directions. Minka\u2019s expectation\npropagation [4] is a generalization that makes the method applicable to Bayesian learning.\nYedidia et al. [2] introduced the Kikuchi free energy in the graphical models community,\nwhich can be considered as a higher order truncation of a systematic expansion of the ex-\nact free energy using larger clusters. They also developed an associated generalized belief\npropagation algorithm. In this paper, we propose another direction which yields possibili-\nties to improve upon loopy belief propagation, without resorting to larger clusters.\n\nThis paper is organized as follows. In section 2 we de\ufb01ne the inference problem. In sec-\ntion 3 we shortly review approximate inference by loopy belief propagation and discuss an\ninherent limitation of this method. This motivates us to generalize upon loopy belief prop-\nagation. We do so by formulating a new class of approximate free energies in section 4. In\n\n\u0001\n\fsection 5 we consider the \ufb01xed point equations and formulate the fractional belief propa-\ngation algorithm. In section 6 we will use linear response estimates to tune the parameters\nin the method. Simulation results are presented in section 7. In section 8 we end with the\nconclusion.\n\n2 Inference in graphical models\n\nof clique potentials\n\nOur starting point is a probabilistic model \nin a \ufb01nite domain. The joint distribution \u0011\u0010\u0012\u0001\u0014\u0013\n\u0011\u0010\u0012\u0001\u000f\u0013\u0016\u0015\u0018\u0017\u001a\u0019\u0018\u001b\n\n\u0010\u001c\u0001\n\n\u0013\u001d\u0007\n\non a set of discrete variables\u0001\u0003\u0002\u0004\u0001\u0006\u0005\b\u0007\n\t\u000b\t\n\t\f\u0007\r\u0001\u000f\u000e\n\nis assumed to be proportional to a product\n\n(1)\n\n02143\n\n\u0011\u0010\u001c\u0001\u000f\u0013&\u0015('*),+\u0006\u0010\r-\n \u0012/\n0C6\n\u0013A\u0002B'\n),+\u0006\u0010\n\nrefers to a subset of the\u001f nodes in the model. A typical example that we will\n\nwhere each\u001e\nconsider later in the paper is the Boltzmann machine with binary units (\u0001\f !\u0002#\"%$ ),\nwhere the sum is over connected pairs \u0010\u001c;<\u0007>=?\u0013 . The right hand side can be viewed as product\nof potentials \u001b\nedges that contain node; . The typical task that we try to perform is to compute the marginal\nsingle node distributions\u0011\u0010\u0012\u0001\u0014 L\u0013 . Basically, the computation requires the summation over all\nremaining variables\u0001\u0014M4\u0001\u000f . In small networks, this summation can be performed explicitly.\n\u0019 are small.\nmoralized graph [5]. This may lead to intractable models, even if the clusters \u0001\n\nIn large networks, the complexity of computation depends on the underlying graphical\nstructure of the model, and is exponential in the maximal clique size of the triangulated\n\nWhen the model is intractable, one has to resort to approximate methods.\n\n-\b8:9\nFJI\nD\n\n\u00132\u0013 , where K\n\n076\n \u00066\n\n 50\nF!GHD\n\nis the set of\n\n\u0013\u0014\u0007\n\n(2)\n\n.ED\n\n\u0010\u001c\u0001\n\n 50\n\n 50\n\n @0\n\n3 Loopy belief propagation in Boltzmann machines\n\nA nowadays popular approximate method is loopy belief propagation. In this section, we\nwill shortly review of this method. Next we will discuss one of its inherent limitations,\nwhich motivates us to propose a possible way to overcome this limitation. For simplicity,\nwe restrict this section to Boltzmann machines.\n\n @0\n\nalgorithm for trees to loopy graphs, i.e., it computes messages according to\n\n\u0013 of connected nodes. Loopy belief propaga-\nThe goal is to compute pair marginals \u0011\u0010\u001c\u0001\n 50\n\u0013 by applying the belief propagation\ntion computes approximating pair marginals N\n 50\n\u0010\u0012\u0001\n @0\n 50\n'\n),+\u0006\u0010\n\u0013S\u0015T-\u0012U\n\u0013\u000fV\nPA \n\u0013S\u0002('\n),+Y\u0010\u00129\n8[Z\n\n @0 are the incoming messages to node; except from node= ,\n\nIf the procedure converges (which is not guaranteed in loopy graphs), the resulting approx-\nimating pair marginals are\n\nin which V\n\n QPR0\n 50\n\u0010\u001c\u0001\n\n\u0013X\u0007\n\u0013X\t\n\n\u0010\u001c\u0001\n\u0010\u001c\u0001\n\nGL\\\n\n\u0010\u001c\u0001\n\n(4)\n\n(3)\n\nIn general, the exact pair marginals will be of the form\n\n @0\n\n 50\n\n\u0010\u001c\u0001\n\u0010\u0012\u0001\n\n 50\n\n 50\n\n\u0013&\u0015]'\n)?+J\u0010\n\u0013S\u0015_'*),+Y\u0010\n\n @0\n`>a\n 50\n\n\u0013\u000fV\n\n 50\n 50\n\n\u0010\u001c\u0001\n\u0010\u0012\u0001\n\n\u0013^V\n\n02 \n02 \n\n\u0010\u0012\u0001\n\u0010\u001c\u0001\n\n\u00137\t\n\u0013X\u0007\n\n(5)\n\n(6)\n\n\u0019\n\u0019\n.\n\u0001\n \n\u0001\n8\n\u0001\n8\n3\n\u0001\n \n\u0001\n\u0005\n1\n9\n \n\u0001\n\u0005\nD\n9\n0\n\u0001\n0\n \nO\n0\nG\n3\n\u0001\n \n\u0001\n0\nW\n \nW\nV\nW\n \n \n\u0001\n \n\u0013\n\u0017\nF\n0\nO\n8\n \nN\n3\n\u0001\n \n\u0001\n0\nW\n \nW\n0\n\n3\n\u0001\n \n\u0001\n0\n\u0013\nW\n \n\u0013\nW\n0\n\f. In the case of a tree,\n\nwhich has an effective interaction\ngraph, however, the loops will contribute to\nfrom\n\n`>a\n 50\n @0 , ignoring contributions from loops.\n\n 50 . If we compare (6) with (5), we see that loopy belief propagation assumes\n\nNow suppose we would know\npected if we could model approximate pair marginals of the form\n\nin advance, then a better approximation could be ex-\n\n @0 . With loops in the\n`>a\n 50\n\n, and the result will in general be different\n\n`>a\n 50\n\n`>a\n @0\n\nwhere \u0001\nIn the next sections, we generalize upon the above idea and introduce fractional belief\npropagation as a family of loopy belief propagation-like algorithms parameterized by scale\n\n\u0010\u0012\u0001\u000f @04\u0013&\u0015\n @0 are to be determined by some propagation algorithm.\n\n 50\n. The V\n\n02 <\u0010\u001c\u0001,0\n\n'*),+Y\u0010\n\n\u0010\u0012\u0001\u000f >\u0013\n\n\u0001\u000f \u001c\u0001\n\n\u00137\u0007\n\n0[\u0013\n\n`>a\n @0\n\n 50\n\n @0\n\n(7)\n\n. The resulting approximating clique marginals will be of the form\n\n 50\n 50\n\n`>a\n @0\n\n 50\nparameters \u0003\n\n\u0010\u0012\u0001\n\n\u0013&\u0015_\u001b\nis the set of nodes in clique\u001e\n\nwhereK\n\nof section 6.\n\n\u0005\u0005\u0004\u0007\u0006\t\b\n\n\u0010\u0012\u0001\n. The issue of how to set the parameters \u0003\n\n <\u0010\u0012\u0001\u000f L\u0013X\u0007\n\n(8)\n\nis subject\n\n4 A family of approximate free energies\n\n.\n\n\u0010\u001c\u0001\n\nis\n\n(9)\n\n\b\u000f\u000e\n\n-[U\n\n-4U\n\n(10)\n\n\u0014\u001a\u0016\u001b\u0018\n\n\f\u000b\u0005\n\n\f\u000b&\n\nA\u0013\n\nenergy\n\n\u0011\u0010\u0012\u0001\u0014\u0013\n\n\u0011\u0010\u0012\u0001\u0014\u0013\n\n\u0011\u0010\u001c\u0001\u000f\u0013C\u0002\n\n\u0010EA\u0013\u0011\u0010\u0013\u0012\n\nis approximately tree-like,\n\nis interpreted as an approximation\n\ncan be recovered by minimization of the free\n\nThe new class of approximating methods will be formulated via a new class of approximat-\ning free energies. The exact free energy of a model with clique potentials \n\u0011\u0010\u001c\u0001\u000f\u0013X\t\n\n\u0019\u0015\u0014\u0017\u0016\u0019\u0018\nIt is well known that the joint distribution \n\u0002\u001d\u001c\u001f\u001e\n\u0018\u001b \"!\u0017#\n$ . The idea is now to construct an approximate free en-\nNA\u0013 and compute its minimumN\n\nunder the constraint '\nergy \n)(+*,*.-0/\t1\nof\nA popular approximate free energy is based on the Bethe assumption, which basically states\nthatN\nin whichK\nof the model is a tree. Substitution of the tree-assumption into the free energy leads to the\nwell-known Bethe free energy\n`6587\n`\n\n)4\n\u000b9\n\nN\u0011\u0010\u001c\u0001\u000f\u0013&\u0002\nthat contain; . This assumption is exact if the factor graph [6]\n\n are the cliques\u001e\n\u00014\u0013S\u0002\u0013\u0012\n\n\b\u0019\u000e\n. ThenN\n\n\u0010\u001c\u0001\n\u0010H$:\u0012<;\n$ and the marginalization constraints '\n\nwhich is to be minimized under normalization constraints '\n\u0010\u0012\u0001\n\n\u0010\u0012\u0001\nN\u0011\u0010\u0012\u0001\n\nN\u0011\u0010\u0012\u0001\u000f L\u0013X\u0007\n\u0013_\u0002\n\u0013 for;@?\u0003K\n\n 2\u0010\u0012\u0001\u000f L\u0013\n\u0013S\u0002\u0018N\n\n\u0014\u001a\u0016\u001b\u0018\n\u0010\u0012\u0001\n\n\u0010\u001c\u0001\n\n <\u0010\u0012\u0001^ >\u0013\n\nand\n.\n\n\u0010\u0012\u0001\n\n\u0013&\u0002\n\n \u0007;\n\n\b>=\n\n\u0010\u0012\u0001\n\n\u0014\u001a\u0016\u001b\u0018\n\n\u0014\u0017\u0016\u0019\u0018\n\n(12)\n\n(11)\n\n\u000532\n\n\u0010\u001c\u0001\n\n3\n3\n\u0002\n3\n3\n3\n3\n\u0002\n3\n3\nN\n3\n\u0001\nV\nW\nV\nW\n\u0002\n3\n\u0002\n3\nW\n\u0001\n\u0002\n\n\u0001\n\u0019\n\u0001\nN\n\u0019\n\u0019\n\u0019\n\u0019\n\u0013\n\u0017\n \nZ\nF\n\b\nV\nW\n\u0019\n\u0001\n\u001b\n\u0019\n\u0001\n-\n\u001b\n\u0019\n\u0019\n\u0013\n6\n\n$\n%\n\u0010\nV\nU\n\u0010\nV\n\u0017\n\u0019\nN\n\u0019\n\u0019\n\u0013\n\u0017\n \nN\nD\nF\nG\nD\n\u0007\n\b\n\u000e\n\u0010\n\nN\n\u0019\n\u0007\nN\n \n-\n\u0019\n-\nU\n\b\nN\n\u0019\n\u0019\n\u0013\n\u001b\n\u0019\n\u0019\n\u0013\n6\n-\n\u0019\n-\nU\n\b\nN\n\u0019\n\u0019\n\u0013\n\u0019\n\u0013\n6\n-\n \nK\n\u0013\n-\nU\nG\nN\nU\n\b\nN\n\u0019\n$\n'\nU\nG\nN\n \n \nU\nG\nN\n\u0019\n\u0019\n \n \n\u0019\n\fIt can be shown that minima of the Bethe free energy are \ufb01xed points of the loopy belief\npropagation algorithm [2].\n\nIn our proposal, we generalize upon the Bethe assumption, and make the parameterized\nassumption\n\n\u0010\u0012\u0001\n\n \u0007;\n\n\u0010\u0012\u0001\u000f\u0013S\u0002\n\u0010\u0012\u0001\n\n\u0017H\u0019\n\n\u0006\t\b\n\n\u0010\u0012\u0001\n\n\u000532+\u0006\n\n(13)\n\n\u0010\u0012\u0001\n\n\u0010\u0012\u0001\n\n\u0010\u001c\u0001\n\n\u0014\u0017\u0016\u0019\u0018\n\n. The intuition behind this assumption is that we replace\nin which \u0001\n. The term with single node marginals is constructed to\neach\u001b\n\u0010\u0012\u0001\ndeal with overcounted terms. Substitution of (13) into the free energy leads to the approxi-\nmate free energy\n\n\u0001\n\u000b9\n\n X\u0002\nK% \u0005;9'\n\u0013 by a factorN\n\u0013S\u0002\u0013\u0012\n\n 2\u0010\u0012\u0001\u000f L\u0013X\u0007\n\n\u0010\u0012\u0001\n\u0014\u001a\u0016\u001b\u0018\nwhich is also parameterized by \u0003\n\u0001 . This class of free energies trivially contains the Bethe\n$ ). In addition, it includes the variational mean \ufb01eld free energy, con-\nfree energy (\u0001\nventionally written as \n\u0003\u0002\u0005\u0004\n as a limiting\n\u0002\u0013\u0012\n(implying an effective interaction of strength zero). If this limit is\ncase for \u0001\n\u0001\b\u0007\n\t\ntaken in (14), terms linear in \u0001 will dominate and act as a penalty term for non-factorial\nentropies. Consequently, the distributions will be constrained to be completely factorized,\n . Under these constraints, the remaining terms reduce to the conventional\n 50 \u2019s are set\n\nrepresentation of \n\nthe log partition function [7]. This one is recovered if, for pair-wise cliques, the \u0001\nto the edge appearance probabilities in the so-called spanning tree polytope of the graph.\nThese requirements imply that\n\n. Thirdly, it contains the recently derived free energy to upper bound\n\n \r\u0010\u001c\u0001^ \u001a\u0013\n\nK% 9;\n\n\u0010H$:\u0012\n\n(14)\n\n\u0014\u0017\u0016\u0019\u0018\n\n\u0014\u001a\u0016\u001b\u0018\n\n\u0014\u001a\u0016\u001b\u0018\n\n\u0002\u0005\u0004\n\n5 Fractional belief propagation\n\n\u000b\r\f\n\n 50\n\n\f#$ .\n\nIn this section we will use the \ufb01xed point equations to generalize Pearl\u2019s algorithm to\nfractional belief propagation as a heuristic to minimize \n\n\u0006 . Here, we do not worry too\nmuch about guaranteed convergence. If convergence is a problem, one can always resort\nto direct minimization of \n\n\u0006 using, e.g., Yuille\u2019s CCCP algorithm [3]. If standard belief\n7\n` [8]. We\npropagation converges, its solution is guaranteed to be a local minimum of \n\nexpect a similar situation for \n\nFixed point equations from \n\n\u0015]\u001b\n\n`65\n\u0006 .\n\u0006 are derived in the same way as in [2]. We obtain\n\u0010\u001c\u0001\n\n\u0005\u00072Y\u0005\u0005\u0004\u0007\u0006\t\b\n\n(15)\n\n\u0010\u0012\u0001\n\n\u0010\u0012\u0001\n\n\u00059\u0004&\u0006\t\b\n <\u0010\u0012\u0001^ >\u0013X\u0007\n\nF!G\n\n\u0010\u0012\u0001\n <\u0010\u0012\u0001^ >\u0013\n <\u0010\u0012\u0001^ >\u0013\nand we notice that N\n\u0010\u0012\u0011\u0014\u0013\u0016\u0015\u0018\u0017 , i.e. with all\n\n1\n\nInspired by Pearl\u2019s loopy belief propagation algorithm, we use the above equations to for-\nmulate fractional belief propagation\n\n\u0010\u001c\u0001^ \u001a\u0013\n\u0010\u001c\u0001\n\u0013 has indeed the functional dependency of \u001b\n\u0015 , is equivalent to standard loopy belief propagation\n\n\u0013 (see Algorithm 1) 1.\n\n \r\u0010\u0012\u0001\u000f \u001a\u00137\t\n\u0011\u0010&\u0003\n\n\u0010\u001c\u0001\n\n\u0019\u001b\u001a\b\u001c\n\n(16)\n\n(17)\n\n\u0019 as desired in (8).\n\nN\nN\n\u0019\n\u0019\n\u0013\n\u0017\n \nN\n \n \n\u0013\nG\nD\nF\nG\nD\n\u0007\n$\n\u0002\n;\n\u0019\nZ\nF\nG\n\u0001\n\u0019\n\u0019\n\u0019\n\u0019\n\u0019\n\u0013\n\u0006\n\b\n\u0006\n\b\n\u000e\n\u0010\n\nN\n\u0019\n\u0007\nN\n \n\u0001\n-\n\u0019\n-\nU\n\b\nN\n\u0019\n\u0019\n\u0013\n\u001b\n\u0019\n\u0019\n\u0013\n6\n-\n\u0019\n\u0001\n\u0019\n-\nU\n\b\nN\n\u0019\n\u0019\n\u0013\nN\n\u0019\n\u0013\n6\n-\n \n\u0001\n\u0013\n-\nU\nG\nN\nN\n\u0019\n\u0002\n'\n\u0019\n'\nU\n\b\n\u0006\n \nN\n \n\u001b\n\u0019\n6\n'\n \n'\nU\nG\nN\n \nN\n\u0019\n\u0002\nN\n\u0019\n\u0002\n\u0006\n \nZ\nF\n\b\nN\n\u0001\n\n\n4\n\n\nN\n\u0019\n\u0019\n\u0013\n\u0019\n\u0019\n\u0013\n\u0017\n \nZ\nF\n\b\n\u0017\n\u000e\nZ\n\\\n\u0019\nO\n\u000e\n \n \n\u0013\nO\n\u0019\n \n \n\u0013\n\u0007\nN\n\u0015\n\u0017\n\u0019\nO\n\u0019\nO\n\u0019\n\u0002\nN\n\u0019\nN\n \n \n\u0013\nO\n\u0019\n\u0019\n\u0019\n\u000f\n\u0001\n\f\u000f%\u0011\u0010&\u0003\n\nAlgorithm 1 Fractional Belief Propagation\n\n1: initialize(O\n )\n\u0007<N\nfor all\u001e do\n\u0019 according to (15).\nupdateN\nupdateO\n ,;@?\nupdateNA ,;@?\u0003K\n\n2: repeat\n3:\n4:\n5:\n6:\n7:\n8: until convergence criterion is met (or maximum number of iterations is exceeded)\n9: return \n\n\u0019 according to (17) using the new N\n\u0019 by marginalization of N\n\n\u0019 and the oldN\n\n(or failure)\n\nend for\n\n .\n\n.\n\n\u0007<N\n\n\u0014\u001a\u0016\u001b\u0018\n\n(18)\n\n(19)\n\n\u0010\u001c\u001e\n\nNA\u0013S\u0002\n\n\u0014\u001a\u0016\u001b\u0018\n\n\u0011\u0010\u0012\u0001\u0014\u0013\n\nN\u0011\u0010\u0012\u0001\u0014\u0013\n\nD\u0005\n\n\u0010\u001cX\u0007\n\n\ufb01eld equations.\n\n\u0010\u0012\u0001\u000f\u0013\n\u0011\u0010\u001c\u0001\u000f\u0013\n\nwith the limiting cases\n\nleads to the well known mean\n\nwhich has the same \ufb01xed points as\n\n(in line 4) is to be taken with \u0001\n\n\u0001 -algorithm,\nAs a theoretical footnote we mention a different (generally more greedy) \u0003\n\u0013 . This algorithm is similar to Algorithm 1, except\n\u000f%\u0011\u0010&\u0003\n$ , as in in standard belief prop-\nthat (1) the update ofN\n (in line 6) is to be performed by minimizing\nagation and (2) the update of the marginals N\nthe divergence D\u00059\u0004\u0007\u0006\t\b\nNA E\u0013 where\n\u0010EN\n\u000532\u0003\u0005\u0004\nD*\u0010\u001cX\u0007\n$:\u0012\n\u0010\r$:\u0012\nand D\u0006\n\u0011\u0010\u0012\u0001\u000f\u0013\n\u0011\u0010\u001c\u0001\u000f\u0013\nNA\u0013&\u0002#-\bU\nN\u0011\u0010\u0012\u0001\u0014\u0013\n\u0010EX\u0007<NA\u0013\u0016\u0002\u0018-\nN\u0011\u0010\u0012\u0001\u0014\u0013\nthe usual\u0007\t\b divergence). The D \u2019s are known as the\u001e -divergences [9] where\u0001\nrather than by marginalization (which corresponds to minimizing D\u0005 , which is the equal to\n \u2019s using D\u0006\nand \u0012\n$ . The minimization of the N\n\f using linear response theory\n6 Tuning\u000b\n\u0001 ? The idea is as follows, if we could\nNow the question is, how do we set the parameters \u0003\nhave access to the true marginals \n\u0013 , we could optimize \u0003\n\u0001 by minimizing,\n\r\u000f\u000e\u0011\u0010\u0013\u0012\n\u0011\u0010\u0012\u0001\n\u0013R\u0002\n\u0007\u0014\bC\u0010E\n\u0011\u0010\u0012\u0001\n\u0013\u0011\u0010\u0018-\n\u0013S\u0002T-\n\u0010&\u0003\n\u0010\u001c\u0001\nby \u0003\nin which we labeled N\n\u0001\u0016\u0015 such that N\n\u0006\u0018\u0017\nthat improve upon N\n, we can compute new parameters \u0003\n\u0019 will be changed as well, and this\n. However, with the new parameters the estimates V\nIn this paper, we use the linear response theory [10] to improve upon N\n\u001d\u001f\u001e\n\n. For simplicity,\nwe restrict ourselves to Boltzmann machines with binary units. Applying linear response\ntheory to\nin Boltzmann machines yields the following linear response estimates for\nthe pair marginals,\n\nto emphasize its dependency on the scale parameters. Unfor-\ntunately, we do not have access to the true pair marginals, but if we would have estimates\nis closer to\n\nprocedure should be iterated.\n\n 50\n\n\u0013S\u0002\u0018N\n\n2\u001a\u0019\u001c\u001b\n\n @0\n\nfor example,\n\n\u0010\u0012\u0001\n\n\u00132N\n\n\u0010\u0012\u0001\n\n\u0010\u0012\u0001\n\n\u0014\u0017\u0016\u0019\u0018\n\n\u0011\u0010\u0007\u0003\n\n\u0010\u001c\u0001\n\u0007<N\n\n\u0010\u001c\u0001\n9\u000b0\n\n\u0006\u0018\u0017\n\n(20)\n\n(21)\n\n\u0010\u001c\u0001\n\n\u0001\n\u0013\n\u0019\n \n \n\u0015\n\u0006\nF\nG\nO\n\u0019\n\u0019\nK\n\u0019\nN\n\u0019\n \n\u0001\n\u0001\n\u0019\n\u0019\n\u0002\n\u0019\n\u0007\n\u0006\n \nZ\nF\n\b\n$\n\u0001\n\u0001\n\u0013\n\u0002\n-\nU\n\n\u0007\nU\nN\n\u0007\n\u0002\n\u0005\n\n6\n$\n\u0013\n$\n\f\n\u001e\n\f\n\u0019\n\u0019\n\u0019\n\u0001\n\u0019\n\u0019\n\n\u0006\n\u0019\n\u0019\n-\nU\n\b\n\n\u0019\n\u0019\n\u0013\n\u0019\n\u0013\nN\n\n\u0006\n\u0019\n\u0019\n\u0013\n\t\n\u0001\nV\n\n\n\u0006\n\u0019\n\n\u0006\n\u0019\n\n\u0019\nV\n\n\n\u0006\n\u0019\n\n\n\n\u0006\n\u0019\n\u000f\n\u0001\n\u0013\nN\n\n\u0006\n\n\u0006\n \n \n\n\u0006\n0\n0\n\u0013\n6\n\u0001\n0\nN\n\n\u0006\n \n \n\u0013\n\u001e\n\t\n\fset\n\nas in (21)\n\nset step-size\n\n1: initialize(\n2: repeat\n3:\n4:\n5:\n6:\n7: until convergence criterion is met\n\nAlgorithm 2 Tuning \u0003\n\u0001 by linear response\n$ )\n\u001d\u0002:$\n\u0001\u0002\u0001\ncompute the linear response estimates N\n\u0005 as in (22).\ncompute \u0003\n\u0001\u0006\u0001\b\u0007\nS\u0002\t\n8: return \n\n \f\u000b\n 50\n\u0013 , the error in the linear response\nIn [10], it is argued that if N\n\u0013 . Linear response theory has been applied previously to improve upon\n\nestimate is\npair marginals (or correlations) in the naive mean \ufb01eld approximation [11] and in loopy\nbelief propagation [12].\n\n\u0006\u0005\u0004\t2\u001a\u0019\u001c\u001b\n 50\n\nis correct up to\n\n\u0010\u001c\u0001^ 504\u0013\n\nTo iteratively compute new scaling parameters from the linear response corrections we use\na gradient descent like algorithm\n\nwith a time dependent step-size parameter\n\n\u0001\b\u0007\n\n\u0002\u0015\u0003\n\n\u0001\u0010\u000f\n\n\u0012\u000e\u0003\n\n\u0007<N\n\n 50\n\n(22)\n\n\u0007\u0014\bR\u0010EN\n\n 5021\n\u0001 .\n\n\u0006\u0005\u0004\n 50\n\n\u0019\u001c\u001b\n\u0011\u0010&\u0003\n\nestimate can be computed numerically by applying\n\nBy iteratively computing the linear response marginals, and adapting the scale parameters\n\u0001 , see Algorithm 2. Each linear response\nto a Boltzmann machine with\n 50 , required for the\ngradient in (22), can be computed numerically by rerunning fractional belief propagation\nrequires\nis the\n\nin the gradient descent direction, we can optimize \u0003\nparameters\u0010\n\u0007\r9\nwith parameters \u0003\n\u0010\u0014\u0013\n\u0010\u001cK\n\n6\u000e\u0011\n\u000729\n 50 . In this procedure the computation cost to update \u0003\n\n\u0013 . Partial derivatives with respect to \u0001\n\u000f%\u0011\u0010&\u0003\n\n\u0013 and \u0010\n6\u0012\u0011\n\nis the number of nodes and\n\n\u0013 , where K\n\nnumber of edges.\n\ntimes the cost of\n\n7 Numerical results\n\n @0\n\nexceeded\n\n\u0015\u0017\u0016\u0018\u0015\n\nIn the experiment the step size was set to be\n\n. Throughout the procedure, fractional belief propagations were ran\n\nwere drawn according to9\n\nWe applied the method to a Boltzmann machine in which the nodes are connected according\nto a\nsquare grid with periodic boundary conditions. The weights in the model were\nwith equal probability. Thresholds\ndrawn from the binary distribution\nnetworks, and compared results\nof standard loopy belief propagation to results obtained by fractional belief propagation\nwhere the scale parameters were obtained by Algorithm 2.\n\r\u0013 . The iterations were\n\u0010\r$\n\u0014\u001a\u0016\u001b\u0018\n2 \u001f , or if the number of iterations\n2\"! between messages in successive\n\t\u001a\u0019\n\n\u0013 We generated\u001d\n\t\u001a\u0019,\u0007\u001b\u000b\n\t\u001a\u0019\n \u001c\u001b\u001e\u001d\nstopped if the maximum change in $\n 50 was less than$\nwith convergence criterion of maximal difference of $\nbelief propagation runs converged. The number of updates of \u0003\n\u001d .\n80. After optimization we found (inverse) scale parameters ranging from $\n 50\n\u0013 are consistently 10 to 100 times better in averaged\u0007\t\b\n\u0011\u0010\u0007\u0003\n\nResults are plotted in \ufb01gure 1. In the left panel, it can be seen that the procedure can lead\nto signi\ufb01cant improvements. In these experiments, the solutions obtained by optimized\n, than the ones obtained by\n\niterations (one iteration is one cycle over all weights). In our experiment, all (fractional)\nranged between 20 and\nto\n\n 50$#\n\n\u0007\n\u0003\n\u0002\n\u0003\n\u0001\n\n6\n$\nN\n\n\u0006\n\u0004\n\u0007\nN\n\n\u0006\n\u0004\n\n\u0006\n\n\u0010\n\u0001\n\n\u0010\n\u0001\n\n\u0003\n\u0001\n\u0005\n\u0001\n\u0001\n\n\u0006\n-\n.\n\n2\n\n\u0006\n\u0013\n\u0003\n\u000f\n\u0001\n\u0013\n3\n3\n9\n0\n\u0001\n\u0001\n\u0001\n\n\u0013\n6\n\n\u0013\n\u0001\n\u0013\n3\n?\n\n\u0012\n\u000b\n\u0001\n\u0010\n\u000b\n\u0007\n\u000b\n\t\n$\n\u000b\n\u0003\n\u0001\n\u0002\n$\n\u0002\n6\n\u0002\n\u0001\n\u000b\n\n\u0002\n$\n\u000b\n\u000b\n\u000b\n\u0001\n\u0002\n\u0001\n\u0012\n\u000b\n$\n\u0002\n\u0001\n#\n\u000f\n\u0001\n\f100\n\nBP(1)\nBP(C)\n\n \nc\n \n\nj\ni\n\n>\n \n)\n \n\nQ\n\n \n|\n|\n \n\nP\n\n10\u22122\n\nj\ni\n\n \n(\nL\nK\n<\n\n \n\n10\u22124\n\nBP(1)\nBP(C)\n\n1\n\n0.5\n\n0\n\nx\no\nr\np\np\na\n\ni\n\n>\nX\n<\n\n\u22120.5\n\n10\u22124\n\n10\u22122\n || Q1\n< KL( P\nij\nij\n\n ) >\n\n100\n\n\u22121\n\u22121\n\n\u22120.5\n\n0\n<X\n>\ni\n\nex\n\n0.5\n\n1\n\n).\n\n @0\n\nhighest\n\nstandard\n\n\u000f%\u0011\u0010&\u0003\n\nbetween exact and approximated pair\n\nmarginals obtained by the optimized fractional belief propagation (\nobtained by standard belief propagation (\none instantiation of the network. Right: approximated single-node means for\noptimized\n\nFigure 1: Left: Scatter plots of averaged\u0007\u0014\b\n\u0013 ) versus the ones\n\u000f%\u0011\u0010H$[\u0013 ). Each point in the plot is the result of\n\u0011\u0010\r$[\u0013 and\n\u0013 against the exact single node means. This plot is for the network where\n\u000f%\u0011\u0010&\u0003\n\u0011\u0010\r$[\u0013 had the worst performance (i.e. corresponding to the point in the left panel with\n\u0007\t\bC\u0010\u001c! 50\n\u0007<N\n\u0013\u0002\u0001\n\u0011\u0010\r$[\u0013 . The averaged\u0007\t\b\n\u0007\t\bC\u0010E\n 50\n\u0007<N\n\nIn the right panel, approximations of single-node means are plotted for the case where\n\n\u0011\u0010\r$[\u0013 had the worst performance. Here we see that procedure can lead to quite precise\nis very poor.\nestimates of the means, even if the quality of solutions by obtained\nHere, it should be noticed that the linear response correction does not alter the estimated\nmeans [12]. In other words, the improvement in quality of the means is a result of optimized\n\u0001 , and not of the linear response correction.\n8 Conclusions\n\n\u0011\u0010\r$[\u0013\n\n\u0007\u0014\bR\u0010\u001c\n\nis de\ufb01ned as\n\n\u0013\u0005\u0004S\u0002\n\n @0\n\n\u0013!\t\n\n(23)\n\n 50\n\n @0\n\n @021\n\nIn this paper, we introduced fractional belief propagation as a family of approximating in-\nference methods that generalize upon loopy belief propagation without resorting to larger\n, which are moti-\nclusters. The approximations are parameterized by scale parameters \u0001\nvated to better model the effective interactions due to the effect of loops in the graph. The\napproximations are formulated in terms of approximating free energies. This family of ap-\nproximating free energies includes as special cases the Bethe free energy, the mean \ufb01eld\nfree energy, and also the free energy approximation that provides an upper bound on the\nlog partition function, developed in [7].\n\nIn order to apply fractional belief propagation, the scale parameters have to be tuned. In\nthis paper, we demonstrated in toy problems for Boltzmann machines that it is possible to\ntune the scale parameters using linear response theory. Results show that considerable im-\nprovements can be obtained, even if standard loopy belief propagation is of poor quality. In\nprinciple, the method is applicable to larger and more general graphical models. However,\n\n\u0001\n\u000f\n\u0001\n\u000f\n\n\u0005\n\u000f\n\u0003\n$\n\u0013\n-\n.\n\u0007\nN\n\u000f\n\u000f\n\u0003\n\u0019\n\fhow to make the tuning of scale parameters practically feasible in such models is still to be\nexplored.\n\nAcknowledgements\n\nWe thank Bert Kappen for helpful comments and the Dutch Technology Foundation STW\nfor support.\n\nReferences\n\n[1] J. Pearl. Probabilistic Reasoning in Intelligent systems: Networks of Plausible Inference. Mor-\n\ngan Kaufmann Publishers, Inc., 1988.\n\n[2] J. Yedidia, W. Freeman, and Y. Weiss. Generalized belief propagation. In NIPS 13.\n[3] A. Yuille. CCCP algorithms to minimize the Bethe and Kikuchi free energies: Convergent\n\nalternatives to belief propagation. Neural Computation, July 2002.\n\n[4] T. Minka. A family of algorithms for approximate Bayesian inference. PhD thesis, MIT Media\n\nLab, 2001.\n\n[5] S.L. Lauritzen and D.J. Spiegelhalter. Local computations with probabilties on graphical struc-\ntures and their application to expert systems. J. Royal Statistical society B, 50:154\u2013227, 1988.\n[6] F. Kschischang, B. Frey, and H. Loeliger. Factor graphs and the sum-product algorithm. IEEE\n\nTransactions on Information Theory, 47(2):498\u2013519, 2001.\n\n[7] W. Wainwright, T. Jaakkola, and S. Willsky. A new class of upper bounds on the log partition\n\nfunction. In UAI-2002, pages 536\u2013543.\n\n[8] T. Heskes. Stable \ufb01xed points of loopy belief propagation are minima of the Bethe free energy.\n\nIn NIPS 15.\n\n[9] S. Amari, S. Ikeda, and H. Shimokawa. Information geometry of\n\n-projection in mean \ufb01eld\napproximation. In M. Opper and D. Saad, editors, Advanced Mean Field Methods, pages 241\u2013\n258, Cambridge, MA, 2001. MIT press.\n\n[10] G. Parisi. Statistical Field Theory. Addison-Wesley, Redwood City, CA, 1988.\n[11] H.J. Kappen and F.B. Rodr\u00b4\u0131guez. Ef\ufb01cient learning in Boltzmann Machines using linear re-\n\nsponse theory. Neural Computation, 10:1137\u20131156, 1998.\n\n[12] M. Welling and Y.W. Teh. Propagation rules for linear response estimates of joint pairwise\n\nprobabilities. 2002. Submitted.\n\n\n\f", "award": [], "sourceid": 2339, "authors": [{"given_name": "Wim", "family_name": "Wiegerinck", "institution": null}, {"given_name": "Tom", "family_name": "Heskes", "institution": null}]}