{"title": "MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation", "book": "Advances in Neural Information Processing Systems", "page_first": 873, "page_last": 880, "abstract": null, "full_text": "MIME: Mutual Information Minimization\n\nand Entropy Maximization for Bayesian\n\nBelief Propagation\n\nDept. of Computer and Information Science and Engineering\n\nAnand Rangarajan\n\nUniversity of Florida\n\nGainesville, FL 32611-6120, US\n\nanand@cise.u(cid:13).edu\n\nAlan L. Yuille\n\nSmith-Kettlewell Eye Research Institute\n\n2318 Fillmore St.\n\nSan Francisco, CA 94115, US\n\nyuille@ski.org\n\nAbstract\n\nBayesian belief propagation in graphical models has been recently\nshown to have very close ties to inference methods based in statis-\ntical physics. After Yedidia et al. demonstrated that belief prop-\nagation (cid:12)xed points correspond to extrema of the so-called Bethe\nfree energy, Yuille derived a double loop algorithm that is guar-\nanteed to converge to a local minimum of the Bethe free energy.\nYuille\u2019s algorithm is based on a certain decomposition of the Bethe\nfree energy and he mentions that other decompositions are possi-\nble and may even be fruitful. In the present work, we begin with\nthe Bethe free energy and show that it has a principled interpre-\ntation as pairwise mutual information minimization and marginal\nentropy maximization (MIME). Next, we construct a family of free\nenergy functions from a spectrum of decompositions of the original\nBethe free energy. For each free energy in this family, we develop\na new algorithm that is guaranteed to converge to a local min-\nimum. Preliminary computer simulations are in agreement with\nthis theoretical development.\n\n1\n\nIntroduction\n\nIn graphical models, Bayesian belief propagation (BBP) algorithms often (but not\nalways) yield reasonable estimates of the marginal probabilities at each node [6].\nRecently, Yedidia et al.\n[7] demonstrated an intriguing connection between BBP\nand certain inference methods based in statistical physics. Essentially, they demon-\nstrated that traditional BBP algorithms can be shown to arise from approximations\n\n\fof the extrema of the Bethe and Kikuchi free energies. Next, Yuille [8] derived new\ndouble-loop algorithms which are guaranteed to minimize the Bethe and Kikuchi\nenergy functions while continuing to have close ties to the original BBP algorithms.\nYuille\u2019s approach relies on a certain decomposition of the Bethe and Kikuchi free\nenergies. In the present work, we begin with a new principle|pairwise mutual in-\nformation minimization and marginal entropy maximization (MIME)|and derive\na new energy function which is shown to be equivalent to the Bethe free energy.\nAfter demonstrating this connection, we derive a family of free energies closely re-\nlated to the MIME principle which also shown to be equivalent, when constraint\nsatisfaction is exact, to the Bethe free energy. For each member in this family of\nenergy functions , we derive a new algorithm that is guaranteed to converge to a\nlocal minimum. Moreover, the resulting form of the algorithm is very simple despite\nthe somewhat unwieldy nature of the algebraic development. Preliminary compar-\nisons of the new algorithm with BBP were carried out on spin glass-like problems\nand indicate that the new algorithm is convergent when BBP is not. However, the\ne(cid:11)ectiveness of the new algorithms remains to be seen.\n\n2 Bethe free energy and the MIME principle\n\nIn this section, we show that the Bethe free energy can be interpreted as pairwise\nmutual information minimization and marginal entropy maximization.\n\nThe Bethe free energy for Bayesian belief propagation is written as\n\nFBethe(fpij ; pi; (cid:13)ij ; (cid:21)ijg) =\n\npi(xi) log pi(xi)\n i(xi)\n\n(1)\n\nPij:i>jPxi;xj\n\npij(xi; xj) log pij (xi;xj )\n\n+Pij:i>jPxj\n+Pij:i>jPxi\n\n(cid:30)ij (xi;xj ) (cid:0)Pi(ni (cid:0) 1)Pxi\n(cid:21)ij (xj)[Pxi\n(cid:21)ji(xi)[Pxj\n+Pij:i>j (cid:13)ij(Pxi;xj\n\npij (xi; xj) (cid:0) 1)\n\npij(xi; xj) (cid:0) pi(xi)]\n\npij (xi; xj) (cid:0) pj(xj )]\n\nwhere (cid:30)ij(xi; xj) def= ij (xi; xj ) i(xi) j(xj ) and ni is the number of neighbors of\nnode i. Link functions ij > 0 are available relational data between nodes i and j.\nThe singleton function i is also available at each node i. The double summation\n\nrameters f(cid:21)ij ; (cid:13)ij g are needed in the Bethe free energy (1) to satisfy the following\nconstraints relating the joint probabilities fpijg with the marginals fpig:\n\nPij:i>j is carried out only over the nodes that are connected. The Lagrange pa-\nXxi\n\npij(xi; xj) = pi(xi); and Xxi;xj\n\npij(xi; xj) = pj(xj ); Xxj\n\npij(xi; xj) = 1:\n\n(2)\n\nThe pairwise mutual information is de(cid:12)ned as\n\nM Iij = Xxi;xj\n\npij (xi; xj) log\n\npij(xi; xj)\npi(xi)pj(xj )\n\n(3)\n\nThe mutual information is minimized when the joint probability pij(xi; xj ) =\npi(xi)pj(xj ) or equivalently when nodes i and j are independent. When nodes i and\nj are connected via a non-separable link ij (xi; xj) they will not be independent.\nWe now state the MIME principle.\n\nStatement of the MIME principle: Maximize the marginal entropy and min-\nimize the pairwise mutual information using the available marginal and pairwise\nlink function expectations while satisfying the joint probability constraints.\n\n\fThe pairwise MIME principle leads to the following free energy:\n\nFMIME(fpij ; pi; (cid:13)ij ; (cid:21)ij g) =\n\nPij:i>jPxi;xj\n(cid:0)Pij:i>jPxi;xj\n\npij(xi; xj ) log pij (xi;xj )\n\npi(xi)pj (xj ) +PiPxi\npij(xi; xj) log ij (xi; xj) (cid:0)PiPxi\n\npi(xi) log pi(xi)\n\npi(xi) log i(xi)\n\n+Pij:i>jPxj\n+Pij:i>jPxi\n\n(cid:21)ij (xj)[Pxi\n(cid:21)ji(xi)[Pxj\n+Pij:i>j (cid:13)ij (Pxi;xj\n\npij (xi; xj) (cid:0) pj(xj )]\n\npij(xi; xj) (cid:0) pi(xi)]\n\npij(xi; xj ) (cid:0) 1):\n\nIn the above free energy, we minimize the pairwise mutual information and maximize\nthe marginal entropies. The singleton and pairwise link functions are additional\ninformation which do not allow the system to reach its \\natural\" equilibrium|a\nuniform i.i.d. distribution on the nodes. The Lagrange parameters enforce the\nconstraints between the pairwise and marginal probabilities. These constraints are\nthe same as in the Bethe free energy (1). Note that the Lagrange parameter terms\nvanish if the constraints in (2) are exactly satis(cid:12)ed. This is an important point\nwhen considering equivalences between di(cid:11)erent energy functions.\n\n(4)\n\nLemma 1 Provided the constraints in (2) are exactly satis(cid:12)ed, the MIME free\nenergy in (4) is equivalent to the Bethe free energy in (1).\n\nProof: Using the fact that constraint satisfaction is exact and using the identity\npij(xi; xj) = pji(xj ; xi), we may write\n\n(cid:0) Xij:i>j Xxi;xj\n\npij(xi; xj) log pi(xi)pj(xj ) = (cid:0) Xij:i6=j Xxi;xj\nniXxi\n= (cid:0)Xi\npij(xi; xj) log i(xi) j (xj) =Xi\nniXxi\n\nand Xij:i>j Xxi;xj\n\npij (xi; xj) log pi(xi)\n\npi(xi) log pi(xi);\n\npi(xi) log i(xi):\n\n(5)\n\nWe have shown that a marginal entropy term emerges from the mutual information\nterm in (4) when constraint satisfaction is exact. Collecting the marginal entropy\nterms together and rearranging the MIME free energy in (4), we get the Bethe free\nenergy in (1).\n\n3 A family of decompositions of the Bethe free energy\n\nRecall that the Bethe free energy and the energy function resulting from application\nof the MIME principle were shown to be equivalent. However, the MIME energy\nfunction is merely one particular decomposition of the Bethe free energy. As Yuille\nmentions [8], many decompositions are possible. The main motivation for consid-\nering alternative decompositions is for algorithmic reasons. We believe that certain\ndecompositions may be more e(cid:11)ective than others. This belief is based on our pre-\nvious experience with closely related deterministic annealing algorithms [3, 2]. In\nthis section, we derive a family of free energies that are equivalent to the Bethe\nfree energy provided constraint satisfaction is exact. The family of free energies is\ninspired by and closely related to the MIME free energy in (4).\n\nLemma 2 The following family of energy functions indexed by the free parameters\n(cid:14) > 0 and f(cid:24)ig is equivalent to the original Bethe free energy (1) provided the\n\n\fconstraints in (2) are exactly satis(cid:12)ed and the parameters q and r are set to fqi =\n(1 (cid:0) (cid:14))nig and fri = 1 (cid:0) ni(cid:24)ig respectively.\n\nFequiv(fpij ; pi; (cid:13)ij ; (cid:21)ij g) =\n\npij(xi; xj) log\n\nPij:i>jPxi;xj\n+PiPxi\n(cid:0)Pij:i>jPxi;xj\n+Pij:i>jPxj\n+Pij:i>jPxi\n\npij (xi;xj )\n\n[Pxj\n\npij (xi;xj )](cid:14) [Pxi\n\npi(xi) log pi(xi)\n\npij (xi;xj )](cid:14)\n\ni (xi) \n\n(cid:24)j\nj (xj )\n\npij (xi; xj) log ij(xi; xj ) (cid:24)i\n\npi(xi) log pi(xi) (cid:0)Pi qiPxi\n(cid:0)Pi riPxi\n\npi(xi) log i(xi)\n\n(cid:21)ij (xj)[Pxi\n(cid:21)ji(xi)[Pxj\n+Pij:i>j (cid:13)ij (Pxi;xj\n\npij (xi; xj) (cid:0) pj(xj )]\n\npij(xi; xj) (cid:0) pi(xi)]\n\npij(xi; xj ) (cid:0) 1):\n\n(6)\n\nIn (6), the (cid:12)rst term is no longer the pairwise mutual information as in (4). And\nunlike (4), pi(xi) no longer appears in the pairwise mutual information-like term.\n\npi(xi) to show the equivalence. First\n\nProof: We selectively substitute Pxi\nXij:i>j Xxi ;xj\npij (xi; xj )](cid:14)[Xxi\n\npij (xi; xj ) log[Xxj\nXij:i>j Xxi ;xj\n\npij(xi; xj ) = pj(xj) and Pxj\npij (xi; xj )](cid:14) = (cid:14)Xi\nj (xj ) =Xi\n\nniXxj\nni(cid:24)iXxj\n\n(cid:24)j\n\npi(xi) log pi(xi);\n\npij (xi; xj ) log \n\n(cid:24)i\ni (xi) \n\npi(xi) log i(xi):\n\n(7)\n\npij(xi; xj) =\n\nSubstituting the identities in (7) into (6), we see that the free energies are alge-\nbraically equivalent.\n\n4 A family of algorithms for belief propagation\n\nWe now derive descent algorithms for the family of energy functions in (6). All\nthe algorithms are guaranteed to converge to a local minimum of (6) under mild\nassumptions regarding the number of (cid:12)xed points. For each member in the family\nof energy functions, there is a corresponding descent algorithm. Since the form of\nthe free energy in (6) is complex and precludes easy minimization, we use algebraic\n(Legendre) transformations [1] to simplify the optimization.\n\npij(xi; xj) =\n\npij (xi; xj) log (cid:27)ji(xi) + (cid:27)ji(xi) (cid:0)Pxj\n\npij(xi; xj) =\n\npij(xi; xj)\n\npij(xi; xj)\n\npij(xi; xj) logXxj\n\n(cid:0)Xxj\nmin(cid:27)ji (xi) (cid:0)Pxj\n(cid:0)Xxi\npij(xi; xj) logXxi\nmin(cid:27)ij (xj ) (cid:0)Pxi\n\npij (xi; xj) log (cid:27)ij(xj ) + (cid:27)ij (xj ) (cid:0)Pxi\n\n(cid:0)pi(xi) log (cid:26)i(xi) + (cid:26)i(xi) (cid:0) pi(xi):\n\n(8)\n\n(cid:0)pi(xi) log pi(xi) = min\n(cid:26)i(xi)\n\nWe now apply the above algebraic transforms. The new free energy is (after some\nalgebraic manipulations)\n\nFequiv(fpij ; pi; (cid:27)ij ; (cid:26)i; (cid:13)ij ; (cid:21)ij g) = Xij:i>j Xxi ;xj\n\npij (xi; xj ) log\n\npij (xi; xj )\nji (xi)(cid:27)(cid:14)\n(cid:27)(cid:14)\n\nij (xj )\n\n\f+(cid:14) Xij:i6=jXxi\n\n(cid:27)ij (xj ) +Xi Xxi\n\npij (xi; xj ) log ij (xi; xj ) \n\n(cid:24)i\ni (xi) \n\n(cid:0) Xij:i>j Xxi ;xj\n\n+ Xij:i>jXxj\n\n(cid:21)ij (xj )[Xxi\n\npij (xi; xj ) (cid:0) pj (xj )] + Xij:i>jXxi\n\n(cid:26)i(xi)\n\nqiXxi\n\npi(xi) log i(xi)\n\npij (xi; xj ) (cid:0) pi(xi)]\n\npi(xi) log\n\n(cid:26)\n\n(cid:24)j\n\npi(xi)\nqi\ni (xi)\n\n+Xi\nj (xj ) (cid:0)Xi\nriXxi\n(cid:21)ji (xi)[Xxj\n+ Xij:i>j\n(cid:13)ij (Xxi ;xj\n\npij (xi; xj ) (cid:0) 1):\n\n(9)\n\nWe continue to keep the parameters fqig and frig in (9). However, from Lemma 2,\nwe know that the equivalence of (9) to the Bethe free energy is predicated upon\nappropriate setting of these parameters. In the rest of the paper, we continue to\nuse q and r for the sake of notational simplicity.\n\nDespite the introduction of new variables via Legendre transforms, the optimiza-\ntion problem in (9) is still a minimization problem over all the variables. The\nalgebraically transformed energy function in (9) is separately convex w.r.t. fpij ; pig\nand w.r.t. f(cid:27)ij ; (cid:26)ig provided (cid:14) 2 [0; 1]. Since the overall energy function is not\nconvex w.r.t. all the variables, we pursue an alternating algorithm strategy similar\nto the double loop algorithm in Yuille [8]. The basic idea is to separately minimize\nw.r.t. the variables f(cid:27)ij ; (cid:26)ig and the variables fpij ; pig. The linear constraints in\n(2) are enforced when minimizing w.r.t the latter and do not a(cid:11)ect the convergence\nproperties of the algorithm since the energy function w.r.t. fpij; pig is convex .\n\nWe evaluate the (cid:12)xpoints of f(cid:27)ij ; (cid:26)ig. Note that (9) is convex w.r.t. f(cid:27)ij ; (cid:26)ig.\n\n(cid:27)ij (xj ) =Xxi\n\npij(xi; xj ); (cid:27)ji(xi) =Xxj\n\npij(xi; xj ); and (cid:26)i(xi) = pi(xi):\n\n(10)\n\nThe (cid:12)xpoints of fpij ; pig are evaluated next. Note that (9) is convex w.r.t. fpij ; pig.\n\npij(xi; xj) = (cid:27)(cid:14)\n\npi(xi) = (cid:26)qi\n\nji(xi)(cid:27)(cid:14)\ni (xi) ri\n\nij (xj ) ij (xi; xj ) (cid:24)i\n(cid:21)ki(xi)(cid:0)1:\n\ni (xi)ePk\n\nThe constraint satisfaction equations from (2) can be rewritten as\n\ni (xi) \n\n(cid:24)j\nj (xj )e(cid:0)(cid:21)ij (xj )(cid:0)(cid:21)ji(xi)(cid:0)(cid:13)ij (cid:0)1\n\n(11)\n\n(12)\n\npij(xi; xj ) = pi(xi) )\n\nXxj\ne2(cid:21)ji(xi) = Pxj\n\n(cid:27)(cid:14)\nji (xi)(cid:27)(cid:14)\n\nij (xj ) ij (xi;xj ) \n\n(cid:24)i\ni (xi) \n\n(cid:24)j\nj (xj )e(cid:0)(cid:21)ij (xj )(cid:0)(cid:13)ij (cid:0)1\n\n(cid:26)\n\nqi\ni (xi) \n\nri\n\ni (xi)ePk6=j\n\n(cid:21)ki(xi )(cid:0)1\n\nSimilar relations can be obtained for the other constraints in (2). Consider a La-\ngrange parameter update sequence where the Lagrange parameter currently being\nupdated is tagged as \\new\" with the rest designated as \\old.\" We can then rewrite\nthe Lagrange parameter updates using \\old\" and \\new\" values. Please note that\neach Lagrange parameter update corresponds to one of the constraints in (2). It\ncan be shown that the iterative update of the Lagrange parameters is guaranteed\nto converge to the unique solution of (2) [8]. While rewriting (12), we multiply the\nleft and right sides with e(cid:0)2(cid:21)old\n\nji (xi).\n\ne2(cid:21)new\n\nji\n\n(xi)(cid:0)2(cid:21)old\n\nji (xi) =\n\nPxj\n\n(cid:27)(cid:14)\nji (xi)(cid:27)(cid:14)\n\nij (xj ) ij (xi;xj ) \n\n(cid:24)i\ni (xi) \n\n(cid:24)j\nj (xj )e\n\n(cid:0)(cid:21)old\n\nij\n\n(xj )(cid:0)(cid:21)old\nji\n\n(xi )(cid:0)(cid:13)old\nij\n\n(cid:0)1\n\n(cid:26)\n\nqi\ni (xi) \n\nri\n\ni (xi)ePk\n\n(cid:21)old\nki\n\n(xi )(cid:0)1\n\n:\n\n(13)\n\n\fUsing (11), we relate each Lagrange parameter update with an update of pij(xi; xj )\nand pi(xi). We again invoke the \\old\" and \\new\" designations, this time on the\nprobabilities. From (11), (12) and (13), we write the joint probability update\n\npnew\n(xi; xj )\nij\npold\nij (xi; xj )\n\n= e(cid:0)(cid:21)new\n\nji\n\n(xi)+(cid:21)old\n\nji (xi) =s pold\n\n(xi)\n\ni\npold\nij (xi; xj)\n\n(14)\n\nand for the marginal probability update\n\nPxj\nji (xi) =sPxj\n\npold\nij (xi; xj )\npold\ni\n\n(xi)\n\npnew\ni\npold\ni\n\n(xi)\n(xi)\n\n= e(cid:21)new\n\nji\n\n(xi)(cid:0)(cid:21)old\n\n:\n\n(15)\n\nFrom (14) and (15), the update equations for the probabilities are\n\nPxj\n\npnew\nij\n\n(xi; xj) = pold\n\nij (xi; xj)s pold\n\n(xi)\n\ni\npold\nij (xi; xj)\n\n; pnew\n\ni\n\n(xi) =spold\n\ni\n\n(xi)Xxj\n\npold\nij (xi; xj )\n\n(16)\nWith the probability updates in place, we may write down new algorithms mini-\nmizing the family of Bethe equivalent free energies using only probability updates.\nThe update equations (16) can be seen to satisfy the (cid:12)rst constraint in (2). Similar\nupdate equations can be derived for the other constraints in (2). For each Lagrange\nparameter update, an equivalent, simultaneous probability (joint and marginal)\nupdate can be derived similar to (16). The overall family of algorithms can be sum-\nmarized as shown in the pseudocode. Despite the unwieldy algebraic development\npreceding it, the algorithm is very simple and straightforward.\n\nSet free parameters (cid:14) 2 [0; 1] and f(cid:24)ig.\nInitialize fpij ; pig. Set fqi = (1 (cid:0) (cid:14))nig and fri = 1 (cid:0) ni(cid:24)ig.\nBegin A: Outer Loop\n\ni (xi) \n\n(cid:24)j\nj (xj )\n\npij(xi; xj ) (cid:0) pj(xj ))2] < cthr\n\n1\n\nN Pij:i>j [(Pxj\n\npij(xi; xj) (cid:0)\n\nSimultaneously update pij(xi; xj) and pi(xi) below.\n\n(cid:27)ij (xj ) Pxi\n(cid:27)ji(xi) Pxj\n\npij(xi; xj )\npij(xi; xj)\n\n(cid:26)i(xi) pi(xi)\npij(xi; xj) (cid:27)(cid:14)\npi(xi) (cid:26)qi\nBegin B: Inner Loop: Do B until\n\nji(xi)(cid:27)(cid:14)\ni (xi)\n\nij (xj ) ij(xi; xj) (cid:24)i\n\ni (xi) ri\n\npij(xi; xj)\n\nPxj\n\npi(xi))2 + (Pxi\npij (xi; xj) pij(xi; xj )r pi(xi)\npi(xi) qpi(xi)Pxj\npij (xi; xj) pij(xi; xj )r pj (xj )\npj(xj ) qpj(xj )Pxi\nPxi ;xj\n\nNormalize pij(xi; xj).\npij (xi; xj) pij (xi;xj )\n\nPxi\n\npij(xi; xj)\n\npij (xi;xj )\n\npij (xi;xj )\n\npij (xi;xj )\n\nSimultaneously update pij(xi; xj) and pj(xj) below.\n\n\fEnd B\n\nEnd A\n\nIn the above family of algorithms, the MIME algorithm corresponds to free param-\neter settings (cid:14) = 1 and (cid:24)i = 0 which in turn lead to parameter settings qi = 0\nand ri = 1. The Yuille [8] double loop algorithm corresponds to the free parameter\nsettings (cid:14) = 0 and (cid:24)i = 0 which in turn leads to parameter settings qi = ni and\nri = 1. A crucial point is that the energy function for every valid parameter setting\nis equivalent to the Bethe free energy provided constraint satisfaction is exact. The\ninner loop constraint satisfaction threshold parameter cthr setting is very important\nin this regard. We are obviously not restricted to the MIME parameter settings.\nAt this early stage of exploration of the inter-relationships between Bayesian belief\npropagation and inference methods based in statistical physics [7], it is premature\nto speculate regarding the \\best\" parameter settings for (cid:14) and f(cid:24)ig. Most likely,\nthe e(cid:11)ectiveness of the algorithms will vary depending on the problem setting which\nenters into the formulation via the link functions f ijg and the singleton functions\nf ig.\n\n5 Results\n\nWe implemented the family of algorithms in C++ and conducted tests on locally\nconnected 50 node graphs and binary state variables. The i(xi) and ij (xi; xj) are\nof the form e(cid:6)hi and e(cid:6)hij where hi and hij are drawn from uniform distributions\n(in the interval [(cid:0)1; 1]). Provided the constraint satisfaction theshold parameter\ncthr was set low enough, the algorithm (for (cid:14) = 1 and other parameter settings\nas described in Figure 1) exhibited monotonic convergence. Figure 2 shows the\nnumber of inner loop iterations corresponding to di(cid:11)erent settings of the constraint\nsatisfaction threshold parameter. We also implemented the BBP algorithm and\nempirically observed that it often did not converge for these graphs. These results\nare quite preliminary and far more validation experiments are required. However,\nthey provide a proof of concept for our approach.\n\n6 Conclusion\n\nWe began with the MIME principle and showed the equivalence of the MIME-\nbased free energy to the Bethe free energy assuming constraint satisfaction to be\nexact. Then, we derived new decompositions of the Bethe free energy inspired\nby the MIME principle, and driven by our belief that certain decompositions may\nbe more e(cid:11)ective than others. We then derived a convergent algorithm for each\nmember in the family of MIME-based decompositions. It remains to be seen if the\nMIME-based algorithms are e(cid:14)cient for a reasonable class of problems. While the\nMIME-based algorithms derived here use closed-form solutions in the constraint\nsatisfaction inner loop, it may turn out that the inner loop is better handled using\npreconditioned gradient-based descent algorithms. And it is important to explore\nthe inter-relationships between the convergent MIME-based descent algorithms and\nother recent related approaches with interesting convergence properties [4, 5].\n\nReferences\n\n[1] E. Mjolsness and C. Garrett. Algebraic transformations of objective functions. Neural\n\nNetworks, 3:651{669, 1990.\n\n[2] A. Rangarajan. Self annealing and self annihilation: unifying deterministic annealing\n\nand relaxation labeling. Pattern Recognition, 33:635{649, 2000.\n\n\f[3] A. Rangarajan, S. Gold, and E. Mjolsness. A novel optimizing network architecture\n\nwith applications. Neural Computation, 8(5):1041{1060, 1996.\n\n[4] Y. W. Teh and M. Welling. Passing and bouncing messages for generalized inference.\nTechnical Report GCNU 2001-01, Gatsby Computational Neuroscience Unit, Univer-\nsity College, London, 2001.\n\n[5] M. Wainwright, T. Jaakola, and A. Willsky. Tree-based reparameterization framework\nfor approximate estimation of stochastic processes on graphs with cycles. Technical\nReport LIDS P-2510, MIT, Cambridge, MA, 2001.\n\n[6] Y. Weiss. Correctness of local probability propagation in graphical models with loops.\n\nNeural Computation, 12:1{41, 2000.\n\n[7] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Bethe free energy, Kikuchi approxima-\ntions and belief propagation algorithms. In Advances in Neural Information Processing\nSystems 13, Cambridge, MA, 2001. MIT Press.\n\n[8] A. L. Yuille. A double loop algorithm to minimize the Bethe and Kikuchi free energies.\n\nNeural Computation, 2001. (submitted).\n\n\u22120.4\n\n\u22120.4\n\n\u22120.4\n\ny\ng\nr\ne\nn\ne\nE\nM\nM\n\nI\n\n \n\ny\ng\nr\ne\nn\ne\nE\nM\nM\n\n \n\nI\n\ny\ng\nr\ne\nn\ne\nE\nM\nM\n\n \n\nI\n\n\u22120.5\n\n0\n\n500\n\n1000\n\n1500\n\n\u22120.5\n\n0\n\n500\n\n1000\n\n1500\n\n\u22120.5\n\n0\n\n500\n\n1000\n\n1500\n\niteration\n\n(a)\n\niteration\n\n(b)\n\niteration\n\n(c)\n\nFigure 1: MIME energy versus outer loop iteration: 50 node, local topology,\n(cid:14) = 1. Constraint satisfaction threshold parameter cthr was set to (a) 10(cid:0)8 (b) 10(cid:0)4\n(c) 10(cid:0)2\n\ns\nn\no\n\ni\nt\n\na\nr\ne\n\nt\ni\n \n\np\no\no\n\nl\n \nr\ne\nn\nn\n\ni\n \nf\n\no\n#\n\n \n\n \nl\n\na\no\n\nt\n\nt\n\n20\n\n18\n\n16\n\n14\n\n12\n\n10\n\n8\n\n6\n\n4\n\n2\n\n0\n\n0\n\ns\nn\no\n\ni\nt\n\na\nr\ne\n\nt\ni\n \n\np\no\no\n\nl\n \nr\ne\nn\nn\n\ni\n \nf\n\no\n\n \n\n#\n\n \nl\n\na\n\nt\n\no\n\nt\n\n7\n\n6\n\n5\n\n4\n\n3\n\n2\n\n1\n\n0\n\ns\nn\no\n\ni\nt\n\na\nr\ne\n\nt\ni\n \n\np\no\no\n\nl\n \nr\ne\nn\nn\n\ni\n \nf\n\no\n\n \n\n#\n\n \nl\n\na\n\nt\n\no\n\nt\n\n2\n\n1.9\n\n1.8\n\n1.7\n\n1.6\n\n1.5\n\n1.4\n\n1.3\n\n1.2\n\n1.1\n\n1\n\n0\n\n500\n\n1000\n\n1500\n\nouter loop iteration index\n\n(b)\n\n500\n\n1000\n\n1500\n\nouter loop iteration index\n\n(a)\n\n500\n\n1000\n\n1500\n\nouter loop iteration index\n\n(c)\n\nFigure 2: Inner loop iterations versus outer loop: 50 node, local topology,\n(cid:14) = 1. Constraint satisfaction threshold parameter cthr was set to (a) 10(cid:0)8 (b)\n10(cid:0)4 (c) 10(cid:0)2\n\n\f", "award": [], "sourceid": 2009, "authors": [{"given_name": "Anand", "family_name": "Rangarajan", "institution": null}, {"given_name": "Alan", "family_name": "Yuille", "institution": null}]}