{"title": "Finding the M Most Probable Configurations using Loopy Belief Propagation", "book": "Advances in Neural Information Processing Systems", "page_first": 289, "page_last": 296, "abstract": "", "full_text": "Finding the M Most Probable\n\nCon\ufb02gurations Using Loopy Belief\n\nPropagation\n\nChen Yanover and Yair Weiss\n\nSchool of Computer Science and Engineering\n\nThe Hebrew University of Jerusalem\n\n91904 Jerusalem, Israel\n\nfcheny,yweissg@cs.huji.ac.il\n\nAbstract\n\nLoopy belief propagation (BP) has been successfully used in a num-\nber of di\u2013cult graphical models to \ufb02nd the most probable con\ufb02gu-\nration of the hidden variables. In applications ranging from protein\nfolding to image analysis one would like to \ufb02nd not just the best\ncon\ufb02guration but rather the top M . While this problem has been\nsolved using the junction tree formalism, in many real world prob-\nlems the clique size in the junction tree is prohibitively large. In\nthis work we address the problem of \ufb02nding the M best con\ufb02gura-\ntions when exact inference is impossible.\nWe start by developing a new exact inference algorithm for calculat-\ning the best con\ufb02gurations that uses only max-marginals. For ap-\nproximate inference, we replace the max-marginals with the beliefs\ncalculated using max-product BP and generalized BP. We show em-\npirically that the algorithm can accurately and rapidly approximate\nthe M best con\ufb02gurations in graphs with hundreds of variables.\n\n1\n\nIntroduction\n\nConsiderable progress has been made in the \ufb02eld of approximate inference using\ntechniques such as variational methods [7], Monte-Carlo methods [5], mini-bucket\nelimination [4] and belief propagation (BP) [6]. These techniques allow approximate\nsolutions to various inference tasks in graphical models where building a junction\ntree is infeasible due to the exponentially large clique size. The inference tasks that\nhave been considered include calculating marginal probabilities, \ufb02nding the most\nlikely con\ufb02guration, and evaluating or bounding the log likelihood.\n\nIn this paper we consider an inference task that has not been tackled with the same\ntools of approximate inference: calculating the M most probable con\ufb02gurations\n(MPCs). This is a natural task in many applications. As a motivating example,\nconsider the protein folding task known as the side-chain prediction problem. In\nour previous work [17], we showed how to \ufb02nd the minimal-energy side-chain con-\n\ufb02guration using approximate inference in a graphical model. The graph has 300\n\n\fnodes and the clique size in a junction tree calculated using standard software [10]\ncan be up to an order of 1042, so that exact inference is obviously impossible. We\nshowed that loopy max-product belief propagation (BP) achieved excellent results\nin \ufb02nding the \ufb02rst MPC for this graph. In the few cases where BP did not con-\nverge, Generalized Belief Propagation (GBP) always converge, with an increase in\ncomputation. But we are also interested in \ufb02nding the second best con\ufb02guration,\nthe third best or, more generally, the top M con\ufb02gurations. Can this also be done\nwith BP ?\n\nThe problem of \ufb02nding the M MPCs has been successfully solved within the junction\ntree (JT) framework. However, to the best of our knowledge, there has been no\nequivalent solution when building a junction tree is infeasible. A simple solution\nwould be outputting the top M con\ufb02gurations that are generated by a Monte-Carlo\nsimulation or by a local search algorithm from multiple initializations. As we show\nin our simulations, both of these solutions are unsatisfactory. Alternatively, one\ncan attempt to use more sophisticated heuristically guided search methods (such\nas A\u2044) or use exact MPCs algorithms on an approximated, reduced size junction\ntree [4, 1]. However, given the success of BP and GBP in \ufb02nding the \ufb02rst MPC in\nsimilar problems [6, 9] it is natural to look for a method based on BP. In this paper\nwe develop such an algorithm. We start by showing why the standard algorithm [11]\nfor calculating the top M MPCs cannot be used in graphs with cycles. We then\nintroduce a novel algorithm called Best Max-Marginal First (BMMF) and show\nthat when the max-marginals are exact it provably \ufb02nds the M MPCs. We show\nsimulation results of BMMF in graphs where exact inference is impossible, with\nexcellent performance on challenging graphical models with hundreds of variables.\n\n2 Exact MPCs algorithms\n\nWe assume our hidden variables are denoted by a vector X, N = jXj and the\nobserved variables by Y , where Y = y. Let mk = (mk(1); mk(2); \u00a2 \u00a2 \u00a2 ; mk(N )) denote\nthe kth MPC. We \ufb02rst seek a con\ufb02guration m1 that maximizes Pr(X = xjy). Pearl,\nDawid and others [12, 3, 11] have shown that this con\ufb02guration can be calculated\nusing a quantity known as max-marginals (MMs):\n\nmax marginal(i; j) = max\n\nx:x(i)=j\n\nPr(X = xjy)\n\n(1)\n\nlemma:\n\n(i.e.\nMax-marginal\nPr(X = m1jy) > Pr(X = xjy); 8x 6= m1) then x1 de\ufb02ned by x1(i) =\narg maxj max marginal(i; j) will recover the MAP assignment, m1 = x1.\n\nIf there exists a unique MAP assignment m1\n\nProof: Suppose, that there exists i for which m1(i) = k, x1(i) = l, and k 6= l.\nIt follows that maxx:x(i)=k Pr(X = xjy) > maxx:x(i)=l Pr(X = xjy) which is a\ncontradiction to the de\ufb02nition of x1.\n\nWhen the graph is a tree, the MMs can be calculated exactly using max-product\nbelief propagation [16, 15, 12] using two passes: one up the tree and the other down\nthe tree. Similarly, for an arbitrary graph they can be calculated exactly using two\npasses of max-propagation in the junction tree [2, 11, 3].\n\nA more e\u2013cient algorithm for calculating m1 requires only one pass of max-\npropagation. After calculating the max-marginal exactly at the root node, the\nMAP assignment m1 can be calculated by tracing back the pointers that were used\nduring the max-propagation [11]. Figure 1a illustrates this traceback operation in\nthe Viterbi algorithm in HMMs [13] (the pairwise potentials favor con\ufb02gurations\nwhere neighboring nodes have di\ufb01erent values). After calculating messages from left\n\n\fx(2) = 0\n\nx(3) = 1\n\nx(1)\n\nx(2)\n\nx(3)\n\n1\n=\n\nx(1)\n\n0\n=\n\nx(1)\n\nx(1)\n\nx\n\n(\n\nx\n\n2\n\n(\n\n2\n\n)\n\n)\n\n=\n\n \n\n=\n\n1\n\n \n\n0\n\na\n\nx(3)\n\nx(2)\n\nx(3) = 1\n\nx(3) = 0\n\nb\n\nFigure 1: a. The traceback operation in the Viterbi algorithm. The MAP con-\n\ufb02guration can be calculated by a forward message passing scheme followed by a\nbackward \\traceback\". b. The same traceback operation applied to a loopy graph\nmay give inconsistent results.\n\nto right using max-product, we have the max-marginal at node 3 and can calculate\nx1(3) = 1. We then use the value of x1(3) and the message from node 1 to 2 to \ufb02nd\nx1(2) = 0. Similarly, we then trace back to \ufb02nd the value of x1(1).\n\nThese traceback operations, however, are problematic in loopy graphs. Figure 1b\nshows a simple example from [15] with the same potentials as in \ufb02gure 1a. After\nsetting x1(3) = 1 we traceback and \ufb02nd x1(2) = 0, x1(1) = 1 and \ufb02nally x1(3) = 0,\nwhich is obviously inconsistent with our initial choice.\n\nOne advantage of using traceback is that it can recover m1 even if there are \\ties\"\nin the MMs, i.e. when there exists a max-marginal that has a non-unique maxi-\nmizing value. When there are ties, the max-marginal lemma no longer holds and\nindependently maximizing the MMs will not \ufb02nd m1 (cf. [12]).\n\nFinding m1 using only MMs requires multiple computation of MMs | each time\nwith the additional constraint x(i) = j, where i is a tied node and j one of its\nmaximizing values | until no ties exist.\nIt is easy to show that this algorithm\nwill recover m1. The proof is a special case of the proof we present for claim 2 in\nthe next section. However, we need to recalculate the MMs many times until no\nmore ties exist. This is the price we pay for not being able to use traceback. The\nsituation is similar if we seek the M MPCs.\n\n2.1 The Simpli\ufb02ed Max-Flow Propagation Algorithm\n\nNilsson\u2019s Simpli\ufb02ed Max-Flow Propagation (SMFP) [11] starts by calculating the\nMMs and using the max-marginal lemma to \ufb02nd m1. Since m2 must di\ufb01er from m1\nin at least one variable, the algorithm de\ufb02nes N conditioning sets, Ci , (x(1) =\nm1(1); x(2) = m1(2); \u00a2 \u00a2 \u00a2 ; x(i\u00a11) = m1(i\u00a11); x(i) 6= m1(i)). It then uses the max-\nmarginal lemma to \ufb02nd the most probable con\ufb02guration given each conditioning\nset, xi = arg maxx Pr(X = xjy; Ci) and \ufb02nally m2 = arg maxx2fxig Pr(X = xjy).\nSince the conditioning sets form a partition, it is easy to show that the algorithm\n\ufb02nds m2 after N calculations of the MMs. Similarly, to \ufb02nd mk the algorithm uses\nthe fact that mk must di\ufb01er from m1; m2; \u00a2 \u00a2 \u00a2 ; mk\u00a11 in at least one variable and\nforms a new set of up to N conditioning sets. Using the max-marginal lemma one\ncan \ufb02nd the MPC given each of these new conditioning sets. This gives up to N\nnew candidates, in addition to (k \u00a1 1)(N \u00a1 1) previously calculated candidates. The\n\n\fFigure 2: An illustration of our novel BMMF algorithm on a simple example.\n\nmost probable candidate out of these k(N \u00a1 1) + 1 is guaranteed to be mk.\n\nAs pointed out by Nilsson, this simple algorithm may require far too many cal-\nculations of the MMs (O(M N )). He suggested an algorithm that uses traceback\noperations to reduce the computation signi\ufb02cantly. Since traceback operations are\nproblematic in loopy graphs, we now present a novel algorithm that does not use\ntraceback but may require far less calculation of the MMs compared to SMFP.\n\n2.2 A novel algorithm: Best Max-Marginal First\n\nFor simplicity of exposition, we will describe the BMMF algorithm under what we\ncall the strict order assumption, that no two con\ufb02gurations have exactly the same\nprobability.\n\nWe illustrate our algorithm using a simple example (\ufb02gure 2). There are 4 bi-\nnary variables in the graphical model and we can \ufb02nd the top 3 MPCs exactly:\n1100; 1110; 0001.\n\nOur algorithm outputs a set of candidates xt, one at each iteration. In the \ufb02rst\niteration, t = 1, we start by calculating the MMs, and using the max-marginal\nlemma we \ufb02nd m1. We now search the max-marginal table for the next best max-\nmarginal value. In this case it is obtained with x(3) = 1. In the second iteration,\nt = 2, we now lock x(3) = 1. In other words, we calculate the MMs with the added\nconstraint that x(3) to 1. We use the max-marginal lemma to \ufb02nd the most likely\ncon\ufb02guration with x(3) = 1 locked and obtain x2 = 1110. Note that we have found\nthe second most likely con\ufb02guration. We then add the complementary constraint\nx(3) 6= 1 to the originating constraints set and calculate the MMs. In the third\niteration, t = 3, we search both previous max-marginal tables and \ufb02nd the best\nremaining max-marginal.\nIt is obtained at x(1) = 0, t = 1. We now add the\nconstraint x(1) = 0 to the constraints set from t = 1, calculate the MMs and use\nthe max-marginal lemma to \ufb02nd x3 = 0001. Finally, we add the complementary\nconstraint x(1) 6= 0 to the originating constraints set and calculate the MMs. Thus\n\n\fafter 3 iterations we have found the \ufb02rst 3 MPCs using only 5 calculations of the\nMMs.\n\nThe Best Max-Marginal First (BMMF) algorithm for calculating the M\nmost probable con\ufb02gurations:\n\n\u2020 Initialization\n\nSCORE1(i; j) = max\n\nPr(X = xjy)\n\nx:x(i)=j\nx1(i) = arg max\n\nj\n\nSCORE1(i; j)\n\nCONSTRAINTS1 = ;\nUSED2 = ;\n\n\u2020 For t=2:T\n\nSEARCHt = (i; j; s < t : xs(i) 6= j; (i; j; s) =2 USEDt)\n(it; jt; st) = arg\n\nSCOREs(i; j)\n\nmax\n\n(i;j;s)2SEARCHt\n\nCONSTRAINTSt = CONSTRAINTSst [ f(x(it) = jt)g\nPr(X = xjy)\n\nSCOREt(i; j) =\n\nmax\n\nxt(i) = arg max\n\nx:x(i)=j;CONSTRAINTSt\nSCOREt(i; j)\n\nj\n\nUSEDt+1 = USEDt [ f(it; jt; st)g\n\nCONSTRAINTSst = CONSTRAINTSst [ f(x(it) 6= jt)g\nPr(X = xjy)\n\nSCOREst (i; j) =\n\nmax\n\nx:x(i)=j;CONSTRAINTSst\n\n(2)\n\n(3)\n\n(4)\n(5)\n\n(6)\n(7)\n\n(8)\n(9)\n\n(10)\n\n(11)\n(12)\n(13)\n\nClaim 1: x1 calculated by the BMMF algorithm is equal to the MPC m1.\n\nProof: This is just a restatement of the max-marginal lemma.\n\nClaim 2: x2 calculated by the BMMF algorithm is equal to the second MPC m2.\n\nProof: We \ufb02rst show that m2(i2) = j2. We know that m2 di\ufb01ers in at least one\nlocation from m1. We also know that out of all the assignments that di\ufb01er from m1\nit must have the highest probability. Suppose, that m2(i2) 6= j2. By the de\ufb02nition\nof SCORE1, this means that there exists an x 6= m2 that is not m1 whose posterior\nprobability is higher than that of m2. This is a contradiction. Now, out of all\nassignments for which x(i2) = j2, m2 has highest posterior probability (recall that\nby de\ufb02nition, m1(i2) 6= j2). The max-marginal lemma guarantees that x2 = m2.\n\nsatisfying\nPartition Lemma:\nCONSTRAINTSk. Then, after iteration k, the collection fSAT1; SAT2; \u00a2 \u00a2 \u00a2 ; SATkg\nis a partition of the assignment space.\n\nset of assignments\n\nLet SATk denote\n\nthe\n\nProof: By induction over k. For k = 1, CONSTRAINTS1 = ; and the claim\ntrivially holds. For k = 2, SAT1 = fxjx(i2) 6= j2g and SAT2 = fxjx(i2) = j2g\nare mutually disjoint and SAT1 [ SAT2 covers the assignment space, therefore\nfSAT1; SAT2g is a partition of the assignment space. Assume that after itera-\ntion k \u00a1 1, fSAT1; SAT2; \u00a2 \u00a2 \u00a2 ; SATk\u00a11g is a partition of the assignment space. Note\nthat in iteration k, we add CONSTRAINTSk = CONSTRAINTSsk [ f(x(ik) = jk)g\nand modify CONSTRAINTSsk = CONSTRAINTSsk [ f(x(ik) 6= jk)g, while keep-\ning all other constraints set unchanged. SATk and the modi\ufb02ed SATsk are pair-\nwise disjoint and SATk [ SATsk covers the originating SATsk. Since after itera-\n\n\ftion k \u00a1 1 fSAT1; SAT2; \u00a2 \u00a2 \u00a2 ; SATk\u00a11g is a partition of the assignment space, so is\nfSAT1; SAT2; \u00a2 \u00a2 \u00a2 ; SATkg.\n\nClaim 3: xk, the con\ufb02guration calculated by the algorithm in iteration k, is mk,\nthe kth MPC.\n\nProof: First, note that SCOREsk(ik; jk) \u2022 SCOREsk\u00a11 (ik\u00a11; jk\u00a11), otherwise (ik; jk; sk)\nwould have been chosen in iteration k \u00a1 1. Following the partition lemma, each\nassignment arises at most once. By the strict order assumption, this means that\nSCOREsk (ik; jk) < SCOREsk\u00a11 (ik\u00a11; jk\u00a11).\n\nLet mk 2 SATs\u2044. We know that mk di\ufb01ers from all previous xs in at least one\nlocation. In particular, mk must di\ufb01er from xs\u2044 in at least one location. Denote\nthat location by i\u2044 and mk(i\u2044) = j\u2044. We want to show that SCOREs\u2044(i\u2044; j\u2044) =\nPr(X = mkjy). First, note that (i\u2044; j\u2044; s\u2044) =2 USEDk. If we had previously used it,\nthen (x(i\u2044) 6= j\u2044) 2 CONSTRAINTSs\u2044 , which contradicts the de\ufb02nition of s\u2044. Now\nsuppose there exists ml, l \u2022 k \u00a1 1 such that ml 2 SATs\u2044 and ml(i\u2044) = j\u2044. Since\n(i\u2044; j\u2044; s\u2044) =2 USEDk this would mean that SCOREsk(ik; jk) \u201a SCOREsk\u00a11 (ik\u00a11; jk\u00a11)\nwhich is a contradiction. Therefore mk is the most probable assignment that satis\ufb02es\nCONSTRAINTSs\u2044 and has the value j\u2044 at location i\u2044. Hence SCOREs\u2044(i\u2044; j\u2044) =\nPr(X = mkjy).\n\nA consequence of claim 3 is that BMMF will \ufb02nd the top M MPCs using 2M\ncalculations of max marginals. In contrast, SMFP requires O(M N ) calculations.\nIn real world loopy problems, especially when N (cid:192)M , this can lead to drastically\ndi\ufb01erent run times. First, real world problems may have thousands of nodes so\na speedup of a factor of N will be very signi\ufb02cant. Second, calculating the MMs\nrequires iterative algorithms (e.g. BP or GBP) so that the speedup of a factor of\nN may be the di\ufb01erence between running a month versus running half a day.\n\n3 Approximate MPCs algorithms using loopy BP\n\nWe now compare 4 approximate MPCs algorithms:\n\n1. loopy BMMF. This is exactly the algorithm in section 2.2 with the MMs\n\nbased on the beliefs computed by loopy max-product BP or max-GBP:\n\nSCOREk(i; j) = Pr(X = xkjy)\n\nBEL(i; jjCONSTRAINTSk)\n\nmaxjBEL(i; jjCONSTRAINTSk)\n\n(14)\n\n2. loopy SMFP. This is just Nilsson\u2019s SMFP algorithm with the MMs cal-\n\nculated using loopy max-product BP.\n\n3. Gibbs sampling. We collect all con\ufb02gurations sampled during a Gibbs\n\nsampling simulation and output the top M of these.\n\n4. Greedy. We collect all con\ufb02gurations encountered during a greedy opti-\nmization of the posterior probability (this is just Gibbs sampling at zero\ntemperature) and output the top M of these.\n\nAll four algorithms were implemented in Matlab and the number of iterations for\ngreedy and Gibbs were chosen so that the run times would be the same as that of\nloopy BMMF. Gibbs sampling started from m1, the most probable assignment, and\nthe greedy local search algorithm initialized to an assignment \\similar\" to m1 (1%\nof the variables were chosen randomly and their values (cid:176)ipped).\n\nFor the protein folding problem [17], we used a database consisting of 325 proteins,\neach gives rise to a graphical model with hundreds of variables and many loops. We\n\n\fy\ng\nr\ne\nn\nE\n\n303.5\n\n303\n\n302.5\n\n302\n\n301.5\n\n301\n\n300.5\n\n300\n\n299.5\n\n5\n\n10\n\nGibbs\n\nloopy BMMF\n\n\u2212591\n\n\u2212591.1\n\n\u2212591.2\n\n\u2212591.3\n\nGreedy\n\nloopy BMMF\n\n20\n\n30\n\n15\n35\nConfiguration Number\n\n25\n\n40\n\n45\n\n50\n\n5\n\n10\n\nConfiguration Number\n\n15\n\n20\n\n25\n\nFigure 3: The con\ufb02gurations found by loopy-BMMF compared to those obtained\nusing Gibbs sampling and greedy local search for a large toy-QMR model (right)\nand a 32 \u00a3 32 spin glass model (right).\n\ncompared the top 100 correct con\ufb02gurations obtained by the A\u2044 heuristic search\nalgorithm [8] to those found by loopy BMMF algorithm, using BP. In all cases where\nA\u2044 was feasible, loopy BMMF always found the correct con\ufb02gurations. Also, the\nBMMF algorithm converged more often (96:3% compared to 76:3%) and ran much\nfaster.\n\nWe then assessed the performance of the BMMF algorithm for a couple of relatively\nsmall problems, where exact inference was possible. For both a small toy-QMR\nmodel (with 20 diseases and 50 symptoms) and a 8 \u00a3 8 spin glass model the BMMF\nalgorithm obtained the correct MPCs.\n\nFinally, we compared the performance of the algorithms for couple of hard problems\n| a large toy-QMR model (with 100 diseases and 200 symptoms) and 32 \u00a3 32 spin\nglass model with large pairwise interactions. For the toy-QMR model, the MPCs\ncalculated by the BMMF algorithm were better than those calculated by Gibbs\nsampling (Figure 3, left). For the large spin glass, we found that ordinary BP\ndidn\u2019t converge and used max-product generalized BP instead. This is exactly the\nalgorithm described in [18] with marginalizations replaced with maximizations. We\nfound that GBP converged far more frequently and indeed the MPCs found using\nGBP are much better than those obtained with Gibbs or greedy (Figure 3, right.\nGibbs results are worse than those of the greedy search and therefore not shown).\nNote that \ufb02nding the second MPC using the simple MFP algorithm requires a week,\nwhile the loopy BMMF calculated the 25 MPCs in few hours only.\n\n4 Discussion\n\nExisting algorithms successfully \ufb02nd the M MPCs for graphs where building a JT is\npossible. However, in many real-world applications exact inference is impossible and\napproximate techniques are needed. In this paper we have addressed the problem\nof \ufb02nding the M MPCs using the techniques of approximate inference. We have\npresented a new algorithm, called Best Max-Marginal First that will provably solve\nthe problem if MMs can be calculated exactly. We have shown that the algorithm\ncontinues to perform well when the MMs are approximated using max-product loopy\nBP or GBP.\n\nInterestingly, the BMMF algorithm uses the numerical values of the approximate\nMMs to determine what to do in each iteration. The success of loopy BMMF\nsuggests that in some cases the max product loopy BP gives a good numerical\napproximation to the true MMs. Most existing analysis of loopy max-product [16,\n\n\f15] has focused on the con\ufb02gurations found by the algorithm. It would be interesting\nto extend the analysis to bound the approximate MMs which in turn would lead to\na provable approximate MPCs algorithm.\n\nWhile we have used loopy BP to approximate the MMs, any approximate inference\ncan be used inside BMMF to derive a novel, approximate MPCs algorithm.\nIn\nparticular, the algorithm suggested by Wainwright et al. [14] can be shown to give\nthe MAP assignment when it converges. It would be interesting to incorporate their\nalgorithm into BMMF.\n\nReferences\n\n[1] A. Cano, S. Moral, and A. Salmer\u00b6on. Penniless propagation in join trees. Journal of\n\nIntelligent Systems, 15:1010{1027, 2000.\n\n[2] R. Cowell. Advanced inference in Bayesian networks. In M.I. Jordan, editor, Learning\n\nin Graphical Models. MIT Press, 1998.\n\n[3] P. Dawid. Applications of a general propagation algorithm for probabilistic expert\n\nsystems. Statistics and Computing, 2:25{36, 1992.\n\n[4] R. Dechter and I. Rish. A scheme for approximating probabilistic inference.\n\nIn\n\nUncertainty in Arti\ufb02cial Intelligence (UAI 97), 1997.\n\n[5] A. Doucet, N. de Freitas, K. Murphy, and S. Russell. Rao-blackwellised particle\n\ufb02ltering for dynamic bayesian networks. In Proceedings UAI 2000. Morgan Kaufmann,\n2000.\n\n[6] B.J. Frey, R. Koetter, and N. Petrovic. Very loopy belief propagation for unwrapping\nphase images. In Adv. Neural Information Processing Systems 14. MIT Press, 2001.\n[7] T.S. Jaakkola and M.I. Jordan. Variational probabilistic inference and the QMR-DT\n\ndatabase. JAIR, 10:291{322, 1999.\n\n[8] Andrew R. Leach and Andrew P. Lemon. Exploring the conformational space of pro-\ntein side chains using dead-end elimination and the A* algorithm. Proteins: Structure,\nFunction, and Genetics, 33(2):227{239, 1998.\n\n[9] A. Levin, A. Zomet, and Y. Weiss. Learning to perceive transparency from the\n\nstatistics of natural scenes. In Proceedings NIPS 2002. MIT Press, 2002.\n\n[10] Kevin Murphy. The bayes net toolbox for matlab. Computing Science and Statistics,\n\n33, 2001.\n\n[11] D. Nilsson. An e\u2013cient algorithm for \ufb02nding the M most probable con\ufb02gurations in\n\nprobabilistic expert systems. Statistics and Computing, 8:159{173, 1998.\n\n[12] Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible\n\nInference. Morgan Kaufmann, 1988.\n\n[13] L.R. Rabiner. A tutorial on hidden Markov models and selected applications in speech\n\nrecognition. Proc. IEEE, 77(2):257{286, 1989.\n\n[14] M. J. Wainwright, T. Jaakkola, and A. S. Willsky. Exact map estimates by (hyper)tree\n\nagreement. In Proceedings NIPS 2002. MIT Press, 2002.\n\n[15] M. J. Wainwright, T. Jaakkola, and A. S. Willsky. Tree consistency and bounds\non the performance of the max-product algorithm and its generalizations. Technical\nReport P-2554, MIT LIDS, 2002.\n\n[16] Y. Weiss and W.T. Freeman. On the optimality of solutions of the max-product\nbelief propagation algorithm in arbitrary graphs. IEEE Transactions on Information\nTheory, 47(2):723{735, 2001.\n\n[17] C. Yanover and Y. Weiss. Approximate inference and protein folding. In Proceedings\n\nNIPS 2002. MIT Press, 2002.\n\n[18] J. Yedidia, W. Freeman, and Y. Weiss. Understanding belief propagation and its gen-\neralizations. In G. Lakemeyer and B. Nebel, editors, Exploring Arti\ufb02cial Intelligence\nin the New Millennium. Morgan Kaufmann, 2003.\n\n\f", "award": [], "sourceid": 2349, "authors": [{"given_name": "Chen", "family_name": "Yanover", "institution": null}, {"given_name": "Yair", "family_name": "Weiss", "institution": null}]}