{"title": "Tree-based reparameterization for approximate inference on loopy graphs", "book": "Advances in Neural Information Processing Systems", "page_first": 1001, "page_last": 1008, "abstract": null, "full_text": "Tree-based reparameterization for \n\napproximate  inference  on loopy graphs \n\nMartin J.  Wainwright,  Tommi  Jaakkola,  and Alan S.  Will sky \n\nDepartment of Electrical Engineering and Computer Science \n\nMassachusetts Institute of Technology \n\nmjwain@mit.edu \n\nCambridge, MA  02139 \ntommi@ai.mit.edu \n\nwillsky@mit.edu \n\nAbstract \n\nWe  develop  a  tree-based  reparameterization framework  that  pro(cid:173)\nvides a  new conceptual view of a  large class of iterative algorithms \nfor  computing  approximate  marginals  in  graphs  with  cycles.  It \nincludes  belief propagation  (BP),  which  can  be  reformulated  as  a \nvery local form of reparameterization.  More generally, we  consider \nalgorithms  that  perform  exact  computations  over  spanning  trees \nof  the  full  graph.  On  the  practical  side,  we  find  that  such  tree \nreparameterization (TRP)  algorithms have convergence properties \nsuperior  to  BP.  The  reparameterization perspective  also  provides \na  number  of  theoretical  insights  into  approximate  inference,  in(cid:173)\ncluding  a  new  characterization  of fixed  points;  and  an  invariance \nintrinsic  to  TRP /BP.  These  two  properties  enable  us  to  analyze \nand  bound  the  error  between  the  TRP /BP  approximations  and \nthe  actual  marginals.  While  our  results  arise  naturally  from  the \nTRP perspective, most of them apply in an algorithm-independent \nmanner  to  any  local  minimum  of the  Bethe  free  energy.  Our  re(cid:173)\nsults  also  have  natural  extensions  to  more  structured approxima(cid:173)\ntions  [e.g. ,  1,  2]. \n\n1 \n\nIntroduction \n\nGiven  a  graphical  model,  one  important  problem  is  the  computation  of marginal \ndistributions  of variables  at each  node.  Although  highly  efficient  algorithms exist \nfor  this  task  on  trees,  exact  solutions  are  prohibitively  complex  for  more  general \ngraphs  of any  substantial  size.  This  difficulty  motivates  the  use  of approximate \ninference  algorithms,  of which  one  of the  best-known  and  most  widely  studied  is \nbelief propagation  [3],  also  known as  the sum-product algorithm in coding  [e.g.,  4]. \n\nRecent  work  has  yielded  some  insight  into  belief  propagation  (BP).  Several  re(cid:173)\nsearchers  [e.g.,  5,  6]  have analyzed the single loop  case,  where  BP can  be reformu(cid:173)\nlated  as  a  matrix powering  method.  For  Gaussian  processes  on  arbitrary graphs, \ntwo  groups  [7,  8]  have  shown  that  the  means  are  exact  when  BP  converges.  For \ngraphs  corresponding  to  turbo  codes,  Richardson  [9]  established  the  existence  of \nfixed  points, and gave conditions for  their stability.  More recently, Yedidia et al.  [1] \n\n\fshowed that BP corresponds to constrained minimization of the Bethe free  energy, \nand proposed extensions  based on  Kikuchi  expansions  [10].  Related extensions  to \nBP were proposed in [2].  The paper [1]  has inspired other researchers [e.g.,  11, 12]  to \ndevelop more sophisticated algorithms for  minimizing the Bethe free energy.  These \nadvances  notwithstanding,  much  remains  to  be  understood  about  the  behavior of \nBP. \n\nThe framework of this  paper provides a  new conceptual view of various algorithms \nfor  approximate inference,  including  BP.  The  basic  idea is  to  seek  a  reparameter(cid:173)\nization  of the  distribution  that  yields  factors  which  correspond,  either  exactly  or \napproximately,  to  the  desired  marginal  distributions.  If the  graph  is  acyclic  (i.e., \na  tree) ,  then there  exists  a  unique  reparameterization specified  by  exact  marginal \ndistributions  over  cliques.  For  a  graph  with  cycles,  we  consider  the  idea of itera(cid:173)\ntively reparameterizing different parts of the distribution, each corresponding to an \nacyclic  subgraph.  As  we  will  show,  BP  can be interpreted in  exactly this  manner, \nin which each reparameterization takes place over a pair of neighboring nodes.  One \nof the consequences of this interpretation is  a  more storage-efficient  \"message-free\" \nimplementation of BP.  More significantly, this interpretation leads to more general \nupdates in which reparameterization is performed over arbitrary acyclic subgraphs, \nwhich  we  refer to as  tree-based  reparameterization  (TRP)  algorithms. \n\nAt a low level, the more global TRP updates can be viewed as a tree-based schedule \nfor  message-passing.  Indeed,  a  practical  contribution  of this  paper  is  to  demon(cid:173)\nstrate  that  TRP  updates  tend  to  have  better  convergence  properties  than  local \nBP updates.  At a  more abstract level,  the reparameterization perspective  provides \nvaluable conceptual insight, including a  simple tree-consistency characterization of \nfixed  points,  as  well  as  an invariance  intrinsic to TRP /BP.  These  properties  allow \nus to derive an exact expression for  the error between the TRP /BP approximations \nand  the  actual  marginals.  Based  on  this  exact  expression,  we  derive  computable \nbounds  on  the  error.  Most  of these  results,  though  they  emerge  very  naturally in \nthe TRP framework, apply in an algorithm-independent manner to any constrained \nlocal minimum of the Bethe free  energy, whether obtained by TRP /BP or an alter(cid:173)\nnative method  [e.g. , 11,  12].  More details of our work can be found  in  [13, 14]. \n\n1.1  Basic notation \n\nAn undirected graph Q =  (V, \u00a3) consists of a set of nodes or vertices V  =  {l , ... ,N} \nthat  are  joined  by  a  set  of  edges  \u00a3.  Lying  at  each  node  s  E  V  is  a  discrete \nrandom  variable  Xs  E  {a, ... ,m -\nI}.  The  underlying  sample  space  X N  is  the \nset  of  all  N  vectors  x  =  {x s  I S  E  V}  over  m  symbols,  so  that  IXNI  =  m N . \nWe  focus  on  stochastic  processes  that  are  Markov  with  respect  to  Q,  so  that  the \nHammersley-Clifford  theorem  [ e.g.,  3]  guarantees  that  the  distribution  factorizes \nas p(x)  ex:  [lcEe 'l/Jc(xc)  where  'l/Jc(xc)  is  a  compatibility  function  depending  only \non  the  subvector Xc  =  {xs  I SEC}  of nodes  in  a  particular  clique  C.  Note  that \neach individual  node  forms  a  singleton clique, so  that  some  of the factors  'l/Jc  may \ninvolve  functions  of each  individual  variable.  As  a  consequence,  if  we  have  inde(cid:173)\npendent  measurements  Ys  of  Xs  at  some  (or  all)  of  the  nodes,  then  Bayes'  rule \nimplies  that the effect  of including these  measurements -\ni.e.,  the transformation \nfrom  the prior distribution p(x)  to the conditional distribution p(x I y)  -\nis simply \nto  modify  the  singleton  factors.  As  a  result,  throughout  this  paper,  we  suppress \nexplicit  mention  of measurements,  since  the  problem  of  computing  marginals  for \neither p(x)  or  p(x I y)  are  of identical  structure  and  complexity.  The  analysis  of \nthis  paper  is  restricted  to  graphs  with  singleton  ('l/Js)  and  pairwise  ('l/Jst}  cliques. \nHowever,  it  is  straightforward to extend reparameterization to larger cliques,  as in \ncluster variational methods  [e.g.,  10]. \n\n\f1.2  Exact  tree inference as  reparameterization \n\nAlgorithms  for  optimal  inference  on  trees  have  appeared  in  the  literature of vari(cid:173)\nous  fields  [e.g.,  4,  3].  One important  consequence  of the junction tree  representa(cid:173)\ntion [15]  is that any exact algorithm for optimal inference on trees actually computes \nmarginal distributions for  pairs (s, t)  of neighboring nodes.  In doing so,  it produces \nan alternative factorization p(x) =  TI sEV Ps  TI(s,t)E\u00a3 Pst/(PsPt ) where  Ps and Pst \nare the single-node  and pairwise marginals respectively.  This  {Ps, Pst}  representa(cid:173)\ntion can be deduced from  a  more general factorization result on junction trees [e.g. \n15].  Thus,  exact  inference  on  trees  can  be  viewed  as  computing  a  reparameter(cid:173)\nized factorization of the distribution p(x)  that explicitly exposes the local marginal \ndistributions. \n\n2  Tree-based reparameterization for  graphs  with cycles \n\ns \n\nt \n\ns , \n\nThe basic idea of a  TRP algorithm is  to perform successive reparameterization up(cid:173)\ndates  on  trees  embedded  within  the  original  graph.  Although  such  updates  are \napplicable  to  arbitrary  acyclic  substructures,  here  we  focus  on  a  set  T 1 ,  ... , TL \nof  embedded  spanning  trees.  To  describe  TRP  updates,  let  T  be  a  pseudo(cid:173)\nmarginal  probability  vector  consisting  of single-node  marginals  Ts(xs)  for  8  E  V; \nand  pairwise  joint  distributions  Tst (x s, Xt)  for  edges  (s, t)  E  [.  Aside  from  pos(cid:173)\nitivity  and  normalization  (Lx  Ts  =  1;  L x  x  Tst  =  1)  constraints,  a  given  vec-\ntor  T  is  arbitraryl ,  and  gives  rises  to  a  parameterization  of  the  distribution  as \np(x; T)  ex:  TI sEV Ts  TI(S,t)E\u00a3 Tst/ {(Lx. Tst)(L Xt  Tst )}, where the dependence of Ts \nand Tst  on x  is  omitted for  notational simplicity.  Ultimately,  we  shall seek vectors \ni.e. , that belong to <C  =  {T I  Lx. Tst  =  Tt \\;/  (8, t)  E  [}.  In \nT  that are  consistent -\nthe context of TRP, such consistent vectors represent  approximations to the exact \nmarginals of the distribution defined  by the graph with cycles. \nWe  shall  express  TRP  as  a  sequence  of functional  updates  Tn  I-t  T n+1 ,  where \nsuperscript  n  denotes  iteration  number.  We  initialize  at  TO  via  T~t =  Ii 'l/Js'I/Jt'I/Jst \nand  T~ =  Ii 'l/Js  TItEN(S) [L X t  'l/Jst'I/Jt],  where  Ii  denotes  a  normalization factor;  and \nN(8)  is  the  set  of  neighbors  of  node  8.  At  iteration  n,  we  choose  some  spanning \ntree  Ti(n)  with  edge  set  [i(n),  and factor  the  distribution p(x; Tn)  into  a  product \nof two terms \n\nex: \n\nex: \n\n(la) \n\n(lb) \n\ncorresponding,  respectively,  to terms in  the spanning tree;  and residual  terms over \nedges  in  [/ [i(n)  removed  to  form  Ti(n).  We  then  perform  a  reparameterization \nupdate on pi(n) (x; Tn) -\n\nexplicitly: \n\npi(n) (x'; Tn) \n\nfor  all  (s,t)  E  [i(n) \n\n(2) \n\nx, s.t( x ~ ,x;)=(x. ,xtl \n\nwith  a  similar  update for  the single-node  marginals  {Ts  I s  E  V}.  These  marginal \ncomputations  can  be  performed  efficiently  by  any exact  tree  algorithm  applied  to \nTi(n).  Elements of T n+1  corresponding to terms in  ri(n) (x; Tn) are left  unchanged \n\nlIn general,  T  need not  be the actual marginals for  any distribution. \n\n\f(i.e., Ts~+l =  Tst for all (8, t)  E E /Ei(n)) . The only restriction placed on the spanning \ntree set T 1, ... ,TL  is that each edge (8, t)  E E belong to at least one spanning tree. \nFor practical reasons,  it  is  desirable to choose a  set of spanning trees that leads to \nrapid mixing  throughout  the graph.  A  natural  choice  for  the  spanning  tree  index \ni(n)  is  the  cyclic  ordering,  in which i(n) ==  n(modL) + 1. \n\n2.1  BP as  local reparameterization \n\nInterestingly,  BP  can  be  reformulated  in  a  \"message-free\"  manner  as  a  sequence \nof local  rather  than  global  reparameterization  operations.  This  message-free  ver(cid:173)\nsion  of BP  directly  updates  approximate  marginals,  Ts  and  Tst,  with  initial  val(cid:173)\nues  determined from  the initial messages  M~t and  the original compatibility func(cid:173)\ntions  of  the  graphical  model  as  T~  = \nIi 'l/Js  ITuEN(S)  M~s  for  all  8  E  V  and \nT~t  =  Ii 'l/Jst'l/Js'l/Jt  ITu EN(s)/t M~s ITuEN(t) /s M~t  for  all  (8, t)  E  E,  where  Ii  de(cid:173)\nnotes a normalization factor.  At iteration n, these quantities are updated according \nto the following  recursions: \n\nT;'t \n\n(3a) \n\n(3b) \n\nThe  reparameterization form  of BP  decomposes  the  graph  into  a  set  of two-node \ntrees  (one  for  each  edge  (8, t));  performs  exact  inference  on  such  tree  via  equa(cid:173)\ntion  (3b);  and  merges  the  marginals  from  each  tree  via equation  (3a).  It can  be \nshown by induction [see  13]  that this simple reparameterization algorithm is equiv(cid:173)\nalent to the message-passing version of BP. \n\n2.2  Practical advantages  of TRP updates \n\nSince a single TRP update suffices to transmit information globally throughout the \ngraph, it might  be expected to have better convergence properties than the purely \nlocal BP updates.  Indeed, this has proven to be the case in various experiments that \nwe  have performed on two graphs  (a single loop of 15  nodes,  and a  7 x  7 grid).  We \nfind  that TRP tends  to converge  2 to 3  times  faster  than  BP on average  (rescaled \nfor  equivalent computational cost);  more importantly, TRP will  converge for  many \nproblems where BP fails  [13].  Further research needs to address the optimal choice \nof trees  (not  necessarily spanning)  in implementing TRP. \n\n3  Theoretical results \n\nThe  TRP  perspective  leads  to  a  number  of theoretical  insights  into  approximate \ninference,  including  a  new  characterization of fixed  points,  an invariance  property, \nand error analysis. \n\n3.1  Analysis  of TRP updates \n\nOur analysis  of TRP updates uses  a  cost  function  that is  an approximation to the \nKullback-Leibler  divergence  between  p(x; T)  and p(x; U)  -\nnamely,  the  quantity \n\n\fXs \n\nGiven  an arbitrary U  E C,  we  show that successive iterates  {Tn}  of TRP updates \nsatisfy the following  \"Pythagorean\" identity: \n\nG(U ; T n)  =  G(U ; T n+l ) + G(T n+1; T n) \n\n(4) \n\nwhich can be used to show that TRP fixed points T * satisfy the necessary conditions \nto  be  local  minima of G  subject  to  the  constraint  T *  E  C.  The  cost  function  G, \nthough distinct  from  the  Bethe free  energy  [1] , coincides  with  it  on  the  constraint \nset C,  thereby allowing us to establish the equivalence of TRP and BP fixed  points. \n\n3.2  Characterization of fixed  points \n\nFrom the reparameterization perspective arises an intuitive characterization of any \nTRP /BP fixed  point  T *.  Shown  in  Figure  l(a)  is  a  distribution  on  a  graph with \n\nT1: \nT~T; \n\nT4; \n\nT2~ \nT; T; \n\nT5: \n\nT3~ \nT; T~ \n\nT 1: \nTtT; \n\nT2~ \nT2*T; \n\nT3~ \nT; T~ \n\n(a)  Fixed point on full  graph \n\n(b)  Tree consistency condition. \n\nFigure  1.  Illustration of fixed  point consistency condition.  (a)  Fixed point T * = \n{T;, T;t } on the full  graph with cycles.  (b) Illustration of consistency condition on \nan embedded tree.  The quantities {T;, T;t  } must be  exact marginal probabilities \nfor  any tree embedded within the full  graph. \n\ncycles,  parameterized according to the fixed  point T * =  {Ts*t, T;}.  The consistency \ncondition implies that if edges  are removed from  the full  graph to form  a  spanning \ntree,  as  shown  in  panel  (b) ,  then  the  quantities  Ts*t  and  Ts*  correspond  to  exact \nmarginal distributions over the tree.  This statement holds for  any acyclic substruc(cid:173)\nture  embedded  within  the  full  graph  with  cycles  -\nnot  just  the  spanning  trees \nTl , ... ,TL  used  to  implement  TRP.  Thus,  algorithms  such  as  TRP /BP attempt \nto reparameterize a distribution on a graph with cycles so that it is  consistent with \nrespect to each embedded tree. \nIt is  remarkable that the existence of such  a  parameterization  (though obvious for \ntrees) should hold for a positive distribution on an arbitrary graph.  Also noteworthy \nis  the  parallel  to  the  characterization  of  max-product 2  fixed  points  obtained  by \nFreeman and Weiss  [16].  Finally, it  can be shown [13,  14]  that this characterization, \nthough it emerged very naturally from the TRP perspective, applies more generally \nto  any  constrained  local  minimum  of the  Bethe  free  energy,  whether  obtained  by \nTRP /BP, or an alternative technique  [e.g.,  11,  12]. \n\n2Max-product  is  a  related  but  different  algorithm  for  computing  approximate  MAP \n\nassignments in  graphs with cycles. \n\n\f3.3 \n\nInvariance  of the distribution \n\nA fundamental property of TRP updates is  that they leave invariant the full  distri(cid:173)\nbution on the graph with cycles.  This invariance follows  from the decomposition of \nequation (1):  in particular, the distribution pi(n) (x; Tn) is left invariant by reparam(cid:173)\neterization; and TRP does not change terms in ri(n) (x; Tn).  As  a consequence, the \noverall distribution remains invariant -\ntinuity of the map T  f-7  p(x; T) , it follows  that any fixed  point T* of the algorithm \nalso satisfies p(x; T*) ==  p(x; TO).  This fixed  point invariance is  also  an algorithm(cid:173)\nin  particular,  all  constrained local minima of the Bethe free \nindependent  result -\nenergy,  regardless of how they are obtained, are invariant in  this manner [13,  14]. \n\ni.e., p(x; Tn)  ==  p(x; TO)  for  all n.  By con(cid:173)\n\nThis  invariance  has  a  number  of important  consequences.  For  example,  it  places \nsevere  restrictions  on  cases  (other  than  trees)  in  which  TRP /BP  can  be  exact; \nsee  [14]  for  examples.  In application to the linear-Gaussian problem, it leads to an \nelementary proof of a known result [7,  8] - namely, the means must be exact if the \nBP updates converge. \n\n3.4  Error analysis \n\nLastly,  we  can  analyze  the  error  arising  from  any  TRP /BP fixed  point  T*  on  an \narbitrary  graph.  Of interest  are  the  exact  single-node  marginals  Ps  of the  origi(cid:173)\nnal distribution p(x; TO)  defined by the graph with cycles,  which  by invariance are \nequivalent  to those  of p(x; T*).  Now  the  quantities Ts*  have two  distinct  interpre(cid:173)\ntations:  (a)  as the TRP /BP approximations to the actual single-node marginals on \nthe full graph; and (b)  as the exact marginals on any embedded tree (as in Figure 1). \nThis implies that the approximations T;  are related to the actual marginals Ps  on \nthe full  graph by  a  relatively simple  perturbation -\nnamely, removing edges from \nthe full  graph to reveal an embedded tree.  From this observation, we  can derive the \nfollowing  exact expression  for  the  difference  between  the  actual marginal  PS;j  and \nthe TRP /BP approximation3  T;j: \n\n[{ ri(X; T*)} \n\nZ(T*) \n\n.J \n- 1  J(x s =  J) \n\nlEpi (x;T* ) \n\n(5) \n\nwhere  i  E  {1, ... ,L} is  an  arbitrary spanning tree index;  pi  and ri  are  defined  in \nequation  (1a)  and  (1b)  respectively;  Z(T*)  is  the  partition  function  of p(x; T*); \nJ(xs  = j)  is  an indicator function  for  Xs  to take the  value  j;  and lEpi (x;T * )  denotes \nexpectation using the distribution pi(x; T*). \nUnfortunately, while the tree distribution pi (x; T*) is tractable, the argument of the \nexpectation  includes  all  terms  r i (x ; T*)  removed  from  the  original  graph  to  form \nspanning tree Ti.  Moreover, computing the partition function  Z (T*) is intractable. \nThese difficulties  motivate the development  of bounds on the error. \n\nIn  [14],  we  use  convexity  arguments  to  derive  a  particular  set  of  bounds  on  the \napproximation  error.  Such  error  bounds,  in  turn,  can  be  used  to  compute  upper \nand lower  bounds  on  the  actual marginals  Ps;l.  Figure  2 illustrates  the  TRP /BP \napproximation, as well as these bounds on the actual marginals for  a binary process \non a 3 x 3 grid under two conditions.  Note that the tightness of the bounds is closely \nrelated to  approximation accuracy.  Although  it is  unlikely  that these  bounds  will \nremain  quantitatively  useful  for  general  problems  on  large  graphs,  they  may  still \nyield  useful  qualitative information. \n\n3The notation T;;j  denotes the /h  element  of the vector T; . \n\n\fBounds on single node marginals \n\nBounds on single node marginals \n\n0.9 \n\n0.8 \n\n0.9 \n\n0.8 \n\n0.7 \n\n:;:::'0.6 \n\n\" :5-\"b.5 \ne \"- o. \n\n0.2 \n\n0.1 \n\n\u00b01~~--~--~4~~5---6~~~~~~ \n\nNode number \n\n(a)  Weak potentials \n\n4 \n\nNode number \n\n5 \n\n6 \n\n(b)  Strong mixed potentials \n\nFigure 2.  Behavior of bounds on 3 x 3 grid.  Plotted are the actual marginals  P s;l \nversus  the  TRP  approximations  T;'l>  as  well  as  upper  and  lower  bounds  on  the \nactual  marginals.  (a)  For  weak  potentials,  TRP /BP  approximation  is  excellent; \nbounds on exact  marginals are tight.  (b)  For strong mixed potentials, approxima(cid:173)\ntion is poor.  Bounds are looser, and for  certain nodes, the TRP /BP approximation \nlies  above  the upper bounds on the actual marginal  P8 ;1 . \n\nMuch  of  the  analysis  of  this  paper  -- including  reparameterization,  invariance, \nand  error  analysis  -- can  be  extended  [see  14]  to  more  structured  approximation \nalgorithms  [e.g.,  1,  2].  Figure 3  illustrates the  use  of bounds  in  assessing  when  to \nuse a  more structured approximation.  For strong attractive potentials on the 3 x 3 \ngrid, the TRP /BP approximation in panel (a)  is  very poor, as reflected by relatively \nloose  bounds  on  the  actual  marginals.  In  contrast,  the  Kikuchi  approximation in \n(b)  is  excellent,  as revealed by the tightness of the bounds. \n\n4  Discussion \n\nThe  TRP framework  of this  paper provides  a  new  view  of approximate inference; \nand makes both  practical and conceptual  contributions.  On the practical  side,  we \nfind  that more global TRP updates tend to have better convergence properties than \nlocal  BP  updates.  The freedom  in  tree  choice  leads  to open problems  of a  graph(cid:173)\ntheoretic  nature:  e.g.,  how  to  choose  trees  so  as  to  guarantee  convergence,  or  to \noptimize the rate of convergence? \n\nAmong the conceptual insights provided by the reparameterization perspective  are \na  new  characterization of fixed  points;  an intrinsic  invariance;  and  analysis  of the \napproximation error.  Importantly,  most  of these  results  apply  to  any  constrained \nlocal  minimum  of the  Bethe  free  energy,  and  have  natural  extensions  [see  14]  to \nmore structured approximations  [e.g.,  1,  2]. \n\nAcknowledgments \n\nThis  work  partially  funded  by  ODDR&E  MURI  Grant  DAAD19-00-1-0466;  by  ONR \nGrant  N00014-00-1-0089;  and by  AFOSR Grant  F49620-00-1-0362;  MJW also  supported \nby NSERC 1967 fellowship. \n\nReferences \n\n[1]  J.  Yedidia,  W.  T.  Freeman,  and Y.  Weiss.  Generalized  belief propagation.  In  NIPS \n\n13,  pages  689- 695.  MIT Press,  2001. \n\n\fBounds on single node marginals \n\nBounds on single node marginals \n\n-\n\n- - - -0 - - - -0- -\n\n- e - - - -\n\nM \n\n_  0 - - - - 0  - - - -0- -\n\n-\n\n-\n\n\u20acl\n\n-\n\n- - -\n\n0.8 \n\n;o::V \n\n\" \n:5-\"b.5 \n\u00a3> \na..  0.4 \n\n0.3 \n\nM \n\n\" \n:5-\"b. \n\u00a3> \na..  0.4 \n\n0.3 \n\n-:-Ac-,-tu--:al----. \n\n-\n-+- TAP I BP \n- 0 \u00b7  Bounds \n\n0.2 r r -+-\n0.1 \n\u00b01~~==~~-4~~5~~-~-~~ \n\nNode number \n\n(a)  TRP /BP \n\n:~ II =-:=  ~~r~~lured approx. 1 \n~rl=-e~B=o=un=ds~==~~~~-~-~-~ \n\n\u00b01 \n\n4 \n\n5 \n\nNode number \n(b)  Kikuchi \n\nFigure 3.  When to use a  more structured approximation?  (a)  For strong attrac(cid:173)\ntive  potentials on  the 3  x  3  grid,  BP approximation  is  poor,  as reflected  by loose \nbounds on  the actual  marginal.  (b)  Kikuchi  approximation  [1]  for  same  problem \nis  excellent;  corresponding bounds are tight. \n\n[2]  T.  P.  Minka.  A  family  of algorithms  for  approximate  Bayesian inference.  PhD thesis, \n\nMIT Media Lab,  2001. \n\n[3]  J.  Pearl.  Probabilistic  reasoning  in intelligent systems.  Morgan Kaufman, San Mateo, \n\n1988. \n\n[4]  F.  Kschischang  and  B.  Frey.  Iterative  decoding  of compound  codes  by  probability \npropagation  in  graphical  models.  IEEE Sel.  Areas  Comm.,  16(2):219- 230,  February \n1998. \n\n[5]  J. B.  Anderson and S.  M.  Hladnik.  Tailbiting map decoders.  IEEE Sel.  Areas  Comm., \n\n16:297- 302,  February  1998. \n\n[6]  Y. Weiss.  Correctness of local probability propagation in graphical models with loops. \n\nNeural  Computation,  12:1-41,  2000. \n\n[7]  Y. Weiss and W. T.  Freeman.  Correctness of belief propagation in Gaussian graphical \n\nmodels of arbitrary  topology.  In  NIPS 12,  pages  673- 679.  MIT  Press,  2000. \n\n[8]  P.  Rusmevichientong and B.  Van Roy.  An  analysis  of turbo decoding with  Gaussian \n\ndensities.  In  NIPS 12,  pages  575- 581.  MIT  Press,  2000. \n\n[9]  T. Richardson.  The geometry of turbo-decoding dynamics.  IEEE Trans.  Info.  Theory, \n\n46(1):9- 23,  January  2000. \n\n[10]  R.  Kikuchi.  The  theory  of cooperative  phenomena.  Physical  Review,  81:988- 1003, \n\n1951. \n\n[11]  M.  Welling  and  Y.  Teh.  Belief  optimization:  A  stable  alternative  to  loopy  belief \n\npropagation.  In  Uncertainty  in  Artificial Intelligence,  July 2001. \n\n[12]  A.  Yuille.  A  double-loop  algorithm to minimize the Bethe and Kikuchi free  energies. \n\nNeural  Computation,  To  appear,  2001. \n\n[13]  M.  J . Wainwright,  T. Jaakkola,  and A.  S.  Willsky.  Tree-based reparameterization for \napproximate estimation on  graphs with cycles.  LIDS  Tech.  report  P-2510:  available \nat http://ssg.rnit.edu/group/rnjyain/rnjyain.shtrnl,  May  2001. \n\n[14]  M.  Wainwright.  Stochastic  processes  on graphs  with  cycles:  geometric  and variational \napproaches.  PhD  thesis,  MIT,  Laboratory  for  Information  and  Decision  Systems, \nJanuary 2002. \n\n[1 5]  S.  L.  Lauritzen.  Graphical  models.  Oxford University Press,  Oxford,  1996. \n[16]  W.  Freeman and Y.  Weiss.  On the optimality  of solutions of the max-product belief \npropagation  algorithm  in  arbitrary  graphs.  IEEE  Trans.  Info.  Theory,  47:736- 744, \n2001. \n\n\f", "award": [], "sourceid": 2107, "authors": [{"given_name": "Martin", "family_name": "Wainwright", "institution": null}, {"given_name": "Tommi", "family_name": "Jaakkola", "institution": null}, {"given_name": "Alan", "family_name": "Willsky", "institution": null}]}