Gaussian Fields for Approximate Inference in Layered Sigmoid Belief Networks

Barber, David; Sollich, Peter

Gaussian Fields for Approximate Inference in Layered Sigmoid Belief Networks

David Barber, Peter Sollich

Advances in Neural Information Processing Systems 12 (NIPS 1999)

Abstract

Local "belief propagation" rules of the sort proposed by Pearl [15] are guaranteed to converge to the correct posterior probabilities in singly connected graphical models. Recently, a number of researchers have em(cid:173) pirically demonstrated good performance of "loopy belief propagation"(cid:173) using these same rules on graphs with loops. Perhaps the most dramatic instance is the near Shannon-limit performance of "Turbo codes", whose decoding algorithm is equivalent to loopy belief propagation. Except for the case of graphs with a single loop, there has been little theo(cid:173) retical understanding of the performance of loopy propagation. Here we analyze belief propagation in networks with arbitrary topologies when the nodes in the graph describe jointly Gaussian random variables. We give an analytical formula relating the true posterior probabilities with those calculated using loopy propagation. We give sufficient conditions for convergence and show that when belief propagation converges it gives the correct posterior means for all graph topologies, not just networks with a single loop. The related "max-product" belief propagation algorithm finds the max(cid:173) imum posterior probability estimate for singly connected networks. We show that, even for non-Gaussian probability distributions, the conver(cid:173) gence points of the max-product algorithm in loopy networks are max(cid:173) ima over a particular large local neighborhood of the posterior proba(cid:173) bility. These results help clarify the empirical performance results and motivate using the powerful belief propagation algorithm in a broader class of networks.

Problems involving probabilistic belief propagation arise in a wide variety of applications, including error correcting codes, speech recognition and medical diagnosis. If the graph is singly connected, there exist local message-passing schemes to calculate the posterior probability of an unobserved variable given the observed variables. Pearl [15] derived such a scheme for singly connected Bayesian networks and showed that this "belief propagation" algorithm is guaranteed to converge to the correct posterior probabilities (or "beliefs").

Several groups have recently reported excellent experimental results by running algorithms

674

Y. Weiss and W T. Freeman

equivalent to Pearl's algorithm on networks with loops [8, 13, 6]. Perhaps the most dramatic instance of this performance is for "Turbo code" [2] error correcting codes. These codes have been described as "the most exciting and potentially important development in coding theory in many years" [12] and have recently been shown [10, 11] to utilize an algorithm equivalent to belief propagation in a network with loops.

Progress in the analysis of loopy belief propagation has been made for the case of networks with a single loop [17, 18, 4, 1]. For these networks, it can be shown that (1) unless all the compatabilities are deterministic, loopy belief propagation will converge. (2) The difference between the loopy beliefs and the true beliefs is related to the convergence rate the faster the convergence the more exact the approximation and (3) If of the messages - the hidden nodes are binary, then the loopy beliefs and the true beliefs are both maximized by the same assignments, although the confidence in that assignment is wrong for the loopy beliefs.

In this paper we analyze belief propagation in graphs of arbitrary topology, for nodes de(cid:173) scribing jointly Gaussian random variables. We give an exact formula relating the correct marginal posterior probabilities with the ones calculated using loopy belief propagation. We show that if belief propagation converges, then it will give the correct posterior means for all graph topologies, not just networks with a single loop. We show that the covari(cid:173) ance estimates will generally be incorrect but present a relationship between the error in the covariance estimates and the convergence speed. For Gaussian or non-Gaussian vari(cid:173) ables, we show that the "max-product" algorithm, which calculates the MAP estimate in singly connected networks, only converges to points that are maxima over a particular large neighborhood of the posterior probability of loopy networks.

Abstract

Name Change Policy