{"title": "Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation", "book": "Advances in Neural Information Processing Systems", "page_first": 579, "page_last": 586, "abstract": null, "full_text": "Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation\n\nJason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 02139 {jasonj,dmm,willsky}@mit.edu\n\nAbstract\nThis paper presents a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose correlations between variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlations. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. This perspective leads to a better understanding of Gaussian belief propagation and of its convergence in loopy graphs.\n\n1\n\nIntroduction\n\nWe consider multivariate Gaussian distributions defined on graphs. The nodes of the graph denote random variables and the edges indicate statistical dependencies between variables. The family of all Gauss-Markov models defined on a graph is naturally represented in the information form of the Gaussian density which is parameterized by the inverse covariance matrix, i.e., the information matrix. This information matrix is sparse, reflecting the structure of the defining graph such that only the diagonal elements and those off-diagonal elements corresponding to edges of the graph are non-zero. Given such a model, we consider the problem of computing the mean and variance of each variable, thereby determining the marginal densities as well as the mode. In principle, these can be obtained by inverting the information matrix, but the complexity of this computation is cubic in the number of variables. More efficient recursive calculations are possible in graphs with very sparse structure  e.g., in chains, trees and in graphs with \"thin\" junction trees. For these models, belief propagation (BP) or its junction tree variants efficiently compute the marginals [1]. In more complex graphs, even this approach can become computationally prohibitive. Then, approximate methods such as loopy belief propagation (LBP) provide a tractable alternative to exact inference [1, 2, 3, 4]. We develop a \"walk-sum\" formulation for computation of means, variances and correlations that holds in a wide class of Gauss-Markov models which we call walk-summable. In particular, this leads to a new interpretation of BP in trees and of LBP in general. Based on this interpretation we are able to extend the previously known sufficient conditions for con-\n\n\f\nvergence of LBP to the class of walk-summable models (which includes all of the following: trees, attractive models, and pairwise-normalizable models). Our sufficient condition is tighter than that given in [3] as the class of diagonally-dominant models is a strict subset of the class of pairwise-normalizable models. Our results also explain why no examples were found in [3] where LBP did not converge. The reason is that they presume a pairwisenormalizable model. We also explain why, in walk-summable models, LBP converges to the correct means but not to the correct variances (proving \"walk-sum\" analogs of results in [3]). In general, walk-summability is not necessary for LBP convergence. Hence, we also provide a tighter (essentially necessary) condition for convergence of LBP variances based on walk-summability of the LBP computation tree. This provides deeper insight into why LBP can fail to converge  because the LBP computation tree is not always well-posed  which suggests connections to [5]. This paper presents the key ideas and outlines proofs of the main results. A more detailed presentation will appear in a technical report [6].\n\n2\n\nPreliminaries\n\nV, A Gauss-Markov model (GMM) is defined by a graph G = (V , E ) with edge set E  2 i.e., some set of two-element subsets of V , and a collection of random variables x = (xi , i  V ) with probability density given in information form1 : 1 p(x)  exp{- x J x + h x} (1) 2 where J is a symmetric positive definite (J 0) matrix which is sparse so as to respect the graph G : if {i, j }  E then Ji,j = 0. We call J the information matrix and h the potential vector. Let N (i) = {j |{i, j }  E } denote the neighbors of i in the graph. The mean   E{x} and covariance P  E{(x - )(x - ) } are given by:  = J -1 h and P = J -1 J J i,j i,i Jj,j (2) The partial correlation coefficients are given by: i,j  cov(xi ; xj |xV \\{i,j } ) v =- ar(xi |xV \\{i,j } )var(xj |xV \\{i,j } ) (3)\n\nThus, Jij = 0 if and only if xi and xj are independent given the other variables xV \\{i,j } . We say that this model is attractive if all partial correlations are non-negative. It is pairwisenormalizable if there exists a diagonal matrix D 0 and a collection of non-negative definite matrices {Je 0, e  E }, where (Je )i,j is zero unless i, j  e, such that: e Je (4) J =D+\nE\n\nIt is diagonally-dominant if for all i  V : =i |Ji,j | < Ji,i . The class of diagonallydominant models is a strict subset of the class of pairwise-normalizable models [6]. Gaussian Elimination and Belief Propagation Integrating (1) over all possible values of xi reduces to Gaussian elimination (GE) in the information form (see also [7]), i.e., p 1 ^ ^ p(x\\i )  (x\\i , xi )dxi  exp{- x\\i J\\i x\\i + h\\i x\\i } (5) 2 where \\i  V \\ {i}, i.e. all variables except i, and ^ ^ J\\i = J\\i,\\i - J\\i,i Ji-1 Ji,\\i and h\\i = h\\i - J\\i,i Ji-1 hi ,i ,i\n1\n\nj\n\n(6)\n\nThe work also applies to p(x|y ), i.e. where some variables y are observed. However, the observations y are fixed, and we redefinQp(x) p(x|y) (conditioning on y is implicit throughout). With e local observations p(x|y )  p(x) i p(yi |xi ), conditioning does not change the graph structure.\n\n\f\n1 -  3  2  -   4  11 4  3  2  2  3 1  4  4 3 -\n\n2  3  1 - 2  4  4  1\n\n1  4\n\n-   (a)\n\n2  3\n\nn=1 n=2 n=3 1\n\n2  3 \n\n41\n\n(b)\n\n(c)\n\nFigure 1: (a) Graph of a GMM with nodes {1, 2, 3, 4} and with edge weights (partial correlations) as shown. In (b) and (c) we illustrate the first three levels of the LBP computation tree rooted at nodes 1 and 2. After 3 iterations of LBP in (a), the marginals at nodes 1 and 2 are identical to the marginals at the root of (b) and (c) respectively. In trees, the marginal of any given node can be efficiently computed by sequentially eliminating leaves of the tree until just that node remains. BP may be seen as a message-passing form of GE in which a message passed from node i to node j  N (i) captures the effect of eliminating the subtree rooted at i. Thus, by a two-pass procedure, BP efficiently computes the marginals at all nodes of the tree. The equations for LBP are identical except that messages are updated iteratively and in parallel. There are two messages per edge, one for each ordered pair (i, j )  E . We specify each message in information form with parameters: (n) (n) hij , Jij (initialized to zero for n = 0). These are iteratively updated as follows. For each (i, j )  E , messages from N (i) \\ j are fused at node i: k k (n) (n) ^ (n) ^(n) = Ji,i + h = hi + h and J J (7)\ni\\j ki i\\j ki N (i)\\j N (i)\\j\n\nThis fused information at node i is predicted to node j : (n+1) (n+1) ^ (n) ^(n) ^(n) hij = -Jj,i (Ji\\j )-1 hi\\j and Jij = -Jj,i (Ji\\j )-1 Ji,j\n\n(8)\n\nAfter n iterations, the marginal of node i is obtained by fusing all incoming messages: k k (n) (n) ^ (n) ^(n) (9) Jki hki and Ji = Ji,i + hi = hi +\nN (i) N (i)\n\nand In trees, this is the The mean and variance are given by marginal at node i conditioned on zero boundary conditions at nodes (n + 1) steps away and LBP converges to the correct marginals after a finite number of steps equal to the diameter of the tree. In graphs with cycles, LBP may not converge and only yields approximate marginals when it does. A useful fact about LBP is the following [2, 3, 5]: the marginal computed at node i after n iterations is identical to the marginal at the root of the n-step computation tree rooted at node i. This tree is obtained by \"unwinding\" the loopy graph for n steps (see Fig. 1). Note that each node of the graph may be replicated many times in the computation tree. Also, neighbors of a node in the computation tree correspond exactly with neighbors of the associated node in the original graph (except at the last level of the tree where some neighbors are missing). The corresponding J matrix defined on the computation tree has the same node and edge values as in the original GMM.\n\n^ (n) ^(n) (Ji )-1 hi\n\n^(n) (Ji )-1 .\n\n3\n\nWalk-Summable Gauss-Markov Models\n\nIn this section we present the walk-sum formulation of inference in GMMs. Let (A) denote the spectral radius of a symmetric matrix A, defined to be the maximum of the absolute values of the eigenvalues of A. The geometric series (I + A + A2 + . . . ) converges\n\n\f\nif and only if (A) < 1. If it converges, it converges to (I - A)-1 . Now, consider a GMM with information matrix J . Without loss of generality, let J be normalized (by rescaling variables) to have Ji,i = 1 for all i. Then, i,j = -Ji,j and the (zero-diagonal) matrix of partial correlations is given by R = I - J . If (R) < 1, then we have a geometric series for the covariance matrix: l Rl = (I - R)-1 = J -1 = P (10)\n=0\n\n Let R = (|rij |) denote the matrix of element-wise absolute values. We say that the model  is walk-summable if (R) < 1. Walk-summability implies (R) < 1 and J 0. Example 1. Consider a 5-node cycle with normalized information matrix J , which has all partial correlations on the edges set to . If  = -.45, then the model is valid (i.e. positive definite) with minimum eigenvalue min (J )  .2719 > 0, and walk-summable  with (R) = .9 < 1. However, when  = -.55, then the model is still valid with  min (J )  .1101 > 0, but no longer walk-summable with (R) = 1.1 > 1. Walk-summability allows us to interpret (10) as computing walk-sums in the graph. Recall that the matrix R reflects graph structure: i,j = 0 if {i, j }  E . These act as weights on the edges of the graph. A walk w = (w0 , w1 , ..., wl ) is a sequence of nodes wi  V connected by edges {wi , wi+1 }  E where l is the length of the walk. The weight (w) of walk w is the product of edge weights along the walk: sl (11) ws-1 ,ws (w) =\n=1\n\nAt each node i  V , we also define a zero-length walk w = (i) for which (w) = 1. Walk-Sums. Given a set of walks W , we define the walk-sum over W by w (W ) = (w)\nW\n\n(12)\n\n which is well-defined (i.e., independent of summation order) because (R) < 1 implies absolute convergence. Let W l denote the set of l-length walks from i to j and let ij Wij =  0 W l . The relation between walks and the geometric series (10) is that l=\nij\n\nthe entries of Rl correspond to walk-sums over l-length walks from i to j in the graph, i.e., (Rl )i,j = (W l ). Hence,\nij\n\nPi,j =\n\nl\n\n(Rl )i,j =\n\n=0\n\nl\n\n(W\n\nij\n\nl\n\n) = (l W\n\nij\n\nl\n\n) = (Wij )\n\n(13)\n\n2 In particular, the variance i  Pi,i of variable i is the walk-sum taken over the set Wii of self-return walks that begin and end at i (defined so that (i)  Wii ). The means can be computed as reweighted walk-sums, i.e., where each walk is scaled by the potential at w the start of the walk: (w; h) = hw0 (w), and (W ; h) = W (w ; h). Then, j j (Wj i )hj = (Wi ; h) (14) Pi,j hj = i = V\n\nwhere Wi  j V Wj i is the set of all walks which end at node i. We have found that a wide class of GMMs are walk-summable: Proposition 1 (Walk-Summable GMMs) All of the following classes of GMMs are walksummable:2 (i) attractive models, (ii) trees and (iii) pairwise-normalizable3 models.\n2 3\n\n That is if we take a valid model (with J 0) in these classes then it automatically has (R) < 1. In [6], we also show that walk-summability is actually equivalent to pairwise-normalizability.\n\n\f\n    Proof Outline. (i) R = R and J = I - R 0 implies max (R) < 1. Because R has  ) = max (R) < 1. In (ii) & (iii), negating any ij , it still  non-negative elements, (R holds that J = I - R 0 : (ii) negating ij doesn't affect the eigenvalues of J (remove edge {i, j } and, in each eigenvector, negate all entries in one subtree); (iii) negating ij  preserves J{i,j } 0 in (4) so J 0. Thus, making all ij > 0, we find I - R 0 and    4 R I. Similarly, making all ij < 0, -R I. Therefore, (R) < 1.\n\nRecursive Walk-Sum Calculations on Trees\nIn this section we derive a recursive algorithm which accrues the walk-sums (over infinite sets of walks) necessary for exact inference on trees and relate this to BP. Walk-summability guarantees correctness of this algorithm which reorders walks in a non-trivial way. We start with a chain of N nodes: its graph G has nodes V = {1, . . . , N } and edges 2 E = {e1 , .., eN -1 } where ei = {i, i + 1}. The variance at node i is i = (Wii ). The set Wii can be partitioned according to the number of times that walks return to node (r ) (r ) i: Wii =  0 Wii where Wii is the set of all self-return walks which return to i r= (0) (0) exactly r times. In particular, Wii = {(i)} for which (Wii ) = 1. A walk which starts at node i and returns r times is a concatenation of r single-revisit self-return walks, (r ) (1) so (Wii ) = (Wii )r . This means: r r 1 (1) (r ) (r ) (Wii )r = (Wii ) = (Wii ) = ( 0 Wii ) = (15) r= (1) 1 - (Wii ) =0 =0 This geometric series converges since the model is walk-summable. Hence, calculating the (1) 2 single-revisit self-return walk-sum (Wii ) determines the variance i . The single-revisit walks at node i consist of walks in the left subchain, and walks in the right subchain. Let Wii\\j be the set of self-return walks of i which never visit j , so e.g. all w  Wii\\i+1 are contained in the subgraph {1, . . . , i}. With this notation: (Wii ) = (Wii\\i+1 ) + (Wii\\i-1 )\n(1) (1) (1) (1) (1)\n\n(16)\n\nThe left single-revisit self-return walk-sums (Wii\\i+1 ) can be computed recursively starting from node 1. At node 1, (W11\\2 ) = 0 and (W11\\2 ) = 1. A single-revisit self-return walk from node i in the left subchain consists of a step to node i - 1, then some number of self-return walks in the subgraph {1, . . . , i - 1}, and a step from i - 1 back to i: (Wii\\i+1 ) = 2,i-1 (Wi-1i-1\\i ) = i\n(1)\n\n2,i-1 i 1 - (Wi-1i-1\\i )\n(1)\n\n(17)\n\nThus single-revisit (and multiple revisit) walk-sums in the left subchain of every node i can be calculated in one forward pass through the chain. The same can be done for the right subchain walk-sums at every node i, by starting at node N , and going backwards. Using equations (15) and (16) these quantities suffice to calculate the variances at all nodes of the chain. A similar forwards-backwards procedure computes the means as reweighted walksums over the left and right single-visit walks for node i, which start at an arbitrary node (in the left or right subchain) and end at i, never visiting i before that [6]. In fact, these recursive walk-sum calculations map exactly to operations in BP  e.g., in a normalized (1) (1) chain Ji-1i = -(Wii\\i+1 ) and hi-1i = -(Wi\\i+1 ; h). The same strategy applies for trees: both single-revisit and single-visit walks at node i can be partitioned according to which subtree (rooted at a neighbor j  N (i) of i) the walk lives in. This leads to a two-pass walks-sum calculation on trees (from the leaves to the root, and back) to calculate means and variances at all nodes.\n\n\f\n5\n\nWalk-sum Analysis of Loopy Belief Propagation\n\nFirst, we analyze LBP in the case that the model is walk-summable and show that LBP converges and includes all the walks for the means, but only a subset of the walks for the variances. Then, we consider the case of non-walksummable models and relate convergence of the LBP variances to walk-summability of the computation tree. 5.1 LBP in walk-summable models\n\nTo compute means and variances in a walk-summable model, we need to calculate walksums for certain sets of walks in the graph G . Running LBP in G is equivalent to exact inference in the computation tree for G , and hence calculating walk-sums for certain walks in the computation tree. In the computation tree rooted at node i, walks ending at the root have a one-to-one correspondence with walks ending at node i in G . Hence, LBP captures all of the walks necessary to calculate the means. For variances, the walks captured by LBP have to start and end at the root in the computation tree. However, some of the selfreturn walks in G translate to walks in the computation tree that end at the root but start at a replica of the root, rather than at the root itself. These walks are not captured by the LBP variances. For example, in Fig. 1(a), the walk (1, 2, 3, 1) is a self-return walk in the original graph G but is not a self-return walk in the computation tree shown in Fig. 1(b). LBP variances capture only those self-return walks of the original graph G which also are self-return walks in the computation tree  e.g., the walk (1, 3, 2, 3, 4, 3, 1) is a selfreturn walk in both Figs. 1(a) and (b). We call these backtracking walks. These simple observations lead to our main result: Proposition 2 (Convergence of LBP for walk-summable GMMs) If the model is walksummable, then LBP converges: the means converge to the true means and the LBP variances converge to walk-sums over just the backtracking self-return walks at each node. Proof Outline. All backtracking walks have positive weights, since each edge is traversed an even number of times. For a walk-summable model, LBP variances are walks-sums over the backtracking walks and are therefore monotonically increasing with the iterations. Thley also are bounded above by the absolute self-return wal sums (diagonal elements of k Rl ) and hence converge. For the means: the series l=0 Rl h converges absolutely l l  R |h| is a linear combination of terms of the absince |Rl h|  Rl |h|, and thel series  Rl . The LBP means are a rearrangement of the absolutely solutely convergent series  A nvergent series l=0 Rl h, so they converge to the same values. co\n\ns a corollary, LBP converges for all of the model classes listed in Proposition 1. Also, in attractive models, the LBP variances are less than or equal to the true variances. Correctness of the means was also shown in [3] for pairwise-normalizable models.4 They also show that LBP variances omit some terms needed for the correct variances. These terms correspond to correlations between the root and its replicas in the computation tree. In our framework, each such correlation is a walk-sum over the subset of non-backtracking self-return walks in G which, in the computation tree, begin at a particular replica of the root. Example 2. Consider the graph in Fig. 1(a). For  = .39, the model is walk-summable with  (R)  .9990. For  = .395 and  = .4, the model is still valid but is not walk-summable,  with (R)  1.0118 and 1.0246 respectively. In Fig. 2(a) we show LBP variances for node 1 (the other nodes are similar) vs. the iteration number. As  increases, first the model is walk-summable and LBP converges, then for a small interval the model is not walk-summable but LBP still converges,5 and for larger  LBP does not converge. Also,\n4 5\n\nHowever, they only prove convergence for the subset of diagonally dominant models. Hence, walk-summability is sufficient but not necessary for convergence of LBP.\n\n\f\n5\n\n1.1\n\n0 0 200 0 -200 0\n\n = 0.39  = 0.395 10 20 30 40\n\n1 0.9 0.8 0.7 LBP converges LBP does not converge\n\n = 0.4 50 100 150 200\n\n0\n\n10\n\n20\n\n30\n\n(a)\n\n(b)\n\nFigure 2: (a) LBP variances vs. iteration. (b) (Rn ) vs. iteration. ll l l for  = .4, we note that (R) = .8 < 1 and the series R converges (but R does not) and LBP does not converge. Hence, (R) < 1 is not sufficient for LBP convergence  showing the importance of the stricter walk-summability condition (R) < 1. 5.2 LBP in non-walksummable models\n\nWe extend our analysis to develop a tighter condition for convergence of LBP variances based on walk-summability of the computation tree (rather than walk-summability on G ).6  For trees, walk-summability and validity are equivalent, i.e. J 0  (R) < 1, hence our condition is equivalent to validity of the computation tree. First, we note that when a model on G is valid (J is positive-definite) but not walksummable, then some finite computation trees may be invalid (indefinite). This turns out to be the reason why LBP variances can fail to converge. Walk-summability of the original GMM implies walk-summability (and hence validity) of all of its computation trees. But if the GMM is not walk-summable, then its computation tree may or may not be walksummable. In Example 2, for  = .395 the computation tree is still walk-summable (even though the model on G is not) and LBP converges. For  = .4, the computation tree is not walk-summable and LBP does not converge. Indeed, LBP is not even well-posed in this case (because the computation tree is indefinite) which explains its strange behavior seen in the bottom plot of Fig. 2(a) (e.g., non-monotonicity and negative variances). We characterize walk-summability of the computation tree as follows. Let Tn be the nstep computation tree rooted at some node i and define Rn In - Jn where Jn is the normalized information matrix on Tn and In is the n  n identity matrix. The n-step  computation tree Tn is walk-summable (valid) if and only if (Rn ) < 1 (in trees, (Rn ) =  (Rn )). The sequence { (Rn )} is monotonically increasing and bounded above by (R) (see [6]) and hence converges. We are interested in the quantity   limn (Rn ). Proposition 3 (LBP validity/variance convergence) (i) If  < 1, then all finite computation trees are valid and the LBP variances converge. (ii) If  > 1, then the computation tree eventually becomes invalid and LBP is ill-posed. Proof Outline. (i) For some  > 0, (Rn )  1 -  for all n which implies: all computation trees are walk-summable and variances monotonically increase; max (Rn )  1 -  , min (Jn )   , and (Pn )i,i  max (Pn )  1 . The variances are monotonically increasing \nWe can focus on one tree: if the computation tree rooted at node i is walk-summable, then so is the computation tree rooted at any node j . Also, if a finite computation tree rooted at node i is not walk-summable, then some finite tree at node j also becomes non-walksummable for n large enough.\n6\n\n\f\nand bounded above, hence they converge. (ii) If limn (Rn ) > 1, then there exists an A for which (Rn ) > 1 for all n  m and the computation tree is invalid. m s discussed in [6], LBP is well-posed if and only if the information numbers computed on the right in (7) and (9) are strictly positive for all n. Hence, it is easily detected if the LBP computation tree becomes invalid. In this case, continuing to run LBP is not meaningful and will lead to division by zero and/or negative variances. Example 3. Consider a 4-node cycle with edge weights (-, , , ). In Fig. 2(b), for  = .49 we plot (Rn ) vs. n (lower curve) and observe that limn (Rn )  .98 < 1, and LBP converges (similar to the upper plot of Fig. 2(a)). For  = .51 (upper curve), the model defined on the 4-node cycle is still valid but limn (Rn )  1.02 > 1 so LBP is ill-posed and does not converge (similar to the lower plot of Fig. 2(a)). In non-walksummable models, the series LBP computes for the means is not absolutely convergent and may diverge even when variances converge (e.g., in Example 2 with  = .39867). However, in all cases where variances converge we have observed that with enough damping of BP messages7 we also obtain convergence of the means. Apparently, it is the validity of the computation tree that is critical for convergence of Gaussian LBP.\n\n6\n\nConclusion\n\nWe have presented a walk-sum interpretation of inference in GMMs and have applied this framework to analyze convergence of LBP extending previous results. In future work, we plan to develop extended walk-sum algorithms which gather more walks than LBP. Another approach is to estimate variances by sampling random walks in the graph. We also are interested to explore possible connections between results in [8]  based on selfavoiding walks in Ising models  and sufficient conditions for convergence of discrete LBP [9] which have some parallels to our walk-sum analysis in the Gaussian case. Acknowledgments This research was supported by the Air Force Office of Scientific Research under Grant FA9550-04-1, the Army Research Office under Grant W911NF-051-0207 and by a grant from MIT Lincoln Laboratory.\n\nReferences\n[1] J. Pearl. Probabilistic inference in intelligent systems. Morgan Kaufmann, 1988. [2] J. Yedidia, W. Freeman, and Y. Weiss. Understanding belief propagation and its generalizations. Exploring AI in the new millennium, pages 239269, 2003. [3] Y. Weiss and W. Freeman. Correctness of belief propagation in Gaussian graphical models of arbitrary topology. Neural Computation, 13:21732200, 2001. [4] P. Rusmevichientong and B. Van Roy. An analysis of belief propagation on the turbo decoding graph with Gaussian densities. IEEE Trans. Information Theory, 48(2):745765, Feb. 2001. [5] S. Tatikonda and M. Jordan. Loopy belief propagation and Gibbs measures. UAI, 2002. [6] J. Johnson, D. Malioutov, and A. Willsky. Walk-Summable Gaussian Networks and Walk-Sum Interpretation of Gaussian Belief Propagation. TR-2650, LIDS, MIT, 2005. [7] K. Plarre and P. Kumar. Extended message passing algorithm for inference in loopy Gaussian graphical models. Ad Hoc Networks, 2004. [8] M. Fisher. Critical temperatures of anisotropic Ising lattices II, general upper bounds. Physical Review, 162(2), 1967. [9] A. Ihler, J. Fisher III, and A. Willsky. Message Errors in Belief Propagation. NIPS, 2004. ^ ^ Modify (8) as follows: hij = (1 - )hij + (-Ji,j (Ji\\j )-1 hi\\j ) with 0 <   1. In Example 2, with  = .39867 and  = .9 the means converge.\n7 (n+1) (n) (n) (n)\n\n\f\n", "award": [], "sourceid": 2833, "authors": [{"given_name": "Dmitry", "family_name": "Malioutov", "institution": null}, {"given_name": "Alan", "family_name": "Willsky", "institution": null}, {"given_name": "Jason", "family_name": "Johnson", "institution": null}]}