{"title": "An Analysis of Turbo Decoding with Gaussian Densities", "book": "Advances in Neural Information Processing Systems", "page_first": 575, "page_last": 581, "abstract": null, "full_text": "An Analysis of Turbo Decoding \n\nwith Gaussian Densities \n\nPaat Rusmevichientong and Benjamin Van Roy \n\nStanford University \nStanford, CA 94305 \n\n{paatrus, bvr} @stanford.edu \n\nAbstract \n\nWe provide an analysis of the turbo decoding algorithm (TDA) \nin a setting involving Gaussian densities. In this context, we are \nable to show that the algorithm converges and that - somewhat \nsurprisingly - though the density generated by the TDA may differ \nsignificantly from the desired posterior density, the means of these \ntwo densities coincide. \n\n1 \n\nIntrod uction \n\nIn many applications, the state of a system must be inferred from noisy observations. \nExamples include digital communications, speech recognition, and control with in(cid:173)\ncomplete information. Unfortunately, problems of inference are often intractable, \nand one must resort to approximation methods. One approximate inference method \nthat has recently generated spectacular success in certain coding applications is the \nturbo decoding algorithm [1, 2], which bears a close resemblance to message-passing \nalgorithms developed in the coding community a few decades ago [4]. It has been \nshown that the TDA is also related to well-understood exact inference algorithms \n[5, 6], but its performance on the intractable problems to which it is applied has \nnot been explained through this connection. \n\nSeveral other papers have further developed an understanding of the turbo decoding \nalgorithm. The exact inference algorithms to which turbo decoding has been related \nare variants of belief propagation [7J. However, this algorithm is designed for in(cid:173)\nference problems for which graphical models describing conditional independencies \nform trees, whereas graphical models associated with turbo decoding possess many \nloops. To understand the behavior of belief propagation in the presence of loops, \nWeiss has analyzed the algorithm for cases where only a single loop is present [11]. \nOther analyses that have shed significant light on the performance of the TDA in \nits original coding context include [8, 9, 10]. \n\nIn this paper, we develop a new line of analysis for a restrictive setting in which un(cid:173)\nderlying distributions are Gaussian. In this context, inference problems are tractable \nand the use of approximation algorithms such as the TDA are unnecessary. How(cid:173)\never, studying the TDA in this context enables a streamlined analysis that generates \nnew insights into its behavior. In particular, we will show that the algorithm con(cid:173)\nverges and that the mean of the resulting distribution coincides with that of the \n\n\f576 \n\nP. Rusmevichientong and B. V. Roy \n\ndesired posterior distribution. \n\nWhile preparing this paper, we became aware of two related initiatives, both in(cid:173)\nvolving analysis of belief propagation when priors are Gaussian and graphs possess \ncycles. Weiss and Freeman [12] were studying the case of graphs possessing only \ncliques of size two. Here, they were able to show that, if belief propagation con(cid:173)\nverges, the mean of the resulting approximation coincides with that of the true \nposterior distribution. At the same time, Frey [3] studied a case involving graphical \nstructures that generalize those employed in turbo decoding. He also conducted an \nempirical study. \n\nThe paper is organized as follows. In Section 2, we provide our working definition \nof the TDA. In Section 3, we analyze the case of Gaussian densities. Finally, a \ndiscussion of experimental results and open issues is presented in Section 4. \n\n2 A Definition of Turbo Decoding \n\nConsider a random variable x taking on values in ~n distributed according to a \ndensity PO. Let YI and Y2 be two random variables that are conditionally indepen(cid:173)\ndent given x. For example, YI and Y2 might represent outcomes of two independent \ntransmissions of the signal x over a noisy communication channel. If YI and Y2 are \nobserved, then one might want to infer a posterior density f for x conditioned on \nYI and Y2. This can be obtained by first computing densities pi and P2, where the \n\nfirst is conditioned on YI and the second is conditioned on Y2. Then, \n\nf = a (P~:2), \nwhere a is a \"normalizing operator\" defined by \nag = J g(x)dX' \n\n9 \n\n-\n\nand multiplication/division are carried out pointwise. \nUnfortunately, the problem of computing f is generally intractable. The computa(cid:173)\ntional burden associated with storing and manipulating high-dimensional densities \nappears to be the primary obstacle. This motivates the idea of limiting attention \nto densities that factor. In this context, it is convenient to define an operator 71' \nthat generates a density that factors while possessing the same marginals as another \ndensity. In particular, this operator is defined by \n\n(\"9)(') '\" !! l ..... I \u2022\u2022 ~ \u2022\u2022 J 9(x)dX A dXi \n\nfor all densities 9 and all a E ~n, where dx /\\ dXi = dXI'\" dXi-Idxi+I ... dXn. \nOne may then aim at computing 7l'f as a proxy for f. Unfortunately, even this \nproblem is generally intractable. The TDA can be viewed as an iterative algorithm \nfor approximating 71' f. \nLet operators FI and F2 be defined by \n\nFIg = a ( ( 7l'~:) ~ ) , \n\nand \n\n\fAn Analysis o/Turbo Decoding with Gaussian Densities \n\n577 \n\nfor any density g. The TDA is applicable in cases where computation of these two \noperations is tractable. The algorithm generates sequences qik) and q~k) according \nto \n\n(HI) _ F \nql \n\n-\n\n(k) \n\n1 q2 \n\nd \nan \n\n(HI) _ D \nq2 \n\n- r2qI \n\n(k) \n. \n\ninitialized with densities qiO) and q~O) that factor. The hope is that Ci.(qik)q~k) /Po) \nconverges to an approximation of 7r f. \n\n3 The Gaussian Case \n\nWe will consider a setting in which joint density of x, Yl, and Y2, is Gaussian. In this \ncontext, application of the TDA is not warranted - there are tractable algorithms for \ncomputing conditional densities when priors are Gaussian. Our objective, however, \nis to provide a setting in which the TDA can be analyzed and new insights can be \ngenerated. \n\nBefore proceeding, let us define some notation that will facilitate our exposition. \nWe will write 9 \"-' N(/-Lg, ~g) to denote a Gaussian density 9 whose mean vector and \ncovariance matrix are /-Lg and ~g, respectively. For any matrix A, b\"(A) will denote \na diagonal matrix whose entries are given by the diagonal elements of A. For any \ndiagonal matrices X and Y, we write X ~ Y if Xii ~ Yii for all i. For any pair of \nnonsingular covariance matrices ~u and ~v such that ~;; 1 + ~; 1 -\nI is nonsingular, \nlet a matrix AEu .E\" be defined by \n\nA \n\nEu.E\" \n\n== (~-l + ~-l _ I)-I. \n\nu \n\nv \n\nTo reduce notation, we will sometimes denote this matrix by Auv. \nWhen the random variables x, Yt, and Y2 are jointly Gaussian, the densities pi, P2' \nf, and Po are also Gaussian. We let \n\npi \"-' N(/-Ll, ~l)' P2 \"-' N(/-L2, ~2)' f \"-' N(/-L, ~), \n\nand assume that both ~l and ~2 are symmetric positive definite matrices. We will \nalso assume that Po \"-' N(O, I) where I is the identity matrix. It is easy to show \nthat A E1 \u2022E2 is well-defined. \nThe following lemma provides formulas for the means and covariances that arise \nfrom multiplying and rescaling Gaussian densities. The result follows from simple \nalgebra, and we state it without proof. \n\nLemma 1 Let u \"-' N(/-Lu, ~u) and v \"-' N(/-Lv, ~v), where ~u and ~v are positive \ndefinite. If ~;;l + ~;l - I is positive definite then \n\nCi. (;~) \"-' N (Auv (~~l /-Lu + ~;l/-Lv) ,Auv) . \n\nOne immediate consequence of this lemma is an expression for the mean of f: \n\n/-L = AE1.E2 (~ll/-Ll + ~2l/-L2). \n\nLet S denote the set of covariance matrices that are diagonal and positive definite. \nLet 9 denote the set of Gaussian densities with covariance matrices in S. We then \nhave the following result, which we state without proof. \n\nLemma 2 The set 9 is closed under Fl and F2 \u2022 \n\nIf the TDA is initialized with qiO), q~O) E g, this lemma allows us to represent all \niterates using appropriate mean vectors and covariance matrices. \n\n\f578 \n\nP. Rusmevichientong and B. V. Roy \n\n3.1 Convergence Analysis \n\nUnder suitable technical conditions, it can be shown that the sequence of mean \nvectors and covariance matrices generates by the TDA converges. Due to space \nlimitations, we will only present results pertinent to the convergence of covariance \nmatrices. FUrthermore, we will only present certain central components of the \nanalyses. For more complete results and detailed analyses, we refer the reader to \nour upcoming full-length paper. \nRecall that the TDA generates sequences qik) and q~k) according to \n\n(HI) F \nqI \n\n= \n\n(k) \n\nIq2 \n\nd \nan \n\n(HI) \n\nq2 \n\nD \n\n= L'2qI \n\n(k) \n. \n\nAs discussed earlier, if the algorithm is initialized with elements of 9, by Lemma 2, \n\nq(k) '\" N (m(k) E(k)) \n1 \n\n1 \n\n1 \n\n, \n\nand q(k) '\" N (m(k) ~(k)) \n, \n\n2 '~2 \n\n2 \n\nfor appropriate sequences of mean vectors and covariance matrices. It turns out \nthat there are mappings 7i : S 1--+ S and 72 : S 1--+ S such that \n\nEik+1) = 7i (E~k)) \n\nand E~k+1) = 72 (Elk)) , \n\nfor all k. Let T == 7i 072. To establish convergence of Elk) and E~k), it suffices to \nshow that Tn(E~O)) converges. The following theorem establishes this and further \npoints out that the limit does not depend on the initial iterates. \n\nTheorem 1 There exists a matrix V* E S such that \n\nlim m(V) = V*, \nn->oo \n\nfor all V E S. \n\n3.1.1 Preliminary Lemmas \n\nOur proof of Theorem 1 relies on a few lemmas that we will present in this section. \nWe begin with a lemma that captures important abstract properties of the function \nT. Due to space constraints, we omit the proof, even though it is nontrivial. \n\nLemma 3 \n(a) There exists a matrix DES such that for all DES, D ::; T(D) ::; f. \n(b) For all X, YES, if X::; Y then T(X) ::; T(Y). \n( c) The function T is continuous on S. \n(d) For all f3 E (0,1) and DES, (f3 + o:)T (D) ::; T (f3D) for some 0: > o. \n\nThe following lemma establishes convergence when the sequence of covariance ma(cid:173)\ntrices is initialized with the identity matrix. \n\nLemma 4 The sequence Tn (f) converges in S to a fixed point of T. \n\nProof; By Lemma 3(a), T(1) ::; f, and it follows from monotonicity of T (Lemma \n3(b)) that Tn+1(I) ::; Tn(I) for all n. Since Tn(I) is bounded below by a matrix \nDES, the sequence converges in S. The fact that the limit is a fixed point of T \nfollows from the continuity of T (Lemma 3( c) ). \n\u2022 \nLet V* = limn->oo Tn(I). This matrix plays the following special role. \n\nLemma 5 The matrix V* is the unique fixed point in S of T. \n\n\fAn Analysis of Turbo Decoding with Gaussian Densities \n\n579 \n\nProof: Because Tn (1) converges to V* and T is monotonic, no matrix V E S with \nV i= V* and V* ::; V ::; I can be a fixed point. Furthermore, by Lemma 3(a), no \nmatrix V E S with V ~ I and V i= I can be a fixed point. For any V E S with \nV::; V*, let \n\nf3v = sup {f3 E (0, 111f3V* ::; V} . \n\nFor any V E S with V i= V* and V ::; V*, we have f3v < 1. For such a V, by \nLemma 3(d), there is an a > 0 such that T(f3vV*) ~ (f3v + a)V*, and therefore \nT(V) i= V. The result follows. \n\u2022 \n\n3.1.2 Proof of Theorem 1 \n\nProof: For V E S with V* ::; V ::; I convergence to V* follows from Lemma 4 and \nmonotonicity (Lemma 3(b)). For V E S with V ~ I, convergence follows from the \nfact that V* ::; T(V) ::; I, which is a consequence of the two previously invoked \nlemmas together with Lemma 3(a). \nLet us now address the case of V E S with V ::; V*. Let f3v be defined as in \nthe proof of Lemma 5. Then, f3v V* ::; T (f3v V*). By monotonicity, Tn (f3v V*) ::; \nTn+I(f3v V*) ::; V* for all n. It follows that Tn(f3v V*) converges, and since T \nis continuous, the limit must be the unique fixed point V*. We have established \nconvergence for elements V of S satisfying V ::; V* or V ~ V*. For other elements \n\u2022 \nof S, convergence follows from the monotonicity of T. \n\n3.2 Analysis of the Fixed Point \n\nAs discussed in the previous section, under suitable conditions, FI 0 F2 and F2 0 FI \neach possess a unique fixed point, and the TDA converges on these fixed points. \nLet qi ,...., N (f-Lq~ , Eq~) and q2 ,...., N (f-Lq2 ' Eq* ) denote the fixed points of FI 0 F2 and \nF2 0 FI, respectively. Based on Theorem 1, Eq~ and Eqi are in S. \nThe following lemma provides an equation relating means associated with the fixed \npoints. It is not hard to show that Aq*q*, AEI E *' and AE * E2' which are used in \nthe statement, are well-defined. \n\n' q2 \n\n1 2 \n\nql ' \n\nLemma 6 \nAq~qi (E;~lf-Lq; + E~lf-Lqi) = AE1 ,Eq2 (E1lf-LI + E~If-Lq2) = AEq~,E2 (E;;lf-Lq~ + E2\"If-L2) \n\nProof: It follows from the definitions of FI and F2 that, if qi = Fl q2 and q2 = F2qi, \n\n* * \na ql q2 = a7rPI q2 = a7r qlP2 . \nPo \n\n* * \nPo \n\n* * \nPo \n\nThe result then follows from Lemma 1 and the fact that 7r does not alter the mean \nof a distribution. \n\u2022 \nWe now prove a central result of this paper: the mean of the density generated by \nthe TDA coincides with the mean f-L of the desired posterior density f. \n\nTheorem 2 a (qi q2/ po) ,...., N (f-L, Aq; qi ) \n\nProof: By Lemma 1, f-L = AE 1 ,E2 (E1lf-LI + E2\"If-L2) , while the mean of a(qiq2lpo) \nis Aq~q2 (E;~l f-Lq; + E;i f-Lqi)' We will show that these two expressions are equal. \n\n\f580 \n\n, . \n\n. ~\\ \n\nP Rusmevichientong and B. V. Roy \n\nFigure 1: Evolution of errors. \n\nMultiplying the equations from Lemma 6 by appropriate matrices, we obtain \nAq*q* A~1 E Aq*q* (2:-.I j.Lq* + 2:-.1 j.Lq*) = Aq*q* (2:11 j.Ll + 2:-} j.Lq*) , \n\n1 2 \n\nql \n\n1 \n\n1 2 \n\nq2 \n\n2 \n\n1 2 \n\n1 , q:i \n\nq2 \n\n2 \n\nand \n\nIt follows that \n( Aq~q:i (A~;,Eq:i +A~:~ ,E2) - I ) Aq~q:i (2:;il j.Lq~ + 2:;:i1 j.Lq:i) = Aqiq:i (2:11 j.Ll + 2:2'1 j.L2) , \nand therefore \n\n(A~IE +A~1 E -Aq-.Iq*) Aq*q* (2:q-.Ij.Lq* +2:q-.Ij.Lq*) = 2:11j.Ll+2:2'1j.L2' \n\n1, q:i \n\n1 2 \n\nq~ , \n\n2 \n\n1 2 \n\n1 \n\n1 \n\n2 \n\n2 \n\n\u2022 \n\n4 Discussion and Experimental Results \n\nThe limits of convergence qi and q2 of the TDA provide an approximation \na( qi q2 / po) to 7r f. We have established that the mean of this approximation coin(cid:173)\ncides with that of the desired density. One might further expect that the covariance \nmatrix of a(qiq2/PO) approximates that of 7r f, and even more so, that qi and q2 bear \nsome relation to pi and P2' Unfortunately, as will be illustrated by experimental \nresults in this section, such expectations appear to be inaccurate. \n\nWe performed experiments involving 20 and 50 dimensional Gaussian densities (Le., \nx was either 20 or 50 dimensional in each instance). Problem instances were sampled \nrandomly from a fixed distribution. Due to space limitations, we will not describe \nthe tedious details of the sampling mechanism. \n\nFigure 1 illustrates the evolution of certain \"errors\" during representative runs of \nthe TDA on 20-dimensional problems. The first graph plots relative errors in means \nof densities a(q~n)q~n) /po) generated by iterates of the TDA. As indicated by our \nanalysis, these errors converge to zero. The second chart plots a measure of relative \nerror for the covariance of a(q~n)q~n) /po) versus that of 7rf for representative runs. \nThough these covariances converge, the ultimate errors are far from zero. The two \n\n\fAn Analysis of Turbo Decoding with Gaussian Densities \n\n581 \n\nFigure 2: Errors after 50 iterations. \n\nfinal graphs plot errors between the means of qin) and q~n) and those of pi and pi , \nrespectively. Again, though these means converge, the ultimate errors can be large. \n\nFigure 2 provides plots of the same sorts of errors measured on 1000 different in(cid:173)\nstances of 50-dimensional problems after the 50th iteration of the TDA. The hori(cid:173)\nzontal axes are labeled with indices of the problem instances. Note that the errors \nin the first graph are all close to zero (the units on the vertical axis must be multi(cid:173)\nplied by 10- 5 and errors are measured in relative terms). On the other hand, errors \nin the other graphs vary dramatically. \nIt is intriguing that - at least in the context of Gaussian densities - the TDA can ef(cid:173)\nfectively compute conditional means without accurately approximating conditional \ndensities. It is also interesting to note that, in the context of communications, the \nobjective is to choose a code word x that is comes close to the transmitted code x. \nOne natural way to do this involves assigning to x the code word that maximizes \nthe conditional density J, i.e., the one that has the highest chance of being correct. \nIn the Gaussian case that we have studied, this corresponds to the mean of J - a \nquantity that is computed correctly by the TDA! It will be interesting to explore \ngeneralizations of the line of analysis presented in this paper to other classes of \ndensities. \n\nReferences \n\n[lJ S. Benedetto and G. Montorsi, \"Unveiling turbo codes: Some results on parallel concatenated coding \n\nschemes,\" in IEEE Trans. Inform. Theory, vol. 42 , pp. 409-428 , Mar. 1996. \n\n{2] G. Berrou, A. Glavieux, a.nd P. Thitimajshima, \"Near Shannon limit error-correcting coding: 'TUrbo codes,\" \n\nin Proc. 1998 Int. Conf. Commun., Geneva, Switzerland, May 1993 , pp. 1064-1070. \n\n[3J B. Frey, \"Turbo Factor Analysis.\" To appear in Advances in Neural Information Processing Systems 11J. \n\n[4J R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MA: MIT Press , 1963. \n\n[5J F. R. Kschischang and B . J. Frey, \"Iterative Decoding of Compound Codes by Probability Propagation in \n\nGraphical Models,\" in IEEE Journal on Selected Areas in Commun., vol. 16, 2, pp. 219-230, Feb. 1998. \n\n[6J R. J. McEliece, D. J . C. MacKay, and J-F. Cheng, \"Turbo Decod ing as an Instance of Pearl's \"Belief \nPropagation\" Algorithm, \" in IEEE Journal on Selected Areas in Commun., vol. 16, 2, pp. 140-152 , Feb. \n1998. \n\n[7] J. Pearl, Probabuistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: \n\nMorgan Kaufmann, 1988. \n\n[8] T . Richardson, \"The Geometry of Turbo-Decoding Dynamics,\" Dec. 1998. To appear in IEEE Trans. Inform. \n\nTh eory. \n\n[9J T. Richardson and R. Urbanke, \"The Capacity of Low-Density Parity Check Codes under Message-Passing \n\nDecoding\", submitted to the IEEE Trans. on Information Th eory. \n\n[10J T. Richardson, A. Shokrollahi, and R . Urbanke, \"Design of Provably Good Low-Density Parity Check \n\nCodes,\" submitted to the IEEE Trans. on Information Th eory. \n\n[l1J Y . Weiss, \" Belief Propagation and Revision in Networks with Loops, \" November 1997. Available by ftp to \n\npublications.ai.mit.edu. \n\n[12] Y . Weiss and W. T . Freeman , \"Correctness of belief propagation in Gaussian graphical m odels of arbitrary \n\ntopology.\" To appear in Advances >n Neural Information Processtng Systems 11J. \n\n\f", "award": [], "sourceid": 1753, "authors": [{"given_name": "Paat", "family_name": "Rusmevichientong", "institution": null}, {"given_name": "Benjamin", "family_name": "Van Roy", "institution": null}]}