{"title": "An Application of Tree-Structured Expectation Propagation for Channel Decoding", "book": "Advances in Neural Information Processing Systems", "page_first": 1854, "page_last": 1862, "abstract": "We show an application of a tree structure for approximate inference in graphical models using the expectation propagation algorithm. These approximations are typically used over graphs with short-range cycles. We demonstrate that these approximations also help in sparse graphs with long-range loops, as the ones used in coding theory to approach channel capacity. For asymptotically large sparse graph, the expectation propagation algorithm together with the tree structure yields a completely disconnected approximation to the graphical model but, for for finite-length practical sparse graphs, the tree structure approximation to the code graph provides accurate estimates for the marginal of each variable.", "full_text": "An Application of Tree-Structured Expectation\n\nPropagation for Channel Decoding\n\nPablo M. Olmos\u2217, Luis Salamanca\u2217, Juan J. Murillo-Fuentes\u2217, Fernando P\u00b4erez-Cruz\u2020\n\n\u2217 Dept. of Signal Theory and Communications, University of Sevilla\n\n\u2020 Dept. of Signal Theory and Communications, University Carlos III in Madrid\n\n{olmos,salamanca,murillo}@us.es\n\n41092 Sevilla Spain\n\n28911 Legan\u00b4es (Madrid) Spain\nfernando@tsc.uc3m.es\n\nAbstract\n\nWe show an application of a tree structure for approximate inference in graphical\nmodels using the expectation propagation algorithm. These approximations are\ntypically used over graphs with short-range cycles. We demonstrate that these\napproximations also help in sparse graphs with long-range loops, as the ones\nused in coding theory to approach channel capacity. For asymptotically large\nsparse graph, the expectation propagation algorithm together with the tree struc-\nture yields a completely disconnected approximation to the graphical model but,\nfor for \ufb01nite-length practical sparse graphs, the tree structure approximation to the\ncode graph provides accurate estimates for the marginal of each variable. Further-\nmore, we propose a new method for constructing the tree structure on the \ufb02y that\nmight be more amenable for sparse graphs with general factors.\n\n1\n\nIntroduction\n\nBelief propagation (BP) has become the standard procedure to decode channel codes, since in 1996\nMacKay [7] proposed BP to decode codes based on low-density parity-check (LDPC) matrices with\nlinear complexity. A rate r = k/n LDPC code can be represented as a sparse factor graph with\nn variable nodes (typically depicted on the left side) and n \u2212 k factor nodes (on the right side), in\nwhich the number of edges is linear in n [15]. The \ufb01rst LDPC codes [6] presented a regular structure,\nin which all variables and factors had, respectively, (cid:96) and r connections, i.e. an ((cid:96), r) LDPC code.\nBut the analysis of their limiting decoding performance, when n tends to in\ufb01nity for a \ufb01xed rate,\nshowed that they do not approach the channel capacity [15]. To improve the performance of regular\nLDPC codes, we can de\ufb01ne an (irregular) LDPC ensemble as the set of codes randomly generated\naccording to the degree distribution (DD) from the edge perspective as follows:\n\nlmax(cid:88)\n\n\u03bb(x) =\n\n\u03bbixi\u22121\n\nand\n\n\u03c1(x) =\n\n\u03c1jxj\u22121,\n\nrmax(cid:88)\n\ni=1\n\nj=1\n\nwhere the fraction of edges with left degree i (from variables to factors) is given by \u03bbi and the\nfraction of edges with right degree j (from factors to variables) is given by \u03c1j. The left (right)\ndegree of an edge is the degree of the variable (factor) node it is connected to. The rate of the code is\ni \u03bbi/i).\nthen given by r = 1 \u2212\nAlthough optimized irregular LDPC codes can achieve the channel capacity with a decoder based on\nBP [15], they present several drawbacks. First, the error \ufb02oor in those codes increases signi\ufb01cantly,\nbecause capacity achieving LDPC ensembles with BP decoding have a large fraction of variables\n\n0 \u03bb(x)dx, and the total number of edges by E = n/((cid:80)\n\n0 \u03c1(x)dx/(cid:82) 1\n(cid:82) 1\n\n1\n\n\fwith two connections and they present low minimum distances. Second, the maximum number of\nones per column lmax tends to in\ufb01nity to approach capacity. These problems limit the BP decoding\nperformance of capacity approaching codes, when we work with \ufb01nite length codes used in real\napplications.\nApproximate inference in graphical models can be solved using more accurate methods that signif-\nicantly improve the BP performance, especially for dense graphs with short-range loops. A non-\nexhaustive list of methods are: generalized BP [22], expectation propagation (EP) [10], fractional\nBP [19], linear programming [17] and power EP [8]. A detailed list of contributions for approxi-\nmate inference can be found in [18] and the references therein. But it is a common belief that BP\nis suf\ufb01ciently accurate to decode LDPC codes and other approximate inference algorithms would\nnot outperform BP decoding signi\ufb01cantly, if at all. In this paper, we challenge that belief and show\nthat more accurate approximate inference algorithms for graphical models can also improve the BP\ndecoding performance for LDPC codes, which are sparse graphical models with long-range loops.\nWe particularly focus on tree-structured approximations for inference in graphical models [9] using\nthe expectation propagation (EP) algorithm, because it presents a simple algorithmic implementation\nfor LDPC decoding transmitted over the binary erasure channel (BEC)1, although other higher order\ninference algorithms might be suitable for this problem, as well, as in [20] it was proven a connection\nbetween some of them. We show the results for the BEC, because it has a simple structure amenable\nfor deeper analysis and most of its properties carry over to actual communications channels [14].\nThe EP with a tree-structured approximation can be presented in a similar way as the BP decoder\nfor an LDPC code over the BEC [11], with similar run-time complexity. We show that a decoder\nbased on EP with a tree-structured approximation converges to the BP solution for the asymptotic\nlimit n \u2192 \u221e, for \ufb01nite-length graphs the performance is otherwise improved signi\ufb01cantly [13,\n11]. For \ufb01nite graphs, the presence of cycles in the graph degrades the BP estimate and we show\nthat the EP solution with a tree-structured approximation is less sensitive to the presence of such\nloops, and provides more accurate estimates for the marginal of each bit. Therefore, it makes the\nexpectation propagation with a tree-structured approximation (for short we refer to this algorithm as\ntree-structured EP or TEP) a more practical decoding algorithm for \ufb01nite length LDPC codes.\nBesides, the analysis of the application of the tree-structured EP to channel decoding over the BEC\nleads to another way of \ufb01xing the approximating tree structure different from the one proposed in\n[9] for dense codes with positive correlation potentials. In channel coding, the factors of the graph\nare parity-checks and the correlations are high but can change from positive to negative by the \ufb02ip of\na single variable. Therefore, the pair-wise mutual information is zero for any two variables (unless\nthe factor only contains two variables) and we could not de\ufb01ne a pre\ufb01xed tree structure with the\nalgorithm in [9]. In contrast, we propose a tree structure that is learnt on the \ufb02y based on the graph\nitself, hence it might be amenable for other potentials and sparser graphs.\nThe rest of the paper is organized as follows. In Section 2, we present the peeling decoder, which is\nthe interpretation of the BP algorithm for LDPC codes over the BEC, and how it can be extended to\nincorporate the tree-structured EP decoding procedure. In Section 3, we analyze the TEP decoder\nperformance for LDPC codes in both the asymptotic and the \ufb01nite-length regimen. We provide an\nestimation of the TEP decoder error rate for a given LDPC ensemble. We conclude the paper in\nSection 5.\n\n2 Tree-structured EP and the peeling decoder\n\nThe BP algorithm was proposed as a message passing algorithm [5] but, for the BEC, it exhibits\na simpler formulation, in which the non-erased variable nodes are removed from the graph in each\niteration [4], because we either have absolute certainty about the received bit (0 or 1) or complete\nignorance (?). The BP under this interpretation is referred to as the peeling decoder (PD) [3, 15] and\nit is easily described using the factor graph of the code. The \ufb01rst step is to initialize the graph by\nremoving all the variable nodes corresponding to non-erased bits. When removing a one-valued non-\nerased variable node, the parity of the factors it was connected to are \ufb02ipped. After the initialization\n\n1The BEC allows binary transmission, in which the bits are either erased with probability \u0001 or arrive without\n\nerror otherwise. The capacity for this channel is 1 \u2212 \u0001 and is achieved with equiprobable inputs [2].\n\n2\n\n\fstage, the algorithm proceeds over the resulting graph by removing a factor and a variable node in\neach step:\n\n1. It looks for any factor linked to a single variable (a check node of degree one). The peeling\n\ndecoder copies the parity of this factor into the variable node and removes the factor.\n\n2. It removes the variable node that we have just de-erased. If the variable was assigned a one,\n\nit changes the parity of the factors it was connected to.\n\n3. It repeats Steps 1 and 2 until all the variable nodes have been removed, successful decoding,\n\nor until there are no degree-one factors left, unsuccessful decoding.\n\nWe illustrate an example of the PD for a 1/2-rate code with four variables in Figure 1. The \ufb01rst and\nlast bits have not been erased and when we remove them from the graph, the second factor is singled\nconnected to the third variable, which can be now de-erased (Figure 1(b)). Finally, the \ufb01rst factor is\nsingled connected to the second variable, decoding the transmitted codeword (Figure 1(c)).\n\n(a)\n\n(b)\n\n(c)\n\nFigure 1: Example of the PD algorithm for LDPC channel decoding in the erasure channel.\n\nThe analysis of the PD for \ufb01xed-rate codes, proposed in [3, 4], allows to compute its threshold in the\nBEC. This result can be used to optimize the DD to build irregular LDPC codes that, as n tends to\nin\ufb01nity, approach the channel capacity. However, as already discussed, these codes present higher\nerror \ufb02oors, because they present many variables with only two edges, and they usually present poor\n\ufb01nite-length performance due to the slow convergence to the asymptotic limit [15].\n\n2.1 The TEP decoder\n\nThe tree-structured EP overlaps a tree over the variables on the graph to further impose pairwise\nmarginal constraints. In the procedure proposed in [9] the tree was de\ufb01ned by measuring the mutual\ninformation between a pair of variables, before running the EP algorithm. The mutual information\nbetween pair of variables is zero for parity-check factors with more than two variables and we need\nto de\ufb01ne the structure in another way. We propose to de\ufb01ne the tree structure on the \ufb02y. Let\u2019s\nassume that we run the PD in the previous section and yields an unsuccessful decoding. Any factor\nof degree two in the remaining graph either tells us that the connected variables are equal (if the\nparity check is zero), or opposite (if the parity check is one). We should link these two variables\nby the tree structure, because their marginal would provide further information to the remaining\nerased variables in the graph. The proposed algorithm actually replaces one variable by the other\nand iterates until a factor of degree one is created and more variables can be de-erased. When\nthis happens a tree structure has been created, in which the pairwise marginal constraint provides\ninformation that was not available with single marginals approximations.\nThe TEP decoder can be explained in a similar fashion as the PD decoder, in which instead of\nlooking for degree-one factors, we look for degree one and two. We initialize the TEP decoder, as\nthe PD, by removing all known variable nodes and updating the parity checks for the variables that\nare one. The TEP then removes a variable and a factor per iteration:\n\n1. It looks for a factor of degree one or two.\n2. If a factor of degree one is found, the TEP recovers the associated variable, performing the\n\nSteps 1 and 2 of the PD previously described.\n\n3. If a factor of degree two is found, the decoder removes it from the graph together with\none of the variable nodes connected to it and the two associated edges. Then, it reconnects\n\n3\n\nP1P2V1V2V3V4p(Y1=0|V1)p(Y2=?|V2)p(Y3=?|V3)p(Y4=1|V4)00P1P2V2V3p(Y2=?|V2)p(Y3=?|V3)10p(Y1=0|V1)V1p(Y4=1|V4)V4P1V2p(Y1=0|V1)p(Y2=?|V2)p(Y3=?|V3)1p(Y4=1|V4)V1V3V4P2\fto the remaining variable node all the factors that were connected to the removed variable\nnode. The parities of the factors re-connected to the remaining variable node are reversed\nif the removed factor had parity one.\n\n4. Steps 1-3 are repeated until all the variable nodes have been removed, successful decoding,\n\nor the graph runs out of factors of degree one or two, unsuccessful decoding.\n\nThe process of removing a factor of degree two is sketched in Figure 2. First, the variable V1 heirs\nthe connections from V2 (solid lines), see Figure 2(b). Finally, the factor P1 and the variable V2\ncan be removed (Figure 2(c)), because they have no further implication in the decoding process. V2\nis de-erased once V1 is de-erased. The TEP removes a factor and a variable node per iteration, as\nthe PD does. The removal of a factor and a variable does not increase the complexity of the TEP\ndecoder compared to the BP algorithm. Both TEP and BP algorithms have complexity O(n).\n\n(a)\n\n(b)\n\n(c)\n\nFigure 2: In (a) we show two variable nodes, V1 and V2, that share a factor of degree two P1. In (b),\nV1 heirs the connections of V2 (solid lines). In (c), we show the graph once P1 and V2 have been\nremoved. If P1 is parity one, the parities of P2, P3 are reversed.\n\nBy removing factors of degree two, we eventually create factors of degree one, whenever we \ufb01nd\nan scenario equivalent to the one depicted in Figure 3. Consider two variable nodes connected\nto a factor of degree two that also share another factor with degree three, as illustrated in Figure\n3(a). When we remove the factor P3 and the variable node V2, the factor P4 is now degree one,\nas illustrated in Figure 3(b). At the beginning of the decoding algorithm, it is unlikely that the two\nvariable nodes in a factor of degree two also share a factor of degree three. However, as we remove\nvariables and factors, the probability of this event grows.\nNote that, when we remove a factor of degree two connected to variables V1 and V2, in terms of\nthe EP algorithm, we are including a pairwise factor between both variables. Therefore, the TEP\nequivalent tree structure is not \ufb01xed a priori and we construct it along the decoding process. Also, the\nsteps of the TEP decoder can be presented as a linear combination of the columns of the parity-check\nmatrix of the code and hence its solution is independent of the processing order.\n\n(a)\n\n(b)\n\nFigure 3: In (a), the variables V1 and V2 are connected to a degree two factor, P3, and they also share\na factor of degree three, P4. In (b) we show the graph once the TEP has removed P3 and V2.\n\n3 TEP analysis: expected graph evolution\n\nWe now sketch the proof of why the TEP decoder outperforms BP. The actual proof can be found in\n[12] (available as supplementary material). Both the PD and the TEP decoder sequentially reduces\n\n4\n\nV1V2P1P2P3V1V2P1P2P3P2P3V1V1V2V3P1P2P3P4V1V3P1P2P4\fthe LDPC graph by removing check nodes of degree one or two. As a consequence, the decoding\nprocess yields a sequence of residual graphs and their associated DD. The DD sequence of the\nresidual graphs constitutes a suf\ufb01cient statistic to analyze this random process [1]. In [3, 4], the\nsequence of residual graphs follows a typical path or expected evolution [15]. The authors make use\nof Wormald\u2019s theorem in [21] to describe this path as the solution of a set of differential equations\nand characterized the typical deviation from it. For the PD, we have an analytical form for the\nevolution of the number of degree one factor as the decoding progresses, r1(\u03c4, \u0001), as a function of\nthe decoding time, \u03c4, and the erasure rate, \u0001. The PD threshold \u0001BP is the maximum \u0001 for which\nr1(\u03c4, \u0001) \u2265 0,\u2200\u03c4. In [1, 15], the authors show that particular decoding realizations are Gaussian\ndistributed around r1(\u03c4, \u0001), with a variance of order \u03b1BP/n, where \u03b1BP can be computed from the\nLDPC DD. They also provide the following approximation to the block error probability of elements\nof an LDPC ensemble:\n\n(cid:2)P BP\nW (C, \u0001)(cid:3)\n\n(cid:18)\u221an(\u0001BP \u2212 \u0001)\n\n(cid:19)\n\n\u03b1BP\n\n\u2248 Q\n\nELDPC [\u03bb(x),\u03c1(x),n]\n\n,\n\n(1)\n\nwhere P BP\nW (C, \u0001) is the average block error probability for the code C \u2208 LDPC [\u03bb(x), \u03c1(x), n]. For\nthe TEP decoder the analysis follows a similar path, but its derivation is more involved. For arbi-\ntrarily large codes, the expected graph evolution during the TEP decoding is computed in [12], with\na set of non-linear differential equations. They track down the expected progression of the fraction\nof edges with left degree i, li(\u03c4 ) for i = 1, . . . , lmax, and right degree j, rj(\u03c4 ) for j = 1, . . . , rmax\nas the TEP decoder performs, where \u03c4 is a normalized time: if u is the TEP iteration index and E\nis the total number of edges in the original graph, then \u03c4 = u/E. By Wormald\u2019s theorem [21], any\nreal decoding realization does not differ from the solution of such equations in a factor larger than\nO(E\u22121/6). The TEP threshold, \u0001TEP, is found as the maximum erasure rate \u0001 such that\n\n\u2200\u03c4 \u2208 [0, n/E],\n\n(2)\nwhere rT EP (\u03c4 ) is computed by solving the system of differential equations in [12] and \u0001TEP \u2265 \u0001BP.\nLet us illustrate the accuracy of the model derived to analyze the TEP decoder properties. In Figure\n4(a), for a regular (3, 6) code with n = 217 and \u0001 = 0.415, we compare the solution of the system\nof differential equations for R1(\u03c4 ) = r1(\u03c4 )E and R2(\u03c4 ) = r2(\u03c4 )E, depicted by thick solid lines,\nwith 30 simulated decoding trajectories, depicted by thin dashed lines. We can see that empirical\ncurves are tightly distributed around the predicted curves. Indeed, the distribution tends very quickly\nto n to a Gaussian [1, 15]. All curves are plotted with respect the evolution of the normalized size\nof the graph at each time instant, denoted by e(\u03c4 ) so that the decoding process starts on the right\ne(\u03c4 = 0) \u2248 0.415 and, if successful, \ufb01nishes at e(\u03c4END) = 0. In Figure 4(b) we reproduce, with\nidentical conclusions, the same experiment for the irregular DD LDPC code de\ufb01ned by:\n\nrT EP (\u03c4 )\n\n.\n= r1(\u03c4 ) + r2(\u03c4 ) > 0,\n\n\u03bb(x) =\n\n1\n6\n\nx +\n\n5\n6\n\nx3,\n\n\u03c1(x) = x5.\n\n(3)\n\n(4)\n\nFor the TEP decoder to perform better than the BP decoder, it needs to signi\ufb01cantly increase the\nnumber of check nodes of degree one that are created, which happens if two variables nodes share a\ndegree-two check together along with a degree-three check node, as illustrated earlier in Figure 3(a).\nIn [12], we compute the probability that two variable nodes that share a check node of degree-two\nalso share another check node (scenario S). If we randomly choose a particular degree-two check\nnode at time \u03c4, the probability of scenario S is:\n\nPS (\u03c4 ) =\n\n(lavg(\u03c4 ) \u2212 1)2(ravg(\u03c4 ) \u2212 1)\n\ne(\u03c4 )E\n\n,\n\n(5)\n\nwhere lavg(\u03c4 ) and ravg(\u03c4 ) are, respectively, the average left and right edge degrees, and e(\u03c4 ) is\nthe fraction of remaining edges in the graph. As the TEP decoder progresses, lavg(\u03c4 ) increases,\nbecause the remaining variables in the graph inherits the connections of the variables that have been\nremoved, and e(\u03c4 ) decreases, therefore creating new factors of degree one and improving the BP/PD\nperformance. However, note that in the limit n \u2192 \u221e, PS (\u03c4 = 0) = 0. Therefore, to improve the\nPD solution in this regime we require that lavg(\u03c4(cid:48)) \u2192 \u221e for some \u03c4(cid:48). The solution of the TEP\ndecoder differential equations does not satisfy this property. For instance, in Figure 5 (a), we plot\nthe expected evolution of r1(\u03c4 ) and r2(\u03c4 ) for n \u2192 \u221e and the (3, 6) regular LDPC ensemble when\n\n5\n\n\fwe are just above the BP threshold for this code, which is \u0001BP \u2248 0.4294. Unlike Figure 4(a),\nr1(\u03c4 ) and r2(\u03c4 ) go to zero before e(\u03c4 ) cancels: the TEP decoder gets stuck before completing the\ndecoding process.\nIn Fig.5 (b), we include the computed evolution for lavg(\u03c4 ). As shown, the\nfraction of degree two check nodes vanishes before lavg(\u03c4 ) becomes in\ufb01nite. We conclude that, in\nthe asymptotic limit n \u2192 \u221e, the EP with tree-structure is not able to outperform the BP solution,\nwhich is optimal since LDPC codes become cycle free [15].\n\n(a)\n\n(b)\n\nFigure 4: In (a), for a regular (3, 6) code with n = 217 and \u0001 = 0.415, we compare the solution of\nthe system of differential equations for R1(\u03c4 ) = r1(\u03c4 )E ((cid:67)) and R2(\u03c4 ) = r2(\u03c4 )E ((cid:5)) (thick solid\nlines) with 30 simulated decoding trajectories (thin dashed lines). In (b), we reproduce the same\nexperiment for the irregular LDPC in (3) and (4) for \u0001 = 0.47.\n\n(a)\n\n(b)\n\nFigure 5: For the regular (3, 6) ensemble and \u0001BP \u2248 0.4294, in (a) we plot the expected evolution of\nr1(\u03c4 ) and r2(\u03c4 ) for n \u2192 \u221e. In (b), we include the computed evolution of lavg(\u03c4 ) for this case.\n\n3.1 Analysis in the \ufb01nite-length regime\n\nIn the \ufb01nite-length regime, the TEP decoder emerges as a powerful decoding algorithm. At a com-\nplexity similar to BP, i.e. of order O(n), it is able to further improve the BP solution thanks to a more\naccurate estimate of the marginal for each bit. We illustrate the TEP decoder performance for some\nregular and irregular \ufb01nite-length LDPC codes. We \ufb01rst consider a rate 1/2 regular (3, 6) LDPC\ncode. This ensemble has no asymptotic error \ufb02oor [15] and we plot the word error rate obtained\nwith the TEP and the BP decoders with different code lengths in Figure 6(a). In Figure 6(b), we\n\n6\n\n00.050.10.150.20.250.30.350.40.45102103104105Ri(\u03c4),i=1,2Residualgraphnormalizedsizee(\u03c4) 0.060.080.10.120.140.160.180.2103104R1(\u03c4)Residualgraphnormalizedsizee(\u03c4) 00.050.10.150.20.250.30.350.40.45101102103104105Ri(\u03c4),i=1,2Residualgraphnormalizedsizee(\u03c4)0.120.140.160.180.20.220.240.260.28103104R1(\u03c4)Residualgraphnormalizedsizee(\u03c4)00.050.10.150.20.250.30.350.40.450.510\u2212710\u2212610\u2212510\u2212410\u2212310\u2212210\u22121100Residualgraphnormalizedsizee(\u03c4)ri(\u03c4)fori=1,2 Predictedr1(\u03c4)Predictedr2(\u03c4)00.050.10.150.20.250.30.350.40.450.5100101102Residualgraphnormalizedsizee(t)lavg(\u03c4) Predictedlavg(\u03c4)\finclude the results for the irregular DD in (3) and (4), where we can see that in all cases BP and TEP\nconverge to the same error \ufb02oor but, as in previous examples, the TEP decoder provides signi\ufb01cant\ngains in the waterfall region and they are more signi\ufb01cant for shorter codes.\n\n(a)\n\n(b)\n\nFigure 6: TEP (solid line) and BP (dashed line) decoding performance for a regular LDPC (3,6)\ncode in (a), and the irregular LDPC in (3) and (4) in (b), with code lengths n = 29 (\u25e6), n = 210\n((cid:3)), n = 211 (\u00d7) and 212 ((cid:66)).\n\nThe expected graph evolution during the TEP decoding in [12], which provides the average presence\nin the graph of degree one and two check nodes as the decoder proceeds, can be used to derive a\ncoarse estimation of the TEP decoder probability of error for a given LDPC ensemble, similar to\n(1) for the BP decoder. By using the regular (3, 6) code as an example, in Figure 5(a), we plot\nthe solution for r1(\u03c4 ) in the case n \u2192 \u221e. Let \u03c4\u2217 be the time at which the decoder gets stuck,\ni.e. r1(\u03c4\u2217) + r2(\u03c4\u2217) = 0. In Figure 7, we plot the solution for the evolution of r1(\u03c4, n, \u0001BP) with\nrespect to e(t) for a (3, 6) regular code at \u0001 = \u0001BP = \u0001TEP. To avoid confusion, in the following\nwe explicitly include the dependence with n and \u0001 in r1(\u03c4, n, \u0001). The code lengths considered are\nn = 212 (+), n = 213 (\u25e6), n = 214 ((cid:3)), n = 215 ((cid:5)), n = 216 (\u00d7) and n = 217 (\u2022). For \ufb01nite-length\nvalues, we observe that r1(\u03c4\u2217, n, \u0001BP) is not zero and, indeed, a closer look shows that the following\napproximation is reasonable tight:\n\n\u2217\n\n\u22121,\n\nr1(\u03c4\n\n(6)\nwhere we compute \u03b3TEP from the ensemble. For the (3, 6) regular case, we obtain \u03b3TEP \u2248 0.3198\n[12]. The idea to estimate the TEP decoder performance at \u0001 = \u0001BP + \u2206\u0001 is to assume that any\nparticular realization will succeed almost surely as long as the fraction of degree one check nodes at\n\u03c4\u2217 is positive. For \u0001 = \u0001BP + \u2206\u0001, we can approximate r1(\u03c4\u2217, n, \u0001) as follows:\n\n, n, \u0001TEP) \u2248 \u03b3TEPn\n\n\u2217\n\nr1(\u03c4\n\n, n, \u0001) =\n\n\u2202r1(\u03c4, n, \u0001)\n\n\u2202\u0001\n\n\u2206\u0001 + \u03b3TEPn\n\n\u22121.\n\n(7)\n\nIn [1, 15], it is shown that simulated trajectories for the evolution of degree one check nodes under\nBP are asymptotically Gaussian distributed and this is observed for the TEP decoder as well. Fur-\nthermore, the variance is of order \u03b4(\u03c4 )/n, where \u03b4(\u03c4 ) depends on the ensemble and the decoder [1].\nTo estimate the TEP decoder error rate, we compute the probability that the fraction of degree one\ncheck nodes at at \u03c4\u2217 is positive. Since it is distributed as N (r1(\u03c4\u2217, n, \u0001TEP), \u03b4(\u03c4 )/n), we get\n(cid:33)\n\nELDPC [\u03bb(x),\u03c1(x),n]\n\n\u2206\u0001 + \u03b3TEPn\u22121\n\n\u2202r1(\u03c4,n,\u0001)\n\n\u2202\u0001\n\n(cid:32)\u221an(\u0001TEP \u2212 \u0001)\n\n\u03b1TEP\n\n\u03b3TEP(cid:112)\n\nn \u03b4(\u03c4\u2217)\n\n+\n\n(cid:2)P TEP\nW (C, \u0001)(cid:3)\n(cid:12)(cid:12)(cid:12)(cid:12) \u03c4 =\u03c4\n(cid:112)\n\n\u2217\n\u0001=\u0001TEP\n\u03b4(\u03c4\u2217)/n\n\n\u2248 1 \u2212 Q\n\n,\n\n(8)\n\n\uf8eb\uf8ec\uf8ec\uf8ec\uf8ec\uf8ed\n\n(cid:12)(cid:12)(cid:12)(cid:12) \u03c4 =\u03c4\n\n\u2217\n\u0001=\u0001TEP\n\n\uf8f6\uf8f7\uf8f7\uf8f7\uf8f7\uf8f8 = Q\n\n7\n\n0.340.360.380.40.420.440.4610\u2212410\u2212310\u2212210\u22121100Channelerasureprobability\u0001WordErrorRate0.30.350.40.450.510\u2212210\u22121100Channelerasureprobability\u0001Worderrorrate\fwhere\n\n(cid:112)\n\n\u03b4(\u03c4\u2217)\n\n(cid:18) \u2202r1(\u03c4, n, \u0001)\n\n\u2202\u0001\n\n(cid:19)\u22121(cid:12)(cid:12)(cid:12)(cid:12) \u03c4 =\u03c4\n\n\u2217\n\u0001=\u0001TEP\n\n.\n\n\u03b1TEP =\n\n(9)\n\nFinally, note that, since for n \u2192 \u221e we know that the TEP and the BP converge to the same solution,\nwe can approximate \u03b1TEP \u2248 \u03b1BP. Besides, we have empirically observed that the variance of\ntrajectories under BP and TEP decoding are quite similar so, for simplicity, we set \u03b4(\u03c4\u2217) in (8)\nequal to \u03b4(\u03c4\u2217)BP, whose analytic solution can be found in [16, 1]. Hence, we consider the TEP\ndecoder expected evolution to estimate the parameter \u03b3TEP in (8). In Figure 7(b), we compare the\nTEP performance for the regular (3, 6) ensemble (solid lines) with the approximation in (8) (dashed\nlines), using the approximation \u03b1TEP \u2248 \u03b1BP = 0.56036, \u03b4(\u03c4\u2217) \u2248 0.0526 and \u03b3TEP \u2248 0.3198. We\nhave plot the results for code lengths of n = 29 (\u25e6), n = 210 ((cid:3)), n = 211 (\u00d7) and 212 ((cid:66)). As we\ncan see, for the shortest code length, the model seems to slightly over-estimate the error probability,\nbut this mismatch vanishes for the rest of the cases, obtaining a tight estimate.\n\n(a)\n\n(b)\n\nFigure 7: In (a), we plot the solution for r1(\u03c4 ) with respect to e(t) for a (3, 6) regular code at\n\u0001 = \u0001BP = \u0001TEP. In (b), we compare the TEP performance for the regular (3, 6) ensemble (solid\nlines) with the approximation in (8) (dashed lines), using the approximation \u03b1TEP \u2248 \u03b1BP = 0.56036,\n\u03b4(\u03c4\u2217) \u2248 0.0526 and \u03b3TEP \u2248 0.3198. We have plot the results for code lengths of n = 29 (\u25e6), n = 210\n((cid:3)), n = 211 (\u00d7) and n = 212 ((cid:66)).\n\n4 Conclusions\n\nIn this paper, we consider a tree structure for approximate inference in sparse graphical models\nusing the EP algorithm. We have shown that, for \ufb01nite-length LDPC sparse graphs, the accuracy\nof the marginal estimation with the method proposed signi\ufb01cantly outperforms the BP estimate\nfor the same graph. As a consequence, the decoding error rates are clearly improved. This result\nis remarkable in itself, as BP was considered the gold standard for LDPC decoding, and it was\nassumed that the long-range cycles and sparse nature of these factor graphs did not lend themselves\nfor the application of more accurate approximate inference algorithms designed for dense graphs\nwith short-range cycles. Additionally, the application of LDPC decoding showed us a different way\nof learning the tree structure that might be amenable for general factors.\n\n5 Acknowledgments\n\nThis work was partially funded by Spanish government (Ministerio de Educaci\u00b4on y Ciencia,\nTEC2009-14504-C02-01,02, Consolider-Ingenio 2010 CSD2008-00010), Universidad Carlos III\n(CCG10-UC3M/TIC-5304) and European Union (FEDER).\n\n8\n\n00.050.10.150.20.250.310\u2212410\u2212310\u2212210\u22121Residualgraphnormalizedsizee(\u03c4)r1(\u03c4,n,\u0001TEP) n=212n=213n=214n=215n=216n\u2192\u221e0.360.380.40.420.440.460.4810\u2212310\u2212210\u22121100Channelerasureprobability\u0001Worderrorrate \fReferences\n[1] Abdelaziz Amraoui, Andrea Montanari, Tom Richardson, and R\u00a8udiger Urbanke. Finite-length scaling for\niteratively decoded LDPC ensembles. IEEE Transactions on Information Theory., 55(2):473\u2013498, 2009.\n[2] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. Wilson and Sons, New York,\n\nUSA, 1991.\n\n[3] Michael Luby, Michael Mitzenmacher, Amin Shokrollahi, Daniel Spielman, and Volker Stemann. Prac-\ntical loss-resilient codes. In Proceedings of the 29th annual ACM Symposium on Theory of Computing,\npages 150\u2013159, 1997.\n\n[4] Michael Luby, Michael Mitzenmacher, Amin Shokrollahi, Daniel Spielman, and Volker Stemann. Ef\ufb01-\n\ncient erasure correcting codes. IEEE Transactions on Information Theory, 47(2):569\u2013584, Feb. 2001.\n\n[5] David J. C. MacKay. Good error-correcting codes based on very sparse matrices. IEEE Transactions on\n\nInformation Theory, 45(2):399\u2013431, 1999.\n\n[6] David J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University\n\nPress, 2003.\n\n[7] David J. C. MacKay and Radford M. Neal. Near Shannon limit performance of low density parity check\n\ncodes. Electronics Letters, 32:1645\u20131646, 1996.\n\n[8] T. Minka.\n\nhttp://research.microsoft.com/\u02dc minka/papers/.\n\nPower\n\nEP.\n\nTechnical\n\nreport,\n\nMSR-TR-2004-149,\n\n2004.\n\n[9] Thomas Minka and Yuan Qi. Tree-structured approximations by expectation propagation. In Proceedings\n\nof the Neural Information Processing Systems Conference, (NIPS), 2003.\n\n[10] Thomas P. Minka. Expectation Propagation for approximate Bayesian inference. In Proceedings of the\n17th Conference in Uncertainty in Arti\ufb01cial Intelligence (UAI 2001), pages 362\u2013369. Morgan Kaufmann\nPublishers Inc., 2001.\n\n[11] Pablo M. Olmos, Juan Jos\u00b4e Murillo-Fuentes, and Fernando P\u00b4erez-Cruz. Tree-structure expectation prop-\nagation for decoding LDPC codes over binary erasure channels. In 2010 IEEE International Symposium\non Information Theory, ISIT, Austin, Texas, 2010.\n\n[12] P.M. Olmos, J.J. Murillo-Fuentes, and F. P\u00b4erez-Cruz. Tree-structure expectation propagation for LDPC\n\ndecoding in erasure channels. Submitted to IEEE Transactions on Information Theory, 2011.\n\n[13] P.M. Olmos, J.J. Murillo-Fuentes, and F. P\u00b4erez-Cruz. Tree-structured expectation propagation for decod-\n\ning \ufb01nite-length ldpc codes. IEEE Communications Letters, 15(2):235 \u2013237, Feb. 2011.\n\n[14] P. Oswald and A. Shokrollahi. Capacity-achieving sequences for the erasure channel. IEEE Transactions\n\non Information Theory, 48(12):3017 \u2013 3028, Dec. 2002.\n\n[15] Tom Richardson and Ruediger Urbanke. Modern Coding Theory. Cambridge University Press, Mar.\n\n2008.\n\n[16] N. Takayuki, K. Kasai, and S. Kohichi. Analytical solution of covariance evolution for irregular LDPC\n\ncodes. e-prints, November 2010.\n\n[17] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsk. Map estimation via agreement on (hyper)trees:\nIEEE Transactions on Information Theory,\n\nMessage-passing and linear-programming approaches.\n51(11):3697\u20133717, November 2005.\n\n[18] Martin J. Wainwright and Michael I. Jordan. Graphical Models, Exponential Families, and Variational\n\nInference. Foundations and Trends in Machine Learning, 2008.\n\n[19] W. Weigerinck and T. Heskes. Fractional belief propagation. In S. Becker, S. Thrun, and K. Obermayer,\neditors, Advances in Neural Information Processing Systems 15, Cambridge, MA, December 2002. MIT\nPress.\n\n[20] M. Welling, T. Minka, and Y.W. Teh. Structured region graphs: Morphing EP into GBP. In UAI, 2005.\n[21] Nicholas C. Wormald. Differential equations for random processes and random graphs. Annals of Applied\n\nProbability, 5(4):1217\u20131235, 1995.\n\n[22] J. S. Yedidia, W. T. Freeman, and Y. Weis. Constructing free-energy approximations and generalized\nbelief propagation algorithms. IEEE Transactions on Information Theory, 51(7):2282\u20132312, July 2005.\n\n9\n\n\f", "award": [], "sourceid": 1049, "authors": [{"given_name": "Pablo", "family_name": "Olmos", "institution": null}, {"given_name": "Luis", "family_name": "Salamanca", "institution": null}, {"given_name": "Juan", "family_name": "Fuentes", "institution": null}, {"given_name": "Fernando", "family_name": "P\u00e9rez-Cruz", "institution": null}]}