{"title": "Error-correcting Codes on a Bethe-like Lattice", "book": "Advances in Neural Information Processing Systems", "page_first": 322, "page_last": 328, "abstract": null, "full_text": "Error-correcting Codes on a Bethe-like Lattice \n\nRenato Vicente \n\nDavid Saad \n\nThe Neural Computing Research Group \n\nAston University, Birmingham, B4 7ET, United Kingdom \n\n{vicenter,saadd}@aston.ac.uk \n\nYoshiyuki Kabashima \n\nDepartment of Computational Intelligence and Systems Science \n\nTokyo Institute of Technology, Yokohama 2268502, Japan \n\nkaba@dis.titech.ac.jp \n\nAbstract \n\nWe analyze Gallager codes by employing a simple mean-field approxi(cid:173)\nmation that distorts the model geometry and preserves important interac(cid:173)\ntions between sites. The method naturally recovers the probability prop(cid:173)\nagation decoding algorithm as an extremization of a proper free-energy. \nWe find a thermodynamic phase transition that coincides with informa(cid:173)\ntion theoretical upper-bounds and explain the practical code performance \nin terms of the free-energy landscape. \n\n1 Introduction \n\nIn the last years increasing interest has been devoted to the application of mean-field tech(cid:173)\nniques to inference problems. There are many different ways of building mean-field theo(cid:173)\nries. One can make a perturbative expansion around a tractable model [1,2], or assume a \ntractable structure and variationally determine the model parameters [3]. \n\nError-correcting codes (ECC) are particularly interesting examples of inference problems \nin loopy intractable graphs [4]. Recently the focus has been directed to the state-of-the art \nhigh performance turbo codes [5] and to Gallager and MN codes [6,7]. Statistical physics \nhas been applied to the analysis of ECCs as an alternative to information theory methods \nyielding some new interesting directions and suggesting new high-performance codes [8]. \nSourlas was the first to relate error-correcting codes to spin glass models [9], showing that \nthe Random-energy Model [10] can be thought of as an ideal code capable of saturating \nShannon's bound at vanishing code rates. This work was extended recently to the case of \nfinite code rates [11] and has been further developed for analyzing MN codes of various \nstructures [12]. All of the analyzes mentioned above as well as the recent turbo codes \nanalysis [13] relied on the replica approach under the assumption of replica symmetry. \nTo date, the only model that can be analyzed exactly is the REM that corresponds to an \nimpractical coding scheme of a vanishing code rate. \n\nHere we present a statistical physics treatment of non-structured Gallager codes by em(cid:173)\nploying a mean-field approximation based on the use of a generalized tree structure (Bethe \n\n\flattice [14]) known as Husimi cactus that is exactly solvable. The model parameters are \njust assumed to be those of the model with cycles. In this framework the probability prop(cid:173)\nagation decoding algorithm (PP) emerges naturally providing an alternative view to the re(cid:173)\nlationship between PP decoding and mean-field approximations already observed in [15]. \nMoreover, this approach has the advantage of being a slightly more controlled and easier \nto understand than replica calculations. \n\nThis paper is organized as follows: in the next section we present unstructured Gallager \ncodes and the statistical physics framework to analyze them, in section 3 we make use of \nthe lattice geometry to solve the model exactly. In section 4 we analyze the typical code \nperformance. We summarize the results in section 5. \n\n2 Gallager codes: statistical physics formulation \n\nWe will concentrate here on a simple communication model whereby messages are rep(cid:173)\nresented by binary vectors and are communicated through a Binary Symmetric Channel \n(BSC) where uncorrelated bit flips appear with probability /. A Gallager code is defined \nby a binary matrix A = [CI I C 2 ], concatenating two very sparse matrices known to both \nsender and receiver, with C 2 (of dimensionality (M - N) x (M - N) being invertible(cid:173)\nthe matrix C I is of dimensionality (M - N) x N. \nEncoding refers to the production of an M dimensional binary code word t E {O, l}M \n(M > N) from the original message e E {O,l}N by t = GTe (mod 2), where all \noperations are performed in the field {a, I} and are indicated by (mod 2). The generator \nmatrix is G = [1 I C2I C I ] (mod 2), where 1 is the N x N identity matrix, implying that \nAGT (mod 2) = \u00b0 and that the first N bits oft are set to the message e. In regular Gallager \ncodes the number of non-zero elements in each row of A is chosen to be exactly K . The \nnumber of elements per column is then C = (1 - R)K, where the code rate is R = N I M \n/8(( - 1). The received vector takes the form r = GTe +, (mod 2). \n(for unbiased messages). The encoded vector t is then corrupted by noise represented by \nthe vector, E {O, l}M with components independently drawn from P( () = (1- J)8( () + \n\nDecoding is carried out by multiplying the received message by the matrix A to produce \nthe syndrome vector z = Ar = A, (mod 2) from which an estimate T for the noise \nvector can be produced. An estimate for the original message is then obtained as the first \nN bits of r + T (mod 2). The Bayes optimal estimator (also known as marginal posterior \nmaximizer, MPM) for the noise is defined as Tj = argmaxr . P(Tj I z) , where Tj E {a, I}. \nThe performance of this estimator can be measured by the probability of bit error Pb = \n1 - 11M ~~1 8[Tj; (j], where 8[;] is Kronecker's delta. Knowing the matrices C 2 and \nC I , the syndrome vector z and the noise level/it is possible to apply Bayes' theorem and \ncompute the posterior probability \n\n1 \n\nP(r I z) = ZX [z = Ar(mod 2)] P(r), \n\n1 \n\n(1) \n\nwhere X[X] is an indicator function providing 1 if X is true and \u00b0 otherwise. To compute \n\nthe MPM one has to compute the marginal posterior Ph I z) = ~i#j P(r I z), which \nin general requires O(2M) operations, thus becoming impractical for long messages. To \nsolve this problem one can use the sparseness of A to design algorithms that require O(M) \noperations to perform the same task. One of these methods is the probability propagation \nalgorithm (PP), also known as belief propagation or sum-product algorithm [16]. \nThe connection to statistical physics becomes clear when the field {a, I} is replaced by \nIsing spins {\u00b1 I} and mod 2 sums by products [9] . The syndrome vector acquires the form \nof a multi-spin coupling Jp, = TIjE.c(p,) (j where j = 1,\u00b7 .. ,M and f..L = 1,\u00b7\u00b7 . , (M - N). \n\n\fFigure 1: Husimi cactus with K = 3 and connectivity C = 4. \n\nThe K indices of nonzero elements in the row f.L of a matrix A, which is not necessarily a \nconcatenation of two separate matrices (therefore, defining an unstructured Gallager code), \nare given by C(f.L) = {it,'\" ,jK}, and in a column l are given by M(l) = {f.Ll'\u00b7\u00b7\u00b7' f.Lc}. \nThe posterior (1) can be written as the Gibbs distribution [12]: \nP{3 (T 1.1) = -Zl lim exp [-,81l'Y (Tj .1) 1 \n\n(2) \n\n1'--+00 \n\n- , M f (.1fA II Tj - 1) -F t Tj . \n\nj=l \n\nfA=l \n\njE\u00a3.(fA) \n\nThe external field corresponds to the prior probability over the noise and has the form \nF = atanh(l- 2J). Note that the Hamiltonian depends on a hyper-parameter that has to be \nset as , -t 00 for optimal decoding. The disorder is trivial and can be gauged as .1fA f-t 1 \nby using Tj f-t Tj (j. The resulting Hamiltonian is a multi-spin ferromagnet with finite \nconnectivity in a random field h j = F(j. The decoding process corresponds to finding \nlocal magnetizations at temperature,8 = 1, mj = (Tj) (3=1 and calculating estimates as \nTj = sgn(mj). \nIn the {\u00b1 1 } representation the probability of bit error, acquires the form \n\n11M \n\nPb = 2' - 2M L(j sgn(mj), \n\nj=l \n\n(3) \n\nconnecting the code performance with the computation of local magnetizations. \n\n3 Bethe-like Lattice calculation \n\n3.1 Generalized Bethe lattice: the \"usimi cactus \n\nA Husimi cactus with connectivity C is generated starting with a polygon of K vertices \nwith one Ising spin in each vertex (generation 0). All spins in a polygon interact through \na single coupling .1fA and one of them is called the base spin. In figure 1 we show the first \nstep in the construction of a Husimi cactus, in a generic step the base spins of the n - 1 \ngeneration polygons, numbering (C -l)(K -1), are attached to K -1 vertices ofa gen(cid:173)\neration n polygon. This process is iterated until a maximum generation nmax is reached, \nthe graph is then completed by attaching C uncorrelated branches of nmax generations at \ntheir base spins. In that way each spin inside the graph is connected to exactly C poly(cid:173)\ngons. The local magnetization at the centre mj can be obtained by fixing boundary (initial) \nconditions in the O-th generation and iterating recursion equations until generation nmax \nis reached. Carrying out the calculation in the thermodynamic limit corresponds to having \nnmax \"\" In M generations and M -t 00. \nThe Hamiltonian of the model has the form (2) where C(f.L) denotes the polygon f.L of the \nlattice. Due to the tree-like structure, local quantities far from the boundary can be cal-\n\n\fculated recursively by specifying boundary conditions. The typical decoding performance \ncan therefore be computed exactly without resorting to replica calculations [17]. \n\n3.2 Recursion relations: probability propagation \n\nWe adopt the approach presented in [18] where recursion relations for the probability dis(cid:173)\ntribution Pl-'k(Tk) of the base spin of the polygon J-L is connected to (C - I)(K - I) dis(cid:173)\ntributions Pvj (Tj), with v E M (j) \\ J-L (all polygons linked to j but J-L) of polygons in the \nprevious generation: \n\nPl-'k(Tk) = ~ Tr{Tj} exp [(3 (.1I-'Tk II Tj -I) + FTk] \n\nII \n\njE'c(I-')\\k \n\nvEM(j)\\l-'jE'c(I-')\\k \n\nII Pvjh), \n(4) \n\nwhere the trace is over the spins Tj such that j E C(J-L) \\ k. \nThe effective field Xvj on a base spin j due to neighbors in polygon v can be written as : \n\nexp (-2x .) = e2F Pvj( -) \nPvj (+)' \n\nVJ \n\n(5) \n\nCombining (4) and (5) one finds the recursion relation: \n\n~ \n\nTrh} exp [-(3.11-' ITjE'c(I-')\\k Tj + EjE'c(I-')\\k(F + EVEMU)\\I-' XVj)Tj] \nexp(-2xl-'k)=------~~------------------------------------~ \nTrh} exp [+(3.11-' ITjE'c(I-')\\k Tj + EjE\u00a3(I-')\\k(F + EVEMU)\\I-' XVj)Tj] \n(6) \n\nBy computing the traces and taking (3 -+ 00 one obtains: \n\nXI-'k = atanh [.11-' II tanh(F + L XVj)] \n\njE'c(I-')\\k \n\nVEMU)\\I-' \n\n(7) \n\nThe effective local magnetization due to interactions with the nearest neighbors in one \nbranch is given by ml-'j = tanh (x I-'j). The effective local field on a base spin j of a polygon \nJ-L due to C - 1 branches in the previous generation and due to the external field is XI-'j = \nF + EVEMU)\\I-' Xvj; the effective local magnetization is, therefore, ml-'j = tanh(xl-'j). \nEquation (7) can then be rewritten in terms ofml-'j and ml-'j and the PP equations [7,15,16] \ncan be recovered: \n\nml-'k = tanh (F + L \n\nvEMU)\\1-' \n\natanh (mVk)) \n\nml-'k = .11-' \n\nII ml-'j \n\njE'c(I-')\\k \n\n(8) \n\nOnce the magnetizations on the boundary (O-th generation) are assigned, the local magne(cid:173)\ntization mj in the central site is determined by iterating (8) and computing: \n\nmj = tanh (F + L atanh (mVj)) \n\nvEMU) \n\n(9) \n\n3.3 Probability propagation as extremization of a free-energy \n\nThe equations (8) describing PP decoding represent extrema of the following free-energy: \n\n.1'( {ml-'k' ml-'d) = L L In(1 + ml-'iml-'i) - L In(1 + .11-' II ml-'i) \n\n(10) \n\nM-N \n\n1-'=1 \n\niE'c \n\nM-N \n\n1-'=1 iE'c \n\n\f0.8 \n\n/\\ \n::2: 0.6 \nI:> \n0> \nV 0.4 \n\n0.2 \n\n00 \n\n(b) \n\n, , , , , \n, , \n, \n, , \n, , , , \n\u2022 , , \n\u2022 \n\u2022 \n\no. \n\n0.6 \n\n0.4 \n\n0.2 \n\na:: \n\n0.1 \n\n0.2 \nf \n\n0.3 \n\n0.4 \n\n00 \n\n0.1 \n\n0.2 \n\n0.3 \n\n0.4 \n\n0.5 \n\nFigure 2: (a) Mean normalized overlap between the actual noise vector C and decoded \nnoise T for K = 4 and C = 3 (therefore R = 1/4). Theoretical values (D), experimental \naverages over 20 runs for code word lengths M = 5000 (e) and M = 100 (full line). \n(b) Transitions for K = 6. Shannon's bound (dashed line), information theory upper \nbound (full line ) and thermodynamic transition obtained numerically (0). Theoretical (0) \nand experimental (+, M = 5000 averaged over 20 runs) PP decoding transitions are also \nshown. In both figures, symbols are chosen larger than the error bars. \n\ntin reF II (1 + mlJ.j) + e-F II (1- mlJ.j)] \n\nIJ.EM(j) \n\nj=l \n\nIJ.EM(j) \n\nThe iteration of the maps (8) is actually one out of many different methods of finding \nextrema of this free-energy (not necessarily stable) . This observation opens an alternative \nway for analyzing the performance of a decoding algorithm by studying the landscape (10). \n\n4 Typical performance \n\n4.1 Macroscopic description \n\nThe typical macroscopic states of the system during decoding can be described by his(cid:173)\ntograms of the variables mlJ.k and mlJ.k averaged over all possible realizations of the noise \nvector C. By applying the gauge transformation:flJ. r-+ 1 and Tj r-+ Tj(j, assigning the \nprobability distributions Po (x) to boundary fields and averaging over random local fields \nF( one obtains from (7) the recursion relation in the space of probability distributions \nP(x): \n\n= \n\n(11) \n\nwhere Pn(x) is the distribution of effective fields at the n-th generation due to the previous \ngenerations and external fields, in the thermodynamic limit the distribution far from the \nboundary will be Poo(x) (generation n -+ (0). The local field distribution at the central \nsite is computed by replacing C - 1 by C in (11), taking into account C polygons in the \ngeneration just before the central site, and inserting the distribution P 00 (x) . Equations (11) \nare identical to those obtained by the replica symmetric theory as in [12] . \n\n\fBy setting initial (boundary) conditions Po(x) and numerically iterating (11), for C ~ 3 \none can find, up to some noise level ls, a single stable fixed point at infinite fields, corre(cid:173)\nsponding to a totally aligned state (successful decoding). At ls a bifurcation occurs and \ntwo other fixed points appear, stable and unstable, the former corresponding to a misaligned \nstate (decoding failure). This situation is identical to that one observed in [12]. In terms of \nthe free-energy (10), below ls the landscape is dominated by the aligned state that is the \nglobal minimum. Above ls a sub-optimal state, corresponding to an exponentially large \nnumber of spurious local minima of the Hamiltonian (2), appears and convergence to the \ntotally aligned state becomes a difficult task. At some critical noise level the totally aligned \nstate loses the status of global minimum and the thermodynamic transition occurs. \nThe practical PP decoding is performed by setting initial conditions as ml-'j = 1 - 21, \ncorresponding to the prior probabilities and iterating (8), until stationarity or a maximum \nnumber of iterations is attained. The estimate for the noise vector is then produced by com(cid:173)\nputing Tj = sign(mj). At each decoding step the system can be described by histograms \nof the variables (8), this is equivalent to iterating (11) (a similar idea was presented in [7]). \nBelow ls the process always converges to the successful decoding state, above ls it con(cid:173)\nverges to the successful decoding only if the initial conditions are fine tuned, in general \nthe process converges to the failure state. In Fig.2a we show the theoretical mean overlap \nbetween actual noise C and the estimate T as a function of the noise levell, as well as \nresults obtained with PP decoding. \n\nInformation theory provides an upper bound for the maximum attainable code rate by \nequalizing the maximal information contents of the syndrome vector z and of the noise \nestimate T [7]. The thermodynamic phase transition obtained by finding the stable fixed \npoints of (11) and their free-energies interestingly coincides with this upper bound within \nthe precision of the numerical calculation. Note that the performance predicted by thermo(cid:173)\ndynamics is not practical as it requires O(2M) operations for an exhaustive search for the \nglobal minimum of the free-energy. In Fig.2b we show the thermodynamic transition for \nK = 6 and compare with the upper bound, Shannon's bound and the theoretical ls values. \n\n4.2 Tree-like approximation and the thermodynamic limit \n\nThe geometrical structure of a Gallager code defined by the matrix A can be represented \nby a bipartite graph (Tanner graph) [16] with bit and check nodes. Each column j of A \nrepresents a bit node and each row J.L represents a check node, AI-'j = 1 means that there \nis an edge linking bit j to check J.L. It is possible to show that for a random ensemble of \nregular codes, the probability of completing a cycle after walking l edges starting from an \narbitrary node is upper bounded by P[l; K, C, M] :-:; l2 Kl 1M. It implies that for very large \nM only cycles of at least order In M survive. In the thermodynamic limit M -+ 00 the \nprobability P [l; K, C, M] -+ a for any finite l and the bulk of the system is effectively tree(cid:173)\nlike. By mapping each check node to a polygon with K bit nodes as vertices, one can map \na Tanner graph into a Husimi lattice that is effectively a tree for any number of generations \nof order less than In M. It is experimentally observed that the number of iterations of (8) \nrequired for convergence does not scale with the system size, therefore, it is expected that \nthe interior of a tree-like lattice approximates a Gallager code with increasing accuracy as \nthe system size increases. Fig.2a shows that the approximation is fairly good even for sizes \nas small as M = 100. \n\n5 Conclusions \n\nTo summarize, we solved exactly, without resorting to the replica method, a system rep(cid:173)\nresenting a Gallager code on a Husimi cactus. The results obtained are in agreement with \nthe replica symmetric calculation and with numerical experiments carried out in systems \n\n\fof moderate size. The framework can be easily extended to MN and similar codes. New \ninsights on the decoding process are obtained by looking at a proper free-energy landscape. \nWe believe that methods of statistical physics are complimentary to those used in the statis(cid:173)\ntical inference community and can enhance our understanding of general graphical models. \n\nAcknowledgments \n\nWe acknowledge support from EPSRC (GRlN00562), The Royal Society (RV,DS) and \nfrom the JSPS RFTF program (YK). \n\nReferences \n\n[1] Plefka, T., (1982) Convergence condition of the TAP equation for the infinite-ranged Ising spin \nglass model. Journal of Physics A 15, 1971-1978. \n\n[2] Tanaka, T., Information geometry of mean field approximation. to appear in Neural Computation \n\n[3] Saul, L.K. & , M.L Jordan (1996) Exploiting tractable substructures in intractable. In Touretzky, \nD. S. , M. C. Mozer and M. E. Hasselmo (eds.), Advances in Neural Information Processing Systems \n8, pp. 486-492. Cambridge, MA: MIT Press. \n\n[4] Frey, B.J. & D.J.C. MacKay (1998) A revolution: belief propagation in graphs with cycles. In \nJordan, M.L, M. J. Kearns and S.A. Solla (eds.), Advances in Neural Information Processing Systems \n10, pp. 479-485 . Cambridge, MA: MIT Press. \n\n[5] Berrou, C. & A. Glavieux (1996) Near optimum error correcting coding and decoding: Turbo(cid:173)\ncodes, IEEE Transactions on Communications 44,1261-1271. \n\n[6] Gallager, R.G. (1963) Low-density parity-check codes, MIT Press, Cambridge, MA. \n\n[7] MacKay, D.J.C. (1999) Good error-correcting codes based on very sparse matrices, IEEE Trans(cid:173)\nactions on Information Theory 45, 399-431. \n\n[8] Kanter, 1. & D. Saad (2000) Finite-size effects and error-free communication in Gaussian chan(cid:173)\nnels, Journal of Physics A 33, 1675-1681. \n\n[9] Sourlas, N. (1989) Spin-glass models as error-correcting codes, Nature 339, 693-695. \n\n[10] Derrida, B. (1981) Random-energy model: an exactly solvable model of disordered systems, \nPhysical Review B 24(5),2613-2626. \n\n[11] Vicente, R., D. Saad & Y. Kabashima (1999) Finite-connectivity systems as error-correcting \ncodes, Physical Review E 60(5), 5352-5366. \n\n[12] Kabashima, Y., T. Murayama & D.Saad (2000) Typical performance of Gallager-type error(cid:173)\ncorrecting codes, Physical Review Letters 84 (6), 1355-1358. \n\n[13] Montanari, A. & N. Sourlas (2000) The statistical mechanics of turbo codes, European Physical \nJournal B 18,107-119. \n\n[14] Sherrington, D. & K.Y.M. Wong (1987) Graph bipartitioning and the Bethe spin glass, Journal \nof Physics A 20, L 785-L791. \n\n[15] Kabashima, Y. & D. Saad (1998) Belief propagation vs. TAP for decoding corrupted messages, \nEurophysics Letters 44 (5), 668-674. \n\n[16] Kschischang, F.R. & B.J. Frey, (1998) Iterative decoding of compound codes by probability \nprobagation in graphical models, IEEE Journal on Selected Areas in Comm. 16 (2), 153-159. \n\n[17] Gujrati, P.D. (1995) Bethe or Bethe-like lattice calculations are more reliable than conventional \nmean-field calculations, Physical Review Letters 74 (5) , 809-812. \n\n[18] Rieger, H. & T.R. Kirkpatrick (1992) Disordered p-spin interaction models on Husirni trees, \nPhysical Review B 45 (17), 9772-9777. \n\n\f", "award": [], "sourceid": 1803, "authors": [{"given_name": "Renato", "family_name": "Vicente", "institution": null}, {"given_name": "David", "family_name": "Saad", "institution": null}, {"given_name": "Yoshiyuki", "family_name": "Kabashima", "institution": null}]}