{"title": "The Belief in TAP", "book": "Advances in Neural Information Processing Systems", "page_first": 246, "page_last": 252, "abstract": null, "full_text": "The Belief in TAP \n\nYoshiyuki Kabashima \n\nDept. of Compt. IntI. & Syst. Sci. \n\nTokyo Institute of Technology \n\nDavid Saad \n\nNeural Computing Research Group \n\nAston University \n\nYokohama 226, Japan \n\nBirmingham B4 7ET, UK \n\nAbstract \n\nWe show the similarity between belief propagation and TAP, for \ndecoding corrupted messages encoded by Sourlas's method. The \nlatter is a special case of the Gallager error-correcting code, where \nthe code word comprises products of J{ bits selected randomly from \nthe original message. We examine the efficacy of solutions obtained \nby the two methods for various values of J{ and show that solutions \nfor J{ 2': 3 may be sensitive to the choice of initial conditions in \nthe case of unbiased patterns. Good approximations are obtained \ngenerally for J{ = 2 and for biased patterns in the case of J{ 2': 3, \nespecially when Nishimori's temperature is being used. \n\n1 \n\nIntroduction \n\nBelief networks [1] are diagrammatic representations of joint probability distribu(cid:173)\ntions over a set of variables. This set is usually represented by the vertices of \na graph, while arcs between vertices represent probabilistic dependencies between \nvariables . Belief propagation provides a convenient mathematical tool for calculat(cid:173)\ning iteratively joint probability distributions between variables and have been used \nin a variety of cases, most recently in the field of error correcting codes, for decoding \ncorrupted messages [2] (for a review of graphical models and their use in the context \nof error-correcting codes see [3]). \n\nError-correcting codes provide a mechanism for retrieving the original message after \ncorruption due to noise during transmission. Of a particular interest to the current \npaper is an error-correcting code presented by Sourlas [4] which is a special case of \nthe Gallager codes [5]. The latter have been recently re-discovered by MacKay and \nNeal [2] and seem to have a significant practical potential. \n\nIn this paper we will examine the similarities between the belief propagation (BP) \nand TAP approaches, used to decode corrupted messaged encoded by Sourlas's \nmethod , and compare the solutions obtained by both approaches to the exact results \nobtained using the replica method [8]. The statistical mechanics approach will then \n\n\fThe Belie/in TAP \n\n247 \n\nallow us to draw some conclusion on the efficacy of the TAP /BP approach in the \ncontext of error correcting codes. \n\nThe paper is arranged in the following manner: In section 2 we will introduce the \nencoding method and describe the decoding task. The Belief Propagation approach \nto the decoding process will be introduced in section 3 and will be compared to the \nTAP approach for diluted spin systems in section 4. Numerical solutions for various \ncases will be presented in section 5 and we will summarize our results and discuss \ntheir implications in section 6. \n\n2 The decoding problem \nIn a general scenario, a message represented ~ an N dimensional binary vector e \nis encoded by a vector JO which is then transmitted through a noisy channel with \nsome flipping probability p per bit. The received message J is then decoded to \nretrieve the original message. Sourlas's code [4], is based on encoded message bits \n. . iK = ~il ei 2 . . . eiK , taking the product of different J{ message sites \nof the form JPl,i 2 \nfor each code word bit . \n\nIn the statistical mechanics approach we will attempt to retrieve the original mes(cid:173)\nsage by exploring the ground state of the following Hamiltonian which corresponds \nto the preferred state of the system in terms of 'energy' \n\n1{=- L Ah , ... iK) J(il , .. iK) Si l \u00b7 \u00b7 .SiK - F/f3LSk , \n\n(1) \n\n(i1, ... iK) \n\nk \n\nwhere S is an N dimensional binary vector of dynamical variables and A is a sparse \ntensor with C unit elements per index (other elements are zero), which determines \nthe components of JO. The last term on the right is required in the case of sparse \n(biased) messages and will require assigning a certain value to the additive field \nF / f3, related to the prior belief in the Bayesian framework. \nThe statistical mechanical analysis can be easily linked to the Bayesian frame(cid:173)\nwork [4] in which one focuses on the posterior probability using Bayes theorem \nP(SIJ)\",,-, IT!' P(J!'IS) Po(S) where jJ runs over the message components and Po(S) \nrepresents the prior . Knowing the posterior one can calculate the typical retrieved \nmessage elements and their alignment, which correspond to the Bayes-optimal de(cid:173)\ncoding. The logarithms of the likelihood and prior terms are directly related to the \nfirst and second components of the Hamiltonian (Eq.l). \n\nOne should also note that A(il , . iK) J(il , . i K) represents a similar encoding scheme \nto that of Ref. [2] where a sparse matrix with J{ non-zero elements per row multiplies \nthe original message e and the resulting vector, modulo 2, is transmitted. \nSourlas analyzed this code in the cases of J{ = 2 and J{ -+ 00, where the ratio \nC / J{ -+ 00 , by mapping them onto the SK [9] and Random Energy [10] models \nrespectively. However, the ratio R = J{ / C constitutes the code rate and the scenarios \nexamined by Sourlas therefore correspond to the limited case of a vanishing code \nrate. The case of finite code rate, which we will consider here , has only recently \nbeen analyzed [8]. \n\n3 Decoding by belief propagation \n\nAs our goal, of calculating the posterior of the system P( S IJ) is rather difficult , we \nresort to the methods of BP, focusing on the calculation of conditional probabilities \nwhen some elements of the system are set to specific values or removed. \n\n\f248 \n\nY Kabashima and D. Saad \n\nThe approach adopted in this case, which is quite similar to the practical approach \nemployed in the case of Gallager codes [2], assumes a two layer system corresponding \nto the elements of the corrupted message J and the dynamical variables 5 respec(cid:173)\ntively, defining conditional probabilities which relate elements in the two layers: \n\nr~1 \n\n(2) \nP(JI'ISI=X,{JII~I'}) = L P(JI'ISI=X,{Sk#}) P({Sk~dl{JII~I'}) , \n\n{ Sk;tz} \n\nwhere the index J.l represents an element of the received vector message J, con(cid:173)\nstituted by a particular choice of indices i 1 , .. . iK, which is connected to the cor(cid:173)\nresponding index of 5 (l in the first equation), i.e., for which the corresponding \nelement A(i1, ... iK) is non-zero; the notation {Sk~d refers to all elements of 5, ex(cid:173)\ncluding the I-th element, which are connected to the corresponding index of J \n(J.l \nin this case for the second equation); the index x can take values of \u00b11. The con(cid:173)\nditional probabilities q~1 and r~1 will enable us, through recursive calculations to \nobtain an approximated expression to the posterior. \nEmploying Bayes rule and the assumption that the dependency of SI on \nan element JII \nis factorizable and vice versa: P(SI 1 ,SI 2 \",SIKI{Jvtl'}) = \nnf=lP(Slkl{JIlt/'}) and P({J/J~I'} ISI=x) = nllt/'P(JIIISI=X,{J(7~II})' \none can rewrite a set of coupled equations for q!/' q;/ , r!1 and r;/ of the form \n\nq~1 = al'l PI IT r:1 and r~1 = L P (J I' lSI = x, {Sk#}) IT q!Z ' \n\n(3) \n\n/J~I' \n\n{Sk;tz} \n\nk~1 \n\nwhere al'l is a normalizing factor such that q~1 + q;/ = 1 and pf = P (SI = x) are \nour prior beliefs in the value of the source bits SI. \n\nThis set of equations can be solved iteratively [2] by updating a coupled set of differ(cid:173)\nence equations for 8ql'I = q~/-q;/ and 8rl'I = r~l-r;/, derived for this specific model, \nmaking use of the fact that the variables r~/' and sub-sequentially the variables q~/' \ncan be calculated by exploiting the relation r;/ = (1\u00b18rl'l)/2 and Eq.(3). At each \niteration we can also calculate the pseudo-posterior probabilities qf = alPI nil r~/' \nwhere al are normalizing factors, to determine the current estimated value of SI. \n\nTwo points that are worthwhile noting: Firstly, the iterative solution makes use of \nthe normalization r~/+r;/ = 1, which is not derived from the basic probability rules \nand makes implicit assumptions about the probabilities of obtaining SI = \u00b11 for all \nelements I. Secondly, the iterative solution would have provided the true posterior \nprobabilities qf if the graph connecting the message J and the encoded bits 5 would \nhave been free of cycles, i.e., if the graph would have been a tree with no recurrent \ndependencies among the variables. The fact that the framework provides adequate \npractical solutions has only recently been explained [13]. \n\n4 Decoding by TAP \n\nWe will now show that for this particular problem it is possible to obtain a similar \nset of equations from the corresponding statistical mechanics framework based on \nBethe approximation [11] or the TAP (Thouless-Anderson-Palmer) approach [12] \nto diluted systems 1 . In the statistical mechanics approach we assign a Boltzmann \n\n1 The terminology in the case of diluted systems is slightly vague. Unlike in the case \nof fully connected systems, self consistent equations of diluted systems cannot be derived \n\n\fThe Beliefin TAP \n\n249 \n\nweight to each set comprising an encoded message bit J II. and a dynamical vector S \n\nWE (JII.IS) = e-{3 9(1I'IS) , \n\n(4) \nsuch that the first term of the system's Hamiltonian (Eq.1) can be rewritten as \nL II. g ( Jil.l S) , where the index J..l runs over all non-zero sites in the multidimensional \ntensor A. We will now employ two straightforward assumptions to write a set of \ncoupled equations for the mean field q~1 == P(511 {Jvtll.})' which may be identified \nas the same variable as in the belief network framework (Eq.2) , and the effective \nBoltzmann weight weff (J 11.151, {J vtll.}): \n1) we assume a mean field behavior for the dependence of the dynamical variables \nS on a certain realization of the message sites J, i.e., the dependence is factorizable \nand may be replaced by a product of mean fields . \n2) Boltzmann weights (effective) for site 51 are factorizable with respect to J 11.. \nThe resulting set of equations are of the form \n\nweff(J1l. 151, {Jvtll.}) = Tr{Sk;l!z} WE (JII. 1 S) II q~r \n\nqSl \n11.1 \n\n== CLII.I pfl II Weff(Jv 151, {J\"tv}) , \n\nk;tl \n\nvtll. \n\n(5) \n\nwhere CLII.I is a normalization factor and pfl is our prior knowledge of the source's \nbias. Replacing the effective Boltzmann weight by a normalized field, which may \nbe identified as the variable r~1 of Eq.(2), we obtain \n\nr~l = P (51 1 JII.' {Jvtll.}) = ali.I weff(J1l. 151, {Jvtll.}) , \n\n(6) \ni.e., a set of equations equivalent to Eq.(3). The explicit expressions of the normal(cid:173)\nization coefficients, ali.I and CLII.I' are \n\na~l = Tr{s} WE (JII.IS) II q~f \n\nk;tl \n\nand \n\n(7) \n\nThe somewhat arbitrary use of the differences oqll.l = (5i}q and Dril.l = (5i}r in \nthe BP approach becomes clear form the statistical mechanics description, where \nthey represent the expectation values of the dynamical variables with respect to the \nfields . The statistical mechanics formulation also provides a partial answer to the \nsuccessful use of the BP methods to loopy systems , as we consider a finite number \nof steps on an infinite lattice [14]. However, it does not provide an explanation in \nthe case of small systems which should be examined using other methods . \n\nThe formulation so far has been general; however, in the case of Sourlas 's code we \ncan make use of the explicit expression for g to derive the relat\\on between q~l, r;l , \noqll.l and Dril.l as well as an explicit expression for WE (JII.IS,,8) \n\n, r~l = ~ (1 + or1l.151) and \nq~l == ~ (1 + oq1l.151) \nWE (JII.IS ,,8) = ~ cosh ,8JII. (1 + tanh,8J II. II 51) \n\nI H(II.) \n\n, \n\n(8) \n\n(9) \n\nby the perturbation expansion of the mean field equations with respect to Onsager reac(cid:173)\ntion fields since these fields are too large in diluted systems. Consequently, the resulting \nequations are different than those obtained for fully connected systems [12]. We termed \nour approach TAP, following the convention for the Bethe approximation when applied to \ndisordered systems subject to mean field type random interactions. \n\n\f250 \n\nY. Kabashima and D. Saad \n\nwhere .C(J.l) is the set of all sites of S connected to J/.I ' i.e. , for which the corre(cid:173)\nsponding element of the tensor A is non-zero. The explicit form of the equations \nfor 8q/.ll and 8r/.ll becomes \n\n8rjjl=tanhf3J/.I II 8q/.ll \n\nIEC,(/.I)/I \n\nand 8q/.ll =tanh ( L tanh- 1 8rv i + F) , (10) \n\nvEM(l)//.I \n\nwhere M(l)/ J.l is the set of all indices of the tensor J , excluding J.l, which are \nconnected to the vector site I; the external field F which previously appeared in the \nlast term of Eq. (1) is directly related to our prior belief of the message bias \n\n1 \n\npfl = \"2 (1 + tanh FSI) . \n\n(11) \n\nWe therefore showed that there is a direct relation between the equations derived \nfrom the BP approach and from TAP in this particular case. One should note that \nthe TAP approach allows for the use of finite inverse-temperatures f3 which is not \nnaturally included in the BP approach. \n\n5 Numerical solutions \n\nTo examine the efficacy of TAP /BP decoding we used the method for decoding \ncorrupted messages encoded by the Sourlas scheme [4], for which we have previously \nobtained analytical solutions using the replica method [8]. We solved iteratively \nEq.(10) for specific cases by making use of differences 8qJjI and 8r/.ll to obtain the \nvalues of q~l and r'N and of the magnetization M. \nNumerical solutions of 10 individual runs for each value of the flip rate p starting \nfrom different initial conditions, obtained for the case f{ = 2 and C = 4, different \nbiases (J = pi = 0.1, 0.5 - the probability of +1 bit in the original message e) and \ntemperatures (T = 0.26, Tn) are shown in Fig . 1a. For each run , 20000 bit code \nwords JO were generated from 10000 bit message e using a fixed random sparse \ntensor A . The noise corrupted code word J was decoded to retrieve the original \nmessage e. Initial conditions are set to 8r /.II = 0 and 8q/.ll = tanh F reflecting the prior \nbelief; whenever the TAP /BP approach was successful in predicting the theoretical \nvalues we observed convergence in most runs corresponding to the ferromagnetic \nphase while almost all runs at low temperatures did not converged to a stable \nsolution above the critical flip-rate (although the magnetization M did converge as \none may expect) . We obtain good agreement between the TAP /BP solutions and \nthe theoretical values calculated using the methods of [8] (diamond symbols and \ndashed line respectively) . The results for biased patterns at T = 0.26 presented in \nthe form of mean values and standard deviation , show a sub-optimal improvement \nin performance as expected. Obtaining solutions under similar conditions but at \nNishimori's temperature - l/Tn = 1/2In[(1 - p)/p] [7], we see that pattern sparsity \nis exploited optimally resulting in a magnetization M ~ 0.8 for high corruption \nrates , as Tn simulates accurately the loss of information due to channel noise [6 , 7]; \nresults for unbiased patterns (not shown) are not affected significantly by the use \nof Nishimori's temperature. \n\nThe replica-based theoretical solutions [8] indicate a profoundly different behaviour \nfor f{ = 2 in comparison to other f{ values. We therefore obtained solutions for \nJ{ = 5 under similar conditions (which are representative of results obtained in \nother cases of f{ # 2). The results presented in Fig. 1b , in terms of means and \nstandard deviation of 10 individual runs per flip rate value p, are less encouraging \nas the iterative solutions are sensitive to the choice of initial conditions and tend to \n\n\fThe Beliefin TAP \n\n251 \n\n1.2 r---,-----.-----.---,-------, \n\n1.2 r---...,------,------.----,----, \n\na) K=2 \n\n.~~-\nO~i\"\\\\ '\"'--\"'II8fiIllmllIat!J8!jlilalll!. \n0\" \nI \n.. \n\u00b0 s~ \\ \n\nI\\ \n1 \ni) \n\n) \nv'l, \n)o~ \nT =0.26, Unbiased 08: \n~=O.5~ 9 \n' l . T=O.26, Biased \n~ \n(I \n=O.l~ \no~ \no \nP \n\u00b0SoP \n\nt~ \n'1.T \n\u00b7!fi \n\n~'f.t' \n\ni \nI \nMn, Biased \n(1=O.l~ \n\n1 \n\n, \n\nO.S \n\nO 6 \n\u2022 \n\nO 4 \n. \n\n0.2 \n\n.: \n\no 8o:h!l. \n\n?~~ \n\nf~ \n\no \no \n\n0.1 \n\n0.2 \n\np \n\n0.3 \n\n0.4 \n\n0.5 \n\nb) K=5 \n\n... ~ \n\nMn, Biased \n(1=O.l~ \n\ni ,-,~1~ \n/ i 1.001 \n\nT:Tn, Unbiased i \n~=O.5~ , \n\n~~1K-iiooh \n\n0.999 \n\n0.998 \n\n0.997 \n\n0.996 \n\nO.S \n\n0.6 \n\n0.4 \n\n0.2 \n\n0.995 '---'-~~~~---LJ \no 0.020.040.060.06 0.1 0.120.14 \no ~'MttItMi.u., ......... tttftl\"\"Itttt\"'\" \no \n0.4 \n\n0.2 \n\n0.3 \n\n0.1 \n\n0.5 \n\np \n\n(a) For K = 2, \nFigure 1: Numerical solutions for M and different flip rate p. \ndifferent biases (f = pi = 0.1, 0.5) and temperatures (T = 0.26, Tn). Results for the \nunbiased patterns are shown as raw data (10 runs per flip rate value p - diamond), \nwhile the theoretical solution is marked by the dashed line. Results for biased \npatterns are presented by their mean and standard deviation, showing a suboptimal \nperformance as expected for T= 0.26 and an optimal one at Nishimori's temperature \n-Tn. The standard deviation is significantly smaller than the symbol size. Figure \n(b) shows results for the case K = 5 and T = Tn in similar conditions to (a). \nAlso here iterative solutions may generally drift away from the theoretical values \nwhere temperatures other than Tn are employed (not shown); using Nishimori's \ntemperature alleviates the problem only in the case of biased messages and the \nresults are in close agreement with the theoretical solutions (inset - focusing on low \np values). \n\nconverge to sub-optimal values unless high sparsity and the appropriate choice of \ntemperature (Tn) forces them to the correct values, showing then good agreement \nwith the theoretical results (solid line, see inset). This phenomena is indicative of the \nfact that the ground state of the non-biased system is macroscopically degenerate \nwith multiple equally good ground states. \n\nWe conclude that the TAP /BP approach may be highly useful in the case of biased \npatterns but may lead to errors for unbiased patterns and K 2: 3, and that the use \nof the appropriate temperature, i.e., Nishimori's temperature, enables one to obtain \nimproved results, in agreement with results presented elsewhere [4, 6, 7]. \n\n\f252 \n\nY. Kabashima and D. Saad \n\n6 Summary and discussion \n\nWe compared the use of BP to that of TAP for decoding corrupted messages encoded \nby Sourlas's method to discover that in this particular case the two methods provide \na similar set of equations. We then solved the equations iteratively for specific cases \nand compared the results to those obtained by the replica method. The solutions \nindicate that the method is particularly useful in the case of biased messages and \nthat using Nishimori's temperature is highly beneficial; solutions obtained using \nother temperature values may be sub-optimal. For non-sparse messages and l{ 2: 3 \nwe may obtain erroneous solutions using these methods. \n\nIt would be desirable to explore whether the similarity in the equations derived using \nTAP and BP is restricted to this particular case or whether there is a more general \nlink between the two methods. Another important question that remains open \nis the generality of our conclusions on the efficacy of these methods for decoding \ncorrupted messages, as they are currently being applied in a variety of state-of-the(cid:173)\nart coding schemes (e.g., [2,3]). Understanding the limitations ofthese methods and \nthe proper way to use them in general, especially in the context of error-correcting \ncodes, may be highly beneficial to practitioners. \n\nAcknowledgment This work was partially supported by the RFTF program of the JSPS \n(YK) and by EPSRC grant GR/L19232 (DS). \n\nReferences \n\n[1] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible \n\nInference (Morgan Kaufmann) 1988. \n\n[2] D.J .C. MacKay and R.M. Neal, Elect. Lett., 33, 457 and preprint (1997). \n[3] B.J. Frey, Graphical Models for Machine Learning and Digital Communication \n\n(MIT Press), 1998. \n\n[4] N. Sourlas, Nature, 339, 693 (1989) and Europhys. Lett., 25, 159 (1994). \n[5] R.G. Gallager, IRE Trans. Info. Theory, IT-8, 21 (1962). \n[6] P. Rujan, Phys. Rev. Lett., 10, 2968 (1993) . \n[7] H. Nishimori,J. Phys. C, 13,4071 (1980) and J. Phys. Soc. of Japan, 62, 1169 \n\n(1993). \n\n[8] Y. Kabashima and D. Saad, Europhys. Lett., 45, in press (1999). \n[9] D. Sherrington and S. Kirkpatrick, Phys. Rev. Lett., 35, 1792 (1975). \n[10] B. Derrida, Phys. Rev. B, 24, 2613 (1981). \n[11] H. Bethe, Proc. R. Soc. A, 151, 540 (1935) . \n[12] D. Thouless, P.W. Anderson and R.G. Palmer, Phil. Mag., 35, 593 (1977). \n[13] Y. Weiss, MIT preprint CBCL155 (1997). \n[14] D. Sherrington and K.Y.M . Wong J. Phys. A, 20, L785 (1987). \n\n\f", "award": [], "sourceid": 1570, "authors": [{"given_name": "Yoshiyuki", "family_name": "Kabashima", "institution": null}, {"given_name": "David", "family_name": "Saad", "institution": null}]}