{"title": "Approximate Inference and Protein-Folding", "book": "Advances in Neural Information Processing Systems", "page_first": 1481, "page_last": 1488, "abstract": null, "full_text": "Approximate Inference and \n\nProtein-Folding \n\nChen Yanover and Yair Weiss \n\nSchool of Computer Science and Engineering \n\nThe Hebrew University of J erusalem \n\n91904 Jerusalem, Israel \n\n{cheny,yweiss} @cs.huji.ac.it \n\nAbstract \n\nSide-chain prediction is an important subtask in the protein-folding \nproblem. We show that finding a minimal energy side-chain con(cid:173)\nfiguration is equivalent to performing inference in an undirected \ngraphical model. The graphical model is relatively sparse yet has \nmany cycles. We used this equivalence to assess the performance of \napproximate inference algorithms in a real-world setting. Specifi(cid:173)\ncally we compared belief propagation (BP), generalized BP (GBP) \nand naive mean field (MF). \nIn cases where exact inference was possible, max-product BP al(cid:173)\nways found the global minimum of the energy (except in few cases \nwhere it failed to converge), while other approximation algorithms \nof similar complexity did not. In the full protein data set, max(cid:173)\nproduct BP always found a lower energy configuration than the \nother algorithms, including a widely used protein-folding software \n(SCWRL). \n\n1 \n\nIntroduction \n\nInference in graphical models scales exponentially with the number of variables. \nSince many real-world applications involve hundreds of variables, it has been im(cid:173)\npossible to utilize the powerful mechanism of probabilistic inference in these appli(cid:173)\ncations. Despite the significant progress achieved in approximate inference, some \nit is not yet known which algorithm to use \npractical questions still remain open -\nfor a given problem nor is it understood what are the advantages and disadvan(cid:173)\ntages of each technique. We address these questions in the context of real-world \nprotein-folding application -\n\nthe side-chain prediction problem. \n\nPredicting side-chain conformation given the backbone structure is a central prob(cid:173)\nlem in protein-folding and molecular design. It arises both in ab-initio protein(cid:173)\nfolding (which can be divided into two sequential tasks -\nthe generation of native(cid:173)\nlike backbone folds and the positioning of the side-chains upon these backbones [6]) \nand in homology modeling schemes (where the backbone and some side-chains are \nassumed to be conserved among the homologs but the configuration of the rest of \nthe side-chains needs to be found). \n\n\fFigure 1: Cow actin binding protein (PDB code 1pne, top) and closer view of its 6 \ncarboxyl-terminal residues (bottom-left). Given the protein backbone (black) and \namino acid sequence, native side-chain conformation (gray) is searched for. Problem \nrepresentation as a graphical model for those carboxyl-terminal residues shown in \nthe bottom-right figure (nodes located at COl atom positions, edges drawn in black). \n\nIn this paper, we show the equivalence between side-chain prediction and inference \nin an undirected graphical model. We compare the performance of BP, generalized \nBP and naive mean field on this problem as well as comparing to a widely used \nprotein-folding program called SCWRL. \n\n2 The side-chain prediction problem \n\nProteins are chains of simpler molecules called amino acids. All amino acids have \na central carbon atom (COl) to which a hydrogen atom, \na common structure -\nan amino group (N H 2 ) and a carboxyl group (COOH) are bonded. In addition, \neach amino acid has a chemical group called the side-chain, bound to COl. This \ngroup distinguishes one amino acid from another and gives its distinctive properties. \nAmino acids are joined end to end during protein synthesis by the formation of \npeptide bonds. An amino acid unit in a protein is called a residue. The formation \nof a succession of peptide bonds generate the backbone (consisting of COl and its \nadjacent atoms, N and CO, of each reside), upon which the side-chains are hanged \n(Figure 1). \n\n\fWe seek to predict the configuration of all the side-chains relative to the backbone. \nThe standard approach to this problem is to define an energy function and use the \nconfiguration that achieves the global minimum of the energy as the prediction. \n\n2.1 The energy function \n\nWe adopted the van der Waals energy function, used by SCWRL [3], which approx(cid:173)\nimates the repulsive portion of Lennard-Jones 12-6 potential. For a pair of atoms, \na and b, the energy of interaction is given by: \n\nE(a, b) = { -k2 :'0 + k~ \n\nEmax \n\nd> Ro \nRo ~ d ~ k1Ro \nk1Ro > d \n\n(1) \n\nwhere Emax = 10, kl = 0.8254 and k2 = ~~k;' d denotes the distance between \na and band Ro is the sum of their radii. Constant radii were used for protein's \natoms (Carbon - 1.6A, Nitrogen and Oxygen - 1.3A, Sulfur - 1.7 A). For two sets \nof atoms, the interaction energy is a sum of the pairwise atom interactions. The \nenergy surface of a typical protein in the data set has dozens to thousands local \nminima. \n\n2.2 Rotamers \n\nThe configuration of a single side-chain is represented by at most 4 dihedral angles \n(denoted Xl,X2,X3 and X4)' Any assignment of X angles for all the residues defines \na protein configuration. Thus the energy minimization problem is a highly nonlinear \ncontinuous optimization problem. \n\nIt turns out, however, that side-chains have a small repertoire of energetically pre(cid:173)\nferred conformations, called rotamers. Statistical analysis of those conformations in \nwell-determined protein structures produce a rotamer library. We used a backbone \ndependent rotamer library (by Dunbrack and Kurplus, July 2001 version). Given \nthe coordinates of the backbone atoms, its dihedral angles \u00a2 (defined, for the ith \nresidue, by Ci - 1 - Ni - Ci - Ci ) and 'IjJ (defined by Ni - Ci - Ci - NHd were \ncalculated. The library then gives the typical rotamers for each side-chain and their \nprior probabilities. \n\nBy using the library we convert the continuous optimization problem into a discrete \none. The number of discrete variables is equal to the number of residues and the \npossible values each variable can take lies between 2 and 81. \n\n2.3 Graphical model \n\nSince we have a discrete optimization problem and the energy function is a sum of \npairwise interactions, we can transform the problem into a graphical model with \npairwise potentials. Each node corresponds to a residue, and the state of each node \nrepresents the configuration of the side chain of that residue. Denoting by {rd an \nassignment of rotamers for all the residues then: \n\nP({ri}) = !e- +E({r;}) \n\nZ \n\n!e -+ L;j E(r;)+E(r;,rj) \nZ \n1 Z II 'lti(ri) II 'ltijh,rj) \n\n(2) \n\nwhere Z is an explicit normalization factor and T is the system \"temperature\" \n(used as free parameter). The local potential 'ltih) takes into account the prior \n\ni \n\ni ,j \n\n\fprobability of the rotamer Pi(ri) (taken from the rotamer library) and the energy \nof the interactions between that rotamer and the backbone: \n\n\\(Ii(ri) = Pi (ri)e-,j,E(ri ,backbone) \n\n(3) \nEquation 2 requires multiplying \\(I ij for all pairs of residues i, j but note that equa(cid:173)\ntion 1 gives zero energy for atoms that are sufficiently far away. Thus we only need \nto calculate the pairwise interactions for nearby residues. To define the topology of \nthe undirected graph, we examine all pairs of residues i, j and check whether there \nexists an assignment ri, rj for which the energy is nonzero. If it exists, we connect \nnodes i and j in the graph and set the potential to be: \n\n(4) \n\nFigure 1 shows a subgraph of the undirected graph. The graph is relatively sparse \n(each node is connected to nodes that are close in 3D space) but contains many \nsmall loops. A typical protein in the data set gives rise to a model with hundreds \nof loops of size 3. \n\n3 Experiments \n\nWhen the protein was small enough we used the max-junction tree algorithm [1] to \nfind the most likely configuration of the variables (and hence the global minimum \nof the energy function). Murphy's implementation of the JT algorithm in his BN \ntoolbox for Matlab was used [10]. \n\nThe approximate inference algorithms we tested were loopy belief propagation (BP), \ngeneralized BP (GBP) and naive mean field (MF). \n\nBP is an exact and efficient local message passing algorithm for inference in singly \nconnected graphs [15]. Its essential idea is replacing the exponential enumeration \n(either summation or maximizing) over the unobserved nodes with series of lo(cid:173)\ncal enumerations (a process called \"elimination\" or \"peeling\"). Loopy BP, that is \napplying BP to multiply connected graphical models, may not converge due to cir(cid:173)\nculation of messages through the loops [12]. However, many groups have recently \nreported excellent results using loopy BP as an approximate inference algorithm \n[4, 11, 5]. We used an asynchronous update schedule and ran for 50 iterations or \nuntil numerical convergence. \n\nGBP is a class of approximate inference algorithms that trade complexity for ac(cid:173)\ncuracy [15]. A subset of GBP algorithms is equivalent to forming a graph from \nclusters of nodes and edges in the original graph and then running ordinary BP on \nthe cluster graph. We used two large clusters. Both clusters contained all nodes \nin the graph but each cluster contained only a subset of the edges. The first clus(cid:173)\nter contained all edges resulting from residues, for which the difference between \nits indices is less than a constant k (typically, 6). All other edges were included \nin the second cluster. It can be shown that the cluster graph BP messages can \nbe computed efficiently using the JT algorithm. Thus this approximation tries to \ncapture dependencies between a large number of nodes in the original graph while \nmaintaining computational feasibility. \n\nThe naive MF approximation tries to approximate the joint distribution in equa(cid:173)\ntion 2 as a product of independent marginals qi(ri) . The marginals qi(ri) can be \nfound by iterating: \n\nqi(ri) f- a\\(li(ri) exp (L L qj(rj) log \\(Iij(ri, rj )) \n\nJENi rj \n\n(5) \n\n\fwhere a denotes a normalization constant and Ni means all nodes neighboring i. \nWe initialized qi(ri) to \\[Ii(ri) and chose a random update ordering for the nodes. \nFor each protein we repeated this minimization 10 times (each time with a different \nupdate order) and chose the local minimum that gave the lowest energy. \n\nIn addition to the approximate inference algorithms described above, we also com(cid:173)\npared the results to two approaches in use in side-chain prediction: the SCWRL and \nDEE algorithms. The Side-Chain placement With a Rotamer Library (SCWRL) \nalgorithm is considered one of the leading algorithms for predicting side-chain con(cid:173)\nformations [3]. \nheuristic search strategy to find a minimal energy conformation in a discrete con(cid:173)\nformational space (defined using rotamer library). \n\nIt uses the energy function described above (equation 1) and a \n\nDead end elimination (DEE) is a search algorithm that tries to reduce the search \nspace until it becomes suitable for an exhaustive search. It is based on a simple \ncondition that identifies rotamers that cannot be members of the global minimum \nenergy conformation [2]. If enough rotamers can be eliminated, the global mini(cid:173)\nmum energy conformation can be found by an exhaustive search of the remaining \nrotamers. \n\nThe various inference algorithms were tested on set of 325 X-ray crystal structures \nwith resolution better than or equal to 2A, R factor below 20% and length up to 300 \nresidues. One representative structure was selected from each cluster of homologous \nstructures (50% homology or more) . Protein structures were acquired from Protein \nData Bank site (http://www.rcsb.org/pdb). \n\nMany proteins contain Cysteine residues which tend to form strong disulfide bonds \nwith each other. A standard technique in side-chain prediction (used e.g. \nin \nSCWRL) is to first search for possible disulfide bonds and if they exist to freeze \nthese residues in that configuration. This essentially reduces the search space. We \nrepeated our experiments with and without freezing the Cysteine residues. \n\nSide-chain to backbone interaction seems to be much severe than side-chain to side(cid:173)\nchain interaction -\nthe backbone is more rigid than side-chains and its structure \nassumed to be known. Therefore, the parameter R was introduced into the pairwise \npotential equation, as follows: \n\n\\[Io(ro ro) -\n-\n\n\", J \n\n(6) \nUsing R > 1 assigns an increased weight for side-chain to backbone interactions \nover side-chain to side-chain interactions. We repeated our experiments both with \nR = 1 and R > 1. It worth mentioning that SCWRL implicitly adopts a weighting \nassumption that assigns an increased weight to side-chain to backbone interactions. \n\n(e -,f-E(ri ,r;))* \n\n\"J \n\n4 Results \n\nIn our first set of experiments we wanted to compare approximate inference to \nexact inference. In order to make exact inference possible we restricted the possible \nrotamers of each residue. Out of the 81 possible states we chose a subset whose \nlocal probability accounted for 90% of the local probability. We constrained the size \nof the subset to be at least 2. The resulting graphical model retains only a small \nfraction of the loops occurring in the full graphical model (about 7% of the loops \nof size 3). However, it still contains many small loops, and in particular, dozens of \nloops of size 3. \n\nOn these graphs we found that ordinary max-product BP always found the global \nminimum of the energy function (except in few cases where it failed to converge). \n\n\f80 \n\n70 \n\n80 \nII! \n.!! 50 \na. \n~ <1l \n\"' 30 \n~ \n\n20 \n\n10 \n\n0 \n\n80 \n\n70 \neo \n\n.. .!! 50 \n\na. \n~ <1l \n\"' 30 \n~ \n\n20 \n\n10 \n\n0 \n\nI \n{;> \" \" \n\n\u2022 \n\n,,, 01> ~ {> \n\n.\" .\" <9 4>

\" \" \n\n80 \n\n70 \neo \n\n.. .!! 50 \n\na. \n~ <1l \n\"' 30 \n~ \n\n20 \n\n10 \n\n0 \n\nI \n\nI. \n\n{;> \" \",,, 01> ~ {>.\u00a7>..,\".\".\" <9 4> c \no 94 \n(,) \n\";J!. 92 \n\n90 \n\nnn \n\nSCWRL Sum, R=1 Sum, R>1 Max. R=1 Max. R>1 \n\nFigure 2: Sum-product BP (top-left), naive MF (top-right) and SCWRL (bottom(cid:173)\nleft) algorithms energies are always higher than or equal to max-product BP energy. \nConvergence rates for the various algorithms shown in bottom-right chart. \n\nSum-product BP failed to find sum-JT conformation in 1% of the graphs only. In \ncontrast the naive MF algorithm found the global minimum conformation for 38% \nof the proteins and on 17% of the runs only. The GBP algorithm gave the same \nresult as the ordinary BP but it converged more often (e.g. 99.6% and 98.9% for \nsum-product GBP and BP, respectively). \n\nIn the second set of experiments we used the full graphical models. Since exact \ninference is impossible we can only compare the relative energies found by the \ndifferent approximate inference algorithms. Results are shown in Figure 2. Note \nthat, when it converged, max-product BP always found a lower energy configuration \ncompared to the other algorithms. This finding agrees with the observation that the \nmax-product solution is a \"neighborhood optimum\" and therefore guaranteed to be \nbetter than all other assignments in a large region around it [13]. \n\nWe also tried decreasing T , the system \"temperature\", for sum-product (in the \nlimit of zero temperature it should approach max-product) . In 96% of the time, \nusing lower temperature (T = 0.3 instead of T = 1) indeed gave a lower energy \nconfiguration. Even at this reduced temperature, however, max-product always \nfound a lower energy configuration. \n\nAll algorithms converged in more than 90% of the cases. However, sum-product \nconverged more often than max-product (Figure 2, bottom-right) . Decreasing tem(cid:173)\nperature resulted in lower convergence rate for sum-product BP algorithm (e.g. \n95.7% compared to 98.15% in full size graphs using disulfide bonds). It should be \nmentioned that SCWRL failed to converge on a single protein in the data set. \n\nApplying the DEE algorithm to the side-chain prediction graphical models dramat(cid:173)\nically decreased the size of the conformational search space, though, in most cases, \nthe resulted space was still infeasible. Moreover, max-product BP was indifferent \n\n\f;::; 3 \n~ e::. \n\n.. \n.. \n\n~ 2 \nu \n:::J \nrn \n<1' 1 \n\n;::; 3 \n.. \n~ e::. \n~ 2 \nU \n:::J \nrn \n<1' 1 \n\n0 \n\nXl \n\nx2 \n\nx3 \n\nx4 \n\nXl \n\nX2 \n\nXl \n\nX4 \n\nSCWRL buried residues success rates \n\nXl \n\nX2 \n\n85.9% 62.2% 40.3% \n\nX3 \n\nX4 \n\n25.5% \n\nFigure 3: Inference results - success rate. SCWRL buried residues success rate \nsubtracted from sum-product BP (light gray), max-product BP (dark gray) and \nMF (black) rates when equally weighting side-chain to backbone and side-chain to \nside-chain clashes (left) and assigning increased weight for side-chain to backbone \nclashes (right). \n\nto that space reduction -\nconverged, found the same conformation. \n\nit failed to converge for the same models and, when \n\n4.1 Success rate \n\nIn comparing the performance of the algorithms, we have focused on the energy of \nthe found configuration since this is the quantity the algorithms seek to optimize. \nA more realistic performance measure is: how well do the algorithms predict the \nnative structure of the protein? \n\nThe dihedral angle Xi is deemed correct when it is within 40\u00b0 of the native (crystal) \nstructure and Xl to Xi-l are correct. Success rate is defined as the portion of \ncorrectly predicted dihedral angles. \n\nThe success rates of the conformations, inferred by both max- and sum-product \noutperformed SCWRL's (Figure 3). For buried residues (residues with relative \naccessibility lower than 30% [9]) both algorithms added 1 % to SCWRL's Xl success \nrate. Increasing the weight of side-chain to backbone interactions over side-chain \nto side-chain interactions resulted in better success rates (Figure 3, right). Freezing \nCysteine residues to allow the formation of disulfide bonds slightly increased the \nsuccess rate. \n\n5 Discussion \n\nRecent years have shown much progress in approximate inference. We believe that \nthe comparison of different approximate inference algorithms is best done in the \ncontext of a real-world problem. In this paper we have shown that for a real(cid:173)\nworld problem with many loops, the performance of belief propagation is excellent. \nIn problems where exact inference was possible max-product BP always found the \nglobal minimum of the energy function and in the full protein data set, max-product \nBP always found a lower energy configuration compared to the other algorithms \ntested. \n\n\fSCWRL is considered one of the leading algorithms for modeling side-chain confor(cid:173)\nmations. However, in the last couple of years several groups reported better results \ndue to more accurate energy function [7], better searching algorithm [8] , or extended \nrotamer library [14]. \n\nAs shown, by using inference algorithms we achieved low energy conformations, \ncompared to existing algorithms. However, this leads only to a modest increase in \nprediction accuracy. Using an energy function, which gives a better approximation \nto the \"true\" physical energy (and particularly, assigns lowest energy to the native \nstructure) should significantly improve the success rate. A promising direction for \nfuture research is to try and learn the energy function from examples. Inference \nalgorithms such as BP may play an important role in the learning procedure. \n\nReferences \n[1] R. Cowell. Introduction to inference in Bayesian networks. In Michael I. Jordan, \n\neditor, Learning in Graphical Models. Morgan Kauffmann , 1998. \n\n[2] Johan Desmet, Marc De Maeyer, Bart Hazes, and Ignace Lasters. The dead-end \nelmination theorem and its use in protein side-chain positioning. Nature, 356:539-\n542, 1992. \n\n[3] Roland L. Dunbrack, Jr. and Martin Kurplus. Back-bone dependent rotamer library \nfor proteins: Application to side-chain predicrtion. J. Mol. Biol, 230:543- 574, 1993. \nSee also http://www.fccc.edu/research/labs/dunbrack/scwrlj. \n\n[4] William T. Freeman and Egon C. Pasztor. Learning to estimate scenes from images. In \nM.S. Kearns, S.A. SoHa, and D.A. Cohn, editors, Adv. Neural Information Processing \nSystems 11. MIT Press, 1999. \n\n[5] Brendan J. Frey, Ralf Koetter, and Nemanja Petrovic. Very loopy belief propagation \nfor unwrapping phase images. In Adv. Neural Information Processing Systems 14. \nMIT Press, 200l. \n\n[6] Enoch S. Huang, Patrice Koehl, Michael Levitt, Rohit V. Pappu, and Jay W. Ponder. \nAccuracy of side-chain prediction upon near-native protein backbones generated by \nab initio folding methods. Proteins, 33(2):204- 217, 1998. \n\n[7] Shide Liang and Nick V. Grishin. Side-chain modeling with an optimized scoring \n\nfunction. Protein Sci, 11(2):322- 331, 2002. \n\n[8] Loren L. Looger and Homme W. HeHinga. Generalized dead-end elimination algo(cid:173)\n\nrithms make large-scale protein side-chain structure prediction tractable: implications \nfor protein design and structural genomics. J Mol Biol, 307(1) :429- 445, 200l. \n\n[9] Joaquim Mendes, Cludio M. Soare, and Maria Armnia Carrondo. mprovement of side(cid:173)\n\nchain modeling in proteins with the self-consistent mean field theory method based on \nan analysis of the factors influencing prediction. Biopolym ers, 50(2):111- 131, 1999. \n\n[10] Kevin Murphy. The bayes net toolbox for matlab. Computing Science and Statistics, \n\n33, 200l. \n\n[11] Kevin P. Murphy, Yair Weiss, and Micheal I. Jordan. Loopy belief propagation for \napproximate inference: an empirical study. In Proceedings of Uncertainty in AI, 1999. \n[12] Judea Pearl. Probabilistic R easoning in Intelligent Systems: Networks of Plausible \n\nInference. Morgan Kaufmann, 1988. \n\n[13] Yair Weiss and William T. Freeman. On the optimality of solutions of the max(cid:173)\n\nproduct belief propagation algorithm. IEEE Transactions on Information Th eory, \n47(2) :723- 735, 2000. \n\n[14] Zhexin Xiang and Barry Honig. Extending the accuracy limits of prediction for side(cid:173)\n\nchain conformations. J Mol Bioi, 311(2):421-430, 200l. \n\n[15] Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. Understanding belief \npropagation and its generalization. In G. Lakemayer and B. Nebel, editors, Exploring \nArtificial Intelligence in the New Millennium. Morgan Kauffmann, 2002. \n\n\f", "award": [], "sourceid": 2181, "authors": [{"given_name": "Chen", "family_name": "Yanover", "institution": null}, {"given_name": "Yair", "family_name": "Weiss", "institution": null}]}