Gianluca Pollastri, Pierre Baldi, Alessandro Vullo, Paolo Frasconi
We develop and test new machine learning methods for the predic- tion of topological representations of protein structures in the form of coarse- or (cid:12)ne-grained contact or distance maps that are transla- tion and rotation invariant. The methods are based on generalized input-output hidden Markov models (GIOHMMs) and generalized recursive neural networks (GRNNs). The methods are used to pre- dict topology directly in the (cid:12)ne-grained case and, in the coarse- grained case, indirectly by (cid:12)rst learning how to score candidate graphs and then using the scoring function to search the space of possible con(cid:12)gurations. Computer simulations show that the pre- dictors achieve state-of-the-art performance.
Introduction: Protein Topology Prediction
Predicting the 3D structure of protein chains from the linear sequence of amino acids is a fundamental open problem in computational molecular biology . Any approach to the problem must deal with the basic fact that protein structures are translation and rotation invariant. To address this invariance, we have proposed a machine learning approach to protein structure prediction  based on the predic- tion of topological representations of proteins, in the form of contact or distance maps. The contact or distance map is a 2D representation of neighborhood rela- tionships consisting of an adjacency matrix at some distance cuto(cid:11) (typically in the range of 6 to 12 (cid:23)A), or a matrix of pairwise Euclidean distances. Fine-grained maps are derived at the amino acid or even atomic level. Coarse maps are obtained by looking at secondary structure elements, such as helices, and the distance between their centers of gravity or, as in the simulations below, the minimal distances be- tween their C(cid:11) atoms. Reasonable methods for reconstructing 3D coordinates from contact/distance maps have been developed in the NMR literature and elsewhere