Search | VHL Regional Portal

Residue Cluster Classes: A Unified Protein Representation for Efficient Structural and Functional Classification.

Fontove, Fernando; Del Rio, Gabriel.

Entropy (Basel) ; 22(4)2020 Apr 20.

Article in English | MEDLINE | ID: mdl-33286246

ABSTRACT

Proteins are characterized by their structures and functions, and these two fundamental aspects of proteins are assumed to be related. To model such a relationship, a single representation to model both protein structure and function would be convenient, yet so far, the most effective models for protein structure or function classification do not rely on the same protein representation. Here we provide a computationally efficient implementation for large datasets to calculate residue cluster classes (RCCs) from protein three-dimensional structures and show that such representations enable a random forest algorithm to effectively learn the structural and functional classifications of proteins, according to the CATH and Gene Ontology criteria, respectively. RCCs are derived from residue contact maps built from different distance criteria, and we show that 7 or 8 Å with or without amino acid side-chain atoms rendered the best classification models. The potential use of a unified representation of proteins is discussed and possible future areas for improvement and exploration are presented.

Protein-Protein Interactions Efficiently Modeled by Residue Cluster Classes.

Poot Velez, Albros Hermes; Fontove, Fernando; Del Rio, Gabriel.

Int J Mol Sci ; 21(13)2020 Jul 06.

Article in English | MEDLINE | ID: mdl-32640745

ABSTRACT

Predicting protein-protein interactions (PPI) represents an important challenge in structural bioinformatics. Current computational methods display different degrees of accuracy when predicting these interactions. Different factors were proposed to help improve these predictions, including choosing the proper descriptors of proteins to represent these interactions, among others. In the current work, we provide a representative protein structure that is amenable to PPI classification using machine learning approaches, referred to as residue cluster classes. Through sampling and optimization, we identified the best algorithm-parameter pair to classify PPI from more than 360 different training sets. We tested these classifiers against PPI datasets that were not included in the training set but shared sequence similarity with proteins in the training set to reproduce the situation of most proteins sharing sequence similarity with others. We identified a model with almost no PPI error (96-99% of correctly classified instances) and showed that residue cluster classes of protein pairs displayed a distinct pattern between positive and negative protein interactions. Our results indicated that residue cluster classes are structural features relevant to model PPI and provide a novel tool to mathematically model the protein structure/function relationship.

Subject(s)

Artificial Intelligence , Computational Biology/methods , Databases, Protein/statistics & numerical data , Machine Learning , Protein Interaction Mapping/methods , Proteins/chemistry , Algorithms , Cluster Analysis , Sequence Analysis, Protein/methods

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL