ABSTRACT
Protein structure prediction (PSP) is a crucial issue in Bioinformatics. PSP has its important use in many vital research areas that include drug discovery. One of the important intermediate steps in PSP is predicting a protein's beta-sheet structures. Because of non-local interactions among numerous irregular areas in beta-sheets, their highly accurate prediction is challenging. The challenge is compounded when a given protein's structure has a large number of beta-sheets. In this paper, we specifically refine the beta-sheets of a protein structure by using a local search method. Then, we use another local search method to refine the full structure. Our search methods analyse residue-residue distance-based scores and apply geometric restrictions gained from deep learning models. Moreover, our search methods recognise the regions of the current conformations prompting the nether scores and generate neighbouring conformations focusing on that identified regions and making alterations there. On a set of standard 88 proteins of various sizes between 46 and 450 residues, our method successfully outperforms state-of-the-art PSP search algorithms. The improvements are more than 12% in average root mean squared distance (RMSD), template modelling score (TM-score), and global distance test (GDT) values.
Subject(s)
Computational Biology , Proteins , Protein Conformation, beta-Strand , Proteins/chemistry , Computational Biology/methods , Algorithms , Protein ConformationABSTRACT
Protein contact maps capture coevolutionary interactions between amino acid residue pairs that are spatially within certain proximity threshold. Predicted contact maps are used in many protein related problems that include drug design, protein design, protein function prediction, and protein structure prediction. Contact map prediction has achieved significant progress lately but still further challenges remain with prediction of contacts between residues that are separated in the amino acid residue sequence by large numbers of other residues. In this paper, with experimental results on 5 standard benchmark datasets that include membrane proteins, we show that contact map prediction could be significantly enhanced by using ensembles of various state-of-the-art short distance predictors and then by converting predicted distances into contact probabilities. Our program along with its data is available from https://gitlab.com/mahnewton/ecp.
Subject(s)
Computational Biology , Proteins , Algorithms , Amino Acid Sequence , Amino Acids/chemistry , Computational Biology/methods , Proteins/chemistryABSTRACT
MOTIVATION: Protein backbone angle prediction has achieved significant accuracy improvement with the development of deep learning methods. Usually the same deep learning model is used in making prediction for all residues regardless of the categories of secondary structures they belong to. In this paper, we propose to train separate deep learning models for each category of secondary structures. Machine learning methods strive to achieve generality over the training examples and consequently loose accuracy. In this work, we explicitly exploit classification knowledge to restrict generalisation within the specific class of training examples. This is to compensate the loss of generalisation by exploiting specialisation knowledge in an informed way. RESULTS: The new method named SAP4SS obtains mean absolute error (MAE) values of 15.59, 18.87, 6.03, and 21.71 respectively for four types of backbone angles [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text]. Consequently, SAP4SS significantly outperforms existing state-of-the-art methods SAP, OPUS-TASS, and SPOT-1D: the differences in MAE for all four types of angles are from 1.5 to 4.1% compared to the best known results. AVAILABILITY: SAP4SS along with its data is available from https://gitlab.com/mahnewton/sap4ss .
Subject(s)
Neural Networks, Computer , Proteins , Machine Learning , Protein Structure, SecondaryABSTRACT
DNA-binding proteins often play important role in various processes within the cell. Over the last decade, a wide range of classification algorithms and feature extraction techniques have been used to solve this problem. In this paper, we propose a novel DNA-binding protein prediction method called HMMBinder. HMMBinder uses monogram and bigram features extracted from the HMM profiles of the protein sequences. To the best of our knowledge, this is the first application of HMM profile based features for the DNA-binding protein prediction problem. We applied Support Vector Machines (SVM) as a classification technique in HMMBinder. Our method was tested on standard benchmark datasets. We experimentally show that our method outperforms the state-of-the-art methods found in the literature.