Search | VHL Regional Portal

Amino Acid Encoding Methods for Protein Sequences: A Comprehensive Review and Assessment.

Jing, Xiaoyang; Dong, Qiwen; Hong, Daocheng; Lu, Ruqian.

IEEE/ACM Trans Comput Biol Bioinform ; 17(6): 1918-1931, 2020.

Article in English | MEDLINE | ID: mdl-30998480

ABSTRACT

As the first step of machine-learning based protein structure and function prediction, the amino acid encoding play a fundamental role in the final success of those methods. Different from the protein sequence encoding, the amino acid encoding can be used in both residue-level and sequence-level prediction of protein properties by combining them with different algorithms. However, it has not attracted enough attention in the past decades, and there are no comprehensive reviews and assessments about encoding methods so far. In this article, we make a systematic classification and propose a comprehensive review and assessment for various amino acid encoding methods. Those methods are grouped into five categories according to their information sources and information extraction methodologies, including binary encoding, physicochemical properties encoding, evolution-based encoding, structure-based encoding, and machine-learning encoding. Then, 16 representative methods from five categories are selected and compared on protein secondary structure prediction and protein fold recognition tasks by using large-scale benchmark datasets. The results show that the evolution-based position-dependent encoding method PSSM achieved the best performance, and the structure-based and machine-learning encoding methods also show some potential for further application, the neural network based distributed representation of amino acids in particular may bring new light to this area. We hope that the review and assessment are useful for future studies in amino acid encoding.

Subject(s)

Amino Acid Sequence/genetics , Amino Acids/chemistry , Computational Biology/methods , Proteins , Sequence Analysis, Protein/methods , Algorithms , Protein Folding , Protein Structure, Secondary/genetics , Proteins/chemistry , Proteins/genetics , Proteins/physiology

RRCRank: a fusion method using rank strategy for residue-residue contact prediction.

Jing, Xiaoyang; Dong, Qiwen; Lu, Ruqian.

BMC Bioinformatics ; 18(1): 390, 2017 Sep 02.

Article in English | MEDLINE | ID: mdl-28865433

ABSTRACT

BACKGROUND: In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair. RESULTS: First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics. CONCLUSIONS: The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment.

Subject(s)

Algorithms , Proteins/chemistry , Machine Learning

Sorting protein decoys by machine-learning-to-rank.

Jing, Xiaoyang; Wang, Kai; Lu, Ruqian; Dong, Qiwen.

Sci Rep ; 6: 31571, 2016 08 17.

Article in English | MEDLINE | ID: mdl-27530967

ABSTRACT

Much progress has been made in Protein structure prediction during the last few decades. As the predicted models can span a broad range of accuracy spectrum, the accuracy of quality estimation becomes one of the key elements of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, and these methods could be roughly divided into three categories: the single-model methods, clustering-based methods and quasi single-model methods. In this study, we develop a single-model method MQAPRank based on the learning-to-rank algorithm firstly, and then implement a quasi single-model method Quasi-MQAPRank. The proposed methods are benchmarked on the 3DRobot and CASP11 dataset. The five-fold cross-validation on the 3DRobot dataset shows the proposed single model method outperforms other methods whose outputs are taken as features of the proposed method, and the quasi single-model method can further enhance the performance. On the CASP11 dataset, the proposed methods also perform well compared with other leading methods in corresponding categories. In particular, the Quasi-MQAPRank method achieves a considerable performance on the CASP11 Best150 dataset.

Subject(s)

Machine Learning , Protein Transport , Proteins/chemistry , Algorithms , Models, Molecular

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL