Pesquisa | Portal Regional da BVS

LMPhosSite: A Deep Learning-Based Approach for General Protein Phosphorylation Site Prediction Using Embeddings from the Local Window Sequence and Pretrained Protein Language Model.

Pakhrin, Subash C; Pokharel, Suresh; Pratyush, Pawel; Chaudhari, Meenal; Ismail, Hamid D; Kc, Dukka B.

J Proteome Res ; 22(8): 2548-2557, 2023 08 04.

Artigo em Inglês | MEDLINE | ID: mdl-37459437

RESUMO

Phosphorylation is one of the most important post-translational modifications and plays a pivotal role in various cellular processes. Although there exist several computational tools to predict phosphorylation sites, existing tools have not yet harnessed the knowledge distilled by pretrained protein language models. Herein, we present a novel deep learning-based approach called LMPhosSite for the general phosphorylation site prediction that integrates embeddings from the local window sequence and the contextualized embedding obtained using global (overall) protein sequence from a pretrained protein language model to improve the prediction performance. Thus, the LMPhosSite consists of two base-models: one for capturing effective local representation and the other for capturing global per-residue contextualized embedding from a pretrained protein language model. The output of these base-models is integrated using a score-level fusion approach. LMPhosSite achieves a precision, recall, Matthew's correlation coefficient, and F1-score of 38.78%, 67.12%, 0.390, and 49.15%, for the combined serine and threonine independent test data set and 34.90%, 62.03%, 0.298, and 44.67%, respectively, for the tyrosine independent test data set, which is better than the compared approaches. These results demonstrate that LMPhosSite is a robust computational tool for the prediction of the general phosphorylation sites in proteins.

Assuntos

Aprendizado Profundo , Fosforilação , Proteínas/metabolismo , Processamento de Proteína Pós-Traducional , Sequência de Aminoácidos

LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model.

Pakhrin, Subash C; Pokharel, Suresh; Aoki-Kinoshita, Kiyoko F; Beck, Moriah R; Dam, Tarun K; Caragea, Doina; Kc, Dukka B.

Glycobiology ; 33(5): 411-422, 2023 06 03.

Artigo em Inglês | MEDLINE | ID: mdl-37067908

RESUMO

Protein N-linked glycosylation is an important post-translational mechanism in Homo sapiens, playing essential roles in many vital biological processes. It occurs at the N-X-[S/T] sequon in amino acid sequences, where X can be any amino acid except proline. However, not all N-X-[S/T] sequons are glycosylated; thus, the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In this regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem that has not been extensively addressed by the existing methods, especially in regard to the creation of negative sets and leveraging the distilled information from protein language models (pLMs). Here, we developed LMNglyPred, a deep learning-based approach, to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained pLM. LMNglyPred produces sensitivity, specificity, Matthews Correlation Coefficient, precision, and accuracy of 76.50, 75.36, 0.49, 60.99, and 75.74 percent, respectively, on a benchmark-independent test set. These results demonstrate that LMNglyPred is a robust computational tool to predict N-linked glycosylation sites confined to the N-X-[S/T] sequon.

Assuntos

Aminoácidos , Glicoproteínas , Humanos , Glicosilação , Glicoproteínas/metabolismo , Aminoácidos/química , Processamento de Proteína Pós-Traducional , Sequência de Aminoácidos

Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction.

Pakhrin, Subash C; Pokharel, Suresh; Saigo, Hiroto; Kc, Dukka B.

Methods Mol Biol ; 2499: 285-322, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35696087

RESUMO

Posttranslational modification (PTM ) is a ubiquitous phenomenon in both eukaryotes and prokaryotes which gives rise to enormous proteomic diversity. PTM mostly comes in two flavors: covalent modification to polypeptide chain and proteolytic cleavage. Understanding and characterization of PTM is a fundamental step toward understanding the underpinning of biology. Recent advances in experimental approaches, mainly mass-spectrometry-based approaches, have immensely helped in obtaining and characterizing PTMs. However, experimental approaches are not enough to understand and characterize more than 450 different types of PTMs and complementary computational approaches are becoming popular. Recently, due to the various advancements in the field of Deep Learning (DL), along with the explosion of applications of DL to various fields, the field of computational prediction of PTM has also witnessed the development of a plethora of deep learning (DL)-based approaches. In this book chapter, we first review some recent DL-based approaches in the field of PTM site prediction. In addition, we also review the recent advances in the not-so-studied PTM , that is, proteolytic cleavage predictions. We describe advances in PTM prediction by highlighting the Deep learning architecture, feature encoding, novelty of the approaches, and availability of the tools/approaches. Finally, we provide an outlook and possible future research directions for DL-based approaches for PTM prediction.

Assuntos

Aprendizado Profundo , Proteômica , Espectrometria de Massas , Processamento de Proteína Pós-Traducional , Proteínas/química

DeepNGlyPred: A Deep Neural Network-Based Approach for Human N-Linked Glycosylation Site Prediction.

Pakhrin, Subash C; Aoki-Kinoshita, Kiyoko F; Caragea, Doina; Kc, Dukka B.

Molecules ; 26(23)2021 Dec 02.

Artigo em Inglês | MEDLINE | ID: mdl-34885895

RESUMO

Protein N-linked glycosylation is a post-translational modification that plays an important role in a myriad of biological processes. Computational prediction approaches serve as complementary methods for the characterization of glycosylation sites. Most of the existing predictors for N-linked glycosylation utilize the information that the glycosylation site occurs at the N-X-[S/T] sequon, where X is any amino acid except proline. Not all N-X-[S/T] sequons are glycosylated, thus the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In that regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem. Here, we report DeepNGlyPred a deep learning-based approach that encodes the positive and negative sequences in the human proteome dataset (extracted from N-GlycositeAtlas) using sequence-based features (gapped-dipeptide), predicted structural features, and evolutionary information. DeepNGlyPred produces SN, SP, MCC, and ACC of 88.62%, 73.92%, 0.60, and 79.41%, respectively on N-GlyDE independent test set, which is better than the compared approaches. These results demonstrate that DeepNGlyPred is a robust computational technique to predict N-Linked glycosylation sites confined to N-X-[S/T] sequon. DeepNGlyPred will be a useful resource for the glycobiology community.

Assuntos

Proteoma/química , Aprendizado Profundo , Glicosilação , Humanos , Modelos Biológicos , Redes Neurais de Computação , Polissacarídeos/análise , Processamento de Proteína Pós-Traducional

Deep Learning-Based Advances in Protein Structure Prediction.

Pakhrin, Subash C; Shrestha, Bikash; Adhikari, Badri; Kc, Dukka B.

Int J Mol Sci ; 22(11)2021 May 24.

Artigo em Inglês | MEDLINE | ID: mdl-34074028

RESUMO

Obtaining an accurate description of protein structure is a fundamental step toward understanding the underpinning of biology. Although recent advances in experimental approaches have greatly enhanced our capabilities to experimentally determine protein structures, the gap between the number of protein sequences and known protein structures is ever increasing. Computational protein structure prediction is one of the ways to fill this gap. Recently, the protein structure prediction field has witnessed a lot of advances due to Deep Learning (DL)-based approaches as evidenced by the success of AlphaFold2 in the most recent Critical Assessment of protein Structure Prediction (CASP14). In this article, we highlight important milestones and progresses in the field of protein structure prediction due to DL-based methods as observed in CASP experiments. We describe advances in various steps of protein structure prediction pipeline viz. protein contact map prediction, protein distogram prediction, protein real-valued distance prediction, and Quality Assessment/refinement. We also highlight some end-to-end DL-based approaches for protein structure prediction approaches. Additionally, as there have been some recent DL-based advances in protein structure determination using Cryo-Electron (Cryo-EM) microscopy based, we also highlight some of the important progress in the field. Finally, we provide an outlook and possible future research directions for DL-based approaches in the protein structure prediction arena.

Assuntos

Biologia Computacional/métodos , Microscopia Crioeletrônica/métodos , Aprendizado Profundo , Proteínas/química , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Bases de Dados de Proteínas , Modelos Moleculares , Redes Neurais de Computação , Conformação Proteica , Software

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA