Pesquisa | Portal Regional da BVS

DeepLoc 2.1: multi-label membrane protein type prediction using protein language models.

Ødum, Marius Thrane; Teufel, Felix; Thumuluri, Vineet; Almagro Armenteros, José Juan; Johansen, Alexander Rosenberg; Winther, Ole; Nielsen, Henrik.

Nucleic Acids Res ; 52(W1): W215-W220, 2024 Jul 05.

Artigo em Inglês | MEDLINE | ID: mdl-38587188

RESUMO

DeepLoc 2.0 is a popular web server for the prediction of protein subcellular localization and sorting signals. Here, we introduce DeepLoc 2.1, which additionally classifies the input proteins into the membrane protein types Transmembrane, Peripheral, Lipid-anchored and Soluble. Leveraging pre-trained transformer-based protein language models, the server utilizes a three-stage architecture for sequence-based, multi-label predictions. Comparative evaluations with other established tools on a test set of 4933 eukaryotic protein sequences, constructed following stringent homology partitioning, demonstrate state-of-the-art performance. Notably, DeepLoc 2.1 outperforms existing models, with the larger ProtT5 model exhibiting a marginal advantage over the ESM-1B model. The web server is available at https://services.healthtech.dtu.dk/services/DeepLoc-2.1.

Assuntos

Proteínas de Membrana , Software , Proteínas de Membrana/química , Proteínas de Membrana/metabolismo , Internet , Sinais Direcionadores de Proteínas , Análise de Sequência de Proteína

DeepLoc 2.0: multi-label subcellular localization prediction using protein language models.

Thumuluri, Vineet; Almagro Armenteros, José Juan; Johansen, Alexander Rosenberg; Nielsen, Henrik; Winther, Ole.

Nucleic Acids Res ; 50(W1): W228-W234, 2022 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-35489069

RESUMO

The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.

Assuntos

Sinais Direcionadores de Proteínas , Proteínas , Humanos , Proteínas/metabolismo , Eucariotos/metabolismo , Transporte Proteico , Idioma , Bases de Dados de Proteínas , Biologia Computacional , Frações Subcelulares/metabolismo

NetSolP: predicting protein solubility in Escherichia coli using language models.

Thumuluri, Vineet; Martiny, Hannah-Marie; Almagro Armenteros, Jose J; Salomon, Jesper; Nielsen, Henrik; Johansen, Alexander Rosenberg.

Bioinformatics ; 38(4): 941-946, 2022 01 27.

Artigo em Inglês | MEDLINE | ID: mdl-35088833

RESUMO

MOTIVATION: Solubility and expression levels of proteins can be a limiting factor for large-scale studies and industrial production. By determining the solubility and expression directly from the protein sequence, the success rate of wet-lab experiments can be increased. RESULTS: In this study, we focus on predicting the solubility and usability for purification of proteins expressed in Escherichia coli directly from the sequence. Our model NetSolP is based on deep learning protein language models called transformers and we show that it achieves state-of-the-art performance and improves extrapolation across datasets. As we find current methods are built on biased datasets, we curate existing datasets by using strict sequence-identity partitioning and ensure that there is minimal bias in the sequences. AVAILABILITY AND IMPLEMENTATION: The predictor and data are available at https://services.healthtech.dtu.dk/service.php?NetSolP and the open-sourced code is available at https://github.com/tvinet/NetSolP-1.0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Escherichia coli , Idioma , Proteínas , Software , Solubilidade

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA