NCSP-PLM: An ensemble learning framework for predicting non-classical secreted proteins based on protein language models and deep learning.

Liu, Taigang; Song, Chen; Wang, Chunhua

Liu, Taigang; Song, Chen; Wang, Chunhua.

Afiliação

Liu T; College of Information Technology, Shanghai Ocean University, Shanghai 201306, China.
Song C; College of Information Technology, Shanghai Ocean University, Shanghai 201306, China.
Wang C; College of Information Technology, Shanghai Ocean University, Shanghai 201306, China.

Math Biosci Eng ; 21(1): 1472-1488, 2024 Jan.

Article em En | MEDLINE | ID: mdl-38303473

ABSTRACT

ABSTRACT

Non-classical secreted proteins (NCSPs) refer to a group of proteins that are located in the extracellular environment despite the absence of signal peptides and motifs. They usually play different roles in intercellular communication. Therefore, the accurate prediction of NCSPs is a critical step to understanding in depth their associated secretion mechanisms. Since the experimental recognition of NCSPs is often costly and time-consuming, computational methods are desired. In this study, we proposed an ensemble learning framework, termed NCSP-PLM, for the identification of NCSPs by extracting feature embeddings from pre-trained protein language models (PLMs) as input to several fine-tuned deep learning models. First, we compared the performance of nine PLM embeddings by training three neural networks Multi-layer perceptron (MLP), attention mechanism and bidirectional long short-term memory network (BiLSTM) and selected the best network model for each PLM embedding. Then, four models were excluded due to their below-average accuracies, and the remaining five models were integrated to perform the prediction of NCSPs based on the weighted voting. Finally, the 5-fold cross validation and the independent test were conducted to evaluate the performance of NCSP-PLM on the benchmark datasets. Based on the same independent dataset, the sensitivity and specificity of NCSP-PLM were 91.18% and 97.06%, respectively. Particularly, the overall accuracy of our model achieved 94.12%, which was 7~16% higher than that of the existing state-of-the-art predictors. It indicated that NCSP-PLM could serve as a useful tool for the annotation of NCSPs.

Assuntos

Aprendizado Profundo; Redes Neurais de Computação; Proteínas; Idioma; Sensibilidade e Especificidade

Palavras-chave

deep learning; ensemble learning; imbalanced classification; non-classical secreted protein; protein language model

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Aprendizado Profundo Tipo de estudo: Diagnostic_studies / Prognostic_studies / Risk_factors_studies Idioma: En Revista: Math Biosci Eng / Mathematical biosciences and engineering (Online) Ano de publicação: 2024 Tipo de documento: Article País de afiliação: China País de publicação: Estados Unidos

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google