NCSP-PLM: An ensemble learning framework for predicting non-classical secreted proteins based on protein language models and deep learning.
Math Biosci Eng
; 21(1): 1472-1488, 2024 Jan.
Article
em En
| MEDLINE
| ID: mdl-38303473
ABSTRACT
Non-classical secreted proteins (NCSPs) refer to a group of proteins that are located in the extracellular environment despite the absence of signal peptides and motifs. They usually play different roles in intercellular communication. Therefore, the accurate prediction of NCSPs is a critical step to understanding in depth their associated secretion mechanisms. Since the experimental recognition of NCSPs is often costly and time-consuming, computational methods are desired. In this study, we proposed an ensemble learning framework, termed NCSP-PLM, for the identification of NCSPs by extracting feature embeddings from pre-trained protein language models (PLMs) as input to several fine-tuned deep learning models. First, we compared the performance of nine PLM embeddings by training three neural networks Multi-layer perceptron (MLP), attention mechanism and bidirectional long short-term memory network (BiLSTM) and selected the best network model for each PLM embedding. Then, four models were excluded due to their below-average accuracies, and the remaining five models were integrated to perform the prediction of NCSPs based on the weighted voting. Finally, the 5-fold cross validation and the independent test were conducted to evaluate the performance of NCSP-PLM on the benchmark datasets. Based on the same independent dataset, the sensitivity and specificity of NCSP-PLM were 91.18% and 97.06%, respectively. Particularly, the overall accuracy of our model achieved 94.12%, which was 7~16% higher than that of the existing state-of-the-art predictors. It indicated that NCSP-PLM could serve as a useful tool for the annotation of NCSPs.
Palavras-chave
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Aprendizado Profundo
Tipo de estudo:
Diagnostic_studies
/
Prognostic_studies
/
Risk_factors_studies
Idioma:
En
Revista:
Math Biosci Eng
/
Mathematical biosciences and engineering (Online)
Ano de publicação:
2024
Tipo de documento:
Article
País de afiliação:
China
País de publicação:
Estados Unidos