Pesquisa | Portal Regional da BVS

1.

MucLiPred: Multi-Level Contrastive Learning for Predicting Nucleic Acid Binding Residues of Proteins.

Zhang, Jiashuo; Wang, Ruheng; Wei, Leyi.

J Chem Inf Model ; 64(3): 1050-1065, 2024 Feb 12.

Artigo em Inglês | MEDLINE | ID: mdl-38301174

RESUMO

Protein-molecule interactions play a crucial role in various biological functions, with their accurate prediction being pivotal for drug discovery and design processes. Traditional methods for predicting protein-molecule interactions are limited. Some can only predict interactions with a specific molecule, restricting their applicability, while others target multiple molecule types but fail to efficiently process diverse interaction information, leading to complexity and inefficiency. This study presents a novel deep learning model, MucLiPred, equipped with a dual contrastive learning mechanism aimed at improving the prediction of multiple molecule-protein interactions and the identification of potential molecule-binding residues. The residue-level paradigm focuses on differentiating binding from non-binding residues, illuminating detailed local interactions. The type-level paradigm, meanwhile, analyzes overarching contexts of molecule types, like DNA or RNA, ensuring that representations of identical molecule types gravitate closer in the representational space, bolstering the model's proficiency in discerning interaction motifs. This dual approach enables comprehensive multi-molecule predictions, elucidating the relationships among different molecule types and strengthening precise protein-molecule interaction predictions. Empirical evidence demonstrates MucLiPred's superiority over existing models in robustness and prediction accuracy. The integration of dual contrastive learning techniques amplifies its capability to detect potential molecule-binding residues with precision. Further optimization, separating representational and classification tasks, has markedly improved its performance. MucLiPred thus represents a significant advancement in protein-molecule interaction prediction, setting a new precedent for future research in this field.

Assuntos

Ácidos Nucleicos , Proteínas , Proteínas/química

2.

NanoCon: contrastive learning-based deep hybrid network for nanopore methylation detection.

Yin, Chenglin; Wang, Ruheng; Qiao, Jianbo; Shi, Hua; Duan, Hongliang; Jiang, Xinbo; Teng, Saisai; Wei, Leyi.

Bioinformatics ; 40(2)2024 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-38305428

RESUMO

MOTIVATION: 5-Methylcytosine (5mC), a fundamental element of DNA methylation in eukaryotes, plays a vital role in gene expression regulation, embryonic development, and other biological processes. Although several computational methods have been proposed for detecting the base modifications in DNA like 5mC sites from Nanopore sequencing data, they face challenges including sensitivity to noise, and ignoring the imbalanced distribution of methylation sites in real-world scenarios. RESULTS: Here, we develop NanoCon, a deep hybrid network coupled with contrastive learning strategy to detect 5mC methylation sites from Nanopore reads. In particular, we adopted a contrastive learning module to alleviate the issues caused by imbalanced data distribution in nanopore sequencing, offering a more accurate and robust detection of 5mC sites. Evaluation results demonstrate that NanoCon outperforms existing methods, highlighting its potential as a valuable tool in genomic sequencing and methylation prediction. In addition, we also verified the effectiveness of our representation learning ability on two datasets by visualizing the dimension reduction of the features of methylation and nonmethylation sites from our NanoCon. Furthermore, cross-species and cross-5mC methylation motifs experiments indicated the robustness and the ability to perform transfer learning of our model. We hope this work can contribute to the community by providing a powerful and reliable solution for 5mC site detection in genomic studies. AVAILABILITY AND IMPLEMENTATION: The project code is available at https://github.com/Challis-yin/NanoCon.

Assuntos

Nanoporos , Metilação de DNA , Genômica , Genoma , DNA

3.

AttenSyn: An Attention-Based Deep Graph Neural Network for Anticancer Synergistic Drug Combination Prediction.

Wang, Tianshuo; Wang, Ruheng; Wei, Leyi.

J Chem Inf Model ; 64(7): 2854-2862, 2024 Apr 08.

Artigo em Inglês | MEDLINE | ID: mdl-37565997

RESUMO

Identifying synergistic drug combinations is fundamentally important to treat a variety of complex diseases while avoiding severe adverse drug-drug interactions. Although several computational methods have been proposed, they highly rely on handcrafted feature engineering and cannot learn better interactive information between drug pairs, easily resulting in relatively low performance. Recently, deep-learning methods, especially graph neural networks, have been widely developed in this area and demonstrated their ability to address complex biological problems. In this study, we proposed AttenSyn, an attention-based deep graph neural network for accurately predicting synergistic drug combinations. In particular, we adopted a graph neural network module to extract high-latent features based on the molecular graphs only and exploited the attention-based pooling module to learn interactive information between drug pairs to strengthen the representations of drug pairs. Comparative results on the benchmark datasets demonstrated that our AttenSyn performs better than the state-of-the-art methods in the prediction of anticancer synergistic drug combinations. Additionally, to provide good interpretability of our model, we explored and visualized some crucial substructures in drugs through attention mechanisms. Furthermore, we also verified the effectiveness of our proposed AttenSyn on two cell lines by visualizing the features of drug combinations learnt from our model, exhibiting satisfactory generalization ability.

Assuntos

Benchmarking , Aprendizagem , Linhagem Celular , Redes Neurais de Computação

4.

CACPP: A Contrastive Learning-Based Siamese Network to Identify Anticancer Peptides Based on Sequence Only.

Yang, Xuetong; Jin, Junru; Wang, Ruheng; Li, Zhongshen; Wang, Yu; Wei, Leyi.

J Chem Inf Model ; 64(7): 2807-2816, 2024 Apr 08.

Artigo em Inglês | MEDLINE | ID: mdl-37252890

RESUMO

Anticancer peptides (ACPs) recently have been receiving increasing attention in cancer therapy due to their low consumption, few adverse side effects, and easy accessibility. However, it remains a great challenge to identify anticancer peptides via experimental approaches, requiring expensive and time-consuming experimental studies. In addition, traditional machine-learning-based methods are proposed for ACP prediction mainly depending on hand-crafted feature engineering, which normally achieves low prediction performance. In this study, we propose CACPP (Contrastive ACP Predictor), a deep learning framework based on the convolutional neural network (CNN) and contrastive learning for accurately predicting anticancer peptides. In particular, we introduce the TextCNN model to extract the high-latent features based on the peptide sequences only and exploit the contrastive learning module to learn more distinguishable feature representations to make better predictions. Comparative results on the benchmark data sets indicate that CACPP outperforms all the state-of-the-art methods in the prediction of anticancer peptides. Moreover, to intuitively show that our model has good classification ability, we visualize the dimension reduction of the features from our model and explore the relationship between ACP sequences and anticancer functions. Furthermore, we also discuss the influence of data set construction on model prediction and explore our model performance on the data sets with verified negative samples.

Assuntos

Benchmarking , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Peptídeos/farmacologia

5.

Multi-CGAN: Deep Generative Model-Based Multiproperty Antimicrobial Peptide Design.

Yu, Haoqing; Wang, Ruheng; Qiao, Jianbo; Wei, Leyi.

J Chem Inf Model ; 64(1): 316-326, 2024 Jan 08.

Artigo em Inglês | MEDLINE | ID: mdl-38135439

RESUMO

Antimicrobial peptides are peptides that are effective against bacteria and viruses, and the discovery of new antimicrobial peptides is of great importance to human life and health. Although the design of antimicrobial peptides using machine learning methods has achieved good results in recent years, it remains a challenge to learn and design novel antimicrobial peptides with multiple properties of interest from peptide data with certain property labels. To this end, we propose Multi-CGAN, a deep generative model-based architecture that can learn from single-attribute peptide data and generate antimicrobial peptide sequences with multiple attributes that we need, which may have a potentially wide range of uses in drug discovery. In particular, we verified that our Multi-CGAN generated peptides with the desired properties have good performance in terms of generation rate. Moreover, a comprehensive statistical analysis demonstrated that our generated peptides are diverse and have a low probability of being homologous to the training data. Interestingly, we found that the performance of many popular deep learning methods on the antimicrobial peptide prediction task can be improved by using Multi-CGAN to expand the data on the training set of the original task, indicating the high quality of our generated peptides and the robust ability of our method. In addition, we also investigated whether it is possible to directionally generate peptide sequences with specified properties by controlling the input noise sampling for our model.

Assuntos

Peptídeos Antimicrobianos , Peptídeos , Humanos , Peptídeos/farmacologia , Peptídeos/química , Aprendizado de Máquina , Descoberta de Drogas

6.

ConPep: Prediction of peptide contact maps with pre-trained biological language model and multi-view feature extracting strategy.

Wei, Qingxin; Wang, Ruheng; Jiang, Yi; Wei, Leyi; Sun, Yu; Geng, Jie; Su, Ran.

Comput Biol Med ; 167: 107631, 2023 12.

Artigo em Inglês | MEDLINE | ID: mdl-37948966

RESUMO

The accurate prediction of peptide contact maps remains a challenging task due to the difficulty in obtaining the interactive information between residues on short sequences. To address this challenge, we propose ConPep, a deep learning framework designed for predicting the contact map of peptides based on sequences only. To sufficiently incorporate the sequential semantic information between residues in peptide sequences, we use a pre-trained biological language model and transfer prior knowledge from large scale databases. Additionally, to extract and integrate sequential local information and residue-based global correlations, our model incorporates Bidirectional Gated Recurrent Unit and attention mechanisms. They can obtain multi-view features and thus enhance the accuracy and robustness of our prediction. Comparative results on independent tests demonstrate that our proposed method significantly outperforms state-of-the-art methods even with short peptides. Notably, our method exhibits superior performance at the sequence level, suggesting the robust ability of our model compared with the multiple sequence alignment (MSA) analysis-based methods. We expect it can be meaningful research for facilitating the wide use of our method.

Assuntos

Algoritmos , Proteínas , Proteínas/química , Biologia Computacional/métodos , Peptídeos , Idioma , Bases de Dados de Proteínas

7.

MVIL6: Accurate identification of IL-6-induced peptides using multi-view feature learning.

Wang, Ruheng; Feng, Yangfan; Sun, Meili; Jiang, Yi; Li, Zhongshen; Cui, Lizhen; Wei, Leyi.

Int J Biol Macromol ; 246: 125412, 2023 Aug 15.

Artigo em Inglês | MEDLINE | ID: mdl-37327922

RESUMO

Interleukin-6 (IL-6) is a potential therapeutic target for many diseases, and it is of great significance in accurately predicting IL-6-induced peptides for IL-6 research. However, the cost of traditional wet experiments to detect IL-6-induced peptides is huge, and the discovery and design of peptides by computer before the experimental stage have become a promising technology. In this study, we developed a deep learning model called MVIL6 for predicting IL-6-inducing peptides. Comparative results demonstrated the outstanding performance and robustness of MVIL6. Specifically, we employ a pre-trained protein language model MG-BERT and the Transformer model to process two different sequence-based descriptors and integrate them with a fusion module to improve the prediction performance. The ablation experiment demonstrated the effectiveness of our fusion strategy for the two models. In addition, to provide good interpretability of our model, we explored and visualized the amino acids considered important for IL-6-induced peptide prediction by our model. Finally, a case study presented using MVIL6 to predict IL-6-induced peptides in the SARS-CoV-2 spike protein shows that MVIL6 achieves higher performance than existing methods and can be useful for identifying potential IL-6-induced peptides in viral proteins.

Assuntos

COVID-19 , Interleucina-6 , Humanos , SARS-CoV-2 , Peptídeos/farmacologia

8.

Explainable Deep Hypergraph Learning Modeling the Peptide Secondary Structure Prediction.

Jiang, Yi; Wang, Ruheng; Feng, Jiuxin; Jin, Junru; Liang, Sirui; Li, Zhongshen; Yu, Yingying; Ma, Anjun; Su, Ran; Zou, Quan; Ma, Qin; Wei, Leyi.

Adv Sci (Weinh) ; 10(11): e2206151, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-36794291

RESUMO

Accurately predicting peptide secondary structures remains a challenging task due to the lack of discriminative information in short peptides. In this study, PHAT is proposed, a deep hypergraph learning framework for the prediction of peptide secondary structures and the exploration of downstream tasks. The framework includes a novel interpretable deep hypergraph multi-head attention network that uses residue-based reasoning for structure prediction. The algorithm can incorporate sequential semantic information from large-scale biological corpus and structural semantic information from multi-scale structural segmentation, leading to better accuracy and interpretability even with extremely short peptides. The interpretable models are able to highlight the reasoning of structural feature representations and the classification of secondary substructures. The importance of secondary structures in peptide tertiary structure reconstruction and downstream functional analysis is further demonstrated, highlighting the versatility of our models. To facilitate the use of the model, an online server is established which is accessible via http://inner.wei-group.net/PHAT/. The work is expected to assist in the design of functional peptides and contribute to the advancement of structural biology research.

Assuntos

Algoritmos , Peptídeos , Estrutura Secundária de Proteína , Peptídeos/química

9.

DeepBIO: an automated and interpretable deep-learning platform for high-throughput biological sequence prediction, functional annotation and visualization analysis.

Wang, Ruheng; Jiang, Yi; Jin, Junru; Yin, Chenglin; Yu, Haoqing; Wang, Fengsheng; Feng, Jiuxin; Su, Ran; Nakai, Kenta; Zou, Quan; Wei, Leyi.

Nucleic Acids Res ; 51(7): 3017-3029, 2023 04 24.

Artigo em Inglês | MEDLINE | ID: mdl-36796796

RESUMO

Here, we present DeepBIO, the first-of-its-kind automated and interpretable deep-learning platform for high-throughput biological sequence functional analysis. DeepBIO is a one-stop-shop web service that enables researchers to develop new deep-learning architectures to answer any biological question. Specifically, given any biological sequence data, DeepBIO supports a total of 42 state-of-the-art deep-learning algorithms for model training, comparison, optimization and evaluation in a fully automated pipeline. DeepBIO provides a comprehensive result visualization analysis for predictive models covering several aspects, such as model interpretability, feature analysis and functional sequential region discovery. Additionally, DeepBIO supports nine base-level functional annotation tasks using deep-learning architectures, with comprehensive interpretations and graphical visualizations to validate the reliability of annotated sites. Empowered by high-performance computers, DeepBIO allows ultra-fast prediction with up to million-scale sequence data in a few hours, demonstrating its usability in real application scenarios. Case study results show that DeepBIO provides an accurate, robust and interpretable prediction, demonstrating the power of deep learning in biological sequence functional analysis. Overall, we expect DeepBIO to ensure the reproducibility of deep-learning biological sequence analysis, lessen the programming and hardware burden for biologists and provide meaningful functional insights at both the sequence level and base level from biological sequences alone. DeepBIO is publicly available at https://inner.wei-group.net/DeepBIO.

The development of next-generation sequencing techniques has led to an exponential increase in the amount of biological sequence data accessible. It naturally poses a fundamental challengehow to build the relationships from such large-scale sequences to their functions. In this work, we present DeepBIO, the first-of-its-kind automated and interpretable deep-learning platform for high-throughput biological sequence functional analysis. It enables researchers to develop new deep-learning architectures to answer any biological question in a fully automated pipeline. We expect DeepBIO to ensure the reproducibility of deep-learning-based biological sequence analysis, lessen the programming and hardware burden for biologists and provide meaningful functional insights at both the sequence level and base level from biological sequences alone.

Assuntos

Aprendizado Profundo , Reprodutibilidade dos Testes , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala

10.

iDNA-ABF: multi-scale deep biological language learning model for the interpretable prediction of DNA methylations.

Jin, Junru; Yu, Yingying; Wang, Ruheng; Zeng, Xin; Pang, Chao; Jiang, Yi; Li, Zhongshen; Dai, Yutong; Su, Ran; Zou, Quan; Nakai, Kenta; Wei, Leyi.

Genome Biol ; 23(1): 219, 2022 10 17.

Artigo em Inglês | MEDLINE | ID: mdl-36253864

RESUMO

In this study, we propose iDNA-ABF, a multi-scale deep biological language learning model that enables the interpretable prediction of DNA methylations based on genomic sequences only. Benchmarking comparisons show that our iDNA-ABF outperforms state-of-the-art methods for different methylation predictions. Importantly, we show the power of deep language learning in capturing both sequential and functional semantics information from background genomes. Moreover, by integrating the interpretable analysis mechanism, we well explain what the model learns, helping us build the mapping from the discovery of important sequential determinants to the in-depth analysis of their biological functions.

Assuntos

Metilação de DNA , Idioma , Genômica , Modelos Biológicos

11.

Predicting protein-peptide binding residues via interpretable deep learning.

Wang, Ruheng; Jin, Junru; Zou, Quan; Nakai, Kenta; Wei, Leyi.

Bioinformatics ; 38(13): 3351-3360, 2022 06 27.

Artigo em Inglês | MEDLINE | ID: mdl-35604077

RESUMO

SUMMARY: Identifying the protein-peptide binding residues is fundamentally important to understand the mechanisms of protein functions and explore drug discovery. Although several computational methods have been developed, most of them highly rely on third-party tools or complex data preprocessing for feature design, easily resulting in low computational efficacy and suffering from low predictive performance. To address the limitations, we propose PepBCL, a novel BERT (Bidirectional Encoder Representation from Transformers) -based contrastive learning framework to predict the protein-peptide binding residues based on protein sequences only. PepBCL is an end-to-end predictive model that is independent of feature engineering. Specifically, we introduce a well pre-trained protein language model that can automatically extract and learn high-latent representations of protein sequences relevant for protein structures and functions. Further, we design a novel contrastive learning module to optimize the feature representations of binding residues underlying the imbalanced dataset. We demonstrate that our proposed method significantly outperforms the state-of-the-art methods under benchmarking comparison, and achieves more robust performance. Moreover, we found that we further improve the performance via the integration of traditional features and our learnt features. Interestingly, the interpretable analysis of our model highlights the flexibility and adaptability of deep learning-based protein language model to capture both conserved and non-conserved sequential characteristics of peptide-binding residues. Finally, to facilitate the use of our method, we establish an online predictive platform as the implementation of the proposed PepBCL, which is now available at http://server.wei-group.net/PepBCL/. AVAILABILITY AND IMPLEMENTATION: https://github.com/Ruheng-W/PepBCL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Aprendizado Profundo , Proteínas/química , Peptídeos , Ligação Proteica , Sequência de Aminoácidos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA