Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Methods ; 231: 61-69, 2024 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-39293728

RESUMO

Arabidopsis thaliana synthesizes various medicinal compounds, and serves as a model plant for medicinal plant research. Single-cell transcriptomics technologies are essential for understanding the developmental trajectory of plant roots, facilitating the analysis of synthesis and accumulation patterns of medicinal compounds in different cell subpopulations. Although methods for interpreting single-cell transcriptomics data are rapidly advancing in Arabidopsis, challenges remain in precisely annotating cell identity due to the lack of marker genes for certain cell types. In this work, we trained a machine learning system, AtML, using sequencing datasets from six cell subpopulations, comprising a total of 6000 cells, to predict Arabidopsis root cell stages and identify biomarkers through complete model interpretability. Performance testing using an external dataset revealed that AtML achieved 96.50% accuracy and 96.51% recall. Through the interpretability provided by AtML, our model identified 160 important marker genes, contributing to the understanding of cell type annotations. In conclusion, we trained AtML to efficiently identify Arabidopsis root cell stages, providing a new tool for elucidating the mechanisms of medicinal compound accumulation in Arabidopsis roots.

2.
PLoS Comput Biol ; 20(8): e1012400, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39213450

RESUMO

The identification of cancer driver genes (CDGs) poses challenges due to the intricate interdependencies among genes and the influence of measurement errors and noise. We propose a novel energy-constrained diffusion (ECD)-based model for identifying CDGs, termed ECD-CDGI. This model is the first to design an ECD-Attention encoder by combining the ECD technique with an attention mechanism. ECD-Attention encoder excels at generating robust gene representations that reveal the complex interdependencies among genes while reducing the impact of data noise. We concatenate topological embedding extracted from gene-gene networks through graph transformers to these gene representations. We conduct extensive experiments across three testing scenarios. Extensive experiments show that the ECD-CDGI model possesses the ability to not only be proficient in identifying known CDGs but also efficiently uncover unknown potential CDGs. Furthermore, compared to the GNN-based approach, the ECD-CDGI model exhibits fewer constraints by existing gene-gene networks, thereby enhancing its capability to identify CDGs. Additionally, ECD-CDGI is open-source and freely available. We have also launched the model as a complimentary online tool specifically crafted to expedite research efforts focused on CDGs identification.


Assuntos
Biologia Computacional , Redes Reguladoras de Genes , Neoplasias , Humanos , Biologia Computacional/métodos , Redes Reguladoras de Genes/genética , Neoplasias/genética , Modelos Genéticos , Algoritmos , Oncogenes/genética , Genes Neoplásicos/genética , Bases de Dados Genéticas
3.
Chem Sci ; 2024 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-39170720

RESUMO

The identification of targets for candidate molecules is a pivotal stride in the drug development journey, encompassing lead discovery, drug repurposing, and the scrutiny of potential off-target or side effects. Consequently, enhancing the precision of target prediction has significant implications. Moreover, current target prediction methods primarily rely on the principle of ligand-based chemical similarity, lacking the capture of novel compound-target relationships based on ligand high-level characterization similarity. Therefore, in this context, we introduce a pioneering algorithm known as the Fused Multiple Biological Signatures (FMBS) strategy. This approach leverages a Bayesian framework to amalgamate 25 predictable biological space characterizations of molecules to predict novel targets through scaffold hopping, thereby improving target prediction accuracy and providing a versatile tool for a wide range of small-molecule target prediction. When juxtaposed with alternative target prediction methods, FMBS showcases notable efficacy, outperforming traditional descriptors. Through an analysis of scaffold hopping cases, we elucidate how FMBS attains heightened accuracy by assimilating comprehensive and complementary high-dimensional signatures, thereby underscoring its potential in unearthing novel compound-target relationships. The findings underscore that our approach adeptly pinpoints promising candidate targets, thereby expediting drug mechanism exploration through the integration of multiple high-level characterizations.

4.
Bioinformatics ; 40(7)2024 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-38967119

RESUMO

MOTIVATION: Accurate prediction of acute dermal toxicity (ADT) is essential for the safe and effective development of contact drugs. Currently, graph neural networks, a form of deep learning technology, accurately model the structure of compound molecules, enhancing predictions of their ADT. However, many existing methods emphasize atom-level information transfer and overlook crucial data conveyed by molecular bonds and their interrelationships. Additionally, these methods often generate "equal" node representations across the entire graph, failing to accentuate "important" substructures like functional groups, pharmacophores, and toxicophores, thereby reducing interpretability. RESULTS: We introduce a novel model, GraphADT, utilizing structure remapping and multi-view graph pooling (MVPool) technologies to accurately predict compound ADT. Initially, our model applies structure remapping to better delineate bonds, transforming "bonds" into new nodes and "bond-atom-bond" interactions into new edges, thereby reconstructing the compound molecular graph. Subsequently, we use MVPool to amalgamate data from various perspectives, minimizing biases inherent to single-view analyses. Following this, the model generates a robust node ranking collaboratively, emphasizing critical nodes or substructures to enhance model interpretability. Lastly, we apply a graph comparison learning strategy to train both the original and structure remapped molecular graphs, deriving the final molecular representation. Experimental results on public datasets indicate that the GraphADT model outperforms existing state-of-the-art models. The GraphADT model has been demonstrated to effectively predict compound ADT, offering potential guidance for the development of contact drugs and related treatments. AVAILABILITY AND IMPLEMENTATION: Our code and data are accessible at: https://github.com/mxqmxqmxq/GraphADT.git.


Assuntos
Pele , Pele/efeitos dos fármacos , Humanos , Aprendizado Profundo , Redes Neurais de Computação
5.
Int J Biol Macromol ; 276(Pt 2): 133825, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39002900

RESUMO

Predicting compound-induced inhibition of cardiac ion channels is crucial and challenging, significantly impacting cardiac drug efficacy and safety assessments. Despite the development of various computational methods for compound-induced inhibition prediction in cardiac ion channels, their performance remains limited. Most methods struggle to fuse multi-source data, relying solely on specific dataset training, leading to poor accuracy and generalization. We introduce MultiCBlo, a model that fuses multimodal information through a progressive learning approach, designed to predict compound-induced inhibition of cardiac ion channels with high accuracy. MultiCBlo employs progressive multimodal information fusion technology to integrate the compound's SMILES sequence, graph structure, and fingerprint, enhancing its representation. This is the first application of progressive multimodal learning for predicting compound-induced inhibition of cardiac ion channels, to our knowledge. The objective of this study was to predict the compound-induced inhibition of three major cardiac ion channels: hERG, Cav1.2, and Nav1.5. The results indicate that MultiCBlo significantly outperforms current models in predicting compound-induced inhibition of cardiac ion channels. We hope that MultiCBlo will facilitate cardiac drug development and reduce compound toxicity risks. Code and data are accessible at: https://github.com/taowang11/MultiCBlo. The online prediction platform is freely accessible at: https://huggingface.co/spaces/wtttt/PCICB.


Assuntos
Canais Iônicos , Humanos , Canais Iônicos/metabolismo , Canais Iônicos/antagonistas & inibidores , Canal de Sódio Disparado por Voltagem NAV1.5/metabolismo , Canais de Cálcio Tipo L/metabolismo , Canais de Cálcio Tipo L/química , Aprendizado de Máquina , Canal de Potássio ERG1/metabolismo , Canal de Potássio ERG1/antagonistas & inibidores
6.
J Phys Chem Lett ; 15(30): 7681-7693, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39038219

RESUMO

Accurate prediction of Drug-Target Interactions (DTI) is crucial for drug development. Current state-of-the-art deep learning methods have significantly advanced the field; however, these methods exhibit limitations in predictive performance and the propensity for false negatives. Therefore, we propose EADTN, a simple and efficient ensemble model. We have designed an innovative feature adaptation technique to automatically extract local weights of drugs and targets, and we utilize clustering-enhanced parameter fine-tuning to overcome the issue of false negatives, thereby enhancing its reliability in drug discovery. Based on EADTN, we also propose a Shapley value-based method for identifying key drug substructures, effectively enhancing the model's interpretability. Additionally, we utilized EADTN to reveal potential interactions between NQO1 targets and the drugs SIRT-IN-1 and LY2183240, which were subsequently validated through wet-lab experiments. Experimental evidence demonstrates that EADTN consistently outperforms existing best-performing models across various data sets, promising significant benefits in fields such as drug repositioning.


Assuntos
Aprendizado Profundo , NAD(P)H Desidrogenase (Quinona)/metabolismo , Sirtuína 1/metabolismo , Descoberta de Drogas , Humanos
7.
Comput Biol Med ; 176: 108543, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38744015

RESUMO

Proteins play a vital role in various biological processes and achieve their functions through protein-protein interactions (PPIs). Thus, accurate identification of PPI sites is essential. Traditional biological methods for identifying PPIs are costly, labor-intensive, and time-consuming. The development of computational prediction methods for PPI sites offers promising alternatives. Most known deep learning (DL) methods employ layer-wise multi-scale CNNs to extract features from protein sequences. But, these methods usually neglect the spatial positions and hierarchical information embedded within protein sequences, which are actually crucial for PPI site prediction. In this paper, we propose MR2CPPIS, a novel sequence-based DL model that utilizes the multi-scale Res2Net with coordinate attention mechanism to exploit multi-scale features and enhance PPI site prediction capability. We leverage the multi-scale Res2Net to expand the receptive field for each network layer, thus capturing multi-scale information of protein sequences at a granular level. To further explore the local contextual features of each target residue, we employ a coordinate attention block to characterize the precise spatial position information, enabling the network to effectively extract long-range dependencies. We evaluate our MR2CPPIS on three public benchmark datasets (Dset 72, Dset 186, and PDBset 164), achieving state-of-the-art performance. The source codes are available at https://github.com/YyinGong/MR2CPPIS.


Assuntos
Aprendizado Profundo , Proteínas/metabolismo , Proteínas/química , Mapeamento de Interação de Proteínas/métodos , Biologia Computacional/métodos , Humanos , Bases de Dados de Proteínas
8.
Mol Ther Nucleic Acids ; 35(2): 102187, 2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38706631

RESUMO

Long non-coding RNAs (lncRNAs) are important factors involved in biological regulatory networks. Accurately predicting lncRNA-protein interactions (LPIs) is vital for clarifying lncRNA's functions and pathogenic mechanisms. Existing deep learning models have yet to yield satisfactory results in LPI prediction. Recently, graph autoencoders (GAEs) have seen rapid development, excelling in tasks like link prediction and node classification. We employed GAE technology for LPI prediction, devising the FMSRT-LPI model based on path masking and degree regression strategies and thereby achieving satisfactory outcomes. This represents the first known integration of path masking and degree regression strategies into the GAE framework for potential LPI inference. The effectiveness of our FMSRT-LPI model primarily relies on four key aspects. First, within the GAE framework, our model integrates multi-source relationships of lncRNAs and proteins with LPN's topological data. Second, the implemented masking strategy efficiently identifies LPN's key paths, reconstructs the network, and reduces the impact of redundant or incorrect data. Third, the integrated degree decoder balances degree and structural information, enhancing node representation. Fourth, the PolyLoss function we introduced is more appropriate for LPI prediction tasks. The results on multiple public datasets further demonstrate our model's potential in LPI prediction.

9.
Bioinformatics ; 40(5)2024 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-38648052

RESUMO

MOTIVATION: Accurate inference of potential drug-protein interactions (DPIs) aids in understanding drug mechanisms and developing novel treatments. Existing deep learning models, however, struggle with accurate node representation in DPI prediction, limiting their performance. RESULTS: We propose a new computational framework that integrates global and local features of nodes in the drug-protein bipartite graph for efficient DPI inference. Initially, we employ pre-trained models to acquire fundamental knowledge of drugs and proteins and to determine their initial features. Subsequently, the MinHash and HyperLogLog algorithms are utilized to estimate the similarity and set cardinality between drug and protein subgraphs, serving as their local features. Then, an energy-constrained diffusion mechanism is integrated into the transformer architecture, capturing interdependencies between nodes in the drug-protein bipartite graph and extracting their global features. Finally, we fuse the local and global features of nodes and employ multilayer perceptrons to predict the likelihood of potential DPIs. A comprehensive and precise node representation guarantees efficient prediction of unknown DPIs by the model. Various experiments validate the accuracy and reliability of our model, with molecular docking results revealing its capability to identify potential DPIs not present in existing databases. This approach is expected to offer valuable insights for furthering drug repurposing and personalized medicine research. AVAILABILITY AND IMPLEMENTATION: Our code and data are accessible at: https://github.com/ZZCrazy00/DPI.


Assuntos
Algoritmos , Simulação de Acoplamento Molecular , Proteínas , Proteínas/química , Proteínas/metabolismo , Preparações Farmacêuticas/química , Preparações Farmacêuticas/metabolismo , Biologia Computacional/métodos , Aprendizado Profundo
10.
Comput Biol Med ; 174: 108484, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38643595

RESUMO

Accurately identifying cancer driver genes (CDGs) is crucial for guiding cancer treatment and has recently received great attention from researchers. However, the high complexity and heterogeneity of cancer gene regulatory networks limit the precition accuracy of existing deep learning models. To address this, we introduce a model called SCIS-CDG that utilizes Schur complement graph augmentation and independent subspace feature extraction techniques to effectively predict potential CDGs. Firstly, a random Schur complement strategy is adopted to generate two augmented views of gene network within a graph contrastive learning framework. Rapid randomization of the random Schur complement strategy enhances the model's generalization and its ability to handle complex networks effectively. Upholding the Schur complement principle in expectations promotes the preservation of the original gene network's vital structure in the augmented views. Subsequently, we employ feature extraction technology using multiple independent subspaces, each trained with independent weights to reduce inter-subspace dependence and improve the model's expressiveness. Concurrently, we introduced a feature expansion component based on the structure of the gene network to address issues arising from the limited dimensionality of node features. Moreover, it can alleviate the challenges posed by the heterogeneity of cancer gene networks to some extent. Finally, we integrate a learnable attention weight mechanism into the graph neural network (GNN) encoder, utilizing feature expansion technology to optimize the significance of various feature levels in the prediction task. Following extensive experimental validation, the SCIS-CDG model has exhibited high efficiency in identifying known CDGs and uncovering potential unknown CDGs in external datasets. Particularly when compared to previous conventional GNN models, its performance has seen significant improved. The code and data are publicly available at: https://github.com/mxqmxqmxq/SCIS-CDG.


Assuntos
Redes Reguladoras de Genes , Neoplasias , Humanos , Neoplasias/genética , Biologia Computacional/métodos , Aprendizado Profundo , Algoritmos
11.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38446739

RESUMO

Antimicrobial peptides (AMPs), short peptides with diverse functions, effectively target and combat various organisms. The widespread misuse of chemical antibiotics has led to increasing microbial resistance. Due to their low drug resistance and toxicity, AMPs are considered promising substitutes for traditional antibiotics. While existing deep learning technology enhances AMP generation, it also presents certain challenges. Firstly, AMP generation overlooks the complex interdependencies among amino acids. Secondly, current models fail to integrate crucial tasks like screening, attribute prediction and iterative optimization. Consequently, we develop a integrated deep learning framework, Diff-AMP, that automates AMP generation, identification, attribute prediction and iterative optimization. We innovatively integrate kinetic diffusion and attention mechanisms into the reinforcement learning framework for efficient AMP generation. Additionally, our prediction module incorporates pre-training and transfer learning strategies for precise AMP identification and screening. We employ a convolutional neural network for multi-attribute prediction and a reinforcement learning-based iterative optimization strategy to produce diverse AMPs. This framework automates molecule generation, screening, attribute prediction and optimization, thereby advancing AMP research. We have also deployed Diff-AMP on a web server, with code, data and server details available in the Data Availability section.


Assuntos
Aminoácidos , Peptídeos Antimicrobianos , Antibacterianos , Difusão , Cinética
12.
Molecules ; 29(6)2024 Mar 10.
Artigo em Inglês | MEDLINE | ID: mdl-38542866

RESUMO

The development of effective inhibitors targeting the Kirsten rat sarcoma viral proto-oncogene (KRASG12D) mutation, a prevalent oncogenic driver in cancer, represents a significant unmet need in precision medicine. In this study, an integrated computational approach combining structure-based virtual screening and molecular dynamics simulation was employed to identify novel noncovalent inhibitors targeting the KRASG12D variant. Through virtual screening of over 1.7 million diverse compounds, potential lead compounds with high binding affinity and specificity were identified using molecular docking and scoring techniques. Subsequently, 200 ns molecular dynamics simulations provided critical insights into the dynamic behavior, stability, and conformational changes of the inhibitor-KRASG12D complexes, facilitating the selection of lead compounds with robust binding profiles. Additionally, in silico absorption, distribution, metabolism, excretion (ADME) profiling, and toxicity predictions were applied to prioritize the lead compounds for further experimental validation. The discovered noncovalent KRASG12D inhibitors exhibit promises as potential candidates for targeted therapy against KRASG12D-driven cancers. This comprehensive computational framework not only expedites the discovery of novel KRASG12D inhibitors but also provides valuable insights for the development of precision treatments tailored to this oncogenic mutation.


Assuntos
Simulação de Dinâmica Molecular , Neoplasias , Humanos , Proteínas Proto-Oncogênicas p21(ras)/genética , Simulação de Acoplamento Molecular , Mutação
13.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38555479

RESUMO

MOTIVATION: Accurately predicting molecular metabolic stability is of great significance to drug research and development, ensuring drug safety and effectiveness. Existing deep learning methods, especially graph neural networks, can reveal the molecular structure of drugs and thus efficiently predict the metabolic stability of molecules. However, most of these methods focus on the message passing between adjacent atoms in the molecular graph, ignoring the relationship between bonds. This makes it difficult for these methods to estimate accurate molecular representations, thereby being limited in molecular metabolic stability prediction tasks. RESULTS: We propose the MS-BACL model based on bond graph augmentation technology and contrastive learning strategy, which can efficiently and reliably predict the metabolic stability of molecules. To our knowledge, this is the first time that bond-to-bond relationships in molecular graph structures have been considered in the task of metabolic stability prediction. We build a bond graph based on 'atom-bond-atom', and the model can simultaneously capture the information of atoms and bonds during the message propagation process. This enhances the model's ability to reveal the internal structure of the molecule, thereby improving the structural representation of the molecule. Furthermore, we perform contrastive learning training based on the molecular graph and its bond graph to learn the final molecular representation. Multiple sets of experimental results on public datasets show that the proposed MS-BACL model outperforms the state-of-the-art model. AVAILABILITY AND IMPLEMENTATION: The code and data are publicly available at https://github.com/taowang11/MS.


Assuntos
Redes Neurais de Computação
14.
Brief Funct Genomics ; 23(4): 475-483, 2024 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-38391194

RESUMO

MicroRNAs (miRNAs) are found ubiquitously in biological cells and play a pivotal role in regulating the expression of numerous target genes. Therapies centered around miRNAs are emerging as a promising strategy for disease treatment, aiming to intervene in disease progression by modulating abnormal miRNA expressions. The accurate prediction of miRNA-drug resistance (MDR) is crucial for the success of miRNA therapies. Computational models based on deep learning have demonstrated exceptional performance in predicting potential MDRs. However, their effectiveness can be compromised by errors in the data acquisition process, leading to inaccurate node representations. To address this challenge, we introduce the GAM-MDR model, which combines the graph autoencoder (GAE) with random path masking techniques to precisely predict potential MDRs. The reliability and effectiveness of the GAM-MDR model are mainly reflected in two aspects. Firstly, it efficiently extracts the representations of miRNA and drug nodes in the miRNA-drug network. Secondly, our designed random path masking strategy efficiently reconstructs critical paths in the network, thereby reducing the adverse impact of noisy data. To our knowledge, this is the first time that a random path masking strategy has been integrated into a GAE to infer MDRs. Our method was subjected to multiple validations on public datasets and yielded promising results. We are optimistic that our model could offer valuable insights for miRNA therapeutic strategies and deepen the understanding of the regulatory mechanisms of miRNAs. Our data and code are publicly available at GitHub:https://github.com/ZZCrazy00/GAM-MDR.


Assuntos
MicroRNAs , MicroRNAs/genética , Humanos , Biologia Computacional/métodos , Resistencia a Medicamentos Antineoplásicos/genética , Algoritmos , Resistência a Medicamentos/genética
15.
Artigo em Inglês | MEDLINE | ID: mdl-38386576

RESUMO

Improving the drug development process can expedite the introduction of more novel drugs that cater to the demands of precision medicine. Accurately predicting molecular properties remains a fundamental challenge in drug discovery and development. Currently, a plethora of computer-aided drug discovery (CADD) methods have been widely employed in the field of molecular prediction. However, most of these methods primarily analyze molecules using low-dimensional representations such as SMILES notations, molecular fingerprints, and molecular graph-based descriptors. Only a few approaches have focused on incorporating and utilizing high-dimensional spatial structural representations of molecules. In light of the advancements in artificial intelligence, we introduce a 3D graph-spatial co-representation model called AEGNN-M, which combines two graph neural networks, GAT and EGNN. AEGNN-M enables learning of information from both molecular graphs representations and 3D spatial structural representations to predict molecular properties accurately. We conducted experiments on seven public datasets, three regression datasets and 14 breast cancer cell line phenotype screening datasets, comparing the performance of AEGNN-M with state-of-the-art deep learning methods. Extensive experimental results demonstrate the satisfactory performance of the AEGNN-M model. Furthermore, we analyzed the performance impact of different modules within AEGNN-M and the influence of spatial structural representations on the model's performance. The interpretability analysis also revealed the significance of specific atoms in determining particular molecular properties.

16.
Comput Biol Med ; 171: 108104, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38335821

RESUMO

Drug-food interactions (DFIs) crucially impact patient safety and drug efficacy by modifying absorption, distribution, metabolism, and excretion. The application of deep learning for predicting DFIs is promising, yet the development of computational models remains in its early stages. This is mainly due to the complexity of food compounds, challenging dataset developers in acquiring comprehensive ingredient data, often resulting in incomplete or vague food component descriptions. DFI-MS tackles this issue by employing an accurate feature representation method alongside a refined computational model. It innovatively achieves a more precise characterization of food features, a previously daunting task in DFI research. This is accomplished through modules designed for perturbation interactions, feature alignment and domain separation, and inference feedback. These modules extract essential information from features, using a perturbation module and a feature interaction encoder to establish robust representations. The feature alignment and domain separation modules are particularly effective in managing data with diverse frequencies and characteristics. DFI-MS stands out as the first in its field to combine data augmentation, feature alignment, domain separation, and contrastive learning. The flexibility of the inference feedback module allows its application in various downstream tasks. Demonstrating exceptional performance across multiple datasets, DFI-MS represents a significant advancement in food presentations technology. Our code and data are available at https://github.com/kkkayle/DFI-MS.


Assuntos
Interações Alimento-Droga , Alimentos , Humanos , Aprendizado de Máquina Supervisionado
17.
Mol Ther Nucleic Acids ; 35(1): 102103, 2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38261851

RESUMO

Inferring small molecule-miRNA associations (MMAs) is crucial for revealing the intricacies of biological processes and disease mechanisms. Deep learning, renowned for its exceptional speed and accuracy, is extensively used for predicting MMAs. However, given their heavy reliance on data, inaccuracies during data collection can make these methods susceptible to noise interference. To address this challenge, we introduce the joint masking and self-supervised (JMSS)-MMA model. This model synergizes graph autoencoders with a probability distribution-based masking strategy, effectively countering the impact of noisy data and enabling precise predictions of unknown MMAs. Operating in a self-supervised manner, it deeply encodes the relationship data of small molecules and miRNA through the graph autoencoder, delving into its latent information. Our masking strategy has successfully reduced data noise, enhancing prediction accuracy. To our knowledge, this is the pioneering integration of a masking strategy with graph autoencoders for MMA prediction. Furthermore, the JMSS-MMA model incorporates a node-degree-based decoder, deepening the understanding of the network's structure. Experiments on two mainstream datasets confirm the model's efficiency and precision, and ablation studies further attest to its robustness. We firmly believe that this model will revolutionize drug development, personalized medicine, and biomedical research.

18.
J Chem Inf Model ; 64(7): 2798-2806, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-37643082

RESUMO

Plant small secretory peptides (SSPs) play an important role in the regulation of biological processes in plants. Accurately predicting SSPs enables efficient exploration of their functions. Traditional experimental verification methods are very reliable and accurate, but they require expensive equipment and a lot of time. The method of machine learning speeds up the prediction process of SSPs, but the instability of feature extraction will also lead to further limitations of this type of method. Therefore, this paper proposes a new feature-correction-based model for SSP recognition in plants, abbreviated as SE-SSP. The model mainly includes the following three advantages: First, the use of transformer encoders can better reveal implicit features. Second, design a feature correction module suitable for sequences, named 2-D SENET, to adaptively adjust the features to obtain a more robust feature representation. Third, stack multiple linear modules to further dig out the deep information on the sample. At the same time, the training based on a contrastive learning strategy can alleviate the problem of sparse samples. We construct experiments on publicly available data sets, and the results verify that our model shows an excellent performance. The proposed model can be used as a convenient and effective SSP prediction tool in the future. Our data and code are publicly available at https://github.com/wrab12/SE-SSP/.


Assuntos
Fontes de Energia Elétrica , Aprendizado de Máquina , Transporte Biológico , Peptídeos , Projetos de Pesquisa
19.
J Chem Inf Model ; 64(7): 2912-2920, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-37920888

RESUMO

Deep learning methods can accurately study noncoding RNA protein interactions (NPI), which is of great significance in gene regulation, human disease, and other fields. However, the computational method for predicting NPI in large-scale dynamic ncRNA protein bipartite graphs is rarely discussed, which is an online modeling and prediction problem. In addition, the results published by researchers on the Web site cannot meet real-time needs due to the large amount of basic data and long update cycles. Therefore, we propose a real-time method based on the dynamic ncRNA-protein bipartite graph learning framework, termed ML-GNN, which can model and predict the NPIs in real time. Our proposed method has the following advantages: first, the meta-learning strategy can alleviate the problem of large prediction errors in sparse neighborhood samples; second, dynamic modeling of newly added data can reduce computational pressure and predict NPIs in real-time. In the experiment, we built a dynamic bipartite graph based on 300000 NPIs from the NPInterv4.0 database. The experimental results indicate that our model achieved excellent performance in multiple experiments. The code for the model is available at https://github.com/taowang11/ML-NPI, and the data can be downloaded freely at http://bigdata.ibp.ac.cn/npinter4.


Assuntos
RNA não Traduzido , Pesquisadores , Humanos , Bases de Dados Factuais , RNA não Traduzido/genética
20.
Methods ; 221: 73-81, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38123109

RESUMO

Research indicates that miRNAs present in herbal medicines are crucial for identifying disease markers, advancing gene therapy, facilitating drug delivery, and so on. These miRNAs maintain stability in the extracellular environment, making them viable tools for disease diagnosis. They can withstand the digestive processes in the gastrointestinal tract, positioning them as potential carriers for specific oral drug delivery. By engineering plants to generate effective, non-toxic miRNA interference sequences, it's possible to broaden their applicability, including the treatment of diseases such as hepatitis C. Consequently, delving into the miRNA-disease associations (MDAs) within herbal medicines holds immense promise for diagnosing and addressing miRNA-related diseases. In our research, we propose the SGAE-MDA model, which harnesses the strengths of a graph autoencoder (GAE) combined with a semi-supervised approach to uncover potential MDAs in herbal medicines more effectively. Leveraging the GAE framework, the SGAE-MDA model exactly integrates the inherent feature vectors of miRNAs and disease nodes with the regulatory data in the miRNA-disease network. Additionally, the proposed semi-supervised learning approach randomly hides the partial structure of the miRNA-disease network, subsequently reconstructing them within the GAE framework. This technique effectively minimizes network noise interference. Through comparison against other leading deep learning models, the results consistently highlighted the superior performance of the proposed SGAE-MDA model. Our code and dataset can be available at: https://github.com/22n9n23/SGAE-MDA.


Assuntos
MicroRNAs , MicroRNAs/genética , Algoritmos , Biologia Computacional/métodos , Aprendizado de Máquina Supervisionado , Extratos Vegetais
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA