Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 72
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38913513

RESUMO

The recent boom in single-cell sequencing technologies provides valuable insights into the transcriptomes of individual cells. Through single-cell data analyses, a number of biological discoveries, such as novel cell types, developmental cell lineage trajectories, and gene regulatory networks, have been uncovered. However, the massive and increasingly accumulated single-cell datasets have also posed a seriously computational and analytical challenge for researchers. To address this issue, one typically applies dimensionality reduction approaches to reduce the large-scale datasets. However, these approaches are generally computationally infeasible for tall matrices. In addition, the downstream data analysis tasks such as clustering still take a large time complexity even on the dimension-reduced datasets. We present single-cell Coreset (scCoreset), a data summarization framework that extracts a small weighted subset of cells from a huge sparse single-cell RNA-seq data to facilitate the downstream data analysis tasks. Single-cell data analyses run on the extracted subset yield similar results to those derived from the original uncompressed data. Tests on various single-cell datasets show that scCoreset outperforms the existing data summarization approaches for common downstream tasks such as visualization and clustering. We believe that scCoreset can serve as a useful plug-in tool to improve the efficiency of current single-cell RNA-seq data analyses.

2.
Cell Syst ; 15(6): 483-487, 2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38901402

RESUMO

This Voices piece will highlight the impact of artificial intelligence on algorithm development among computational biologists. How has worldwide focus on AI changed the path of research in computational biology? What is the impact on the algorithmic biology research community?


Assuntos
Algoritmos , Inteligência Artificial , Biologia Computacional , Inteligência Artificial/tendências , Biologia Computacional/métodos , Humanos
3.
Commun Chem ; 7(1): 127, 2024 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-38834746

RESUMO

Identifying active compounds for target proteins is fundamental in early drug discovery. Recently, data-driven computational methods have demonstrated promising potential in predicting compound activities. However, there lacks a well-designed benchmark to comprehensively evaluate these methods from a practical perspective. To fill this gap, we propose a Compound Activity benchmark for Real-world Applications (CARA). Through carefully distinguishing assay types, designing train-test splitting schemes and selecting evaluation metrics, CARA can consider the biased distribution of current real-world compound activity data and avoid overestimation of model performances. We observed that although current models can make successful predictions for certain proportions of assays, their performances varied across different assays. In addition, evaluation of several few-shot training strategies demonstrated different performances related to task types. Overall, we provide a high-quality dataset for developing and evaluating compound activity prediction models, and the analyses in this work may inspire better applications of data-driven models in drug discovery.

4.
bioRxiv ; 2024 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-38798507

RESUMO

Polygenic risk scores (PRSs) are commonly used for predicting an individual's genetic risk of complex diseases. Yet, their implication for disease pathogenesis remains largely limited. Here, we introduce scPRS, a geometric deep learning model that constructs single-cell-resolved PRS leveraging reference single-cell chromatin accessibility profiling data to enhance biological discovery as well as disease prediction. Real-world applications across multiple complex diseases, including type 2 diabetes (T2D), hypertrophic cardiomyopathy (HCM), and Alzheimer's disease (AD), showcase the superior prediction power of scPRS compared to traditional PRS methods. Importantly, scPRS not only predicts disease risk but also uncovers disease-relevant cells, such as hormone-high alpha and beta cells for T2D, cardiomyocytes and pericytes for HCM, and astrocytes, microglia and oligodendrocyte progenitor cells for AD. Facilitated by a layered multi-omic analysis, scPRS further identifies cell-type-specific genetic underpinnings, linking disease-associated genetic variants to gene regulation within corresponding cell types. We substantiate the disease relevance of scPRS-prioritized HCM genes and demonstrate that the suppression of these genes in HCM cardiomyocytes is rescued by Mavacamten treatment. Additionally, we establish a novel microglia-specific regulatory relationship between the AD risk variant rs7922621 and its target genes ANXA11 and TSPAN14. We further illustrate the detrimental effects of suppressing these two genes on microglia phagocytosis. Our work provides a multi-tasking, interpretable framework for precise disease prediction and systematic investigation of the genetic, cellular, and molecular basis of complex diseases, laying the methodological foundation for single-cell genetics.

5.
PLoS Comput Biol ; 20(4): e1011945, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38578805

RESUMO

Early identification of safe and efficacious disease targets is crucial to alleviating the tremendous cost of drug discovery projects. However, existing experimental methods for identifying new targets are generally labor-intensive and failure-prone. On the other hand, computational approaches, especially machine learning-based frameworks, have shown remarkable application potential in drug discovery. In this work, we propose Progeni, a novel machine learning-based framework for target identification. In addition to fully exploiting the known heterogeneous biological networks from various sources, Progeni integrates literature evidence about the relations between biological entities to construct a probabilistic knowledge graph. Graph neural networks are then employed in Progeni to learn the feature embeddings of biological entities to facilitate the identification of biologically relevant target candidates. A comprehensive evaluation of Progeni demonstrated its superior predictive power over the baseline methods on the target identification task. In addition, our extensive tests showed that Progeni exhibited high robustness to the negative effect of exposure bias, a common phenomenon in recommendation systems, and effectively identified new targets that can be strongly supported by the literature. Moreover, our wet lab experiments successfully validated the biological significance of the top target candidates predicted by Progeni for melanoma and colorectal cancer. All these results suggested that Progeni can identify biologically effective targets and thus provide a powerful and useful tool for advancing the drug discovery process.


Assuntos
Biologia Computacional , Descoberta de Drogas , Aprendizado de Máquina , Redes Neurais de Computação , Humanos , Biologia Computacional/métodos , Descoberta de Drogas/métodos , Algoritmos , Melanoma , Probabilidade , Neoplasias Colorretais
6.
medRxiv ; 2024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-38633814

RESUMO

Amyotrophic lateral sclerosis (ALS) is a fatal and incurable neurodegenerative disease caused by the selective and progressive death of motor neurons (MNs). Understanding the genetic and molecular factors influencing ALS survival is crucial for disease management and therapeutics. In this study, we introduce a deep learning-powered genetic analysis framework to link rare noncoding genetic variants to ALS survival. Using data from human induced pluripotent stem cell (iPSC)-derived MNs, this method prioritizes functional noncoding variants using deep learning, links cis-regulatory elements (CREs) to target genes using epigenomics data, and integrates these data through gene-level burden tests to identify survival-modifying variants, CREs, and genes. We apply this approach to analyze 6,715 ALS genomes, and pinpoint four novel rare noncoding variants associated with survival, including chr7:76,009,472:C>T linked to CCDC146. CRISPR-Cas9 editing of this variant increases CCDC146 expression in iPSC-derived MNs and exacerbates ALS-specific phenotypes, including TDP-43 mislocalization. Suppressing CCDC146 with an antisense oligonucleotide (ASO), showing no toxicity, completely rescues ALS-associated survival defects in neurons derived from sporadic ALS patients and from carriers of the ALS-associated G4C2-repeat expansion within C9ORF72. ASO targeting of CCDC146 may be a broadly effective therapeutic approach for ALS. Our framework provides a generic and powerful approach for studying noncoding genetics of complex human diseases.

7.
J Immunother Cancer ; 12(3)2024 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-38458637

RESUMO

BACKGROUND: Dendritic cell (DC)-mediated antigen presentation is essential for the priming and activation of tumor-specific T cells. However, few drugs that specifically manipulate DC functions are available. The identification of drugs targeting DC holds great promise for cancer immunotherapy. METHODS: We observed that type 1 conventional DCs (cDC1s) initiated a distinct transcriptional program during antigen presentation. We used a network-based approach to screen for cDC1-targeting therapeutics. The antitumor potency and underlying mechanisms of the candidate drug were investigated in vitro and in vivo. RESULTS: Sitagliptin, an oral gliptin widely used for type 2 diabetes, was identified as a drug that targets DCs. In mouse models, sitagliptin inhibited tumor growth by enhancing cDC1-mediated antigen presentation, leading to better T-cell activation. Mechanistically, inhibition of dipeptidyl peptidase 4 (DPP4) by sitagliptin prevented the truncation and degradation of chemokines/cytokines that are important for DC activation. Sitagliptin enhanced cancer immunotherapy by facilitating the priming of antigen-specific T cells by DCs. In humans, the use of sitagliptin correlated with a lower risk of tumor recurrence in patients with colorectal cancer undergoing curative surgery. CONCLUSIONS: Our findings indicate that sitagliptin-mediated DPP4 inhibition promotes antitumor immune response by augmenting cDC1 functions. These data suggest that sitagliptin can be repurposed as an antitumor drug targeting DC, which provides a potential strategy for cancer immunotherapy.


Assuntos
Antineoplásicos , Diabetes Mellitus Tipo 2 , Neoplasias , Camundongos , Animais , Humanos , Dipeptidil Peptidase 4/metabolismo , Células Dendríticas , Fosfato de Sitagliptina/farmacologia , Fosfato de Sitagliptina/uso terapêutico , Fosfato de Sitagliptina/metabolismo , Apresentação de Antígeno , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico
9.
J Chem Inf Model ; 64(7): 2236-2249, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-37584270

RESUMO

Optimizing the activities and properties of lead compounds is an essential step in the drug discovery process. Despite recent advances in machine learning-aided drug discovery, most of the existing methods focus on making predictions for the desired objectives directly while ignoring the explanations for predictions. Although several techniques can provide interpretations for machine learning-based methods such as feature attribution, there are still gaps between these interpretations and the principles commonly adopted by medicinal chemists when designing and optimizing molecules. Here, we propose an interpretation framework, named MolSHAP, for quantitative structure-activity relationship analysis by estimating the contributions of R-groups. Instead of attributing the activities to individual input features, MolSHAP regards the R-group fragments as the basic units of interpretation, which is in accordance with the fragment-based modifications in molecule optimization. MolSHAP is a model-agnostic method that can interpret activity regression models with arbitrary input formats and model architectures. Based on the evaluations of numerous representative activity regression models on a specially designed R-group ranking task, MolSHAP achieved significantly better interpretation power compared with other methods. In addition, we developed a compound optimization algorithm based on MolSHAP and illustrated the reliability of the optimized compounds using an independent case study. These results demonstrated that MolSHAP can provide a useful tool for accurately interpreting the quantitative structure-activity relationships and rationally optimizing the compound activities in drug discovery.


Assuntos
Descoberta de Drogas , Relação Quantitativa Estrutura-Atividade , Reprodutibilidade dos Testes , Descoberta de Drogas/métodos , Algoritmos , Aprendizado de Máquina
10.
Nat Commun ; 14(1): 8459, 2023 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-38123534

RESUMO

Single-cell technologies enable the dynamic analyses of cell fate mapping. However, capturing the gene regulatory relationships and identifying the driver factors that control cell fate decisions are still challenging. We present CEFCON, a network-based framework that first uses a graph neural network with attention mechanism to infer a cell-lineage-specific gene regulatory network (GRN) from single-cell RNA-sequencing data, and then models cell fate dynamics through network control theory to identify driver regulators and the associated gene modules, revealing their critical biological processes related to cell states. Extensive benchmarking tests consistently demonstrated the superiority of CEFCON in GRN construction, driver regulator identification, and gene module identification over baseline methods. When applied to the mouse hematopoietic stem cell differentiation data, CEFCON successfully identified driver regulators for three developmental lineages, which offered useful insights into their differentiation from a network control perspective. Overall, CEFCON provides a valuable tool for studying the underlying mechanisms of cell fate decisions from single-cell RNA-seq data.


Assuntos
Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Animais , Camundongos , Diferenciação Celular/genética , Linhagem da Célula/genética , Redes Reguladoras de Genes , Análise de Célula Única/métodos
11.
Nat Commun ; 14(1): 7568, 2023 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-37989998

RESUMO

Learning effective molecular feature representation to facilitate molecular property prediction is of great significance for drug discovery. Recently, there has been a surge of interest in pre-training graph neural networks (GNNs) via self-supervised learning techniques to overcome the challenge of data scarcity in molecular property prediction. However, current self-supervised learning-based methods suffer from two main obstacles: the lack of a well-defined self-supervised learning strategy and the limited capacity of GNNs. Here, we propose Knowledge-guided Pre-training of Graph Transformer (KPGT), a self-supervised learning framework to alleviate the aforementioned issues and provide generalizable and robust molecular representations. The KPGT framework integrates a graph transformer specifically designed for molecular graphs and a knowledge-guided pre-training strategy, to fully capture both structural and semantic knowledge of molecules. Through extensive computational tests on 63 datasets, KPGT exhibits superior performance in predicting molecular properties across various domains. Moreover, the practical applicability of KPGT in drug discovery has been validated by identifying potential inhibitors of two antitumor targets: hematopoietic progenitor kinase 1 (HPK1) and fibroblast growth factor receptor 1 (FGFR1). Overall, KPGT can provide a powerful and useful tool for advancing the artificial intelligence (AI)-aided drug discovery process.


Assuntos
Fármacos Anti-HIV , Inteligência Artificial , Descoberta de Drogas , Fontes de Energia Elétrica , Hidrolases
12.
Cell Syst ; 14(8): 692-705.e6, 2023 08 16.
Artigo em Inglês | MEDLINE | ID: mdl-37516103

RESUMO

Protein-ligand interactions are essential for cellular activities and drug discovery processes. Appropriately and effectively representing protein features is of vital importance for developing computational approaches, especially data-driven methods, for predicting protein-ligand interactions. However, existing approaches may not fully investigate the features of the ligand-occupying regions in the protein pockets. Here, we design a structure-based protein representation method, named PocketAnchor, for capturing the local environmental and spatial features of protein pockets to facilitate protein-ligand interaction-related learning tasks. We define "anchors" as probe points reaching into the cavities and those located near the surface of proteins, and we design a specific message passing strategy for gathering local information from the atoms and surface neighboring these anchors. Comprehensive evaluation of our method demonstrated its successful applications in pocket detection and binding affinity prediction, which indicated that our anchor-based approach can provide effective protein feature representations for improving the prediction of protein-ligand interactions.


Assuntos
Algoritmos , Proteínas , Sítios de Ligação , Ligantes , Proteínas/metabolismo
13.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37287135

RESUMO

Hi-C is a widely applied chromosome conformation capture (3C)-based technique, which has produced a large number of genomic contact maps with high sequencing depths for a wide range of cell types, enabling comprehensive analyses of the relationships between biological functionalities (e.g. gene regulation and expression) and the three-dimensional genome structure. Comparative analyses play significant roles in Hi-C data studies, which are designed to make comparisons between Hi-C contact maps, thus evaluating the consistency of replicate Hi-C experiments (i.e. reproducibility measurement) and detecting statistically differential interacting regions with biological significance (i.e. differential chromatin interaction detection). However, due to the complex and hierarchical nature of Hi-C contact maps, it remains challenging to conduct systematic and reliable comparative analyses of Hi-C data. Here, we proposed sslHiC, a contrastive self-supervised representation learning framework, for precisely modeling the multi-level features of chromosome conformation and automatically producing informative feature embeddings for genomic loci and their interactions to facilitate comparative analyses of Hi-C contact maps. Comprehensive computational experiments on both simulated and real datasets demonstrated that our method consistently outperformed the state-of-the-art baseline methods in providing reliable measurements of reproducibility and detecting differential interactions with biological meanings.


Assuntos
Cromatina , Cromossomos , Reprodutibilidade dos Testes , Cromatina/genética , Cromossomos/genética , Genômica/métodos , Aprendizado de Máquina Supervisionado
14.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36527428

RESUMO

Understanding the mechanisms of candidate drugs play an important role in drug discovery. The activating/inhibiting mechanisms between drugs and targets are major types of mechanisms of drugs. Owing to the complexity of drug-target (DT) mechanisms and data scarcity, modelling this problem based on deep learning methods to accurately predict DT activating/inhibiting mechanisms remains a considerable challenge. Here, by considering network pharmacology, we propose a multi-view deep learning model, DrugAI, which combines four modules, i.e. a graph neural network for drugs, a convolutional neural network for targets, a network embedding module for drugs and targets and a deep neural network for predicting activating/inhibiting mechanisms between drugs and targets. Computational experiments show that DrugAI performs better than state-of-the-art methods and has good robustness and generalization. To demonstrate the reliability of the predictive results of DrugAI, bioassay experiments are conducted to validate two drugs (notopterol and alpha-asarone) predicted to activate TRPV1. Moreover, external validation bears out 61 pairs of mechanism relationships between natural products and their targets predicted by DrugAI based on independent literatures and PubChem bioassays. DrugAI, for the first time, provides a powerful multi-view deep learning framework for robust prediction of DT activating/inhibiting mechanisms.


Assuntos
Aprendizado Profundo , Algoritmos , Reprodutibilidade dos Testes , Redes Neurais de Computação , Descoberta de Drogas
15.
Pac Symp Biocomput ; 28: 157-168, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36540973

RESUMO

Identifying effective target-disease associations (TDAs) can alleviate the tremendous cost incurred by clinical failures of drug development. Although many machine learning models have been proposed to predict potential novel TDAs rapidly, their credibility is not guaranteed, thus requiring extensive experimental validation. In addition, it is generally challenging for current models to predict meaningful associations for entities with less information, hence limiting the application potential of these models in guiding future research. Based on recent advances in utilizing graph neural networks to extract features from heterogeneous biological data, we develop CreaTDA, an end-to-end deep learning-based framework that effectively learns latent feature representations of targets and diseases to facilitate TDA prediction. We also propose a novel way of encoding credibility information obtained from literature to enhance the performance of TDA prediction and predict more novel TDAs with real evidence support from previous studies. Compared with state-of-the-art baseline methods, CreaTDA achieves substantially better prediction performance on the whole TDA network and its sparse sub-networks containing the proteins associated with few known diseases. Our results demonstrate that CreaTDA can provide a powerful and helpful tool for identifying novel target-disease associations, thereby facilitating drug discovery.


Assuntos
Biologia Computacional , Redes Neurais de Computação , Humanos , Biologia Computacional/métodos , Aprendizado de Máquina , Descoberta de Drogas , Proteínas
16.
iScience ; 25(10): 105231, 2022 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-36274947

RESUMO

Deeply understanding the properties (e.g., chemical or biological characteristics) of small molecules plays an essential role in drug development. A large number of molecular property datasets have been rapidly accumulated in recent years. However, most of these datasets contain only a limited amount of data, which hinders deep learning methods from making accurate predictions of the corresponding molecular properties. In this work, we propose a transfer learning strategy to alleviate such a data scarcity problem by exploiting the similarity between molecular property prediction tasks. We introduce an effective and interpretable computational framework, named MoTSE (Molecular Tasks Similarity Estimator), to provide an accurate estimation of task similarity. Comprehensive tests demonstrated that the task similarity derived from MoTSE can serve as useful guidance to improve the prediction performance of transfer learning on molecular properties. We also showed that MoTSE can capture the intrinsic relationships between molecular properties and provide meaningful interpretability for the derived similarity.

17.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-36070863

RESUMO

Computational recovery of gene regulatory network (GRN) has recently undergone a great shift from bulk-cell towards designing algorithms targeting single-cell data. In this work, we investigate whether the widely available bulk-cell data could be leveraged to assist the GRN predictions for single cells. We infer cell-type-specific GRNs from both the single-cell RNA sequencing data and the generic GRN derived from the bulk cells by constructing a weakly supervised learning framework based on the axial transformer. We verify our assumption that the bulk-cell transcriptomic data are a valuable resource, which could improve the prediction of single-cell GRN by conducting extensive experiments. Our GRN-transformer achieves the state-of-the-art prediction accuracy in comparison to existing supervised and unsupervised approaches. In addition, we show that our method can identify important transcription factors and potential regulations for Alzheimer's disease risk genes by using the predicted GRN. Availability: The implementation of GRN-transformer is available at https://github.com/HantaoShu/GRN-Transformer.


Assuntos
Biologia Computacional , Redes Reguladoras de Genes , Algoritmos , Biologia Computacional/métodos , Fatores de Transcrição/genética , Transcriptoma
18.
Comput Struct Biotechnol J ; 20: 5193-5202, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36059866

RESUMO

The coronavirus disease-2019 (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has seriously affected public health around the world. In-depth studies on the pathogenic mechanisms of SARS-CoV-2 is urgently necessary for pandemic prevention. However, most laboratory studies on SARS-CoV-2 have to be carried out in bio-safety level 3 (BSL-3) laboratories, greatly restricting the progress of relevant experiments. In this study, we used a bacterial artificial chromosome (BAC) method to assemble a SARS-CoV-2 replication and transcription system in Vero E6 cells without virion envelope formation, thus avoiding the risk of coronavirus exposure. Furthermore, an improved real-time quantitative reverse transcription PCR (RT-qPCR) approach was used to distinguish the replication of full-length replicon RNAs and transcription of subgenomic RNAs (sgRNAs). Using the SARS-CoV-2 replicon, we demonstrated that the nucleocapsid (N) protein of SARS-CoV-2 facilitates the transcription of sgRNAs in the discontinuous synthesis process. Moreover, two high-frequency mutants of N protein, R203K and S194L, can obviously enhance the transcription level of the replicon, hinting that these mutations likely allow SARS-CoV-2 to spread and reproduce more quickly. In addition, remdesivir and chloroquine, two well-known drugs demonstrated to be effective against coronavirus in previous studies, also inhibited the transcription of our replicon, indicating the potential applications of this system in antiviral drug discovery. Overall, we developed a bio-safe and valuable replicon system of SARS-CoV-2 that is useful to study the mechanisms of viral RNA synthesis and has potential in novel antiviral drug screening.

19.
Cell Rep Med ; 3(1): 100492, 2022 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-35106508

RESUMO

The Columbia Cancer Target Discovery and Development (CTD2) Center is developing PANACEA, a resource comprising dose-responses and RNA sequencing (RNA-seq) profiles of 25 cell lines perturbed with ∼400 clinical oncology drugs, to study a tumor-specific drug mechanism of action. Here, this resource serves as the basis for a DREAM Challenge assessing the accuracy and sensitivity of computational algorithms for de novo drug polypharmacology predictions. Dose-response and perturbational profiles for 32 kinase inhibitors are provided to 21 teams who are blind to the identity of the compounds. The teams are asked to predict high-affinity binding targets of each compound among ∼1,300 targets cataloged in DrugBank. The best performing methods leverage gene expression profile similarity analysis as well as deep-learning methodologies trained on individual datasets. This study lays the foundation for future integrative analyses of pharmacogenomic data, reconciliation of polypharmacology effects in different tumor contexts, and insights into network-based assessments of drug mechanisms of action.


Assuntos
Neoplasias/tratamento farmacológico , Polifarmacologia , Algoritmos , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Redes Neurais de Computação , Proteínas Quinases/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Transcrição Gênica
20.
Pharmacol Res ; 173: 105752, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34481072

RESUMO

Traditional Chinese medicine (TCM) formula is widely used for thousands of years in clinical practice. With the development of artificial intelligence, deep learning models may help doctors prescribe reasonable formulas. Meanwhile, current studies of formula recommendation only focus on the observable clinical symptoms and lack of molecular information. Here, inspired by the theory of TCM network pharmacology, we propose an intelligent formula recommendation system based on deep learning (FordNet), fusing the information of phenotype and molecule. We collected more than 20,000 electronic health records from TCM Master Li Jiren's experience from 2013 to March 2020. In the FordNet system, the feature of diagnosis description is extracted by convolution neural network and the feature of TCM formula is extracted by network embedding, which fusing the molecular information. A hierarchical sampling strategy for data augmentation is designed to effectively learn training samples. Based on the expanded samples, a deep neural network based quantitative optimization model is developed for TCM formula recommendation. FordNet performs significantly better than baseline methods (hit ratio of top 10 improved by 46.9% compared with the best baseline random forest method). Moreover, the molecular information helps FordNet improve 17.3% hit ratio compared with the model using only macro information. Clinical evaluation shows that FordNet can well learn the effective experience of TCM Master and obtain excellent recommendation results. Our study, for the first time, proposes an intelligent recommendation system for TCM formula integrating phenotype and molecule information, which has potential to improve clinical diagnosis and treatment, and promote the shift of TCM research pattern from "experience based, macro" to "data based, macro-micro combined" as well as the development of TCM network pharmacology.


Assuntos
Medicina Tradicional Chinesa , Redes Neurais de Computação , Humanos , Farmacologia em Rede , Fenótipo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...