Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
Commun Chem ; 7(1): 127, 2024 Jun 04.
Article in English | MEDLINE | ID: mdl-38834746

ABSTRACT

Identifying active compounds for target proteins is fundamental in early drug discovery. Recently, data-driven computational methods have demonstrated promising potential in predicting compound activities. However, there lacks a well-designed benchmark to comprehensively evaluate these methods from a practical perspective. To fill this gap, we propose a Compound Activity benchmark for Real-world Applications (CARA). Through carefully distinguishing assay types, designing train-test splitting schemes and selecting evaluation metrics, CARA can consider the biased distribution of current real-world compound activity data and avoid overestimation of model performances. We observed that although current models can make successful predictions for certain proportions of assays, their performances varied across different assays. In addition, evaluation of several few-shot training strategies demonstrated different performances related to task types. Overall, we provide a high-quality dataset for developing and evaluating compound activity prediction models, and the analyses in this work may inspire better applications of data-driven models in drug discovery.

2.
PLoS Comput Biol ; 20(4): e1011945, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38578805

ABSTRACT

Early identification of safe and efficacious disease targets is crucial to alleviating the tremendous cost of drug discovery projects. However, existing experimental methods for identifying new targets are generally labor-intensive and failure-prone. On the other hand, computational approaches, especially machine learning-based frameworks, have shown remarkable application potential in drug discovery. In this work, we propose Progeni, a novel machine learning-based framework for target identification. In addition to fully exploiting the known heterogeneous biological networks from various sources, Progeni integrates literature evidence about the relations between biological entities to construct a probabilistic knowledge graph. Graph neural networks are then employed in Progeni to learn the feature embeddings of biological entities to facilitate the identification of biologically relevant target candidates. A comprehensive evaluation of Progeni demonstrated its superior predictive power over the baseline methods on the target identification task. In addition, our extensive tests showed that Progeni exhibited high robustness to the negative effect of exposure bias, a common phenomenon in recommendation systems, and effectively identified new targets that can be strongly supported by the literature. Moreover, our wet lab experiments successfully validated the biological significance of the top target candidates predicted by Progeni for melanoma and colorectal cancer. All these results suggested that Progeni can identify biologically effective targets and thus provide a powerful and useful tool for advancing the drug discovery process.


Subject(s)
Computational Biology , Drug Discovery , Machine Learning , Neural Networks, Computer , Humans , Computational Biology/methods , Drug Discovery/methods , Algorithms , Melanoma , Probability , Colorectal Neoplasms
3.
J Immunother Cancer ; 12(3)2024 Mar 07.
Article in English | MEDLINE | ID: mdl-38458637

ABSTRACT

BACKGROUND: Dendritic cell (DC)-mediated antigen presentation is essential for the priming and activation of tumor-specific T cells. However, few drugs that specifically manipulate DC functions are available. The identification of drugs targeting DC holds great promise for cancer immunotherapy. METHODS: We observed that type 1 conventional DCs (cDC1s) initiated a distinct transcriptional program during antigen presentation. We used a network-based approach to screen for cDC1-targeting therapeutics. The antitumor potency and underlying mechanisms of the candidate drug were investigated in vitro and in vivo. RESULTS: Sitagliptin, an oral gliptin widely used for type 2 diabetes, was identified as a drug that targets DCs. In mouse models, sitagliptin inhibited tumor growth by enhancing cDC1-mediated antigen presentation, leading to better T-cell activation. Mechanistically, inhibition of dipeptidyl peptidase 4 (DPP4) by sitagliptin prevented the truncation and degradation of chemokines/cytokines that are important for DC activation. Sitagliptin enhanced cancer immunotherapy by facilitating the priming of antigen-specific T cells by DCs. In humans, the use of sitagliptin correlated with a lower risk of tumor recurrence in patients with colorectal cancer undergoing curative surgery. CONCLUSIONS: Our findings indicate that sitagliptin-mediated DPP4 inhibition promotes antitumor immune response by augmenting cDC1 functions. These data suggest that sitagliptin can be repurposed as an antitumor drug targeting DC, which provides a potential strategy for cancer immunotherapy.


Subject(s)
Antineoplastic Agents , Diabetes Mellitus, Type 2 , Neoplasms , Mice , Animals , Humans , Dipeptidyl Peptidase 4/metabolism , Dendritic Cells , Sitagliptin Phosphate/pharmacology , Sitagliptin Phosphate/therapeutic use , Sitagliptin Phosphate/metabolism , Antigen Presentation , Antineoplastic Agents/pharmacology , Antineoplastic Agents/therapeutic use
4.
J Chem Inf Model ; 64(7): 2236-2249, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-37584270

ABSTRACT

Optimizing the activities and properties of lead compounds is an essential step in the drug discovery process. Despite recent advances in machine learning-aided drug discovery, most of the existing methods focus on making predictions for the desired objectives directly while ignoring the explanations for predictions. Although several techniques can provide interpretations for machine learning-based methods such as feature attribution, there are still gaps between these interpretations and the principles commonly adopted by medicinal chemists when designing and optimizing molecules. Here, we propose an interpretation framework, named MolSHAP, for quantitative structure-activity relationship analysis by estimating the contributions of R-groups. Instead of attributing the activities to individual input features, MolSHAP regards the R-group fragments as the basic units of interpretation, which is in accordance with the fragment-based modifications in molecule optimization. MolSHAP is a model-agnostic method that can interpret activity regression models with arbitrary input formats and model architectures. Based on the evaluations of numerous representative activity regression models on a specially designed R-group ranking task, MolSHAP achieved significantly better interpretation power compared with other methods. In addition, we developed a compound optimization algorithm based on MolSHAP and illustrated the reliability of the optimized compounds using an independent case study. These results demonstrated that MolSHAP can provide a useful tool for accurately interpreting the quantitative structure-activity relationships and rationally optimizing the compound activities in drug discovery.


Subject(s)
Drug Discovery , Quantitative Structure-Activity Relationship , Reproducibility of Results , Drug Discovery/methods , Algorithms , Machine Learning
5.
Cell Syst ; 14(8): 692-705.e6, 2023 08 16.
Article in English | MEDLINE | ID: mdl-37516103

ABSTRACT

Protein-ligand interactions are essential for cellular activities and drug discovery processes. Appropriately and effectively representing protein features is of vital importance for developing computational approaches, especially data-driven methods, for predicting protein-ligand interactions. However, existing approaches may not fully investigate the features of the ligand-occupying regions in the protein pockets. Here, we design a structure-based protein representation method, named PocketAnchor, for capturing the local environmental and spatial features of protein pockets to facilitate protein-ligand interaction-related learning tasks. We define "anchors" as probe points reaching into the cavities and those located near the surface of proteins, and we design a specific message passing strategy for gathering local information from the atoms and surface neighboring these anchors. Comprehensive evaluation of our method demonstrated its successful applications in pocket detection and binding affinity prediction, which indicated that our anchor-based approach can provide effective protein feature representations for improving the prediction of protein-ligand interactions.


Subject(s)
Algorithms , Proteins , Binding Sites , Ligands , Proteins/metabolism
6.
Pac Symp Biocomput ; 28: 157-168, 2023.
Article in English | MEDLINE | ID: mdl-36540973

ABSTRACT

Identifying effective target-disease associations (TDAs) can alleviate the tremendous cost incurred by clinical failures of drug development. Although many machine learning models have been proposed to predict potential novel TDAs rapidly, their credibility is not guaranteed, thus requiring extensive experimental validation. In addition, it is generally challenging for current models to predict meaningful associations for entities with less information, hence limiting the application potential of these models in guiding future research. Based on recent advances in utilizing graph neural networks to extract features from heterogeneous biological data, we develop CreaTDA, an end-to-end deep learning-based framework that effectively learns latent feature representations of targets and diseases to facilitate TDA prediction. We also propose a novel way of encoding credibility information obtained from literature to enhance the performance of TDA prediction and predict more novel TDAs with real evidence support from previous studies. Compared with state-of-the-art baseline methods, CreaTDA achieves substantially better prediction performance on the whole TDA network and its sparse sub-networks containing the proteins associated with few known diseases. Our results demonstrate that CreaTDA can provide a powerful and helpful tool for identifying novel target-disease associations, thereby facilitating drug discovery.


Subject(s)
Computational Biology , Neural Networks, Computer , Humans , Computational Biology/methods , Machine Learning , Drug Discovery , Proteins
7.
Cell Rep Med ; 3(1): 100492, 2022 01 18.
Article in English | MEDLINE | ID: mdl-35106508

ABSTRACT

The Columbia Cancer Target Discovery and Development (CTD2) Center is developing PANACEA, a resource comprising dose-responses and RNA sequencing (RNA-seq) profiles of 25 cell lines perturbed with ∼400 clinical oncology drugs, to study a tumor-specific drug mechanism of action. Here, this resource serves as the basis for a DREAM Challenge assessing the accuracy and sensitivity of computational algorithms for de novo drug polypharmacology predictions. Dose-response and perturbational profiles for 32 kinase inhibitors are provided to 21 teams who are blind to the identity of the compounds. The teams are asked to predict high-affinity binding targets of each compound among ∼1,300 targets cataloged in DrugBank. The best performing methods leverage gene expression profile similarity analysis as well as deep-learning methodologies trained on individual datasets. This study lays the foundation for future integrative analyses of pharmacogenomic data, reconciliation of polypharmacology effects in different tumor contexts, and insights into network-based assessments of drug mechanisms of action.


Subject(s)
Neoplasms/drug therapy , Polypharmacology , Algorithms , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Humans , Neural Networks, Computer , Protein Kinases/metabolism , RNA, Messenger/genetics , RNA, Messenger/metabolism , Transcription, Genetic
8.
Nat Commun ; 12(1): 5465, 2021 09 15.
Article in English | MEDLINE | ID: mdl-34526500

ABSTRACT

Peptide-protein interactions are involved in various fundamental cellular functions and their identification is crucial for designing efficacious peptide therapeutics. Recently, a number of computational methods have been developed to predict peptide-protein interactions. However, most of the existing prediction approaches heavily depend on high-resolution structure data. Here, we present a deep learning framework for multi-level peptide-protein interaction prediction, called CAMP, including binary peptide-protein interaction prediction and corresponding peptide binding residue identification. Comprehensive evaluation demonstrated that CAMP can successfully capture the binary interactions between peptides and proteins and identify the binding residues along the peptides involved in the interactions. In addition, CAMP outperformed other state-of-the-art methods on binary peptide-protein interaction prediction. CAMP can serve as a useful tool in peptide-protein interaction prediction and identification of important binding residues in the peptides, which can thus facilitate the peptide drug discovery process.


Subject(s)
Algorithms , Computational Biology/methods , Deep Learning , Peptides/metabolism , Proteins/metabolism , Binding Sites , Models, Molecular , Peptides/chemistry , Protein Binding , Protein Domains , Proteins/chemistry , Reproducibility of Results
9.
Signal Transduct Target Ther ; 6(1): 165, 2021 04 24.
Article in English | MEDLINE | ID: mdl-33895786

ABSTRACT

The global spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) requires an urgent need to find effective therapeutics for the treatment of coronavirus disease 2019 (COVID-19). In this study, we developed an integrative drug repositioning framework, which fully takes advantage of machine learning and statistical analysis approaches to systematically integrate and mine large-scale knowledge graph, literature and transcriptome data to discover the potential drug candidates against SARS-CoV-2. Our in silico screening followed by wet-lab validation indicated that a poly-ADP-ribose polymerase 1 (PARP1) inhibitor, CVL218, currently in Phase I clinical trial, may be repurposed to treat COVID-19. Our in vitro assays revealed that CVL218 can exhibit effective inhibitory activity against SARS-CoV-2 replication without obvious cytopathic effect. In addition, we showed that CVL218 can interact with the nucleocapsid (N) protein of SARS-CoV-2 and is able to suppress the LPS-induced production of several inflammatory cytokines that are highly relevant to the prevention of immunopathology induced by SARS-CoV-2 infection.


Subject(s)
Antiviral Agents/therapeutic use , COVID-19 Drug Treatment , COVID-19/metabolism , Computer Simulation , Drug Repositioning , Models, Biological , SARS-CoV-2/metabolism , Humans
10.
PLoS Comput Biol ; 17(3): e1008842, 2021 03.
Article in English | MEDLINE | ID: mdl-33770074

ABSTRACT

Translation elongation is regulated by a series of complicated mechanisms in both prokaryotes and eukaryotes. Although recent advance in ribosome profiling techniques has enabled one to capture the genome-wide ribosome footprints along transcripts at codon resolution, the regulatory codes of elongation dynamics are still not fully understood. Most of the existing computational approaches for modeling translation elongation from ribosome profiling data mainly focus on local contextual patterns, while ignoring the continuity of the elongation process and relations between ribosome densities of remote codons. Modeling the translation elongation process in full-length coding sequence (CDS) level has not been studied to the best of our knowledge. In this paper, we developed a deep learning based approach with a multi-input and multi-output framework, named RiboMIMO, for modeling the ribosome density distributions of full-length mRNA CDS regions. Through considering the underlying correlations in translation efficiency among neighboring and remote codons and extracting hidden features from the input full-length coding sequence, RiboMIMO can greatly outperform the state-of-the-art baseline approaches and accurately predict the ribosome density distributions along the whole mRNA CDS regions. In addition, RiboMIMO explores the contributions of individual input codons to the predictions of output ribosome densities, which thus can help reveal important biological factors influencing the translation elongation process. The analyses, based on our interpretable metric named codon impact score, not only identified several patterns consistent with the previously-published literatures, but also for the first time (to the best of our knowledge) revealed that the codons located at a long distance from the ribosomal A site may also have an association on the translation elongation rate. This finding of long-range impact on translation elongation velocity may shed new light on the regulatory mechanisms of protein synthesis. Overall, these results indicated that RiboMIMO can provide a useful tool for studying the regulation of translation elongation in the range of full-length CDS.


Subject(s)
Computational Biology/methods , Deep Learning , Models, Genetic , Peptide Chain Elongation, Translational/genetics , Ribosomes , Codon/genetics , Codon/metabolism , Escherichia coli/genetics , RNA, Messenger/chemistry , RNA, Messenger/genetics , RNA, Messenger/metabolism , Ribosomes/genetics , Ribosomes/metabolism , Saccharomyces cerevisiae/genetics
12.
Nucleic Acids Res ; 49(7): 3719-3734, 2021 04 19.
Article in English | MEDLINE | ID: mdl-33744973

ABSTRACT

N6-methyladenosine (m6A) is the most pervasive modification in eukaryotic mRNAs. Numerous biological processes are regulated by this critical post-transcriptional mark, such as gene expression, RNA stability, RNA structure and translation. Recently, various experimental techniques and computational methods have been developed to characterize the transcriptome-wide landscapes of m6A modification for understanding its underlying mechanisms and functions in mRNA regulation. However, the experimental techniques are generally costly and time-consuming, while the existing computational models are usually designed only for m6A site prediction in a single-species and have significant limitations in accuracy, interpretability and generalizability. Here, we propose a highly interpretable computational framework, called MASS, based on a multi-task curriculum learning strategy to capture m6A features across multiple species simultaneously. Extensive computational experiments demonstrate the superior performances of MASS when compared to the state-of-the-art prediction methods. Furthermore, the contextual sequence features of m6A captured by MASS can be explained by the known critical binding motifs of the related RNA-binding proteins, which also help elucidate the similarity and difference among m6A features across species. In addition, based on the predicted m6A profiles, we further delineate the relationships between m6A and various properties of gene regulation, including gene expression, RNA stability, translation, RNA structure and histone modification. In summary, MASS may serve as a useful tool for characterizing m6A modification and studying its regulatory code. The source code of MASS can be downloaded from https://github.com/mlcb-thu/MASS.


Subject(s)
Adenosine/analogs & derivatives , Machine Learning , RNA/chemistry , Adenosine/chemistry , Animals , Databases, Genetic , Datasets as Topic , Gene Expression Regulation , Humans , RNA-Binding Proteins , Sequence Analysis, RNA , Software , Transcriptome
13.
Front Pharmacol ; 11: 112, 2020.
Article in English | MEDLINE | ID: mdl-32184722

ABSTRACT

Synthetic lethality (SL), an important type of genetic interaction, can provide useful insight into the target identification process for the development of anticancer therapeutics. Although several well-established SL gene pairs have been verified to be conserved in humans, most SL interactions remain cell-line specific. Here, we demonstrated that the cell-line-specific gene expression profiles derived from the shRNA perturbation experiments performed in the LINCS L1000 project can provide useful features for predicting SL interactions in human. In this paper, we developed a semi-supervised neural network-based method called EXP2SL to accurately identify SL interactions from the L1000 gene expression profiles. Through a systematic evaluation on the SL datasets of three different cell lines, we demonstrated that our model achieved better performance than the baseline methods and verified the effectiveness of using the L1000 gene expression features and the semi-supervise training technique in SL prediction.

SELECTION OF CITATIONS
SEARCH DETAIL
...