Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 26
Filter
Add more filters










Publication year range
2.
J Cheminform ; 15(1): 20, 2023 Feb 11.
Article in English | MEDLINE | ID: mdl-36774523

ABSTRACT

Artificial Intelligence is revolutionizing many aspects of the pharmaceutical industry. Deep learning models are now routinely applied to guide drug discovery projects leading to faster and improved findings, but there are still many tasks with enormous unrealized potential. One such task is the reaction yield prediction. Every year more than one fifth of all synthesis attempts result in product yields which are either zero or too low. This equates to chemical and human resources being spent on activities which ultimately do not progress the programs, leading to a triple loss when accounting for the cost of opportunity in time wasted. In this work we pre-train a BERT model on more than 16 million reactions from 4 different data sources, and fine tune it to achieve an uncertainty calibrated global yield prediction model. This model is an improvement upon state of the art not just from the increase in pre-train data but also by introducing a new embedding layer which solves a few limitations of SMILES and enables integration of additional information such as equivalents and molecule role into the reaction encoding, the model is called BERT Enriched Embedding (BEE). The model is benchmarked on an open-source dataset against a state-of-the-art synthesis focused BERT showing a near 20-point improvement in r2 score. The model is fine-tuned and tested on an internal company data benchmark, and a prospective study shows that the application of the model can reduce the total number of negative reactions (yield under 5%) ran in Janssen by at least 34%. Lastly, we corroborate the previous results through experimental validation, by directly deploying the model in an on-going drug discovery project and showing that it can also be used successfully as a reagent recommender due to its fast inference speed and reliable confidence estimation, a critical feature for industry application.

3.
J Chem Inf Model ; 62(9): 2111-2120, 2022 05 09.
Article in English | MEDLINE | ID: mdl-35034452

ABSTRACT

Finding synthesis routes for molecules of interest is essential in the discovery of new drugs and materials. To find such routes, computer-assisted synthesis planning (CASP) methods are employed, which rely on a single-step model of chemical reactivity. In this study, we introduce a template-based single-step retrosynthesis model based on Modern Hopfield Networks, which learn an encoding of both molecules and reaction templates in order to predict the relevance of templates for a given molecule. The template representation allows generalization across different reactions and significantly improves the performance of template relevance prediction, especially for templates with few or zero training examples. With inference speed up to orders of magnitude faster than baseline methods, we improve or match the state-of-the-art performance for top-k exact match accuracy for k ≥ 3 in the retrosynthesis benchmark USPTO-50k. Code to reproduce the results is available at github.com/ml-jku/mhn-react.

4.
Sci Rep ; 10(1): 13262, 2020 08 06.
Article in English | MEDLINE | ID: mdl-32764586

ABSTRACT

Phenomic profiles are high-dimensional sets of readouts that can comprehensively capture the biological impact of chemical and genetic perturbations in cellular assay systems. Phenomic profiling of compound libraries can be used for compound target identification or mechanism of action (MoA) prediction and other applications in drug discovery. To devise an economical set of phenomic profiling assays, we assembled a library of 1,008 approved drugs and well-characterized tool compounds manually annotated to 218 unique MoAs, and we profiled each compound at four concentrations in live-cell, high-content imaging screens against a panel of 15 reporter cell lines, which expressed a diverse set of fluorescent organelle and pathway markers in three distinct cell lineages. For 41 of 83 testable MoAs, phenomic profiles accurately ranked the reference compounds (AUC-ROC ≥ 0.9). MoAs could be better resolved by screening compounds at multiple concentrations than by including replicates at a single concentration. Screening additional cell lineages and fluorescent markers increased the number of distinguishable MoAs but this effect quickly plateaued. There remains a substantial number of MoAs that were hard to distinguish from others under the current study's conditions. We discuss ways to close this gap, which will inform the design of future phenomic profiling efforts.


Subject(s)
Biological Products/pharmacology , Luminescent Proteins/genetics , Phenomics/methods , Small Molecule Libraries/pharmacology , A549 Cells , Cell Line , Drug Discovery , Gene Expression Regulation/drug effects , Hep G2 Cells , Humans , Luminescent Proteins/metabolism
5.
Nat Genet ; 51(7): 1082-1091, 2019 07.
Article in English | MEDLINE | ID: mdl-31253980

ABSTRACT

Most candidate drugs currently fail later-stage clinical trials, largely due to poor prediction of efficacy on early target selection1. Drug targets with genetic support are more likely to be therapeutically valid2,3, but the translational use of genome-scale data such as from genome-wide association studies for drug target discovery in complex diseases remains challenging4-6. Here, we show that integration of functional genomic and immune-related annotations, together with knowledge of network connectivity, maximizes the informativeness of genetics for target validation, defining the target prioritization landscape for 30 immune traits at the gene and pathway level. We demonstrate how our genetics-led drug target prioritization approach (the priority index) successfully identifies current therapeutics, predicts activity in high-throughput cellular screens (including L1000, CRISPR, mutagenesis and patient-derived cell assays), enables prioritization of under-explored targets and allows for determination of target-level trait relationships. The priority index is an open-access, scalable system accelerating early-stage drug target selection for immune-mediated disease.


Subject(s)
Arthritis, Rheumatoid/genetics , Drug Discovery , Gene Regulatory Networks , Genome, Human , Immunity, Innate/genetics , Quantitative Trait Loci , Selection, Genetic , Arthritis, Rheumatoid/drug therapy , Arthritis, Rheumatoid/immunology , Gene Expression Regulation , Genome-Wide Association Study , Humans , Polymorphism, Single Nucleotide
6.
Drug Discov Today Technol ; 32-33: 55-63, 2019 Dec.
Article in English | MEDLINE | ID: mdl-33386095

ABSTRACT

There has been a wave of generative models for molecules triggered by advances in the field of Deep Learning. These generative models are often used to optimize chemical compounds towards particular properties or a desired biological activity. The evaluation of generative models remains challenging and suggested performance metrics or scoring functions often do not cover all relevant aspects of drug design projects. In this work, we highlight some unintended failure modes in molecular generation and optimization and how these evade detection by current performance metrics.


Subject(s)
Drug Discovery , Models, Molecular , Humans
7.
Chem Sci ; 9(24): 5441-5451, 2018 Jun 28.
Article in English | MEDLINE | ID: mdl-30155234

ABSTRACT

Deep learning is currently the most successful machine learning technique in a wide range of application areas and has recently been applied successfully in drug discovery research to predict potential drug targets and to screen for active molecules. However, due to (1) the lack of large-scale studies, (2) the compound series bias that is characteristic of drug discovery datasets and (3) the hyperparameter selection bias that comes with the high number of potential deep learning architectures, it remains unclear whether deep learning can indeed outperform existing computational methods in drug discovery tasks. We therefore assessed the performance of several deep learning methods on a large-scale drug discovery dataset and compared the results with those of other machine learning and target prediction methods. To avoid potential biases from hyperparameter selection or compound series, we used a nested cluster-cross-validation strategy. We found (1) that deep learning methods significantly outperform all competing methods and (2) that the predictive performance of deep learning is in many cases comparable to that of tests performed in wet labs (i.e., in vitro assays).

8.
Cell Chem Biol ; 25(5): 611-618.e3, 2018 05 17.
Article in English | MEDLINE | ID: mdl-29503208

ABSTRACT

In both academia and the pharmaceutical industry, large-scale assays for drug discovery are expensive and often impractical, particularly for the increasingly important physiologically relevant model systems that require primary cells, organoids, whole organisms, or expensive or rare reagents. We hypothesized that data from a single high-throughput imaging assay can be repurposed to predict the biological activity of compounds in other assays, even those targeting alternate pathways or biological processes. Indeed, quantitative information extracted from a three-channel microscopy-based screen for glucocorticoid receptor translocation was able to predict assay-specific biological activity in two ongoing drug discovery projects. In these projects, repurposing increased hit rates by 50- to 250-fold over that of the initial project assays while increasing the chemical structure diversity of the hits. Our results suggest that data from high-content screens are a rich source of information that can be used to predict and replace customized biological assays.


Subject(s)
Drug Repositioning/methods , Image Processing, Computer-Assisted/methods , Machine Learning , Neural Networks, Computer , Antineoplastic Agents/pharmacology , Cell Line, Tumor , High-Throughput Screening Assays/methods , Humans , Neoplasms/drug therapy
10.
J Cheminform ; 9: 17, 2017.
Article in English | MEDLINE | ID: mdl-28316655

ABSTRACT

Chemogenomics data generally refers to the activity data of chemical compounds on an array of protein targets and represents an important source of information for building in silico target prediction models. The increasing volume of chemogenomics data offers exciting opportunities to build models based on Big Data. Preparing a high quality data set is a vital step in realizing this goal and this work aims to compile such a comprehensive chemogenomics dataset. This dataset comprises over 70 million SAR data points from publicly available databases (PubChem and ChEMBL) including structure, target information and activity annotations. Our aspiration is to create a useful chemogenomics resource reflecting industry-scale data not only for building predictive models of in silico polypharmacology and off-target effects but also for the validation of cheminformatics approaches in general.

11.
J Med Chem ; 58(9): 4029-38, 2015 May 14.
Article in English | MEDLINE | ID: mdl-25897791

ABSTRACT

A series of darunavir analogues featuring a substituted bis-THF ring as P2 ligand have been synthesized and evaluated. Very high affinity protease inhibitors (PIs) with an interesting activity on wild-type HIV and a panel of multi-PI resistant HIV-1 mutants containing clinically observed, primary mutations were identified using a cell-based assay. Crystal structure analysis was conducted on a number of PI analogues in complex with HIV-1 protease.


Subject(s)
Acetamides/chemistry , Furans/chemistry , HIV Protease Inhibitors/chemistry , HIV-1/drug effects , Sulfonamides/chemistry , Acetamides/chemical synthesis , Acetamides/pharmacology , Crystallography, X-Ray , Darunavir , Drug Resistance, Viral , Furans/chemical synthesis , Furans/pharmacology , HIV Protease Inhibitors/chemical synthesis , HIV Protease Inhibitors/pharmacology , HIV-1/enzymology , HIV-1/genetics , Ligands , Models, Molecular , Molecular Conformation , Mutation , Stereoisomerism , Structure-Activity Relationship , Sulfonamides/chemical synthesis , Sulfonamides/pharmacology
12.
J Cheminform ; 5(1): 41, 2013 Sep 23.
Article in English | MEDLINE | ID: mdl-24059694

ABSTRACT

BACKGROUND: While a large body of work exists on comparing and benchmarking of descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 different protein descriptor sets have been compared with respect to their behavior in perceiving similarities between amino acids. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI and BLOSUM, and a novel protein descriptor set termed ProtFP (4 variants). We investigate to which extent descriptor sets show collinear as well as orthogonal behavior via principal component analysis (PCA). RESULTS: In describing amino acid similarities, MSWHIM, T-scales and ST-scales show related behavior, as do the VHSE, FASGAI, and ProtFP (PCA3) descriptor sets. Conversely, the ProtFP (PCA5), ProtFP (PCA8), Z-Scales (Binned), and BLOSUM descriptor sets show behavior that is distinct from one another as well as both of the clusters above. Generally, the use of more principal components (>3 per amino acid, per descriptor) leads to a significant differences in the way amino acids are described, despite that the later principal components capture less variation per component of the original input data. CONCLUSION: In this work a comparison is provided of how similar (and differently) currently available amino acids descriptor sets behave when converting structure to property space. The results obtained enable molecular modelers to select suitable amino acid descriptor sets for structure-activity analyses, e.g. those showing complementary behavior.

13.
J Cheminform ; 5(1): 42, 2013 Sep 24.
Article in English | MEDLINE | ID: mdl-24059743

ABSTRACT

BACKGROUND: While a large body of work exists on comparing and benchmarking descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 amino acid descriptor sets have been benchmarked with respect to their ability of establishing bioactivity models. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI, BLOSUM, a novel protein descriptor set (termed ProtFP (4 variants)), and in addition we created and benchmarked three pairs of descriptor combinations. Prediction performance was evaluated in seven structure-activity benchmarks which comprise Angiotensin Converting Enzyme (ACE) dipeptidic inhibitor data, and three proteochemometric data sets, namely (1) GPCR ligands modeled against a GPCR panel, (2) enzyme inhibitors (NNRTIs) with associated bioactivities against a set of HIV enzyme mutants, and (3) enzyme inhibitors (PIs) with associated bioactivities on a large set of HIV enzyme mutants. RESULTS: The amino acid descriptor sets compared here show similar performance (<0.1 log units RMSE difference and <0.1 difference in MCC), while errors for individual proteins were in some cases found to be larger than those resulting from descriptor set differences ( > 0.3 log units RMSE difference and >0.7 difference in MCC). Combining different descriptor sets generally leads to better modeling performance than utilizing individual sets. The best performers were Z-scales (3) combined with ProtFP (Feature), or Z-Scales (3) combined with an average Z-Scale value for each target, while ProtFP (PCA8), ST-Scales, and ProtFP (Feature) rank last. CONCLUSIONS: While amino acid descriptor sets capture different aspects of amino acids their ability to be used for bioactivity modeling is still - on average - surprisingly similar. Still, combining sets describing complementary information consistently leads to small but consistent improvement in modeling performance (average MCC 0.01 better, average RMSE 0.01 log units lower). Finally, performance differences exist between the targets compared thereby underlining that choosing an appropriate descriptor set is of fundamental for bioactivity modeling, both from the ligand- as well as the protein side.

14.
PLoS Comput Biol ; 9(2): e1002899, 2013.
Article in English | MEDLINE | ID: mdl-23436985

ABSTRACT

Infection with HIV cannot currently be cured; however it can be controlled by combination treatment with multiple anti-retroviral drugs. Given different viral genotypes for virtually each individual patient, the question now arises which drug combination to use to achieve effective treatment. With the availability of viral genotypic data and clinical phenotypic data, it has become possible to create computational models able to predict an optimal treatment regimen for an individual patient. Current models are based only on sequence data derived from viral genotyping; chemical similarity of drugs is not considered. To explore the added value of chemical similarity inclusion we applied proteochemometric models, combining chemical and protein target properties in a single bioactivity model. Our dataset was a large scale clinical database of genotypic and phenotypic information (in total ca. 300,000 drug-mutant bioactivity data points, 4 (NNRTI), 8 (NRTI) or 9 (PI) drugs, and 10,700 (NNRTI) 10,500 (NRTI) or 27,000 (PI) mutants). Our models achieved a prediction error below 0.5 Log Fold Change. Moreover, when directly compared with previously published sequence data, derived models PCM performed better in resistance classification and prediction of Log Fold Change (0.76 log units versus 0.91). Furthermore, we were able to successfully confirm both known and identify previously unpublished, resistance-conferring mutations of HIV Reverse Transcriptase (e.g. K102Y, T216M) and HIV Protease (e.g. Q18N, N88G) from our dataset. Finally, we applied our models prospectively to the public HIV resistance database from Stanford University obtaining a correct resistance prediction rate of 84% on the full set (compared to 80% in previous work on a high quality subset). We conclude that proteochemometric models are able to accurately predict the phenotypic resistance based on genotypic data even for novel mutants and mixtures. Furthermore, we add an applicability domain to the prediction, informing the user about the reliability of predictions.


Subject(s)
Anti-HIV Agents/chemistry , Anti-HIV Agents/pharmacology , Computational Biology/methods , Drug Discovery/methods , HIV/drug effects , Models, Biological , Artificial Intelligence , Databases, Genetic , HIV/genetics , Mutation , Phenotype , Reproducibility of Results
15.
Bioorg Med Chem Lett ; 23(1): 310-7, 2013 Jan 01.
Article in English | MEDLINE | ID: mdl-23177258

ABSTRACT

The design and synthesis of novel HIV-1 protease inhibitors (PIs) (1-22), which display high potency against HIV-1 wild-type and multi-PI-resistant HIV-mutant clinical isolates, is described. Lead optimization was initiated from compound 1, a Phe-Phe hydroxyethylene peptidomimetic PI, and was directed towards the discovery of new PIs suitable for a long-acting (LA) injectable drug application. Introducing a heterocyclic 6-methoxy-3-pyridinyl or a 6-(dimethylamino)-3-pyridinyl moiety (R(3)) at the para-position of the P1' benzyl fragment generated compounds with antiviral potency in the low single digit nanomolar range. Halogenation or alkylation of the metabolic hot spots on the various aromatic rings resulted in PIs with high stability against degradation in human liver microsomes and low plasma clearance in rats. Replacing the chromanolamine moiety (R(1)) in the P2 protease binding site by a cyclopentanolamine or a cyclohexanolamine derivative provided a series of high clearance PIs (16-22) with EC(50)s on wild-type HIV-1 in the range of 0.8-1.8 nM. PIs 18 and 22, formulated as nanosuspensions, showed gradual but sustained and complete release from the injection site over two months in rats, and were therefore identified as interesting candidates for a LA injectable drug application for treating HIV/AIDS.


Subject(s)
Carbamates/chemical synthesis , Dipeptides/chemical synthesis , Drug Design , HIV Protease Inhibitors/chemical synthesis , HIV Protease/chemistry , HIV-1/enzymology , Pyridines/chemical synthesis , Alkylation , Animals , Carbamates/chemistry , Carbamates/pharmacokinetics , Dipeptides/chemistry , Dipeptides/pharmacokinetics , HIV Protease/metabolism , HIV Protease Inhibitors/chemistry , HIV Protease Inhibitors/pharmacokinetics , Half-Life , Halogenation , Humans , Microsomes, Liver/metabolism , Pyridines/chemistry , Pyridines/pharmacokinetics , Rats , Structure-Activity Relationship
16.
J Med Chem ; 55(16): 7010-20, 2012 Aug 23.
Article in English | MEDLINE | ID: mdl-22827545

ABSTRACT

The four subtypes of adenosine receptors form relevant drug targets in the treatment of, e.g., diabetes and Parkinson's disease. In the present study, we aimed at finding novel small molecule ligands for these receptors using virtual screening approaches based on proteochemometric (PCM) modeling. We combined bioactivity data from all human and rat receptors in order to widen available chemical space. After training and validating a proteochemometric model on this combined data set (Q(2) of 0.73, RMSE of 0.61), we virtually screened a vendor database of 100910 compounds. Of 54 compounds purchased, six novel high affinity adenosine receptor ligands were confirmed experimentally, one of which displayed an affinity of 7 nM on the human adenosine A(1) receptor. We conclude that the combination of rat and human data performs better than human data only. Furthermore, we conclude that proteochemometric modeling is an efficient method to quickly screen for novel bioactive compounds.


Subject(s)
Databases, Chemical , Models, Molecular , Receptors, Purinergic P1/chemistry , Animals , Artificial Intelligence , Binding Sites , CHO Cells , Computer Simulation , Cricetinae , Cricetulus , Humans , Ligands , Radioligand Assay , Rats , Receptor, Adenosine A1/chemistry , Receptor, Adenosine A1/metabolism , Receptor, Adenosine A2A/chemistry , Receptor, Adenosine A2A/metabolism , Receptor, Adenosine A2B/chemistry , Receptor, Adenosine A2B/metabolism , Receptor, Adenosine A3/chemistry , Receptor, Adenosine A3/metabolism , Receptors, Purinergic P1/metabolism , Structure-Activity Relationship
17.
PLoS One ; 6(11): e27518, 2011.
Article in English | MEDLINE | ID: mdl-22132107

ABSTRACT

In quite a few diseases, drug resistance due to target variability poses a serious problem in pharmacotherapy. This is certainly true for HIV, and hence, it is often unknown which drug is best to use or to develop against an individual HIV strain. In this work we applied 'proteochemometric' modeling of HIV Non-Nucleoside Reverse Transcriptase (NNRTI) inhibitors to support preclinical development by predicting compound performance on multiple mutants in the lead selection stage. Proteochemometric models are based on both small molecule and target properties and can thus capture multi-target activity relationships simultaneously, the targets in this case being a set of 14 HIV Reverse Transcriptase (RT) mutants. We validated our model by experimentally confirming model predictions for 317 untested compound-mutant pairs, with a prediction error comparable with assay variability (RMSE 0.62). Furthermore, dependent on the similarity of a new mutant to the training set, we could predict with high accuracy which compound will be most effective on a sequence with a previously unknown genotype. Hence, our models allow the evaluation of compound performance on untested sequences and the selection of the most promising leads for further preclinical research. The modeling concept is likely to be applicable also to other target families with genetic variability like other viruses or bacteria, or with similar orthologs like GPCRs.


Subject(s)
Drug Evaluation, Preclinical/methods , Models, Molecular , Proteomics/methods , Reverse Transcriptase Inhibitors/analysis , Reverse Transcriptase Inhibitors/chemistry , Amino Acid Sequence , Binding Sites , Databases as Topic , HIV Reverse Transcriptase/antagonists & inhibitors , HIV Reverse Transcriptase/chemistry , Humans , Ligands , Molecular Sequence Data , Mutation/genetics , Reproducibility of Results , Reverse Transcriptase Inhibitors/pharmacology
18.
ACS Med Chem Lett ; 2(6): 461-5, 2011 Jun 09.
Article in English | MEDLINE | ID: mdl-24900331

ABSTRACT

A series of darunavir analogues featuring a substituted bis-THF ring as P2 ligand have been synthesized and evaluated. High affinity protease inhibitors (PIs) with an interesting activity on wild-type HIV and a panel of multi-PI resistant HIV-1 mutants containing clinically observed, primary mutations were identified using a cell-based assay. A number of PIs have been synthesized that show equivalent and greater activity for HIV-1 mutant strains as compared to wild-type HIV-1. The activity on the purified enzyme was confirmed for a selection of analogues.

19.
Protein Sci ; 19(4): 742-52, 2010 Apr.
Article in English | MEDLINE | ID: mdl-20120021

ABSTRACT

In this work, we describe two novel approaches to utilize the dynamic structure information implicitly contained in large crystal structure data sets. The first approach visualizes both consistent as well as variable ligand-induced changes in ligand-bound compared with apo protein crystal structures. For this purpose, information was mined from B-factors and ligand-induced residue displacements in multiple crystal structures, minimizing experimental error and noise. With this approach, the mechanism of action of non-nucleoside reverse transcriptase inhibitors (NNRTIs) as an inseparable combination of distortion of protein dynamics and conformational changes of HIV-1 reverse transcriptase was corroborated (a combination of the previously proposed "molecular arthritis" and "distorted site" mechanisms). The second approach presented here uses "consensus structures" to map common binding features that are present in a set of structures of NNRTI-bound HIV-1 reverse transcriptase. Consensus structures are based on different levels of structural overlap of multiple crystal structures and are used to analyze protein-ligand interactions. The structures are shown to yield information about conserved hydrogen bonding interactions as well as binding-pocket flexibility, shape, and volume. From the consensus structures, a common wild type NNRTI binding pocket emerges. Furthermore, we were able to identify a conserved backbone hydrogen bond acceptor at P236 and a novel hydrophobic subpocket, which are not yet utilized by current drugs. Our methods introduced here reinterpret the atom information and make use of the data variability by using multiple structures, complementing classical 3D structural information of single structures.


Subject(s)
Crystallography, X-Ray , Data Mining/methods , HIV Reverse Transcriptase/chemistry , Proteins/chemistry , Reverse Transcriptase Inhibitors/chemistry , Binding Sites , Databases, Protein , Hydrogen Bonding , Ligands , Models, Molecular , Structure-Activity Relationship
20.
J Chem Inf Model ; 47(4): 1279-93, 2007.
Article in English | MEDLINE | ID: mdl-17511441

ABSTRACT

Chemoinformatics is a large scientific discipline that deals with the storage, organization, management, retrieval, analysis, dissemination, visualization, and use of chemical information. Chemoinformatics techniques are used extensively in drug discovery and development. Although many consider it a mature field, the advent of high-throughput experimental techniques and the need to analyze very large data sets have brought new life and challenges to it. Here, we review a selection of papers published in 2006 that caught our attention with regard to the novelty of the methodology that was presented. The field is seeing significant growth, which will be further catalyzed by the widespread availability of public databases to support the development and validation of new approaches.


Subject(s)
Informatics , Combinatorial Chemistry Techniques , Drug Industry , Genomics , Quantitative Structure-Activity Relationship
SELECTION OF CITATIONS
SEARCH DETAIL
...