Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 39
Filter
1.
J Comput Aided Mol Des ; 38(1): 7, 2024 Jan 31.
Article in English | MEDLINE | ID: mdl-38294570

ABSTRACT

An important aspect in the development of small molecules as drugs or agrochemicals is their systemic availability after intravenous and oral administration. The prediction of the systemic availability from the chemical structure of a potential candidate is highly desirable, as it allows to focus the drug or agrochemical development on compounds with a favorable kinetic profile. However, such predictions are challenging as the availability is the result of the complex interplay between molecular properties, biology and physiology and training data is rare. In this work we improve the hybrid model developed earlier (Schneckener in J Chem Inf Model 59:4893-4905, 2019). We reduce the median fold change error for the total oral exposure from 2.85 to 2.35 and for intravenous administration from 1.95 to 1.62. This is achieved by training on a larger data set, improving the neural network architecture as well as the parametrization of mechanistic model. Further, we extend our approach to predict additional endpoints and to handle different covariates, like sex and dosage form. In contrast to a pure machine learning model, our model is able to predict new end points on which it has not been trained. We demonstrate this feature by predicting the exposure over the first 24 h, while the model has only been trained on the total exposure.


Subject(s)
Machine Learning , Neural Networks, Computer , Animals , Rats , Kinetics
2.
J Chem Inf Model ; 64(7): 2331-2344, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-37642660

ABSTRACT

Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.


Subject(s)
Benchmarking , Quantitative Structure-Activity Relationship , Biological Assay , Machine Learning
3.
J Comput Aided Mol Des ; 37(12): 765-789, 2023 12.
Article in English | MEDLINE | ID: mdl-37878216

ABSTRACT

In this study, we use machine learning algorithms with QM-derived COSMO-RS descriptors, along with Morgan fingerprints, to predict the absolute solubility of drug-like compounds. The QM-derived descriptors account for the molecular properties of the solute, i.e., the solute-solute interactions in an artificial-liquid-state (super-cooled liquid), and the solute-solvent interactions in solution. We employ two main approaches to predict solubility: (i) a hypothetical pathway that involves melting the solute at room temperature T = T¯ ([Formula: see text]) and mixing the artificially liquid solute into the solvent ([Formula: see text]). In this approach [Formula: see text] is predicted using machine learning models, and the [Formula: see text] is obtained from COSMO-RS calculations; (ii) direct solubility prediction using machine learning algorithms. The models were trained on a large number of Bayer in-house compounds for which water solubility data is available at physiological pH of 6.5 and ambient temperature. We also evaluated our models using external datasets from a solubility challenge. Our models present great improvements compared to the absolute solubility prediction with the QSAR model for the artificial liquid state as implemented in the COSMOtherm software, for both in-house and external datasets. We are furthermore able to demonstrate the superiority of QM-derived descriptors compared to cheminformatics descriptors. We finally present low-cost alternative models using fragment-based COSMOquick calculations with only marginal reduction in the quality of predicted solubility.


Subject(s)
Models, Chemical , Water , Solubility , Water/chemistry , Machine Learning , Solvents/chemistry
4.
J Comput Aided Mol Des ; 37(3): 129-145, 2023 03.
Article in English | MEDLINE | ID: mdl-36797399

ABSTRACT

Aqueous solubility is the most important physicochemical property for agrochemical and drug candidates and a prerequisite for uptake, distribution, transport, and finally the bioavailability in living species. We here present the first-ever direct machine learning models for pH-dependent solubility in water. For this, we combined almost 300000 data points from 11 solubility assays performed over 24 years and over one million data points from lipophilicity and melting point experiments. Data were split into three pH-classes - acidic, neutral and basic - , representing the conditions of stomach and intestinal tract for animals and humans, and phloem and xylem for plants. We find that multi-task neural networks using ECFP-6 fingerprints outperform baseline random forests and single-task neural networks on the individual tasks. Our final model with three solubility tasks using the pH-class combined data from different assays and five helper tasks results in root mean square errors of 0.56 log units overall (acidic 0.61; neutral 0.52; basic 0.54) and Spearman rank correlations of 0.83 (acidic 0.78; neutral 0.86; basic 0.86), making it a valuable tool for profiling of compounds in pharmaceutical and agrochemical research. The model allows for the prediction of compound pH profiles with mean and median RMSE per molecule of 0.62 and 0.56 log units.


Subject(s)
Neural Networks, Computer , Water , Humans , Animals , Solubility , Water/chemistry , Machine Learning , Hydrogen-Ion Concentration , Pharmaceutical Preparations
5.
ACS Omega ; 8(6): 5901-5916, 2023 Feb 14.
Article in English | MEDLINE | ID: mdl-36816707

ABSTRACT

Approaches for predicting proteolysis targeting chimera (PROTAC) cell permeability are of major interest to reduce resource-demanding synthesis and testing of low-permeable PROTACs. We report a comprehensive investigation of the scope and limitations of machine learning-based binary classification models developed using 17 simple descriptors for large and structurally diverse sets of cereblon (CRBN) and von Hippel-Lindau (VHL) PROTACs. For the VHL PROTAC set, kappa nearest neighbor and random forest models performed best and predicted the permeability of a blinded test set with >80% accuracy (k ≥ 0.57). Models retrained by combining the original training and the blinded test set performed equally well for a second blinded VHL set. However, models for CRBN PROTACs were less successful, mainly due to the imbalanced nature of the CRBN datasets. All descriptors contributed to the models, but size and lipophilicity were the most important. We conclude that properly trained machine learning models can be integrated as effective filters in the PROTAC design process.

6.
ACS Omega ; 7(49): 45617-45623, 2022 Dec 13.
Article in English | MEDLINE | ID: mdl-36530278

ABSTRACT

We present a quantum chemistry (QM)-based method that computes the relative energies of intermediates in the Heck reaction that relate to the regioselective reaction outcome: branched (α), linear (ß), or a mix of the two. The calculations are done for two different reaction pathways (neutral and cationic) and are based on r 2SCAN-3c single-point calculations on GFN2-xTB geometries that, in turn, derive from a GFNFF-xTB conformational search. The method is completely automated and is sufficiently efficient to allow for the calculation of thousands of reaction outcomes. The method can mostly reproduce systematic experimental studies where the ratios of regioisomers are carefully determined. For a larger dataset extracted from Reaxys, the results are somewhat worse with accuracies of 63% for ß-selectivity using the neutral pathway and 29% for α-selectivity using the cationic pathway. Our analysis of the dataset suggests that only the major or desired regioisomer is reported in the literature in many cases, which makes accurate comparisons difficult. The code is freely available on GitHub under the MIT open-source license: https://github.com/jensengroup/HeckQM.

7.
J Comput Aided Mol Des ; 36(11): 805-824, 2022 11.
Article in English | MEDLINE | ID: mdl-36319876

ABSTRACT

Accurate calculation of relative tautomer energies in different environments is a prerequisite to many parameters of relevance in drug discovery. This work provides a thorough benchmark of the semiempirical methods AM1, PM3 and GFN2-xTB, the force-field OPLS4, Hartree-Fock and HF-3c, the density functionals PBEh-3c, B97-3c, r2SCAN-3c, PBE, PBE0, TPSS, r2SCAN, ω-B97X-V, M06-2X, B3LYP, B2PLYP, and second-order perturbation theory MP2 versus the gold-standard coupled-cluster DLPNO-CCSD(T) using the def2-QZVPP basis set. The outperforming method identified is M06-2X, whereas r2SCAN-3c is the best-perfoming one in the set of cost-optimized methods. Application of the two methods on a challenging subset from the SAMPL2 challenge provides evidence that deviations from experiment are caused by deficiencies of current continuum solvation methods.


Subject(s)
Drug Discovery , Isomerism
8.
Methods Mol Biol ; 2390: 61-101, 2022.
Article in English | MEDLINE | ID: mdl-34731464

ABSTRACT

The well-known concept of quantitative structure-activity relationships (QSAR) has been gaining significant interest in the recent years. Data, descriptors, and algorithms are the main pillars to build useful models that support more efficient drug discovery processes with in silico methods. Significant advances in all three areas are the reason for the regained interest in these models. In this book chapter we review various machine learning (ML) approaches that make use of measured in vitro/in vivo data of many compounds. We put these in context with other digital drug discovery methods and present some application examples.


Subject(s)
Machine Learning , Algorithms , Drug Discovery , Quantitative Structure-Activity Relationship
9.
J Cheminform ; 13(1): 55, 2021 Jul 29.
Article in English | MEDLINE | ID: mdl-34325738

ABSTRACT

In this study we compare the three algorithms for the generation of conformer ensembles Biovia BEST, Schrödinger Prime macrocycle sampling (PMM) and Conformator (CONF) form the University of Hamburg, with ensembles derived for exhaustive molecular dynamics simulations applied to a dataset of 7 small macrocycles in two charge states and three solvents. Ensemble completeness is a prerequisite to allow for the selection of relevant diverse conformers for many applications in computational chemistry. We apply conformation maps using principal component analysis based on ring torsions. Our major finding critical for all applications of conformer ensembles in any computational study is that maps derived from MD with explicit solvent are significantly distinct between macrocycles, charge states and solvents, whereas the maps for post-optimized conformers using implicit solvent models from all generator algorithms are very similar independent of the solvent. We apply three metrics for the quantification of the relative covered ensemble space, namely cluster overlap, variance statistics, and a novel metric, Mahalanobis distance, showing that post-optimized MD ensembles cover a significantly larger conformational space than the generator ensembles, with the ranking PMM > BEST >> CONF. Furthermore, we find that the distributions of 3D polar surface areas are very similar for all macrocycles independent of charge state and solvent, except for the smaller and more strained compound 7, and that there is also no obvious correlation between 3D PSA and intramolecular hydrogen bond count distributions.

10.
J Cheminform ; 13(1): 10, 2021 Feb 12.
Article in English | MEDLINE | ID: mdl-33579374

ABSTRACT

We present RegioSQM20, a new version of RegioSQM (Chem Sci 9:660, 2018), which predicts the regioselectivities of electrophilic aromatic substitution (EAS) reactions from the calculation of proton affinities. The following improvements have been made: The open source semiempirical tight binding program xtb is used instead of the closed source MOPAC program. Any low energy tautomeric forms of the input molecule are identified and regioselectivity predictions are made for each form. Finally, RegioSQM20 offers a qualitative prediction of the reactivity of each tautomer (low, medium, or high) based on the reaction center with the highest proton affinity. The inclusion of tautomers increases the success rate from 90.7 to 92.7%. RegioSQM20 is compared to two machine learning based models: one developed by Struble et al. (React Chem Eng 5:896, 2020) specifically for regioselectivity predictions of EAS reactions (WLN) and a more generally applicable reactivity predictor (IBM RXN) developed by Schwaller et al. (ACS Cent Sci 5:1572, 2019). RegioSQM20 and WLN offers roughly the same success rates for the entire data sets (without considering tautomers), while WLN is many orders of magnitude faster. The accuracy of the more general IBM RXN approach is somewhat lower: 76.3-85.0%, depending on the data set. The code is freely available under the MIT open source license and will be made available as a webservice (regiosqm.org) in the near future.

12.
J Comput Aided Mol Des ; 35(4): 505-516, 2021 04.
Article in English | MEDLINE | ID: mdl-33094408

ABSTRACT

Selective progesterone receptor modulators are promising therapeutic options for the treatment of uterine fibroids. Vilaprisan, a new chemical entity that was discovered at Bayer is currently in clinical development. In this study we provide a combined experimental and quantum chemical approach providing the data that allowed to present hydroxyestradienone as an acceptable starting material for drug substance synthesis. Hydroxyestradienone has four stereogenic centers leading to 8 diastereomers and 16 enantiomers of which only six diastereomers were synthetically accessible but two not. A computational multistep protocol resulting in density functional P2PLYP-D3(BJ)/dev2-TZVPP Gibbs free energies and SMD solvation free energies led to a clear separation between the existing and the synthetically not accessible enantiomers, whereas multiple geometry-based and cheminformatic descriptors were not able to explain experimental findings.


Subject(s)
Estrenes/chemistry , Steroids/chemistry , Estrenes/chemical synthesis , Models, Molecular , Quantum Theory , Stereoisomerism , Steroids/chemical synthesis , Thermodynamics
13.
Drug Discov Today ; 25(9): 1702-1709, 2020 09.
Article in English | MEDLINE | ID: mdl-32652309

ABSTRACT

Over the past two decades, an in silico absorption, distribution, metabolism, and excretion (ADMET) platform has been created at Bayer Pharma with the goal to generate models for a variety of pharmacokinetic and physicochemical endpoints in early drug discovery. These tools are accessible to all scientists within the company and can be a useful in assisting with the selection and design of novel leads, as well as the process of lead optimization. Here. we discuss the development of machine-learning (ML) approaches with special emphasis on data, descriptors, and algorithms. We show that high company internal data quality and tailored descriptors, as well as a thorough understanding of the experimental endpoints, are essential to the utility of our models. We discuss the recent impact of deep neural networks and show selected application examples.


Subject(s)
Machine Learning , Pharmacokinetics , Animals , Computer Simulation , Humans , Intestinal Absorption , Models, Theoretical , Pharmaceutical Preparations/metabolism
14.
J Med Chem ; 63(13): 6774-6783, 2020 07 09.
Article in English | MEDLINE | ID: mdl-32453569

ABSTRACT

We herein report the first thorough analysis of the structure-permeability relationship of semipeptidic macrocycles. In total, 47 macrocycles were synthesized using a hybrid solid-phase/solution strategy, and then their passive and cellular permeability was assessed using the parallel artificial membrane permeability assay (PAMPA) and Caco-2 assay, respectively. The results indicate that semipeptidic macrocycles generally possess high passive permeability based on the PAMPA, yet their cellular permeability is governed by efflux, as reported in the Caco-2 assay. Structural variations led to tractable structure-permeability and structure-efflux relationships, wherein the linker length, stereoinversion, N-methylation, and peptoids site-specifically impact the permeability and efflux. Extensive nuclear magnetic resonance, molecular dynamics, and ensemble-based three-dimensional polar surface area (3D-PSA) studies showed that ensemble-based 3D-PSA is a good predictor of passive permeability.


Subject(s)
Macrocyclic Compounds/chemistry , Macrocyclic Compounds/metabolism , Peptides/chemistry , Caco-2 Cells , Humans , Membranes, Artificial , Permeability
15.
J Phys Chem B ; 124(18): 3636-3646, 2020 05 07.
Article in English | MEDLINE | ID: mdl-32275425

ABSTRACT

Special-purpose classical force fields (FFs) provide good accuracy at very low computational cost, but their application is limited to systems for which potential energy functions are available. This excludes most metal-containing proteins or those containing cofactors. In contrast, the GFN2-xTB semiempirical quantum chemical method is parametrized for almost the entire periodic table. The accuracy of GFN2-xTB is assessed for protein structures with respect to experimental X-ray data. Furthermore, the results are compared with those of two special-purpose FFs, HF-3c, PM6-D3H4X, and PM7. The test sets include proteins without any prosthetic groups as well as metalloproteins. Crystal packing effects are examined for a set of smaller proteins to validate the molecular approach. For the proteins without prosthetic groups, the special purpose FF OPLS-2005 yields the smallest overall RMSD to the X-ray data but GFN2-xTB provides similarly good structures with even better bond-length distributions. For the metalloproteins with up to 5000 atoms, a good overall structural agreement is obtained with GFN2-xTB. The full geometry optimizations of protein structures with on average 1000 atoms in wall-times below 1 day establishes the GFN2-xTB method as a versatile tool for the computational treatment of various biomolecules with a good accuracy/computational cost ratio.


Subject(s)
Metalloproteins , Peptides
16.
J Chem Inf Model ; 59(11): 4893-4905, 2019 11 25.
Article in English | MEDLINE | ID: mdl-31714067

ABSTRACT

Oral administration of drug products is a strict requirement in many medical indications. Therefore, bioavailability prediction models are of high importance for prioritization of compound candidates in the drug discovery process. However, oral exposure and bioavailability are difficult to predict, as they are the result of various highly complex factors and/or processes influenced by the physicochemical properties of a compound, such as solubility, lipophilicity, or charge state, as well as by interactions with the organism, for instance, metabolism or membrane permeation. In this study, we assess whether it is possible to predict intravenous (iv) or oral drug exposure and oral bioavailability in rats. As input parameters, we use (i) six experimentally determined in vitro and physicochemical endpoints, namely, membrane permeation, free fraction, metabolic stability, solubility, pKa value, and lipophilicity; (ii) the outputs of six in silico absorption, distribution, metabolism, and excretion models trained on the same endpoints, or (iii) the chemical structure encoded as fingerprints or simplified molecular input line entry system strings. The underlying data set for the models is an unprecedented collection of almost 1900 data points with high-quality in vivo experiments performed in rats. We find that drug exposure after iv administration can be predicted similarly well using hybrid models with in vitro- or in silico-predicted endpoints as inputs, with fold change errors (FCE) of 2.28 and 2.08, respectively. The FCEs for exposure after oral administration are higher, and here, the prediction from in vitro inputs performs significantly better in comparison to in silico-based models with FCEs of 3.49 and 2.40, respectively, most probably reflecting the higher complexity of oral bioavailability. Simplifying the prediction task to a binary alert for low oral bioavailability, based only on chemical structure, we achieve accuracy and precision close to 70%.


Subject(s)
Drug Discovery/methods , Hepatocytes/metabolism , Pharmaceutical Preparations/metabolism , Administration, Oral , Animals , Biological Availability , Caco-2 Cells , Computer Simulation , Humans , Machine Learning , Male , Models, Biological , Permeability , Pharmaceutical Preparations/chemistry , Rats , Rats, Wistar , Serum Albumin/metabolism , Solubility
17.
J Chem Inf Model ; 59(2): 668-672, 2019 02 25.
Article in English | MEDLINE | ID: mdl-30694664

ABSTRACT

Pharmaceutical products are often synthesized by the use of reactive starting materials and intermediates. These can, either as impurities or through metabolic activation, bind to the DNA. Primary aromatic amines belong to the critical classes that are considered potentially mutagenic in the Ames test, so there is a great need for good prediction models for risk assessment. How primary aromatic amines exert their mutagenic potential can be rationalized by the widely accepted nitrenium ion hypothesis of covalent binding to the DNA of reactive electrophiles formed out of the aromatic amines. Since the reactive chemical species is different in chemical structure from the actual compound, it is difficult to achieve good predictions via classical descriptor or fingerprint-based machine learning. In this approach, we use a combination of different molecular and atomic descriptors that is able to describe different mechanistic aspects of the metabolic transformation leading from the primary aromatic amine to the reactive metabolite that binds to the DNA. Applied to a test set, the combination shows significantly better performance than models that only use one of these descriptors and complemented the general internal Ames mutagenicity prediction model at Bayer.


Subject(s)
Amines/chemistry , Amines/toxicity , Cheminformatics/methods , Mutagenicity Tests , Mutagens/chemistry , Mutagens/toxicity , Models, Molecular , Molecular Conformation , Quantitative Structure-Activity Relationship
18.
Drug Discov Today Technol ; 32-33: 37-43, 2019 Dec.
Article in English | MEDLINE | ID: mdl-33386093

ABSTRACT

This review provides an overview of descriptions of atoms applied to the understanding of phenomena like chemical reactivity and selectivity, pKa values, Site of Metabolism prediction, or hydrogen bond strengths, but also the substitution of quantum mechanical calculations by machine learning models for energies, forces or even spectrosocopic properties and finally the fast calculation of atomic charges for force field parametrization. The descriptor space ranges from derivatives of the wavefunctions or electron density via quantum mechanics derived descriptors to classical descriptions of atoms and their embedding in a molecule. The common denominator for all approaches is the thorough understanding of the physics of the chemical problem that guided the design of the atom descriptor. Quantum mechanics (QM) and machine learning (ML) finally are converging to a new discipline, namely QM/ML.


Subject(s)
Drug Discovery , Machine Learning , Pharmaceutical Preparations/chemistry , Quantum Theory , Humans
19.
J Cheminform ; 11(1): 59, 2019 Sep 11.
Article in English | MEDLINE | ID: mdl-33430967

ABSTRACT

We present machine learning (ML) models for hydrogen bond acceptor (HBA) and hydrogen bond donor (HBD) strengths. Quantum chemical (QC) free energies in solution for 1:1 hydrogen-bonded complex formation to the reference molecules 4-fluorophenol and acetone serve as our target values. Our acceptor and donor databases are the largest on record with 4426 and 1036 data points, respectively. After scanning over radial atomic descriptors and ML methods, our final trained HBA and HBD ML models achieve RMSEs of 3.8 kJ mol-1 (acceptors), and 2.3 kJ mol-1 (donors) on experimental test sets, respectively. This performance is comparable with previous models that are trained on experimental hydrogen bonding free energies, indicating that molecular QC data can serve as substitute for experiment. The potential ramifications thereof could lead to a full replacement of wetlab chemistry for HBA/HBD strength determination by QC. As a possible chemical application of our ML models, we highlight our predicted HBA and HBD strengths as possible descriptors in two case studies on trends in intramolecular hydrogen bonding.

20.
Mol Inform ; 38(4): e1800115, 2019 04.
Article in English | MEDLINE | ID: mdl-30474291

ABSTRACT

We present two approaches for the computation of hydrogen bond acceptor strengths, one by machine-learning and one by a composite quantum-mechanical protocol, both based on the well-established pKBHX scale and dataset. The QM calculations after a necessary linear fit reproduce the complexation free energies in solution with an RMSE of 2.6 kJ mol-1 , not far off the expected error of 2 kJ mol-1 obtained from the comparison of experimental data from two different sources. The second approach is by Gaussian Process Regression (GPR) machine-learning. We describe the hydrogen bond acceptor atoms by a radial atomic reactivity descriptor that encodes their electronic and steric environment. The performance of the GPR model on an external test set corresponds to 3.3 kJ mol-1 , which is also close to the experimental error. We apply the GPR model built on experimental data to model the hydrogen bond acceptor strengths of a series of hydrogen bond acceptor sites of 10 phosphodiesterase 10 A inhibitors. The predicted values correlate well with the experimentally measured IC50 values.


Subject(s)
Machine Learning , Quantitative Structure-Activity Relationship , Databases, Chemical , Hydrogen Bonding , Inhibitory Concentration 50 , Linear Models , Normal Distribution , Phosphodiesterase Inhibitors/chemistry , Phosphodiesterase Inhibitors/pharmacology
SELECTION OF CITATIONS
SEARCH DETAIL
...