Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 8 de 8
Add more filters

Publication year range
ACS Omega ; 8(2): 2046-2056, 2023 Jan 17.
Article in English | MEDLINE | ID: mdl-36687099


Lipophilicity, as measured by the partition coefficient between octanol and water (log P), is a key parameter in early drug discovery research. However, measuring log P experimentally is difficult for specific compounds and log P ranges. The resulting lack of reliable experimental data impedes development of accurate in silico models for such compounds. In certain discovery projects at Novartis focused on such compounds, a quantum mechanics (QM)-based tool for log P estimation has emerged as a valuable supplement to experimental measurements and as a preferred alternative to existing empirical models. However, this QM-based approach incurs a substantial computational cost, limiting its applicability to small series and prohibiting quick, interactive ideation. This work explores a set of machine learning models (Random Forest, Lasso, XGBoost, Chemprop, and Chemprop3D) to learn calculated log P values on both a public data set and an in-house data set to obtain a computationally affordable, QM-based estimation of drug lipophilicity. The message-passing neural network model Chemprop emerged as the best performing model with mean absolute errors of 0.44 and 0.34 log units for scaffold split test sets of the public and in-house data sets, respectively. Analysis of learning curves suggests that a further decrease in the test set error can be achieved by increasing the training set size. While models directly trained on experimental data perform better at approximating experimentally determined log P values than models trained on calculated values, we discuss the potential advantages of using calculated log P values going beyond the limits of experimental quantitation. We analyze the impact of the data set splitting strategy and gain insights into model failure modes. Potential use cases for the presented models include pre-screening of large compound collections and prioritization of compounds for full QM calculations.

J Chem Phys ; 149(10): 104102, 2018 Sep 14.
Article in English | MEDLINE | ID: mdl-30219007


The PM6 implementation in the GAMESS program is extended to elements requiring d-integrals and interfaced with the conducter-like polarized continuum model of solvation, including gradients. The accuracy of aqueous solvation energies computed using AM1, PM3, PM6, and DFT tight binding (DFTB) and the Solvation Model Density (SMD) continuum solvation model is tested using the Minnesota Solvation Database data set. The errors in SMD solvation energies predicted using Neglect of Diatomic Differential Overlap (NDDO)-based methods are considerably larger than when using density functional theory (DFT) and HF, with root mean square error (RMSE) values of 3.4-5.9 (neutrals) and 6-15 kcal/mol (ions) compared to 2.4 and ∼5 kcal/mol for HF/6-31G(d). For the NDDO-based methods, the errors are especially large for cations and considerably higher than the corresponding conductor-like screening model results, which suggests that the NDDO/SMD results can be improved by re-parameterizing the SMD parameters focusing on ions. We found that the best results are obtained by changing only the radii for hydrogen, carbon, oxygen, nitrogen, and sulfur, and this leads to RMSE values for PM3 (neutrals: 2.8/ions: ∼5 kcal/mol), PM6 (4.7/∼5 kcal/mol), and DFTB (3.9/∼5 kcal/mol) that are more comparable to HF/6-31G(d) (2.4/∼5 kcal/mol). Although the radii are optimized to reproduce aqueous solvation energies, they also lead more accurate predictions for other polar solvents such as dimethyl sulfoxide, acetonitrile, and methanol, while the improvements for non-polar solvents are negligible.

Chem Sci ; 9(3): 660-665, 2018 Jan 21.
Article in English | MEDLINE | ID: mdl-29629133


While computational prediction of chemical reactivity is possible it usually requires expert knowledge and there are relatively few computational tools that can be used by a bench chemist to help guide synthesis. The RegioSQM method for predicting the regioselectivity of electrophilic aromatic substitution reactions of heteroaromatic systems is presented in this paper. RegioSQM protonates all aromatic C-H carbon atoms and identifies those with the lowest free energies in chloroform using the PM3 semiempirical method as the most nucleophilic center. These positions are found to correlate qualitatively with the regiochemical outcome in a retrospective analysis of 96% of more than 525 literature examples of electrophilic aromatic halogenation reactions. The method is automated and requires only a SMILES string of the molecule of interest, which can easily be generated using chemical drawing programs such as ChemDraw. The computational cost is 1-10 minutes per molecule depending on size, using relatively modest computational resources and the method is freely available via a web server at ; RegioSQM should therefore be of practical use in the planning of organic synthesis.

ACS Omega ; 3(4): 4372-4377, 2018 Apr 30.
Article in English | MEDLINE | ID: mdl-31458662


The connectivity-based hierarchy (CBH) protocol for computing accurate reaction enthalpies developed by Sengupta and Raghavachari is tested for fast ab initio methods (PBEh-3c, HF-3c, and HF/STO-3G), tight-binding density functional theory (DFT) methods (GFN-xTB, DFTB, and DFTB-D3), and neglect-of-diatomic-differential-overlap (NDDO)-based semiempirical methods (AM1, PM3, PM6, PM6-DH+, PM6-D2, PM6-D3H+, PM6-D3H4X, PM7, and OM2) using the same set of 25 reactions as in the original study. For the CBH-2 scheme, which reflects the change in the immediate chemical environment of all of the heavy atoms, the respective mean unsigned error relative to G4 for PBEh-3c, HF-3c, HF/STO-3G, GFN-xTB, DFTB-D3, DFTB, PM3, AM1, PM6, PM6-DH+, PM6-D3, PM6-D3H+, PM6-D3H4X, PM7, and OM2 are 1.9, 2.4, 3.0, 3.9, 3.7, 4.5, 4.8, 5.5, 5.4, 5.3, 5,4, 6.5, 5.3, 5.2, and 5.9 kcal/mol, with a single outlier removed for HF-3c, PM6, PM6-DH+, PM6-D3, PM6-D3H4X, and PM7. The increase in accuracy for the NDDO-based methods is relatively modest due to the random errors in predicted heats for formation.

J Chem Phys ; 147(16): 161704, 2017 Oct 28.
Article in English | MEDLINE | ID: mdl-29096452


To facilitate further development of approximate quantum mechanical methods for condensed phase applications, we present a new benchmark dataset of intermolecular interaction energies in the solution phase for a set of 15 dimers, each containing one charged monomer. The reference interaction energy in solution is computed via a thermodynamic cycle that integrates dimer binding energy in the gas phase at the coupled cluster level and solute-solvent interaction with density functional theory; the estimated uncertainty of such calculated interaction energy is ±1.5 kcal/mol. The dataset is used to benchmark the performance of a set of semi-empirical quantum mechanical (SQM) methods that include DFTB3-D3, DFTB3/CPE-D3, OM2-D3, PM6-D3, PM6-D3H+, and PM7 as well as the HF-3c method. We find that while all tested SQM methods tend to underestimate binding energies in the gas phase with a root-mean-squared error (RMSE) of 2-5 kcal/mol, they overestimate binding energies in the solution phase with an RMSE of 3-4 kcal/mol, with the exception of DFTB3/CPE-D3 and OM2-D3, for which the systematic deviation is less pronounced. In addition, we find that HF-3c systematically overestimates binding energies in both gas and solution phases. As most approximate QM methods are parametrized and evaluated using data measured or calculated in the gas phase, the dataset represents an important first step toward calibrating QM based methods for application in the condensed phase where polarization and exchange repulsion need to be treated in a balanced fashion.

PeerJ ; 4: e2335, 2016.
Article in English | MEDLINE | ID: mdl-27602298


The PM6 semiempirical method and the dispersion and hydrogen bond-corrected PM6-D3H+ method are used together with the SMD and COSMO continuum solvation models to predict pKa values of pyridines, alcohols, phenols, benzoic acids, carboxylic acids, and phenols using isodesmic reactions and compared to published ab initio results. The pKa values of pyridines, alcohols, phenols, and benzoic acids considered in this study can generally be predicted with PM6 and ab initio methods to within the same overall accuracy, with average mean absolute differences (MADs) of 0.6-0.7 pH units. For carboxylic acids, the accuracy (0.7-1.0 pH units) is also comparable to ab initio results if a single outlier is removed. For primary, secondary, and tertiary amines the accuracy is, respectively, similar (0.5-0.6), slightly worse (0.5-1.0), and worse (1.0-2.5), provided that di- and tri-ethylamine are used as reference molecules for secondary and tertiary amines. When applied to a drug-like molecule where an empirical pKa predictor exhibits a large (4.9 pH unit) error, we find that the errors for PM6-based predictions are roughly the same in magnitude but opposite in sign. As a result, most of the PM6-based methods predict the correct protonation state at physiological pH, while the empirical predictor does not. The computational cost is around 2-5 min per conformer per core processor, making PM6-based pKa prediction computationally efficient enough to be used for high-throughput screening using on the order of 100 core processors.

PeerJ ; 4: e1994, 2016.
Article in English | MEDLINE | ID: mdl-27168993


We have collected computed barrier heights and reaction energies (and associated model structures) for five enzymes from studies published by Himo and co-workers. Using this data, obtained at the B3LYP/6- 311+G(2d,2p)[LANL2DZ]//B3LYP/6-31G(d,p) level of theory, we then benchmark PM6, PM7, PM7-TS, and DFTB3 and discuss the influence of system size, bulk solvation, and geometry re-optimization on the error. The mean absolute differences (MADs) observed for these five enzyme model systems are similar to those observed for PM6 and PM7 for smaller systems (10-15 kcal/mol), while DFTB results in a MAD that is significantly lower (6 kcal/mol). The MADs for PMx and DFTB3 are each dominated by large errors for a single system and if the system is disregarded the MADs fall to 4-5 kcal/mol. Overall, results for the condensed phase are neither more or less accurate relative to B3LYP than those in the gas phase. With the exception of PM7-TS, the MAD for small and large structural models are very similar, with a maximum deviation of 3 kcal/mol for PM6. Geometry optimization with PM6 shows that for one system this method predicts a different mechanism compared to B3LYP/6-31G(d,p). For the remaining systems, geometry optimization of the large structural model increases the MAD relative to single points, by 2.5 and 1.8 kcal/mol for barriers and reaction energies. For the small structural model, the corresponding MADs decrease by 0.4 and 1.2 kcal/mol, respectively. However, despite these small changes, significant changes in the structures are observed for some systems, such as proton transfer and hydrogen bonding rearrangements. The paper represents the first step in the process of creating a benchmark set of barriers computed for systems that are relatively large and representative of enzymatic reactions, a considerable challenge for any one research group but possible through a concerted effort by the community. We end by outlining steps needed to expand and improve the data set and how other researchers can contribute to the process.

PeerJ ; 2: e449, 2014.
Article in English | MEDLINE | ID: mdl-25024918


We present new dispersion and hydrogen bond corrections to the PM6 method, PM6-D3H+, and its implementation in the GAMESS program. The method combines the DFT-D3 dispersion correction by Grimme et al. with a modified version of the H+ hydrogen bond correction by Korth. Overall, the interaction energy of PM6-D3H+ is very similar to PM6-DH2 and PM6-DH+, with RMSD and MAD values within 0.02 kcal/mol of one another. The main difference is that the geometry optimizations of 88 complexes result in 82, 6, 0, and 0 geometries with 0, 1, 2, and 3 or more imaginary frequencies using PM6-D3H+ implemented in GAMESS, while the corresponding numbers for PM6-DH+ implemented in MOPAC are 54, 17, 15, and 2. The PM6-D3H+ method as implemented in GAMESS offers an attractive alternative to PM6-DH+ in MOPAC in cases where the LBFGS optimizer must be used and a vibrational analysis is needed, e.g., when computing vibrational free energies. While the GAMESS implementation is up to 10 times slower for geometry optimizations of proteins in bulk solvent, compared to MOPAC, it is sufficiently fast to make geometry optimizations of small proteins practically feasible.