Search | VHL Regional Portal

1.

Machine learning from quantum chemistry to predict experimental solvent effects on reaction rates.

Chung, Yunsie; Green, William H.

Chem Sci ; 15(7): 2410-2424, 2024 Feb 14.

Article in English | MEDLINE | ID: mdl-38362410

ABSTRACT

Fast and accurate prediction of solvent effects on reaction rates are crucial for kinetic modeling, chemical process design, and high-throughput solvent screening. Despite the recent advance in machine learning, a scarcity of reliable data has hindered the development of predictive models that are generalizable for diverse reactions and solvents. In this work, we generate a large set of data with the COSMO-RS method for over 28 000 neutral reactions and 295 solvents and train a machine learning model to predict the solvation free energy and solvation enthalpy of activation (ΔΔGsolv, ΔΔHsolv) for a solution phase reaction. On unseen reactions, the model achieves mean absolute errors of 0.71 and 1.03 kcal mol-1 for ΔΔGsolv and ΔΔHsolv, respectively, relative to the COSMO-RS calculations. The model also provides reliable predictions of relative rate constants within a factor of 4 when tested on experimental data. The presented model can provide nearly instantaneous predictions of kinetic solvent effects or relative rate constants for a broad range of neutral closed-shell or free radical reactions and solvents only based on atom-mapped reaction SMILES and solvent SMILES strings.

2.

Chemprop: A Machine Learning Package for Chemical Property Prediction.

Heid, Esther; Greenman, Kevin P; Chung, Yunsie; Li, Shih-Cheng; Graff, David E; Vermeire, Florence H; Wu, Haoyang; Green, William H; McGill, Charles J.

J Chem Inf Model ; 64(1): 9-17, 2024 Jan 08.

Article in English | MEDLINE | ID: mdl-38147829

ABSTRACT

Deep learning has become a powerful and frequently employed tool for the prediction of molecular properties, thus creating a need for open-source and versatile software solutions that can be operated by nonexperts. Among the current approaches, directed message-passing neural networks (D-MPNNs) have proven to perform well on a variety of property prediction tasks. The software package Chemprop implements the D-MPNN architecture and offers simple, easy, and fast access to machine-learned molecular properties. Compared to its initial version, we present a multitude of new Chemprop functionalities such as the support of multimolecule properties, reactions, atom/bond-level properties, and spectra. Further, we incorporate various uncertainty quantification and calibration methods along with related metrics as well as pretraining and transfer learning workflows, improved hyperparameter optimization, and other customization options concerning loss functions or atom/bond features. We benchmark D-MPNN models trained using Chemprop with the new reaction, atom-level, and spectra functionality on a variety of property prediction data sets, including MoleculeNet and SAMPL, and observe state-of-the-art performance on the prediction of water-octanol partition coefficients, reaction barrier heights, atomic partial charges, and absorption spectra. Chemprop enables out-of-the-box training of D-MPNN models for a variety of problem settings in fast, user-friendly, and open-source software.

Subject(s)

Machine Learning , Software , Neural Networks, Computer , Chemical Phenomena , Water

3.

Predicting Critical Properties and Acentric Factors of Fluids Using Multitask Machine Learning.

Biswas, Sayandeep; Chung, Yunsie; Ramirez, Josephine; Wu, Haoyang; Green, William H.

J Chem Inf Model ; 63(15): 4574-4588, 2023 08 14.

Article in English | MEDLINE | ID: mdl-37487557

ABSTRACT

Knowledge of critical properties, such as critical temperature, pressure, density, as well as acentric factor, is essential to calculate thermo-physical properties of chemical compounds. Experiments to determine critical properties and acentric factors are expensive and time intensive; therefore, we developed a machine learning (ML) model that can predict these molecular properties given the SMILES representation of a chemical species. We explored directed message passing neural network (D-MPNN) and graph attention network as ML architecture choices. Additionally, we investigated featurization with additional atomic and molecular features, multitask training, and pretraining using estimated data to optimize model performance. Our final model utilizes a D-MPNN layer to learn the molecular representation and is supplemented by Abraham parameters. A multitask training scheme was used to train a single model to predict all the critical properties and acentric factors along with boiling point, melting point, enthalpy of vaporization, and enthalpy of fusion. The model was evaluated on both random and scaffold splits where it shows state-of-the-art accuracies. The extensive data set of critical properties and acentric factors contains 1144 chemical compounds and is made available in the public domain together with the source code that can be used for further exploration.

Subject(s)

Machine Learning , Neural Networks, Computer , Temperature , Transition Temperature

4.

Computing Kinetic Solvent Effects and Liquid Phase Rate Constants Using Quantum Chemistry and COSMO-RS Methods.

Chung, Yunsie; Green, William H.

J Phys Chem A ; 127(27): 5637-5651, 2023 Jul 13.

Article in English | MEDLINE | ID: mdl-37381077

ABSTRACT

Many industrially and environmentally relevant reactions occur in the liquid phase. An accurate prediction of the rate constants is needed to analyze the intricate kinetic mechanisms of condensed phase systems. Quantum chemistry and continuum solvation models are commonly used to compute liquid phase rate constants; yet, their exact computational errors remain largely unknown, and a consistent computational workflow has not been well established. In this study, the accuracies of various quantum chemical and COSMO-RS levels of theory are assessed for the predictions of liquid phase rate constants and kinetic solvent effects. The prediction is made by first obtaining gas phase rate constants and subsequently applying solvation corrections. The calculation errors are evaluated using the experimental data of 191 rate constants that comprise 15 neutral closed-shell or free radical reactions and 49 solvents. The ωB97XD/def2-TZVP level of theory combined with the COSMO-RS method at the BP-TZVP level is shown to achieve the best performance with a mean absolute error of 0.90 in log10(kliq). Relative rate constants are additionally compared to determine the errors associated with the solvation calculations alone. Very accurate predictions of relative rate constants are achieved at nearly all levels of theory with a mean absolute error of 0.27 in log10(ksolvent1/ksolvent2).

5.

RMG Database for Chemical Property Prediction.

Johnson, Matthew S; Dong, Xiaorui; Grinberg Dana, Alon; Chung, Yunsie; Farina, David; Gillis, Ryan J; Liu, Mengjie; Yee, Nathan W; Blondal, Katrin; Mazeau, Emily; Grambow, Colin A; Payne, A Mark; Spiekermann, Kevin A; Pang, Hao-Wei; Goldsmith, C Franklin; West, Richard H; Green, William H.

J Chem Inf Model ; 62(20): 4906-4915, 2022 10 24.

Article in English | MEDLINE | ID: mdl-36222558

ABSTRACT

The Reaction Mechanism Generator (RMG) database for chemical property prediction is presented. The RMG database consists of curated datasets and estimators for accurately predicting the parameters necessary for constructing a wide variety of chemical kinetic mechanisms. These datasets and estimators are mostly published and enable prediction of thermodynamics, kinetics, solvation effects, and transport properties. For thermochemistry prediction, the RMG database contains 45 libraries of thermochemical parameters with a combination of 4564 entries and a group additivity scheme with 9 types of corrections including radical, polycyclic, and surface absorption corrections with 1580 total curated groups and parameters for a graph convolutional neural network trained using transfer learning from a set of >130 000 DFT calculations to 10 000 high-quality values. Correction schemes for solvent-solute effects, important for thermochemistry in the liquid phase, are available. They include tabulated values for 195 pure solvents and 152 common solutes and a group additivity scheme for predicting the properties of arbitrary solutes. For kinetics estimation, the database contains 92 libraries of kinetic parameters containing a combined 21â¯000 reactions and contains rate rule schemes for 87 reaction classes trained on 8655 curated training reactions. Additional libraries and estimators are available for transport properties. All of this information is easily accessible through the graphical user interface at https://rmg.mit.edu. Bulk or on-the-fly use can be facilitated by interfacing directly with the RMG Python package which can be installed from Anaconda. The RMG database provides kineticists with easy access to estimates of the many parameters they need to model and analyze kinetic systems. This helps to speed up and facilitate kinetic analysis by enabling easy hypothesis testing on pathways, by providing parameters for model construction, and by providing checks on kinetic parameters from other sources.

Subject(s)

Models, Chemical , Kinetics , Thermodynamics , Databases, Factual , Solvents

6.

Predicting Solubility Limits of Organic Solutes for a Wide Range of Solvents and Temperatures.

Vermeire, Florence H; Chung, Yunsie; Green, William H.

J Am Chem Soc ; 144(24): 10785-10797, 2022 06 22.

Article in English | MEDLINE | ID: mdl-35687887

ABSTRACT

The solubility of organic molecules is crucial in organic synthesis and industrial chemistry; it is important in the design of many phase separation and purification units, and it controls the migration of many species into the environment. To decide which solvents and temperatures can be used in the design of new processes, trial and error is often used, as the choice is restricted by unknown solid solubility limits. Here, we present a fast and convenient computational method for estimating the solubility of solid neutral organic molecules in water and many organic solvents for a broad range of temperatures. The model is developed by combining fundamental thermodynamic equations with machine learning models for solvation free energy, solvation enthalpy, Abraham solute parameters, and aqueous solid solubility at 298 K. We provide free open-source and online tools for the prediction of solid solubility limits and a curated data collection (SolProp) that includes more than 5000 experimental solid solubility values for validation of the model. The model predictions are accurate for aqueous systems and for a huge range of organic solvents up to 550 K or higher. Methods to further improve solid solubility predictions by providing experimental data on the solute of interest in another solvent, or on the solute's sublimation enthalpy, are also presented.

Subject(s)

Water , Data Collection , Solubility , Solutions , Solvents/chemistry , Temperature , Thermodynamics , Water/chemistry

7.

Group Contribution and Machine Learning Approaches to Predict Abraham Solute Parameters, Solvation Free Energy, and Solvation Enthalpy.

Chung, Yunsie; Vermeire, Florence H; Wu, Haoyang; Walker, Pierre J; Abraham, Michael H; Green, William H.

J Chem Inf Model ; 62(3): 433-446, 2022 02 14.

Article in English | MEDLINE | ID: mdl-35044781

ABSTRACT

We present a group contribution method (SoluteGC) and a machine learning model (SoluteML) to predict the Abraham solute parameters, as well as a machine learning model (DirectML) to predict solvation free energy and enthalpy at 298 K. The proposed group contribution method uses atom-centered functional groups with corrections for ring and polycyclic strain while the machine learning models adopt a directed message passing neural network. The solute parameters predicted from SoluteGC and SoluteML are used to calculate solvation energy and enthalpy via linear free energy relationships. Extensive data sets containing 8366 solute parameters, 20,253 solvation free energies, and 6322 solvation enthalpies are compiled in this work to train the models. The three models are each evaluated on the same test sets using both random and substructure-based solute splits for solvation energy and enthalpy predictions. The results show that the DirectML model is superior to the SoluteML and SoluteGC models for both predictions and can provide accuracy comparable to that of advanced quantum chemistry methods. Yet, even though the DirectML model performs better in general, all three models are useful for various purposes. Uncertain predicted values can be identified by comparing the three models, and when the 3 models are combined together, they can provide even more accurate predictions than any one of them individually. Finally, we present our compiled solute parameter, solvation energy, and solvation enthalpy databases (SoluteDB, dGsolvDBx, dHsolvDB) and provide public access to our final prediction models through a simple web-based tool, software packages, and source code.

Subject(s)

Machine Learning , Neural Networks, Computer , Entropy , Solutions , Solvents , Thermodynamics

8.

Two-fluid wetting behavior of a hydrophobic silicon nanowire array.

Kim, Yongkwan; Chung, Yunsie; Tian, Ye; Carraro, Carlo; Maboudian, Roya.

Langmuir ; 30(44): 13330-7, 2014 Nov 11.

Article in English | MEDLINE | ID: mdl-25356959

ABSTRACT

The two-fluid wetting behavior of surfaces textured by an array of silicon nanowires is investigated systematically. The Si nanowire array is produced by a combination of colloidal patterning and metal-catalyzed etching, with control over its roughness depending upon the wire length. The nanowires are made hydrophobic and oleophobic by treatment with hydrocarbon and fluorinated self-assembled monolayers, respectively. Static, advancing, and receding contact angles are measured with water, hexadecane, and perfluorotripentylamine in both single-fluid (droplet on solid in an air environment) and two-fluid (droplet on solid in a liquid environment) configurations. The single-fluid measurements show wetting behavior similar to that expected by the Wenzel and Cassie-Baxter models, where the wetting or non-wetting behaviors are amplified with increasing roughness. The two-fluid systems on the rough surface exhibit more complex configurations because either the droplet or the environment fluid can penetrate the asperities depending upon the wettability of each fluid. It is observed that, when the Young contact angles are significantly increased or reduced from single-liquid to two-liquid systems, the effect of roughness is relatively minimal. However, when the Young contact angles are similar, roughness has almost identical influence on apparent contact angles in single- and two-liquid systems. The Wenzel and Cassie-Baxter models are modified to describe various two-fluid wetting states. In cases where metastable behavior is observed for the droplet, advancing and receding measurements are performed to suggest the equilibrium state of the droplet.

9.

Tuning micropillar tapering for optimal friction performance of thermoplastic gecko-inspired adhesive.

Kim, Yongkwan; Chung, Yunsie; Tsao, Angela; Maboudian, Roya.

ACS Appl Mater Interfaces ; 6(9): 6936-43, 2014 May 14.

Article in English | MEDLINE | ID: mdl-24761942

ABSTRACT

We present a fabrication method and friction testing of a gecko-inspired thermoplastic micropillar array with control over the tapering angle of the pillar sidewall. A combination of deep reactive ion etching of vertical silicon pillars and subsequent maskless chemical etching produces templates with various widths and degrees of taper, which are then replicated with low-density polyethylene. As the silicon pillars on the template are chemically etched in a bath consisting of hydrofluoric acid, nitric acid, and acetic acid (HNA), the pillars are progressively thinned, then shortened. The replicated polyethylene pillar arrays exhibit a corresponding increase in friction as the stiffness is reduced with thinning and then a decrease in friction as the stiffness is again increased. The dilution of the HNA bath in water influences the tapering angle of the silicon pillars. The friction of the replicated pillars is maximized for the taper angle that maximizes the contact area at the tip which in turn is influenced by the stiffness of the tapered pillars. To provide insights on how changes in microscale geometry and contact behavior may affect friction of the pillar array, the pillars are imaged by scanning electron microscopy after friction testing, and the observed deformation behavior from shearing is related to the magnitude of the macroscale friction values. It is shown that the tapering angle critically changes the pillar compliance and the available contact area. Simple finite element modeling calculations are performed to support that the observed deformation is consistent with what is expected from a mechanical analysis. We conclude that friction can be maximized via proper pillar tapering with low stiffness that still maintains enough contact area to ensure high adhesion.

Subject(s)

Adhesives , Friction , Animals , Finite Element Analysis , Lizards

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL