Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
J Chem Inf Model ; 64(1): 9-17, 2024 Jan 08.
Article in English | MEDLINE | ID: mdl-38147829

ABSTRACT

Deep learning has become a powerful and frequently employed tool for the prediction of molecular properties, thus creating a need for open-source and versatile software solutions that can be operated by nonexperts. Among the current approaches, directed message-passing neural networks (D-MPNNs) have proven to perform well on a variety of property prediction tasks. The software package Chemprop implements the D-MPNN architecture and offers simple, easy, and fast access to machine-learned molecular properties. Compared to its initial version, we present a multitude of new Chemprop functionalities such as the support of multimolecule properties, reactions, atom/bond-level properties, and spectra. Further, we incorporate various uncertainty quantification and calibration methods along with related metrics as well as pretraining and transfer learning workflows, improved hyperparameter optimization, and other customization options concerning loss functions or atom/bond features. We benchmark D-MPNN models trained using Chemprop with the new reaction, atom-level, and spectra functionality on a variety of property prediction data sets, including MoleculeNet and SAMPL, and observe state-of-the-art performance on the prediction of water-octanol partition coefficients, reaction barrier heights, atomic partial charges, and absorption spectra. Chemprop enables out-of-the-box training of D-MPNN models for a variety of problem settings in fast, user-friendly, and open-source software.


Subject(s)
Machine Learning , Software , Neural Networks, Computer , Chemical Phenomena , Water
2.
Science ; 382(6677): eadi1407, 2023 Dec 22.
Article in English | MEDLINE | ID: mdl-38127734

ABSTRACT

A closed-loop, autonomous molecular discovery platform driven by integrated machine learning tools was developed to accelerate the design of molecules with desired properties. We demonstrated two case studies on dye-like molecules, targeting absorption wavelength, lipophilicity, and photooxidative stability. In the first study, the platform experimentally realized 294 unreported molecules across three automatic iterations of molecular design-make-test-analyze cycles while exploring the structure-function space of four rarely reported scaffolds. In each iteration, the property prediction models that guided exploration learned the structure-property space of diverse scaffold derivatives, which were realized with multistep syntheses and a variety of reactions. The second study exploited property models trained on the explored chemical space and previously reported molecules to discover nine top-performing molecules within a lightly explored structure-property space.

3.
J Phys Chem B ; 127(47): 10151-10170, 2023 Nov 30.
Article in English | MEDLINE | ID: mdl-37966798

ABSTRACT

Predicting Gibbs free energy of solution is key to understanding the solvent effects on thermodynamics and reaction rates for kinetic modeling. Accurately computing solution free energies requires the enumeration and evaluation of relevant solute conformers in solution. However, even after generation of relevant conformers, determining their free energy of solution requires an expensive workflow consisting of several ab initio computational chemistry calculations. To help address this challenge, we generate a large data set of solution free energies for nearly 44,000 solutes with almost 9 million conformers calculated in 41 different solvents using density functional theory and COSMO-RS and quantify the impact of solute conformers on the solution free energy. We then train a message passing neural network to predict the relative solution free energies of a set of solute conformers, enabling the identification of a small subset of thermodynamically relevant conformers. The model offers substantial computational time savings with predictions usually substantially within 1 kcal/mol of the free energy of the solution calculated by using computational chemical methods.

4.
J Chem Inf Model ; 63(13): 4012-4029, 2023 07 10.
Article in English | MEDLINE | ID: mdl-37338239

ABSTRACT

Characterizing uncertainty in machine learning models has recently gained interest in the context of machine learning reliability, robustness, safety, and active learning. Here, we separate the total uncertainty into contributions from noise in the data (aleatoric) and shortcomings of the model (epistemic), further dividing epistemic uncertainty into model bias and variance contributions. We systematically address the influence of noise, model bias, and model variance in the context of chemical property predictions, where the diverse nature of target properties and the vast chemical chemical space give rise to many different distinct sources of prediction error. We demonstrate that different sources of error can each be significant in different contexts and must be individually addressed during model development. Through controlled experiments on data sets of molecular properties, we show important trends in model performance associated with the level of noise in the data set, size of the data set, model architecture, molecule representation, ensemble size, and data set splitting. In particular, we show that 1) noise in the test set can limit a model's observed performance when the actual performance is much better, 2) using size-extensive model aggregation structures is crucial for extensive property prediction, and 3) ensembling is a reliable tool for uncertainty quantification and improvement specifically for the contribution of model variance. We develop general guidelines on how to improve an underperforming model when falling into different uncertainty contexts.


Subject(s)
Machine Learning , Uncertainty , Reproducibility of Results
5.
Waste Manag ; 165: 108-118, 2023 Jun 15.
Article in English | MEDLINE | ID: mdl-37119685

ABSTRACT

Due to the complexity and diversity of polyolefinic plastic waste streams and the inherent non-selective nature of the pyrolysis chemistry, the chemical decomposition of plastic waste is still not fully understood. Accurate data of feedstock and products that also consider impurities is, in this context, quite scarce. Therefore this work focuses on the thermochemical recycling via pyrolysis of different virgin and contaminated waste-derived polyolefin feedstocks (i.e., low-density polyethylene (LDPE), polypropylene (PP) as main components), along with an investigation of the decomposition mechanisms based on the detailed composition of the pyrolysis oils. Crucial in this work is the detailed chemical analysis of the resulting pyrolysis oils by comprehensive two-dimensional gas chromatography (GC × GC) and ICP-OES, among others. Different feedstocks were pyrolyzed at a temperature range of 430-490 °C and at pressures between 0.1 and 2 bar in a continuous pilot-scale pyrolysis unit. At the lowest pressure, the pyrolysis oil yield of the studied polyolefins reached up to 95 wt%. The pyrolysis oil consists of primarily α-olefins (37-42 %) and n-paraffins (32-35 %) for LDPE pyrolysis, while isoolefins (mostly C9 and C15) and diolefins accounted for 84-91 % of the PP-based pyrolysis oils. The post-consumer waste feedstocks led to significantly less pyrolysis oil yields and more char formation compared to their virgin equivalents. It was found that plastic aging, polyvinyl chloride (PVC) (3 wt%), and metal contamination were the main causes of char formation during the pyrolysis of polyolefin waste (4.9 wt%).


Subject(s)
Polyethylene , Pyrolysis , Polyethylene/chemistry , Temperature , Plastics/chemistry , Polypropylenes/chemistry , Oils
6.
Faraday Discuss ; 238(0): 491-511, 2022 10 21.
Article in English | MEDLINE | ID: mdl-35781310

ABSTRACT

Renewable cracking feedstocks from plastic waste and the need for novel reactor designs related to electrification of steam crackers drives the development of accurate and fundamental kinetic models for this process, despite its large scale implementation for more than half a century. Pressure dependent kinetics have mostly been omitted in fundamental steam cracking models, while they are crucial in combustion models. Therefore, we have assessed the importance of pressure dependent kinetics for steam cracking via in-depth modelling and experimental studies. In particular we have studied the influence of considering fall-off on the product yields for ethane and propane steam cracking. A high-pressure limit fundamental kinetic model is generated, based on quantum chemical data and group additive values, and supplemented with literature values for pressure dependent kinetic parameters for ß-scission reactions and homolytic bond scissions of C2 and C3 species. Model simulations with high-pressure limit rate coefficients and pressure dependent kinetics are compared to new experimental measurements. Steam cracking experiments for pure ethane and propane feeds are performed on a tubular bench-scale reactor at 0.17 MPa and temperatures ranging from 1058 to 1178 K. All important product species are identified using a comprehensive GC × GC-FID/q-MS. For homolytic bond scissions, the inclusion of pressure dependent kinetics has a significant effect on the conversion profile for ethane steam cracking. On the other hand, pressure dependence of C2 ß-scissions significantly influences conversion and product species profiles for both ethane and propane steam cracking. The C3 ß-scissions pressure dependence has a negligible effect in ethane steam cracking, while for propane steam cracking the effect is non-negligible on the product species profiles.


Subject(s)
Propane , Steam , Kinetics , Ethane , Plastics
7.
J Am Chem Soc ; 144(24): 10785-10797, 2022 06 22.
Article in English | MEDLINE | ID: mdl-35687887

ABSTRACT

The solubility of organic molecules is crucial in organic synthesis and industrial chemistry; it is important in the design of many phase separation and purification units, and it controls the migration of many species into the environment. To decide which solvents and temperatures can be used in the design of new processes, trial and error is often used, as the choice is restricted by unknown solid solubility limits. Here, we present a fast and convenient computational method for estimating the solubility of solid neutral organic molecules in water and many organic solvents for a broad range of temperatures. The model is developed by combining fundamental thermodynamic equations with machine learning models for solvation free energy, solvation enthalpy, Abraham solute parameters, and aqueous solid solubility at 298 K. We provide free open-source and online tools for the prediction of solid solubility limits and a curated data collection (SolProp) that includes more than 5000 experimental solid solubility values for validation of the model. The model predictions are accurate for aqueous systems and for a huge range of organic solvents up to 550 K or higher. Methods to further improve solid solubility predictions by providing experimental data on the solute of interest in another solvent, or on the solute's sublimation enthalpy, are also presented.


Subject(s)
Water , Data Collection , Solubility , Solutions , Solvents/chemistry , Temperature , Thermodynamics , Water/chemistry
8.
J Chem Inf Model ; 62(3): 433-446, 2022 02 14.
Article in English | MEDLINE | ID: mdl-35044781

ABSTRACT

We present a group contribution method (SoluteGC) and a machine learning model (SoluteML) to predict the Abraham solute parameters, as well as a machine learning model (DirectML) to predict solvation free energy and enthalpy at 298 K. The proposed group contribution method uses atom-centered functional groups with corrections for ring and polycyclic strain while the machine learning models adopt a directed message passing neural network. The solute parameters predicted from SoluteGC and SoluteML are used to calculate solvation energy and enthalpy via linear free energy relationships. Extensive data sets containing 8366 solute parameters, 20,253 solvation free energies, and 6322 solvation enthalpies are compiled in this work to train the models. The three models are each evaluated on the same test sets using both random and substructure-based solute splits for solvation energy and enthalpy predictions. The results show that the DirectML model is superior to the SoluteML and SoluteGC models for both predictions and can provide accuracy comparable to that of advanced quantum chemistry methods. Yet, even though the DirectML model performs better in general, all three models are useful for various purposes. Uncertain predicted values can be identified by comparing the three models, and when the 3 models are combined together, they can provide even more accurate predictions than any one of them individually. Finally, we present our compiled solute parameter, solvation energy, and solvation enthalpy databases (SoluteDB, dGsolvDBx, dHsolvDB) and provide public access to our final prediction models through a simple web-based tool, software packages, and source code.


Subject(s)
Machine Learning , Neural Networks, Computer , Entropy , Solutions , Solvents , Thermodynamics
SELECTION OF CITATIONS
SEARCH DETAIL
...