Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 36
Filter
Add more filters










Publication year range
1.
Sci Data ; 11(1): 728, 2024 Jul 03.
Article in English | MEDLINE | ID: mdl-38961122

ABSTRACT

Liquid formulations are ubiquitous yet have lengthy product development cycles owing to the complex physical interactions between ingredients making it difficult to tune formulations to customer-defined property targets. Interpolative ML models can accelerate liquid formulations design but are typically trained on limited sets of ingredients and without any structural information, which limits their out-of-training predictive capacity. To address this challenge, we selected eighteen formulation ingredients covering a diverse chemical space to prepare an open experimental dataset for training ML models for rinse-off formulations development. The resulting design space has an over 50-fold increase in dimensionality compared to our previous work. Here, we present a dataset of 812 formulations, including 294 stable samples, which cover the entire design space, with phase stability, turbidity, and high-fidelity rheology measurements generated on our semi-automated, ML-driven liquid formulations workflow. Our dataset has the unique attribute of sample-specific uncertainty measurements to train predictive surrogate models.

2.
J Chem Inf Model ; 64(9): 3790-3798, 2024 May 13.
Article in English | MEDLINE | ID: mdl-38648077

ABSTRACT

Machine learning has the potential to provide tremendous value to life sciences by providing models that aid in the discovery of new molecules and reduce the time for new products to come to market. Chemical reactions play a significant role in these fields, but there is a lack of high-quality open-source chemical reaction data sets for training machine learning models. Herein, we present ORDerly, an open-source Python package for the customizable and reproducible preparation of reaction data stored in accordance with the increasingly popular Open Reaction Database (ORD) schema. We use ORDerly to clean United States patent data stored in ORD and generate data sets for forward prediction, retrosynthesis, as well as the first benchmark for reaction condition prediction. We train neural networks on data sets generated with ORDerly for condition prediction and show that data sets missing key cleaning steps can lead to silently overinflated performance metrics. Additionally, we train transformers for forward and retrosynthesis prediction and demonstrate how non-patent data can be used to evaluate model generalization. By providing a customizable open-source solution for cleaning and preparing large chemical reaction data, ORDerly is poised to push forward the boundaries of machine learning applications in chemistry.


Subject(s)
Benchmarking , Machine Learning , Neural Networks, Computer , Databases, Chemical
3.
ACS Omega ; 9(16): 18385-18399, 2024 Apr 23.
Article in English | MEDLINE | ID: mdl-38680356

ABSTRACT

Computer-aided synthesis planning (CASP) development of reaction routes requires an understanding of complete reaction structures. However, most reactions in the current databases are missing reaction coparticipants. Although reaction prediction and atom mapping tools can predict major reaction participants and trace atom rearrangements in reactions, they fail to identify the missing molecules to complete reactions. This is because these approaches are data-driven models trained on the current reaction databases, which comprise incomplete reactions. In this work, a workflow was developed to tackle the reaction completion challenge. This includes a heuristic-based method to identify balanced reactions from reaction databases and complete some imbalanced reactions by adding candidate molecules. A machine learning masked language model (MLM) was trained to learn from simplified molecular input line entry system (SMILES) sentences of these completed reactions. The model predicted missing molecules for the incomplete reactions, a workflow analogous to predicting missing words in sentences. The model is promising for the prediction of small- and middle-sized missing molecules in incomplete reaction records. The workflow combining both the heuristic and machine learning methods completed more than half of the entire reaction space.

4.
Nat Commun ; 15(1): 462, 2024 Jan 23.
Article in English | MEDLINE | ID: mdl-38263405

ABSTRACT

The ability to integrate resources and share knowledge across organisations empowers scientists to expedite the scientific discovery process. This is especially crucial in addressing emerging global challenges that require global solutions. In this work, we develop an architecture for distributed self-driving laboratories within The World Avatar project, which seeks to create an all-encompassing digital twin based on a dynamic knowledge graph. We employ ontologies to capture data and material flows in design-make-test-analyse cycles, utilising autonomous agents as executable knowledge components to carry out the experimentation workflow. Data provenance is recorded to ensure its findability, accessibility, interoperability, and reusability. We demonstrate the practical application of our framework by linking two robots in Cambridge and Singapore for a collaborative closed-loop optimisation for a pharmaceutically-relevant aldol condensation reaction in real-time. The knowledge graph autonomously evolves toward the scientist's research goals, with the two robots effectively generating a Pareto front for cost-yield optimisation in three days.

5.
ChemSusChem ; 17(3): e202300538, 2024 Feb 08.
Article in English | MEDLINE | ID: mdl-37792551

ABSTRACT

The shift towards sustainable feedstocks for platform chemicals requires new routes to access functional molecules that contain heteroatoms, but there are limited bio-derived feedstocks that lead to heteroatoms in platform chemicals. Combining renewable molecules of different origins could be a solution to optimize the use of atoms from renewable sources. However, the lack of retrosynthetic tools makes it challenging to examine the extensive reaction networks of various platform molecules focusing on multiple bio-based feedstocks. In this study, a protocol was developed to identify potential transformation pathways that allow for the use of feedstocks from different origins. By analyzing existing knowledge on chemical reactions in large databases, several promising synthetic routes were shortlisted, with the reaction of D-glucosamine and pyruvic acid being the most interesting to make pyrrole-2-carboxylic acid (PCA). The optimized synthetic conditions resulted in 50 % yield of PCA, with insights gained from temperature variant NMR studies. The use of substrates obtained from two different bio-feedstock bases, namely cellulose and chitin, allowed for the establishment of a PCA-based chemical space.

6.
iScience ; 26(10): 107834, 2023 Oct 20.
Article in English | MEDLINE | ID: mdl-37954138

ABSTRACT

We discovered that CO2 electroreduction strongly favors the conversion of the dominant isotope of carbon (12C) and discriminates against the less abundant, stable carbon 13C isotope. Both absorption of CO2 in the alkaline electrolyte and CO2 electrochemical reduction favor the lighter isotopologue. As a result, the stream of unreacted CO2 leaving the electrolyzer has an increased 13C content, and the depletion of 13C in the product is several times greater than that of photosynthesis. Using a natural abundance feed, we demonstrate enriching of the 13C fraction to ∼1.3% (i.e., +18%) in a single-pass reactor and propose a scalable and economically attractive process to yield isotopes of a commercial purity. Our finding opens pathways to both cheaper and less energy-intensive production of stable isotopes (13C, 15N) essential to the healthcare and chemistry research, and to an economically viable, disruptive application of electrolysis technologies developed in the context of sustainability transition.

7.
ACS Cent Sci ; 9(5): 957-968, 2023 May 24.
Article in English | MEDLINE | ID: mdl-37252348

ABSTRACT

Functionalization of C-H bonds is a key challenge in medicinal chemistry, particularly for fragment-based drug discovery (FBDD) where such transformations require execution in the presence of polar functionality necessary for protein binding. Recent work has shown the effectiveness of Bayesian optimization (BO) for the self-optimization of chemical reactions; however, in all previous cases these algorithmic procedures have started with no prior information about the reaction of interest. In this work, we explore the use of multitask Bayesian optimization (MTBO) in several in silico case studies by leveraging reaction data collected from historical optimization campaigns to accelerate the optimization of new reactions. This methodology was then translated to real-world, medicinal chemistry applications in the yield optimization of several pharmaceutical intermediates using an autonomous flow-based reactor platform. The use of the MTBO algorithm was shown to be successful in determining optimal conditions of unseen experimental C-H activation reactions with differing substrates, demonstrating an efficient optimization strategy with large potential cost reductions when compared to industry-standard process optimization techniques. Our findings highlight the effectiveness of the methodology as an enabling tool in medicinal chemistry workflows, representing a step-change in the utilization of data and machine learning with the goal of accelerated reaction optimization.

8.
Nat Methods ; 20(4): 569-579, 2023 04.
Article in English | MEDLINE | ID: mdl-36997816

ABSTRACT

The ability to quantify structural changes of the endoplasmic reticulum (ER) is crucial for understanding the structure and function of this organelle. However, the rapid movement and complex topology of ER networks make this challenging. Here, we construct a state-of-the-art semantic segmentation method that we call ERnet for the automatic classification of sheet and tubular ER domains inside individual cells. Data are skeletonized and represented by connectivity graphs, enabling precise and efficient quantification of network connectivity. ERnet generates metrics on topology and integrity of ER structures and quantifies structural change in response to genetic or metabolic manipulation. We validate ERnet using data obtained by various ER-imaging methods from different cell types as well as ground truth images of synthetic ER structures. ERnet can be deployed in an automatic high-throughput and unbiased fashion and identifies subtle changes in ER phenotypes that may inform on disease progression and response to therapy.


Subject(s)
Endoplasmic Reticulum , Semantics , Endoplasmic Reticulum/metabolism
9.
J Phys Chem A ; 127(11): 2628-2636, 2023 Mar 23.
Article in English | MEDLINE | ID: mdl-36916916

ABSTRACT

Computational reaction prediction has become a ubiquitous task in chemistry due to the potential value accurate predictions can bring to chemists. Boronic acids are widely used in industry; however, understanding how to avoid the protodeboronation side reaction remains a challenge. We have developed an algorithm for in silico prediction of the rate of protodeboronation of boronic acids. A general mechanistic model devised through kinetic studies of protodeboronation was found in the literature and forms the foundation on which the algorithm presented in this work is built. Protodeboronation proceeds through 7 distinct pathways, though for any particular boronic acid, only a subset of mechanistic pathways are active. The rate of each active mechanistic pathway is linearly correlated with its characteristic energy difference, which in turn can be determined using Density Functional Theory. We validated the algorithm using leave-one-out cross-validation on a data set of 50 boronic acids and made a further 50 rate predictions on academically and industrially important boronic acids out of sample. We believe this work will provide great assistance to chemists performing reactions that feature boronic acids, such as Suzuki-Miyaura and Chan-Evans-Lam couplings.

10.
Chem Rev ; 123(6): 3089-3126, 2023 Mar 22.
Article in English | MEDLINE | ID: mdl-36820880

ABSTRACT

From the start of a synthetic chemist's training, experiments are conducted based on recipes from textbooks and manuscripts that achieve clean reaction outcomes, allowing the scientist to develop practical skills and some chemical intuition. This procedure is often kept long into a researcher's career, as new recipes are developed based on similar reaction protocols, and intuition-guided deviations are conducted through learning from failed experiments. However, when attempting to understand chemical systems of interest, it has been shown that model-based, algorithm-based, and miniaturized high-throughput techniques outperform human chemical intuition and achieve reaction optimization in a much more time- and material-efficient manner; this is covered in detail in this paper. As many synthetic chemists are not exposed to these techniques in undergraduate teaching, this leads to a disproportionate number of scientists that wish to optimize their reactions but are unable to use these methodologies or are simply unaware of their existence. This review highlights the basics, and the cutting-edge, of modern chemical reaction optimization as well as its relation to process scale-up and can thereby serve as a reference for inspired scientists for each of these techniques, detailing several of their respective applications.

11.
JACS Au ; 2(2): 292-309, 2022 Feb 28.
Article in English | MEDLINE | ID: mdl-35252980

ABSTRACT

High-fidelity computer-aided experimentation is becoming more accessible with the development of computing power and artificial intelligence tools. The advancement of experimental hardware also empowers researchers to reach a level of accuracy that was not possible in the past. Marching toward the next generation of self-driving laboratories, the orchestration of both resources lies at the focal point of autonomous discovery in chemical science. To achieve such a goal, algorithmically accessible data representations and standardized communication protocols are indispensable. In this perspective, we recategorize the recently introduced approach based on Materials Acceleration Platforms into five functional components and discuss recent case studies that focus on the data representation and exchange scheme between different components. Emerging technologies for interoperable data representation and multi-agent systems are also discussed with their recent applications in chemical automation. We hypothesize that knowledge graph technology, orchestrating semantic web technologies and multi-agent systems, will be the driving force to bring data to knowledge, evolving our way of automating the laboratory.

12.
STAR Protoc ; 2(4): 100889, 2021 12 17.
Article in English | MEDLINE | ID: mdl-34723210

ABSTRACT

Recycling of waste CO2 to bulk chemicals has a tremendous potential for the decarbonization of the chemical industry. Quantitative analysis of the prospects of this technology is hindered by the lack of flexible techno-economic assessment (TEA) models that enable evaluation of the processing costs under different deployment scenarios. In this protocol, we explain how to convert literature data into metrics useful for evaluation of the emerging electrolysis technologies, derive TEA models, and illustrate their use with a CO2-to-ethylene example. For complete details on the use and execution of this protocol, please refer to Barecka et al. (2021a).


Subject(s)
Carbon Dioxide , Electrolysis , Recycling , Chemical Industry , Waste Management
13.
Chem Soc Rev ; 50(21): 12013-12036, 2021 Nov 01.
Article in English | MEDLINE | ID: mdl-34520507

ABSTRACT

This study highlights new opportunities for optimal reaction route selection from large chemical databases brought about by the rapid digitalisation of chemical data. The chemical industry requires a transformation towards more sustainable practices, eliminating its dependencies on fossil fuels and limiting its impact on the environment. However, identifying more sustainable process alternatives is, at present, a cumbersome, manual, iterative process, based on chemical intuition and modelling. We give a perspective on methods for automated discovery and assessment of competitive sustainable reaction routes based on renewable or waste feedstocks. Three key areas of transition are outlined and reviewed based on their state-of-the-art as well as bottlenecks: (i) data, (ii) evaluation metrics, and (iii) decision-making. We elucidate their synergies and interfaces since only together these areas can bring about the most benefit. The field of chemical data intelligence offers the opportunity to identify the inherently more sustainable reaction pathways and to identify opportunities for a circular chemical economy. Our review shows that at present the field of data brings about most bottlenecks, such as data completion and data linkage, but also offers the principal opportunity for advancement.

14.
iScience ; 24(6): 102514, 2021 Jun 25.
Article in English | MEDLINE | ID: mdl-34142030

ABSTRACT

The chemical industry needs to significantly decrease carbon dioxide (CO2) emissions in order to meet the 2050 carbon neutrality goal. Utilization of CO2 as a chemical feedstock for bulk products is a promising way to mitigate industrial emissions; however, CO2-based manufacturing is currently not competitive with the established petrochemical methods and its deployment requires creation of a new value chain. Here, we show that an alternative approach, using CO2 conversion as an add-on to existing manufactures, can disrupt the global carbon cycle while minimally perturbing the operation of chemical plants. Proposed closed-loop on-site CO2 recycling processes are economically viable in the current market and have the potential for rapid introduction in the industries. Retrofit-based CO2 recycling can reduce annually between 4 and 10 Gt CO2 by 2050 and contribute to achieving up to 50% of the industrial carbon neutrality goal.

15.
ACS Sustain Chem Eng ; 9(25): 8642-8652, 2021 Jun 28.
Article in English | MEDLINE | ID: mdl-35024250

ABSTRACT

An efficient elevated-pressure catalytic oxidative process (2.5 mol % Co(NO3)2, 2.5 mol % MnBr2, air (30 bar), 125 °C, acetic acid, 6 h) has been developed to oxidize p-cymene into crystalline white terephthalic acid (TA) in ∼70% yield. Use of this mixed Co2+/Mn2+ catalytic system is key to obtaining high 70% yields of TA at relatively low reaction temperatures (125 °C) in short reaction times (6 h), which is likely to be due to the synergistic action of bromine and nitrate radicals in the oxidative process. Recycling studies have demonstrated that the mixed metal catalysts present in recovered mother liquors could be recycled three times in successive p-cymene oxidation reactions with no loss in catalytic activity or TA yield. Partial oxidation of p-cymene to give p-methylacetophenone (p-MA) in 55-60% yield can be achieved using a mixed CoBr2/Mn(OAc)2 catalytic system under 1 atm air for 24 h, while use of Co(NO3)2/MnBr2 under 1 atm O2 for 24 h gave p-toluic acid in 55-60% yield. Therefore, access to these simple catalytic aerobic conditions enables multiple biorenewable bulk terpene feedstocks (e.g., crude sulfate turpentine, turpentine, cineole, and limonene) to be converted into synthetically useful bio-p-MA, bio-p-toluic acid, and bio-TA (and hence bio-polyethylene terephthalate) as part of a terpene based biorefinery.

16.
Beilstein J Org Chem ; 16: 1465-1475, 2020.
Article in English | MEDLINE | ID: mdl-32647548

ABSTRACT

A computational approach has been developed to automatically generate and analyse the structures of the intermediates of palladium-catalysed carbon-hydrogen (C-H) activation reactions as well as to predict the final products. Implemented as a high-performance computing cluster tool, it has been shown to correctly choose the mechanism and rationalise regioselectivity of chosen examples from open literature reports. The developed methodology is capable of predicting reactivity of various substrates by differentiation between two major mechanisms - proton abstraction and electrophilic aromatic substitution. An attempt has been made to predict new C-H activation reactions. This methodology can also be used for the automated reaction planning, as well as a starting point for microkinetic modelling.

17.
Nat Commun ; 9(1): 4913, 2018 11 21.
Article in English | MEDLINE | ID: mdl-30464298

ABSTRACT

Formation mechanisms of two-dimensional nanostructures in wet syntheses are poorly understood. Even more enigmatic is the influence of hydrodynamic forces. Here we use liquid flow cell transmission electron microscopy to show that layered double hydroxide, as a model material, may form via the oriented attachment of hexagonal nanoparticles; under hydrodynamic shear, oriented attachment is accelerated. To hydrodynamically manipulate the kinetics of particle growth and oriented attachment, we develop a microreactor with high and tunable shear rates, enabling control over particle size, crystallinity and aspect ratio. This work offers new insights in the formation of two-dimensional materials, provides a scalable yet precise synthesis method, and proposes new avenues for the rational engineering and scalable production of highly anisotropic nanostructures.

18.
Science ; 360(6390): 707-708, 2018 05 18.
Article in English | MEDLINE | ID: mdl-29773731

Subject(s)
Renewable Energy
19.
Front Chem ; 6: 85, 2018.
Article in English | MEDLINE | ID: mdl-29637068

ABSTRACT

Bimetallic Pd-Au catalysts were prepared on the porous nanocrystalline silicon (PSi) for the first time. The catalysts were tested in the reaction of direct hydrogen peroxide synthesis and characterized by standard structural and chemical techniques. It was shown that the Pd-Au/PSi catalyst prepared from conventional H2[PdCl4] and H[AuCl4] precursors contains monometallic Pd and a range of different Pd-Au alloy nanoparticles over the oxidized PSi surface. The PdAu2/PSi catalyst prepared from the [Pd(NH3)4][AuCl4]2 double complex salt (DCS) single-source precursor predominantly contains bimetallic Pd-Au alloy nanoparticles. For both catalysts the surface of bimetallic nanoparticles is Pd-enriched and contains palladium in Pd0 and Pd2+ states. Among the catalysts studied, the PdAu2/PSi catalyst was the most active and selective in the direct H2O2 synthesis with H2O2 productivity of 0.5 [Formula: see text] at selectivity of 50% and H2O2 concentration of 0.023 M in 0.03 M H2SO4-methanol solution after 5 h on stream at -10°C and atmospheric pressure. This performance is due to high activity in the H2O2 synthesis reaction and low activities in the undesirable H2O2 decomposition and hydrogenation reactions. Good performance of the PdAu2/PSi catalyst was associated with the major part of Pd in the catalyst being in the form of the bimetallic Pd-Au nanoparticles. Porous silicon was concluded to be a promising catalytic support for direct hydrogen peroxide synthesis due to its inertness with respect to undesirable side reactions, high thermal stability, and conductivity, possibility of safe operation at high temperatures and pressures and a well-established manufacturing process.

20.
J Cheminform ; 9(1): 23, 2017 Apr 11.
Article in English | MEDLINE | ID: mdl-29086180

ABSTRACT

The algorithmic, large-scale use and analysis of reaction databases such as Reaxys is currently hindered by the absence of widely adopted standards for publishing reaction data in machine readable formats. Crucial data such as yields of all products or stoichiometry are frequently not explicitly stated in the published papers and, hence, not reported in the database entry for those reactions, limiting their usefulness for algorithmic analysis. This paper presents a possible extension to the IUPAC RInChI standard via an auxiliary layer, termed ProcAuxInfo, which is a standardised, extensible form in which to report certain key reaction parameters such as declaration of all products and reactants as well as auxiliaries known in the reaction, reaction stoichiometry, amounts of substances used, conversion, yield and operating conditions. The standard is demonstrated via creation of the RInChI including the ProcAuxInfo layer based on three published reactions and demonstrates accurate data recoverability via reverse translation of the created strings. Implementation of this or another method of reporting process data by the publishing community would ensure that databases, such as Reaxys, would be able to abstract crucial data for big data analysis of their contents.

SELECTION OF CITATIONS
SEARCH DETAIL
...