Search | VHL Regional Portal

1.

Improving chemical reaction yield prediction using pre-trained graph neural networks.

Han, Jongmin; Kwon, Youngchun; Choi, Youn-Suk; Kang, Seokho.

J Cheminform ; 16(1): 25, 2024 Mar 01.

Article in English | MEDLINE | ID: mdl-38429787

ABSTRACT

Graph neural networks (GNNs) have proven to be effective in the prediction of chemical reaction yields. However, their performance tends to deteriorate when they are trained using an insufficient training dataset in terms of quantity or diversity. A promising solution to alleviate this issue is to pre-train a GNN on a large-scale molecular database. In this study, we investigate the effectiveness of GNN pre-training in chemical reaction yield prediction. We present a novel GNN pre-training method for performance improvement.Given a molecular database consisting of a large number of molecules, we calculate molecular descriptors for each molecule and reduce the dimensionality of these descriptors by applying principal component analysis. We define a pre-text task by assigning a vector of principal component scores as the pseudo-label to each molecule in the database. A GNN is then pre-trained to perform the pre-text task of predicting the pseudo-label for the input molecule. For chemical reaction yield prediction, a prediction model is initialized using the pre-trained GNN and then fine-tuned with the training dataset containing chemical reactions and their yields. We demonstrate the effectiveness of the proposed method through experimental evaluation on benchmark datasets.

2.

Retention Time Prediction through Learning from a Small Training Data Set with a Pretrained Graph Neural Network.

Kwon, Youngchun; Kwon, Hyukju; Han, Jongmin; Kang, Myeonginn; Kim, Ji-Yeong; Shin, Dongyeeb; Choi, Youn-Suk; Kang, Seokho.

Anal Chem ; 95(47): 17273-17283, 2023 Nov 28.

Article in English | MEDLINE | ID: mdl-37955847

ABSTRACT

Graph neural networks (GNNs) have shown remarkable performance in predicting the retention time (RT) for small molecules. However, the training data set for a particular target chromatographic system tends to exhibit scarcity, which poses a challenge because the experimental process for measuring RT is costly. To address this challenge, transfer learning has been used to leverage an abundant training data set from a related source task. In this study, we present an improved transfer learning method to better predict the RT of molecules for a target chromatographic system by learning from a small training data set with a pretrained GNN. We use a graph isomorphism network as the architecture of the GNN. The GNN is pretrained on the METLIN-SMRT data set and is then fine-tuned on the target training data set for a fixed number of training iterations using the limited-memory Broyden-Fletcher-Goldfarb-Shanno optimizer with a learning rate decay. We demonstrate that the proposed method achieves superior predictive performance on various chromatographic systems compared with that of the existing transfer learning methods, especially when only a small training data set is available for use. A potential avenue for future research is to leverage multiple small training data sets from different chromatographic systems to further enhance the generalization performance.

3.

AI-driven robotic chemist for autonomous synthesis of organic molecules.

Ha, Taesin; Lee, Dongseon; Kwon, Youngchun; Park, Min Sik; Lee, Sangyoon; Jang, Jaejun; Choi, Byungkwon; Jeon, Hyunjeong; Kim, Jeonghun; Choi, Hyundo; Seo, Hyung-Tae; Choi, Wonje; Hong, Wooram; Park, Young Jin; Jang, Junwon; Cho, Joonkee; Kim, Bosung; Kwon, Hyukju; Kim, Gahee; Oh, Won Seok; Kim, Jin Woo; Choi, Joonhyuk; Min, Minsik; Jeon, Aram; Jung, Yongsik; Kim, Eunji; Lee, Hyosug; Choi, Youn-Suk.

Sci Adv ; 9(44): eadj0461, 2023 Nov 03.

Article in English | MEDLINE | ID: mdl-37910607

ABSTRACT

The automation of organic compound synthesis is pivotal for expediting the development of such compounds. In addition, enhancing development efficiency can be achieved by incorporating autonomous functions alongside automation. To achieve this, we developed an autonomous synthesis robot that harnesses the power of artificial intelligence (AI) and robotic technology to establish optimal synthetic recipes. Given a target molecule, our AI initially plans synthetic pathways and defines reaction conditions. It then iteratively refines these plans using feedback from the experimental robot, gradually optimizing the recipe. The system performance was validated by successfully determining synthetic recipes for three organic compounds, yielding that conversion rates that outperform existing references. Notably, this autonomous system is designed around batch reactors, making it accessible and valuable to chemists in standard laboratory settings, thereby streamlining research endeavors.

4.

Exploring Optimal Reaction Conditions Guided by Graph Neural Networks and Bayesian Optimization.

Kwon, Youngchun; Lee, Dongseon; Kim, Jin Woo; Choi, Youn-Suk; Kim, Sun.

ACS Omega ; 7(49): 44939-44950, 2022 Dec 13.

Article in English | MEDLINE | ID: mdl-36530311

ABSTRACT

The optimization of organic reaction conditions to obtain the target product in high yield is crucial to avoid expensive and time-consuming chemical experiments. Advancements in artificial intelligence have enabled various data-driven approaches to predict suitable chemical reaction conditions. However, for many novel syntheses, the process to determine good reaction conditions is inevitable. Bayesian optimization (BO), an iterative optimization algorithm, demonstrates exceptional performance to identify reagents compared to synthesis experts. However, BO requires several initial randomly selected experimental results (yields) to train a surrogate model (approximately 10 experimental trials). Parts of this process, such as the cold-start problem in recommender systems, are inefficient. Here, we present an efficient optimization algorithm to determine suitable conditions based on BO that is guided by a graph neural network (GNN) trained on a million organic synthesis experiment data. The proposed method determined 8.0 and 8.7% faster high-yield reaction conditions than state-of-the-art algorithms and 50 human experts, respectively. In 22 additional optimization tests, the proposed method needed 4.7 trials on average to find conditions higher than the yield of the conditions recommended by five synthesis experts. The proposed method is considered in a situation of having a reaction dataset for training GNN.

5.

Generative Modeling to Predict Multiple Suitable Conditions for Chemical Reactions.

Kwon, Youngchun; Kim, Sun; Choi, Youn-Suk; Kang, Seokho.

J Chem Inf Model ; 62(23): 5952-5960, 2022 Dec 12.

Article in English | MEDLINE | ID: mdl-36413480

ABSTRACT

In synthesis planning, it is important to determine suitable reaction conditions such that a chemical reaction proceeds as intended. Recent research attempts based on machine learning have proven to be effective in recommending reaction elements for specific categories regarding critical chemical context and operating conditions. However, existing methods can only make a single prediction per reaction and do not directly provide a complete specification of the reaction elements as the prediction. Therefore, their achievable performance is limited. In this study, we propose a generative modeling approach to predict multiple different reaction conditions for a chemical reaction, each of which fully specifies critical reaction elements such that these elements can be directly used as a feasible reaction condition. We formulate the problem of predicting reaction conditions as sampling from a generative distribution. We model the distribution by introducing a variational autoencoder augmented with a graph neural network and learn it from a reaction dataset. For a query reaction, multiple predictions can be obtained by repeated sampling from the distribution. Through experimental investigation on the reaction datasets of four major types of cross-coupling reactions, we demonstrate that the proposed method significantly outperforms existing methods in retrieving ground-truth reaction conditions.

Subject(s)

Machine Learning , Neural Networks, Computer

6.

Scalable graph neural network for NMR chemical shift prediction.

Han, Jongmin; Kang, Hyungu; Kang, Seokho; Kwon, Youngchun; Lee, Dongseon; Choi, Youn-Suk.

Phys Chem Chem Phys ; 24(43): 26870-26878, 2022 Nov 09.

Article in English | MEDLINE | ID: mdl-36317530

ABSTRACT

Graph neural networks (GNNs) have been proven effective in the fast and accurate prediction of nuclear magnetic resonance (NMR) chemical shifts of a molecule. Existing methods, despite their effectiveness, suffer from high space complexity and are therefore limited to relatively small molecules. In this work, we propose a scalable GNN for NMR chemical shift prediction. To reduce the space complexity, we sparsify the graph representation of a molecule by regarding only heavy atoms as nodes and their chemical bonds as edges. To better learn from the sparsified graph representation, we improve the message passing and readout functions in the GNN. For the message passing function, we adapt the attention mechanism and residual connection to better capture local information around each node. For the readout function, we use both node-level and graph-level embeddings as the local and global information to better predict node-level chemical shifts. Through the experimental investigation using 13C and 1H NMR datasets, we demonstrate that the proposed method yields higher prediction accuracy and is more scalable to large molecules having many heavy atoms.

Subject(s)

Magnetic Resonance Imaging , Neural Networks, Computer , Magnetic Resonance Spectroscopy

7.

Uncertainty-aware prediction of chemical reaction yields with graph neural networks.

Kwon, Youngchun; Lee, Dongseon; Choi, Youn-Suk; Kang, Seokho.

J Cheminform ; 14(1): 2, 2022 Jan 10.

Article in English | MEDLINE | ID: mdl-35012654

ABSTRACT

In this paper, we present a data-driven method for the uncertainty-aware prediction of chemical reaction yields. The reactants and products in a chemical reaction are represented as a set of molecular graphs. The predictive distribution of the yield is modeled as a graph neural network that directly processes a set of graphs with permutation invariance. Uncertainty-aware learning and inference are applied to the model to make accurate predictions and to evaluate their uncertainty. We demonstrate the effectiveness of the proposed method on benchmark datasets with various settings. Compared to the existing methods, the proposed method improves the prediction and uncertainty quantification performance in most settings.

8.

Data undersampling models for the efficient rule-based retrosynthetic planning.

Park, Min Sik; Lee, Dongseon; Kwon, Youngchun; Kim, Eunji; Choi, Youn-Suk.

Phys Chem Chem Phys ; 23(46): 26510-26518, 2021 Dec 01.

Article in English | MEDLINE | ID: mdl-34807202

ABSTRACT

Computer-aided retrosynthetic planning for organic molecules, which is based on a large synthetic database, is a significant part of the recent development of autonomous robotic chemists. As in other AI fields, however, the class imbalance problem in the dataset affects the prediction performance of retrosynthetic paths. Here, we demonstrate that applying undersampling models to the imbalanced reaction dataset can improve the prediction of retrosynthetic templates for target molecules. We report improvements in the top-1 and top-10 prediction accuracies by 13.8% (13.1, 5.4%) and 8.8% (6.9, 2.4%) for undersampling based on the similarity (random, dissimilarity) clustering of molecular structures of products, respectively. These results demonstrate the importance of deep understanding of the statistical distribution, internal structure, and sampling for the training dataset. For practical applications, the target-oriented undersampling model is proposed and confirmed by the improved prediction performance of 9.3 and 4.2% for the top-1 and top-10 accuracies, respectively.

9.

Molecular search by NMR spectrum based on evaluation of matching between spectrum and molecule.

Kwon, Youngchun; Lee, Dongseon; Choi, Youn-Suk; Kang, Seokho.

Sci Rep ; 11(1): 20998, 2021 Oct 25.

Article in English | MEDLINE | ID: mdl-34697368

ABSTRACT

Inferring molecular structures from experimentally measured nuclear magnetic resonance (NMR) spectra is an important task in many chemistry applications. Herein, we present a novel method implementing an automated molecular search by NMR spectrum. Given a query spectrum and a pool of candidate molecules, the matching score of each candidate molecule with respect to the query spectrum is evaluated by introducing a molecule-to-spectrum estimation procedure. The candidate molecule with the highest matching score is selected. This procedure does not require any prior knowledge of the corresponding molecular structure nor laborious manual efforts by chemists. We demonstrate the effectiveness of the proposed method on molecular search using 13C NMR spectra.

10.

Evolutionary design of molecules based on deep learning and a genetic algorithm.

Kwon, Youngchun; Kang, Seokho; Choi, Youn-Suk; Kim, Inkoo.

Sci Rep ; 11(1): 17304, 2021 08 27.

Article in English | MEDLINE | ID: mdl-34453086

ABSTRACT

Evolutionary design has gained significant attention as a useful tool to accelerate the design process by automatically modifying molecular structures to obtain molecules with the target properties. However, its methodology presents a practical challenge-devising a way in which to rapidly evolve molecules while maintaining their chemical validity. In this study, we address this limitation by developing an evolutionary design method. The method employs deep learning models to extract the inherent knowledge from a database of materials and is used to effectively guide the evolutionary design. In the proposed method, the Morgan fingerprint vectors of seed molecules are evolved using the techniques of mutation and crossover within the genetic algorithm. Then, a recurrent neural network is used to reconstruct the final fingerprints into actual molecular structures while maintaining their chemical validity. The use of deep neural network models to predict the properties of these molecules enabled more versatile and efficient molecular evaluations to be conducted by using the proposed method repeatedly. Four design tasks were performed to modify the light-absorbing wavelengths of organic molecules from the PubChem library.

11.

Valid, Plausible, and Diverse Retrosynthesis Using Tied Two-Way Transformers with Latent Variables.

Kim, Eunji; Lee, Dongseon; Kwon, Youngchun; Park, Min Sik; Choi, Youn-Suk.

J Chem Inf Model ; 61(1): 123-133, 2021 01 25.

Article in English | MEDLINE | ID: mdl-33410697

ABSTRACT

Retrosynthesis is an essential task in organic chemistry for identifying the synthesis pathways of newly discovered materials, and with the recent advances in deep learning, there have been growing attempts to solve the retrosynthesis problem through transformer models, which are the state-of-the-art in neural machine translation, by converting the problem into a machine translation problem. However, the pure transformer provides unsatisfactory results that lack grammatical validity, chemical plausibility, and diversity in reactant candidates. In this study, we develop tied two-way transformers with latent modeling to solve those problems using cycle consistency checks, parameter sharing, and multinomial latent variables. Experimental results obtained using public and in-house datasets demonstrate that the proposed model improves the retrosynthesis accuracy, grammatical error, and diversity, and qualitative evaluation results verify its ability to suggest valid and plausible results.

Subject(s)

Chemistry, Organic , Neural Networks, Computer

12.

Predictive Modeling of NMR Chemical Shifts without Using Atomic-Level Annotations.

Kang, Seokho; Kwon, Youngchun; Lee, Dongseon; Choi, Youn-Suk.

J Chem Inf Model ; 60(8): 3765-3769, 2020 08 24.

Article in English | MEDLINE | ID: mdl-32692561

ABSTRACT

Recently, machine learning has been successfully applied to the prediction of nuclear magnetic resonance (NMR) chemical shifts. To build a prediction model, the existing methods require a training data set that comprises molecules whose NMR-active atoms are annotated with their chemical shifts. However, the laborious task of atomic-level annotation must be manually conducted by chemists. Thus, it becomes difficult to perform large-scale annotation. To address this issue, we propose a weakly supervised learning method to enable the predictive modeling of NMR chemical shifts without requiring explicit atomic-level annotations in the training data set. For the training data set, the proposed method only requires the annotation of chemical shifts at the molecular level. As a prediction model, we build a message passing neural network (MPNN) that predicts the chemical shifts of individual NMR-active atoms in a molecule. Using a loss function that is invariant to the permutation of atoms in a molecule, the model is trained in a weakly supervised manner to minimize the molecular-level difference between a set of predicted chemical shifts and the corresponding set of actual chemical shifts across the training data set. Accordingly, during the training, the chemical shifts predicted by the model are approximately aligned with the actual chemical shifts in a data-driven fashion. The proposed method performs comparably to the existing fully supervised methods in terms of predicting the chemical shifts of 1H and 13C NMR spectra for small molecules.

Subject(s)

Magnetic Resonance Imaging , Neural Networks, Computer , Magnetic Resonance Spectroscopy

13.

Neural Message Passing for NMR Chemical Shift Prediction.

Kwon, Youngchun; Lee, Dongseon; Choi, Youn-Suk; Kang, Myeonginn; Kang, Seokho.

J Chem Inf Model ; 60(4): 2024-2030, 2020 04 27.

Article in English | MEDLINE | ID: mdl-32250618

ABSTRACT

Fast and accurate prediction of NMR spectra enables automatic structure validation and elucidation of molecules on a large scale. In this Article, we propose an improved method of learning from an NMR database to predict the chemical shifts of NMR-active atoms of a new molecule. For this purpose, we use a message passing neural network that operates on the graph representation of a molecule. The compactness and informativeness of the graph representation are enhanced by treating hydrogen atoms implicitly and incorporating various node and edge features. Experimental investigation demonstrates that the proposed method achieves higher prediction performance for the chemical shifts in the 1H NMR and 13C NMR spectra of small molecules. We apply this method to determine the correct molecular structure for a new NMR spectrum by searching from a set of candidate molecules.

Subject(s)

Magnetic Resonance Imaging , Neural Networks, Computer , Databases, Factual , Magnetic Resonance Spectroscopy , Molecular Structure

14.

Compressed graph representation for scalable molecular graph generation.

Kwon, Youngchun; Lee, Dongseon; Choi, Youn-Suk; Shin, Kyoham; Kang, Seokho.

J Cheminform ; 12(1): 58, 2020 Sep 23.

Article in English | MEDLINE | ID: mdl-33431050

ABSTRACT

Recently, deep learning has been successfully applied to molecular graph generation. Nevertheless, mitigating the computational complexity, which increases with the number of nodes in a graph, has been a major challenge. This has hindered the application of deep learning-based molecular graph generation to large molecules with many heavy atoms. In this study, we present a molecular graph compression method to alleviate the complexity while maintaining the capability of generating chemically valid and diverse molecular graphs. We designate six small substructural patterns that are prevalent between two atoms in real-world molecules. These relevant substructures in a molecular graph are then converted to edges by regarding them as additional edge features along with the bond types. This reduces the number of nodes significantly without any information loss. Consequently, a generative model can be constructed in a more efficient and scalable manner with large molecules on a compressed graph representation. We demonstrate the effectiveness of the proposed method for molecules with up to 88 heavy atoms using the GuacaMol benchmark.

15.

Efficient learning of non-autoregressive graph variational autoencoders for molecular graph generation.

Kwon, Youngchun; Yoo, Jiho; Choi, Youn-Suk; Son, Won-Joon; Lee, Dongseon; Kang, Seokho.

J Cheminform ; 11(1): 70, 2019 Nov 21.

Article in English | MEDLINE | ID: mdl-33430985

ABSTRACT

With the advancements in deep learning, deep generative models combined with graph neural networks have been successfully employed for data-driven molecular graph generation. Early methods based on the non-autoregressive approach have been effective in generating molecular graphs quickly and efficiently but have suffered from low performance. In this paper, we present an improved learning method involving a graph variational autoencoder for efficient molecular graph generation in a non-autoregressive manner. We introduce three additional learning objectives and incorporate them into the training of the model: approximate graph matching, reinforcement learning, and auxiliary property prediction. We demonstrate the effectiveness of the proposed method by evaluating it for molecular graph generation tasks using QM9 and ZINC datasets. The model generates molecular graphs with high chemical validity and diversity compared with existing non-autoregressive methods. It can also conditionally generate molecular graphs satisfying various target conditions.

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL