Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24
Filter
Add more filters










Publication year range
1.
J Chem Inf Model ; 64(8): 3173-3179, 2024 Apr 22.
Article in English | MEDLINE | ID: mdl-38554112

ABSTRACT

In this work, we propose a versatile molecule and reaction encoding binary data format that aims to bridge the gap between the advantages of SMILES, like local stereo- and implicit hydrogen encoding, and block-structured MDL MOL with a 2D layout and explicit bond encoding, while addressing their respective limitations. Our new format introduces a balance between size efficiency, processing speed, and comprehensive representation, making it well-suited for various applications in cheminformatics, including deep learning, data storage, and searching. By offering an explicit approach to store atom connectivity (including implicit hydrogens), electronic state, stereochemistry, and other crucial molecular attributes, our proposal seeks to enhance data storage efficiency and promote interoperability among different software tools.


Subject(s)
Cheminformatics , Software , Cheminformatics/methods , Molecular Structure
2.
J Org Chem ; 88(16): 11954-11967, 2023 Aug 18.
Article in English | MEDLINE | ID: mdl-37540578

ABSTRACT

The kinetic data indicate that the addition of tertiary phosphines to α-methylene lactones in acetic acid is strongly accelerated in comparison to the reactions of related open-chain esters. Six-membered α-methylene-δ-valerolactone exhibited a more pronounced rate increase than five-membered α-methylene-γ-butyrolactone. The use of α-methylene-γ-butyrolactam as a nitrogen analogue of α-methylene-γ-butyrolactone resulted in a total loss of the reaction acceleration. The observed reactivities were rationalized by DFT calculations at the RwB97XD/6-31+G(d,p) level of theory, showing that the intramolecular interaction between phosphonium and enolate oxygen centers provided by the locked s-cis-geometry of the heterocycles plays an important role in the stabilization of intermediate zwitterions. The reactivity is also controlled by the conformational flexibility of the heterocycle. The geometries of five-membered and, especially, six-membered lactone cycles are slightly changed upon the nucleophilic attack of phosphine, leading to the stabilizing stereoelectronic effect by the Ρ···Ο interaction. The addition of phosphine to α-methylene-γ-butyrolactam significantly distorts the initial geometry of the heterocycle, making the nucleophilic attack unfavorable. The application of the stereoelectronic effect to enhance the efficiency of the phosphine-catalyzed Michael and Pudovik reactions of α-methylene lactones was demonstrated.

4.
J Cheminform ; 15(1): 20, 2023 Feb 11.
Article in English | MEDLINE | ID: mdl-36774523

ABSTRACT

Artificial Intelligence is revolutionizing many aspects of the pharmaceutical industry. Deep learning models are now routinely applied to guide drug discovery projects leading to faster and improved findings, but there are still many tasks with enormous unrealized potential. One such task is the reaction yield prediction. Every year more than one fifth of all synthesis attempts result in product yields which are either zero or too low. This equates to chemical and human resources being spent on activities which ultimately do not progress the programs, leading to a triple loss when accounting for the cost of opportunity in time wasted. In this work we pre-train a BERT model on more than 16 million reactions from 4 different data sources, and fine tune it to achieve an uncertainty calibrated global yield prediction model. This model is an improvement upon state of the art not just from the increase in pre-train data but also by introducing a new embedding layer which solves a few limitations of SMILES and enables integration of additional information such as equivalents and molecule role into the reaction encoding, the model is called BERT Enriched Embedding (BEE). The model is benchmarked on an open-source dataset against a state-of-the-art synthesis focused BERT showing a near 20-point improvement in r2 score. The model is fine-tuned and tested on an internal company data benchmark, and a prospective study shows that the application of the model can reduce the total number of negative reactions (yield under 5%) ran in Janssen by at least 34%. Lastly, we corroborate the previous results through experimental validation, by directly deploying the model in an on-going drug discovery project and showing that it can also be used successfully as a reagent recommender due to its fast inference speed and reliable confidence estimation, a critical feature for industry application.

5.
J Chem Inf Model ; 62(14): 3307-3315, 2022 07 25.
Article in English | MEDLINE | ID: mdl-35792579

ABSTRACT

This work introduces GraphormerMapper, a new algorithm for reaction atom-to-atom mapping (AAM) based on a transformer neural network adopted for the direct processing of molecular graphs as sets of atoms and bonds, as opposed to SMILES/SELFIES sequence-based approaches, in combination with the Bidirectional Encoder Representations from Transformers (BERT) network. The graph transformer serves to extract molecular features that are tied to atoms and bonds. The BERT network is used for chemical transformation learning. In a benchmarking study with IBM RxnMapper, which is the best AAM algorithm according to our previous study, we demonstrate that our AAM algorithm is superior to it on our "Golden" benchmarking data set.


Subject(s)
Algorithms , Neural Networks, Computer , Electric Power Supplies
6.
J Chem Inf Model ; 62(9): 2015-2020, 2022 05 09.
Article in English | MEDLINE | ID: mdl-34843251

ABSTRACT

This work introduces CGRdb2.0─an open-source database management system for molecules, reactions, and chemical data. CGRdb2.0 is a Python package connecting to a PostgreSQL database that enables native searches for molecules and reactions without complicated SQL syntax. The library provides out-of-the-box implementations for similarity and substructure searches for molecules, as well as similarity and substructure searches for reactions in two ways─based on reaction components and based on the Condensed Graph of Reaction approach, the latter significantly accelerating the performance. In benchmarking studies with the RDKit database cartridge, we demonstrate that CGRdb2.0 performs searches faster for smaller data sets, while allowing for interactive access to the retrieved data.


Subject(s)
Benchmarking , Database Management Systems , Databases, Factual
7.
Mol Inform ; 41(4): e2100138, 2022 04.
Article in English | MEDLINE | ID: mdl-34726834

ABSTRACT

In this paper, we compare the most popular Atom-to-Atom Mapping (AAM) tools: ChemAxon,[1] Indigo,[2] RDTool,[3] NameRXN (NextMove),[4] and RXNMapper[5] which implement different AAM algorithms. An open-source RDTool program was optimized, and its modified version ("new RDTool") was considered together with several consensus mapping strategies. The Condensed Graph of Reaction approach was used to calculate chemical distances and develop the "AAM fixer" algorithm for an automatized correction of erroneous mapping. The benchmarking calculations were performed on a Golden dataset containing 1851 manually mapped and curated reactions. The best performing RXNMapper program together with the AMM Fixer was applied to map the USPTO database. The Golden dataset, mapped USPTO and optimized RDTool are available in the GitHub repository https://github.com/Laboratoire-de-Chemoinformatique.


Subject(s)
Benchmarking , Biochemical Phenomena , Algorithms , Databases, Factual
8.
Molecules ; 28(1)2022 Dec 28.
Article in English | MEDLINE | ID: mdl-36615457

ABSTRACT

Fluorescent derivatives attract the attention of researchers for their use as sensors, photocatalysts and for the creation of functional materials. In order to create amphiphilic fluorescent derivatives of calixarenes, a fluorescein derivative containing oligoethylene glycol and propargyl groups was obtained. The resulting fluorescein derivative was introduced into three different (thia)calix[4]arene azide derivatives. For all synthesized compounds, the luminescence quantum yields have been established in different solvents. Using UV-visible spectroscopy, dynamic light scattering, as well as transmission and confocal microscopy, aggregation of macrocycles was studied. It was evaluated that calixarene derivatives with alkyl substituents form spherical aggregates, while symmetrical tetrafluorescein-containing thiacalix[4]arene forms extended worm-like aggregates. The macrocycle containing tetradecyl fragments was found to be the most efficient in photoredox ipso-oxidation of phenylboronic acid. In addition, it was shown that in a number of different electron donors (NEt3, DABCO and iPr2EtN), the photoredox ipso-oxidation proceeds best with triethylamine. It has been shown that a low molecular weight surfactant Triton-X100 can also improve the photocatalytic abilities of an oligoethylene glycol fluorescein derivative, thus showing the importance of a combination of micellar and photoredox catalysis.


Subject(s)
Calixarenes , Water , Water/chemistry , Phenols/chemistry , Calixarenes/chemistry , Catalysis , Fluoresceins
9.
Molecules ; 26(18)2021 Sep 07.
Article in English | MEDLINE | ID: mdl-34576922

ABSTRACT

A potential hypoxia-sensitive system host-guest complex of three calixarenes (including two with four anionic carboxyl and sulphonate azo fragments on the upper rim and a newly synthesized bis-azo adduct of calixarene in the cone configuration with azo fragments on the lower rim with the most widespread cationic and zwitterionic rhodamine dyes (123, 6G and B)) was studied using UV-VIS spectrometry and fluorescence as well as 1D and 2D NMR techniques. It was found that all three calixarenes form a complex with rhodamine dyes with a 1:1 composition. The association constants of calixarene-dye complexes with sulfonate calixarenes, especially in the case of tetra-anionic calixarene, turned out to be higher compared with carboxyl calixarene due to the more intense electrostatic interactions. For the first time using an HRESI MS technique, it was shown that the treatment of rhodamine 6G and 123 with sodium dithionite (SDT) produces a non-fluorescent leuco form of the dye, and only rhodamine B can be used with SDT without the occurrence of a side reduction. Moreover, it was identified that in addition to the reduction in the azo groups, SDT causes partial cleavage of the aryl ether bonds. The found features of SDT should be taken into account when SDT is used as an azoreductase mimic.

10.
J Chem Inf Model ; 61(10): 4913-4923, 2021 10 25.
Article in English | MEDLINE | ID: mdl-34554736

ABSTRACT

Modern QSAR approaches have wide practical applications in drug discovery for designing potentially bioactive molecules. If such models are based on the use of 2D descriptors, important information contained in the spatial structures of molecules is lost. The major problem in constructing models using 3D descriptors is the choice of a putative bioactive conformation, which affects the predictive performance. The multi-instance (MI) learning approach considering multiple conformations in model training could be a reasonable solution to the above problem. In this study, we implemented several multi-instance algorithms, both conventional and based on deep learning, and investigated their performance. We compared the performance of MI-QSAR models with those based on the classical single-instance QSAR (SI-QSAR) approach in which each molecule is encoded by either 2D descriptors computed for the corresponding molecular graph or 3D descriptors issued for a single lowest energy conformation. The calculations were carried out on 175 data sets extracted from the ChEMBL23 database. It is demonstrated that (i) MI-QSAR outperforms SI-QSAR in numerous cases and (ii) MI algorithms can automatically identify plausible bioactive conformations.


Subject(s)
Algorithms , Quantitative Structure-Activity Relationship , Databases, Factual , Drug Discovery , Molecular Conformation
11.
Mol Inform ; 40(12): e2100119, 2021 12.
Article in English | MEDLINE | ID: mdl-34427989

ABSTRACT

The quality of experimental data for chemical reactions is a critical consideration for any reaction-driven study. However, the curation of reaction data has not been extensively discussed in the literature so far. Here, we suggest a 4 steps protocol that includes the curation of individual structures (reactants and products), chemical transformations, reaction conditions and endpoints. Its implementation in Python3 using CGRTools toolkit has been used to clean three popular reaction databases Reaxys, USPTO and Pistachio. The curated USPTO database is available in the GitHub repository (Laboratoire-de-Chemoinformatique/Reaction_Data_Cleaning).


Subject(s)
Data Curation , Databases, Factual , Reference Standards
12.
Int J Mol Sci ; 22(7)2021 Mar 29.
Article in English | MEDLINE | ID: mdl-33805474

ABSTRACT

Understanding the interaction of ions with organic receptors in confined space is of fundamental importance and could advance nanoelectronics and sensor design. In this work, metal ion complexation of conformationally varied thiacalix[4]monocrowns bearing lower-rim hydroxy (type I), dodecyloxy (type II), or methoxy (type III) fragments was evaluated. At the liquid-liquid interface, alkylated thiacalixcrowns-5(6) selectively extract alkali metal ions according to the induced-fit concept, whereas crown-4 receptors were ineffective due to distortion of the crown-ether cavity, as predicted by quantum-chemical calculations. In type-I ligands, alkali-metal ion extraction by the solvent-accessible crown-ether cavity was prevented, which resulted in competitive Ag+ extraction by sulfide bridges. Surprisingly, amphiphilic type-I/II conjugates moderately extracted other metal ions, which was attributed to calixarene aggregation in salt aqueous phase and supported by dynamic light scattering measurements. Cation-monolayer interactions at the air-water interface were monitored by surface pressure/potential measurements and UV/visible reflection-absorption spectroscopy. Topology-varied selectivity was evidenced, towards Sr2+ (crown-4), K+ (crown-5), and Ag+ (crown-6) in type-I receptors and Na+ (crown-4), Ca2+ (crown-5), and Cs+ (crown-6) in type-II receptors. Nuclear magnetic resonance and electronic absorption spectroscopy revealed exocyclic coordination in type-I ligands and cation-π interactions in type-II ligands.


Subject(s)
Coordination Complexes/chemistry , Crown Ethers/chemistry , Ions/metabolism , Phenols/chemistry , Sulfides/chemistry , Air , Alkylation , Calcium/metabolism , Coordination Complexes/metabolism , Crown Ethers/chemical synthesis , Crown Ethers/metabolism , Dynamic Light Scattering , Ions/chemistry , Liquid-Liquid Extraction , Magnetic Resonance Spectroscopy , Metals/chemistry , Molecular Conformation , Phenols/metabolism , Solvents/chemistry , Spectrophotometry, Ultraviolet , Sulfides/metabolism , Water/chemistry
13.
Sci Rep ; 11(1): 3178, 2021 02 04.
Article in English | MEDLINE | ID: mdl-33542271

ABSTRACT

The "creativity" of Artificial Intelligence (AI) in terms of generating de novo molecular structures opened a novel paradigm in compound design, weaknesses (stability & feasibility issues of such structures) notwithstanding. Here we show that "creative" AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent. Furthermore, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class. A sequence-to-sequence autoencoder with bidirectional Long Short-Term Memory layers was trained on on-purpose developed "SMILES/CGR" strings, encoding reactions of the USPTO database. The autoencoder latent space was visualized on a generative topographic map. Novel latent space points were sampled around a map area populated by Suzuki reactions and decoded to corresponding reactions. These can be critically analyzed by the expert, cleaned of irrelevant functional groups and eventually experimentally attempted, herewith enlarging the synthetic purpose of popular synthetic pathways.

14.
J Chem Inf Model ; 61(2): 554-559, 2021 02 22.
Article in English | MEDLINE | ID: mdl-33502186

ABSTRACT

Presently, quantum chemical calculations are widely used to generate extensive data sets for machine learning applications; however, generally, these sets only include information on equilibrium structures and some close conformers. Exploration of potential energy surfaces provides important information on ground and transition states, but analysis of such data is complicated due to the number of possible reaction pathways. Here, we present RePathDB, a database system for managing 3D structural data for both ground and transition states resulting from quantum chemical calculations. Our tool allows one to store, assemble, and analyze reaction pathway data. It combines relational database CGR DB for handling compounds and reactions as molecular graphs with a graph database architecture for pathway analysis by graph algorithms. Original condensed graph of reaction technology is used to store any chemical reaction as a single graph.


Subject(s)
Algorithms , Database Management Systems , Databases, Factual
15.
Int J Mol Sci ; 23(1)2021 Dec 27.
Article in English | MEDLINE | ID: mdl-35008674

ABSTRACT

The selection of experimental conditions leading to a reasonable yield is an important and essential element for the automated development of a synthesis plan and the subsequent synthesis of the target compound. The classical QSPR approach, requiring one-to-one correspondence between chemical structure and a target property, can be used for optimal reaction conditions prediction only on a limited scale when only one condition component (e.g., catalyst or solvent) is considered. However, a particular reaction can proceed under several different conditions. In this paper, we describe the Likelihood Ranking Model representing an artificial neural network that outputs a list of different conditions ranked according to their suitability to a given chemical transformation. Benchmarking calculations demonstrated that our model outperformed some popular approaches to the theoretical assessment of reaction conditions, such as k Nearest Neighbors, and a recurrent artificial neural network performance prediction of condition components (reagents, solvents, catalysts, and temperature). The ability of the Likelihood Ranking model trained on a hydrogenation reactions dataset, (~42,000 reactions) from Reaxys® database, to propose conditions that led to the desired product was validated experimentally on a set of three reactions with rich selectivity issues.


Subject(s)
Models, Chemical , Hydrogenation , Likelihood Functions , Stereoisomerism
16.
Int J Mol Sci ; 21(15)2020 Aug 03.
Article in English | MEDLINE | ID: mdl-32756326

ABSTRACT

Nowadays, the problem of the model's applicability domain (AD) definition is an active research topic in chemoinformatics. Although many various AD definitions for the models predicting properties of molecules (Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) models) were described in the literature, no one for chemical reactions (Quantitative Reaction-Property Relationships (QRPR)) has been reported to date. The point is that a chemical reaction is a much more complex object than an individual molecule, and its yield, thermodynamic and kinetic characteristics depend not only on the structures of reactants and products but also on experimental conditions. The QRPR models' performance largely depends on the way that chemical transformation is encoded. In this study, various AD definition methods extensively used in QSAR/QSPR studies of individual molecules, as well as several novel approaches suggested in this work for reactions, were benchmarked on several reaction datasets. The ability to exclude wrong reaction types, increase coverage, improve the model performance and detect Y-outliers were tested. As a result, several "best" AD definitions for the QRPR models predicting reaction characteristics have been revealed and tested on a previously published external dataset with a clear AD definition problem.


Subject(s)
Cheminformatics/trends , Protein Domains , Quantitative Structure-Activity Relationship , Thermodynamics , Chemical Phenomena , Kinetics , Models, Molecular
17.
J Chem Inf Model ; 59(11): 4569-4576, 2019 11 25.
Article in English | MEDLINE | ID: mdl-31638794

ABSTRACT

Here, we describe a concept of conjugated models for several properties (activities) linked by a strict mathematical relationship. This relationship can be directly integrated analytically into the ridge regression (RR) algorithm or accounted for in a special case of "twin" neural networks (NN). Developed approaches were applied to the modeling of the logarithm of the prototropic tautomeric constant (logKT) which can be expressed as the difference between the acidity constants (pKa) of two related tautomers. Both conjugated and individual RR and NN models for logKT and pKa were developed. The modeling set included 639 tautomeric constants and 2371 acidity constants of organic molecules in various solvents. A descriptor vector for each reaction resulted from the concatenation of structural descriptors and some parameters for reaction conditions. For the former, atom-centered substructural fragments describing acid sites in tautomer molecules were used. The latter were automatically identified using the condensed graph of reaction approach. Conjugated models performed similarly to the best individual models for logKT and pKa. At the same time, the physically grounded relationship between logKT and pKa was respected only for conjugated but not individual models.


Subject(s)
Organic Chemicals/chemistry , Pharmaceutical Preparations/chemistry , Acids/chemistry , Algorithms , Drug Discovery , Models, Chemical , Molecular Structure , Neural Networks, Computer , Quantitative Structure-Activity Relationship , Solvents/chemistry , Stereoisomerism
18.
J Chem Inf Model ; 59(6): 2516-2521, 2019 06 24.
Article in English | MEDLINE | ID: mdl-31063394

ABSTRACT

CGRtools is an open-source Python library aimed to handle molecular and reaction information. It is the sole library developed so far which can process condensed graph of reaction (CGR) handling. CGR provides the possibility for advanced operations with reaction information and could be used for reaction descriptor calculation, structure-reactivity modeling, atom-to-atom mapping comparison and correction, reaction center extraction, reaction balancing, and some other related tasks. Unlike other popular libraries, CGRtools is fully written in Python with minor dependencies on other libraries and cross-platform. Reaction, molecule, and CGR objects in CGRtools support native Python methods and are comparable with the help of operations "equal to", "less than", and "bigger than". CGRtools supports common structural formats. CGRtools is distributed via an L-GPL license and available on https://github.com/cimm-kzn/CGRtools .


Subject(s)
Cheminformatics/methods , Small Molecule Libraries/chemistry , Software , Chemical Phenomena , Models, Chemical
19.
Mol Inform ; 38(1-2): e1800077, 2019 01.
Article in English | MEDLINE | ID: mdl-30134047

ABSTRACT

This paper reports SVR (Support Vector Regression) and GTM (Generative Topographic Mapping) modeling of three kinetic properties of cycloaddition reactions: rate constant (logk), activation energy (Ea) and pre-exponential factor (logA). A data set of 1849 reactions, comprising (4+2), (3+2) and (2+2) cycloadditions (CA) were studied in different solvents and at different temperatures. The reactions were encoded by the ISIDA fragment descriptors generated for Condensed Graph of Reaction (CGR). For a given reaction, a CGR condenses structures of all the reactants and products into one single molecular graph, described both by conventional chemical bonds and "dynamical" bonds characterizing chemical transformations. Different scenarios of logk assessment were exploited: direct modeling, application of the Arrhenius equation and temperature-scaled GTM landscapes. The logk models with optimal cross-validated statistics (Q2 =0.78-0.94 RMSE=0.45-0.86) have been challenged to predict rates for the external test set of 200 reactions, comprising both reactions that were not present in the training set, and training set transformations performed under different reaction conditions. The models are freely available on our web-server: http://cimm.kpfu.ru/models.


Subject(s)
Cycloaddition Reaction/methods , Models, Chemical , Kinetics
20.
Mol Inform ; 38(4): e1800104, 2019 04.
Article in English | MEDLINE | ID: mdl-30468317

ABSTRACT

Here, we report the data visualization, analysis and modeling for a large set of 4830 SN 2 reactions the rate constant of which (logk) was measured at different experimental conditions (solvent, temperature). The reactions were encoded by one single molecular graph - Condensed Graph of Reactions, which allowed us to use conventional chemoinformatics techniques developed for individual molecules. Thus, Matched Reaction Pairs approach was suggested and used for the analyses of substituents effects on the substrates and nucleophiles reactivity. The data were visualized with the help of the Generative Topographic Mapping approach. Consensus Support Vector Regression (SVR) model for the rate constant was prepared. Unbiased estimation of the model's performance was made in cross-validation on reactions measured on unique structural transformations. The model's performance in cross-validation (RMSE=0.61 logk units) and on the external test set (RMSE=0.80) is close to the noise in data. Performances of the local models obtained for selected subsets of reactions proceeding in particular solvents or with particular type of nucleophiles were similar to that of the model built on the entire set. Finally, four different definitions of model's applicability domains for reactions were examined.


Subject(s)
Models, Chemical , Support Vector Machine , Hydrocarbons, Cyclic/chemistry , Kinetics , Oxidation-Reduction
SELECTION OF CITATIONS
SEARCH DETAIL
...