Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 33
Filter
Add more filters











Publication year range
1.
J Chem Inf Model ; 63(21): 6629-6641, 2023 11 13.
Article in English | MEDLINE | ID: mdl-37902548

ABSTRACT

Computational design of chiral organic catalysts for asymmetric synthesis is a promising technology that can significantly reduce the material and human resources required for the preparation of enantiopure compounds. Herein, for the modeling of catalysts' enantioselectivity, we propose to use the multi-instance learning approach accounting for multiple catalyst conformers and requiring neither conformer selection nor their spatial alignment. A catalyst was represented by an ensemble of conformers, each encoded by three-dimesinonal (3D) pmapper descriptors. A catalyzed reactant transformation was converted into a single molecular graph, a condensed graph of reaction, encoded by 2D fragment descriptors. A whole chemical reaction was finally encoded by concatenated 3D catalyst and 2D transformation descriptors. The performance of the proposed method was demonstrated in the modeling of the enantioselectivity of homogeneous and phase-transfer reactions and compared with the state-of-the-art approaches.


Subject(s)
Catalysis
2.
Mol Inform ; 42(10): e2200275, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37488968

ABSTRACT

Conjugated QSPR models for reactions integrate fundamental chemical laws expressed by mathematical equations with machine learning algorithms. Herein we present a methodology for building conjugated QSPR models integrated with the Arrhenius equation. Conjugated QSPR models were used to predict kinetic characteristics of cycloaddition reactions related by the Arrhenius equation: rate constant l o g k ${{\rm l}{\rm o}{\rm g}k}$ , pre-exponential factor l o g A ${{\rm l}{\rm o}{\rm g}A}$ , and activation energy E a ${{E}_{{\rm a}}}$ . They were benchmarked against single-task (individual and equation-based models) and multi-task models. In individual models, all characteristics were modeled separately, while in multi-task models l o g k ${{\rm l}{\rm o}{\rm g}k}$ , l o g A ${{\rm l}{\rm o}{\rm g}A}$ and E a ${{E}_{{\rm a}}}$ were treated cooperatively. An equation-based model assessed l o g k ${{\rm l}{\rm o}{\rm g}k}$ using the Arrhenius equation and l o g A ${{\rm l}{\rm o}{\rm g}A}$ and E a ${{E}_{{\rm a}}}$ values predicted by individual models. It has been demonstrated that the conjugated QSPR models can accurately predict the reaction rate constants at extreme temperatures, at which reaction rate constants hardly can be measured experimentally. Also, in the case of small training sets conjugated models are more robust than related single-task approaches.

3.
J Control Release ; 353: 903-914, 2023 01.
Article in English | MEDLINE | ID: mdl-36402234

ABSTRACT

Active learning (AL) has become a subject of active recent research both in industry and academia as an efficient approach for rapid design and discovery of novel chemicals, materials, and polymers. Herein, we have assessed the applicability of AL for the discovery of polymeric micelle formulations for poorly soluble drugs. We were motivated by the key advantages of this approach making it a desirable strategy for rational design of drug delivery systems due toto its ability to (i) employ relatively small datasets for model development, (ii) iterate between model development and model assessment using small external datasets that can be either generated in focused experimental studies or formed from subsets of the initial training data, and (iii) progressively evolve models towards increasingly more reliable predictions and the identification of novel chemicals with the desired properties. In this study, we compared various AL protocols for their effectiveness in finding biologically active molecules using synthetic datasets. We have investigated the dependency of AL performance on the size of the initial training set, the relative complexity of the task, and the choice of the initial training dataset. We found that AL techniques as applied to regression modeling offer no benefits over random search, while AL used for classification tasks performs better than models built for randomly selected training sets but still quite far from perfect. Using the best performing AL protocol,. Finally, the best performing AL approach was employed to discover and experimentally validate novel binding polymers for a case study of asialoglycoprotein receptor (ASGPR).


Subject(s)
Polymers , Problem-Based Learning , Polymers/chemistry , Micelles , Drug Delivery Systems , Peptides
4.
J Chem Inf Model ; 62(22): 5471-5484, 2022 11 28.
Article in English | MEDLINE | ID: mdl-36332178

ABSTRACT

In order to better foramize it, the notorious inverse-QSAR problem (finding structures of given QSAR-predicted properties) is considered in this paper as a two-step process including (i) finding "seed" descriptor vectors corresponding to user-constrained QSAR model output values and (ii) identifying the chemical structures best matching the "seed" vectors. The main development effort here was focused on the latter stage, proposing a new attention-based conditional variational autoencoder neural-network architecture based on recent developments in attention-based methods. The obtained results show that this workflow was capable of generating compounds predicted to display desired activity while being completely novel compared to the training database (ChEMBL). Moreover, the generated compounds show acceptable druglikeness and synthetic accessibility. Both pharmacophore and docking studies were carried out as "orthogonal" in silico validation methods, proving that some of de novo structures are, beyond being predicted active by 2D-QSAR models, clearly able to match binding 3D pharmacophores and bind the protein pocket.


Subject(s)
Quantitative Structure-Activity Relationship , Molecular Docking Simulation
5.
J Chem Inf Model ; 62(15): 3524-3534, 2022 08 08.
Article in English | MEDLINE | ID: mdl-35876159

ABSTRACT

Graph-based architectures are becoming increasingly popular as a tool for structure generation. Here, we introduce novel open-source architecture HyFactor in which, similar to the InChI linear notation, the number of hydrogens attached to the heavy atoms was considered instead of the bond types. HyFactor was benchmarked on the ZINC 250K, MOSES, and ChEMBL data sets against conventional graph-based architecture ReFactor, representing our implementation of the reported DEFactor architecture in the literature. On average, HyFactor models contain some 20% less fitting parameters than those of ReFactor. The two architectures display similar validity, uniqueness, and reconstruction rates. Compared to the training set compounds, HyFactor generates more similar structures than ReFactor. This could be explained by the fact that the latter generates many open-chain analogues of cyclic structures in the training set. It has been demonstrated that the reconstruction error of heavy molecules can be significantly reduced using the data augmentation technique. The codes of HyFactor and ReFactor as well as all models obtained in this study are publicly available from our GitHub repository: https://github.com/Laboratoire-de-Chemoinformatique/HyFactor.


Subject(s)
Software
6.
Mol Inform ; 41(9): e2200044, 2022 Sep.
Article in English | MEDLINE | ID: mdl-35338606
7.
Mol Inform ; 41(4): e2100138, 2022 04.
Article in English | MEDLINE | ID: mdl-34726834

ABSTRACT

In this paper, we compare the most popular Atom-to-Atom Mapping (AAM) tools: ChemAxon,[1] Indigo,[2] RDTool,[3] NameRXN (NextMove),[4] and RXNMapper[5] which implement different AAM algorithms. An open-source RDTool program was optimized, and its modified version ("new RDTool") was considered together with several consensus mapping strategies. The Condensed Graph of Reaction approach was used to calculate chemical distances and develop the "AAM fixer" algorithm for an automatized correction of erroneous mapping. The benchmarking calculations were performed on a Golden dataset containing 1851 manually mapped and curated reactions. The best performing RXNMapper program together with the AMM Fixer was applied to map the USPTO database. The Golden dataset, mapped USPTO and optimized RDTool are available in the GitHub repository https://github.com/Laboratoire-de-Chemoinformatique.


Subject(s)
Benchmarking , Biochemical Phenomena , Algorithms , Databases, Factual
8.
J Chem Inf Model ; 62(9): 2015-2020, 2022 05 09.
Article in English | MEDLINE | ID: mdl-34843251

ABSTRACT

This work introduces CGRdb2.0─an open-source database management system for molecules, reactions, and chemical data. CGRdb2.0 is a Python package connecting to a PostgreSQL database that enables native searches for molecules and reactions without complicated SQL syntax. The library provides out-of-the-box implementations for similarity and substructure searches for molecules, as well as similarity and substructure searches for reactions in two ways─based on reaction components and based on the Condensed Graph of Reaction approach, the latter significantly accelerating the performance. In benchmarking studies with the RDKit database cartridge, we demonstrate that CGRdb2.0 performs searches faster for smaller data sets, while allowing for interactive access to the retrieved data.


Subject(s)
Benchmarking , Database Management Systems , Databases, Factual
9.
J Chem Inf Model ; 61(10): 4913-4923, 2021 10 25.
Article in English | MEDLINE | ID: mdl-34554736

ABSTRACT

Modern QSAR approaches have wide practical applications in drug discovery for designing potentially bioactive molecules. If such models are based on the use of 2D descriptors, important information contained in the spatial structures of molecules is lost. The major problem in constructing models using 3D descriptors is the choice of a putative bioactive conformation, which affects the predictive performance. The multi-instance (MI) learning approach considering multiple conformations in model training could be a reasonable solution to the above problem. In this study, we implemented several multi-instance algorithms, both conventional and based on deep learning, and investigated their performance. We compared the performance of MI-QSAR models with those based on the classical single-instance QSAR (SI-QSAR) approach in which each molecule is encoded by either 2D descriptors computed for the corresponding molecular graph or 3D descriptors issued for a single lowest energy conformation. The calculations were carried out on 175 data sets extracted from the ChEMBL23 database. It is demonstrated that (i) MI-QSAR outperforms SI-QSAR in numerous cases and (ii) MI algorithms can automatically identify plausible bioactive conformations.


Subject(s)
Algorithms , Quantitative Structure-Activity Relationship , Databases, Factual , Drug Discovery , Molecular Conformation
10.
Mol Inform ; 40(12): e2100119, 2021 12.
Article in English | MEDLINE | ID: mdl-34427989

ABSTRACT

The quality of experimental data for chemical reactions is a critical consideration for any reaction-driven study. However, the curation of reaction data has not been extensively discussed in the literature so far. Here, we suggest a 4 steps protocol that includes the curation of individual structures (reactants and products), chemical transformations, reaction conditions and endpoints. Its implementation in Python3 using CGRTools toolkit has been used to clean three popular reaction databases Reaxys, USPTO and Pistachio. The curated USPTO database is available in the GitHub repository (Laboratoire-de-Chemoinformatique/Reaction_Data_Cleaning).


Subject(s)
Data Curation , Databases, Factual , Reference Standards
11.
Mol Inform ; 40(11): e2060030, 2021 11.
Article in English | MEDLINE | ID: mdl-34342944

ABSTRACT

The most widely used QSAR approaches are mainly based on 2D molecular representation which ignores stereoconfiguration and conformational flexibility of compounds. 3D QSAR uses a single conformer of each compound which is difficult to choose reasonably. 4D QSAR uses multiple conformers to overcome the issues of 2D and 3D methods. However, many of existing 4D QSAR models suffer from the necessity to pre-align conformers, while alignment-independent approaches often ignore stereoconfiguration of compounds. In this study we propose a QSAR modeling approach based on transforming chirality-aware 3D pharmacophore descriptors of individual conformers into a set of latent variables representing the whole conformer set of a molecule. This is achieved by clustering together all conformers of all training set compounds. The final representation of a compound is a bit string encoding cluster membership of its conformers. In our study we used Random Forest, but this representation can be used in combination with any machine learning method. We compared this approach with conventional 2D and 3D approaches using multiple data sets and investigated the sensitivity of the approach proposed to tuning parameters: number of conformers and clusters.


Subject(s)
Quantitative Structure-Activity Relationship , Molecular Conformation
12.
Sci Rep ; 11(1): 3178, 2021 02 04.
Article in English | MEDLINE | ID: mdl-33542271

ABSTRACT

The "creativity" of Artificial Intelligence (AI) in terms of generating de novo molecular structures opened a novel paradigm in compound design, weaknesses (stability & feasibility issues of such structures) notwithstanding. Here we show that "creative" AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent. Furthermore, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class. A sequence-to-sequence autoencoder with bidirectional Long Short-Term Memory layers was trained on on-purpose developed "SMILES/CGR" strings, encoding reactions of the USPTO database. The autoencoder latent space was visualized on a generative topographic map. Novel latent space points were sampled around a map area populated by Suzuki reactions and decoded to corresponding reactions. These can be critically analyzed by the expert, cleaned of irrelevant functional groups and eventually experimentally attempted, herewith enlarging the synthetic purpose of popular synthetic pathways.

13.
J Chem Inf Model ; 61(2): 554-559, 2021 02 22.
Article in English | MEDLINE | ID: mdl-33502186

ABSTRACT

Presently, quantum chemical calculations are widely used to generate extensive data sets for machine learning applications; however, generally, these sets only include information on equilibrium structures and some close conformers. Exploration of potential energy surfaces provides important information on ground and transition states, but analysis of such data is complicated due to the number of possible reaction pathways. Here, we present RePathDB, a database system for managing 3D structural data for both ground and transition states resulting from quantum chemical calculations. Our tool allows one to store, assemble, and analyze reaction pathway data. It combines relational database CGR DB for handling compounds and reactions as molecular graphs with a graph database architecture for pathway analysis by graph algorithms. Original condensed graph of reaction technology is used to store any chemical reaction as a single graph.


Subject(s)
Algorithms , Database Management Systems , Databases, Factual
14.
Int J Mol Sci ; 23(1)2021 Dec 27.
Article in English | MEDLINE | ID: mdl-35008674

ABSTRACT

The selection of experimental conditions leading to a reasonable yield is an important and essential element for the automated development of a synthesis plan and the subsequent synthesis of the target compound. The classical QSPR approach, requiring one-to-one correspondence between chemical structure and a target property, can be used for optimal reaction conditions prediction only on a limited scale when only one condition component (e.g., catalyst or solvent) is considered. However, a particular reaction can proceed under several different conditions. In this paper, we describe the Likelihood Ranking Model representing an artificial neural network that outputs a list of different conditions ranked according to their suitability to a given chemical transformation. Benchmarking calculations demonstrated that our model outperformed some popular approaches to the theoretical assessment of reaction conditions, such as k Nearest Neighbors, and a recurrent artificial neural network performance prediction of condition components (reagents, solvents, catalysts, and temperature). The ability of the Likelihood Ranking model trained on a hydrogenation reactions dataset, (~42,000 reactions) from Reaxys® database, to propose conditions that led to the desired product was validated experimentally on a set of three reactions with rich selectivity issues.


Subject(s)
Models, Chemical , Hydrogenation , Likelihood Functions , Stereoisomerism
15.
ACS Omega ; 5(31): 19589-19597, 2020 Aug 11.
Article in English | MEDLINE | ID: mdl-32803053

ABSTRACT

Steam injection is the most widely used technique for effectively reducing the viscosity of heavy oil in heavy oil production, in which in situ upgrading of heavy oil by aquathermolysis plays an important role. Earlier, transition-metal catalysts have been used for improving the efficiency of steam injection by catalytic aquathermolysis and achieving a higher degree of in situ oil upgrading. However, the unclear mechanism of aquathermolysis makes it difficult to choose efficient catalysts for different types of heavy oil. This theoretical study is aimed at deeply understanding the mechanism of in situ upgrading of sulfur-containing heavy oil and its catalysis. For this purpose, cyclohexyl phenyl sulfide (CPS) is selected as a model compound of sulfur-containing oil components, and, for the first time, a catalytic effect of transition metals on the thermochemistry and kinetics of its aquathermolysis is investigated by the density functional theory (DFT) methods with the use of the Becke three-parameter Lee-Yang-Parr (B3LYP), ωB97X-D, and M06-2X functionals. Calculation results show that the hydrolysis of CPS is characterized by fairly high energy barriers in comparison with other possible reaction routes leading to the cleavage of C-S bonds, while the heterolysis of C-S bonds in the presence of protons has a substantially lower kinetic barrier. According to the theoretical analysis, transition-metal ions significantly reduce the kinetic barrier of heterolysis. The Cu2+ ion outperforms the other investigated metal ions and the hydrogen ion in the calculated rate constant by 5-6 (depending on the metal) and 7 orders of magnitude, respectively. The catalytic activity of the investigated transition-metal ions is arranged in the following sequence, depending on the used DFT functional: Cu2+ ≫ Co2+ ≈ Ni2+ > Fe2+. It is theoretically confirmed that transition-metal ions, especially Cu2+, can serve as effective catalysts in aquathermolysis reactions. The proposed quantum-chemical approach for studying the catalytic aquathermolysis provides a new supplementary theoretical tool that can be used in the development of catalysts for different chemical transformations of heavy oil components in reservoirs due to hydrothermal treatment.

16.
Int J Mol Sci ; 21(15)2020 Aug 03.
Article in English | MEDLINE | ID: mdl-32756326

ABSTRACT

Nowadays, the problem of the model's applicability domain (AD) definition is an active research topic in chemoinformatics. Although many various AD definitions for the models predicting properties of molecules (Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) models) were described in the literature, no one for chemical reactions (Quantitative Reaction-Property Relationships (QRPR)) has been reported to date. The point is that a chemical reaction is a much more complex object than an individual molecule, and its yield, thermodynamic and kinetic characteristics depend not only on the structures of reactants and products but also on experimental conditions. The QRPR models' performance largely depends on the way that chemical transformation is encoded. In this study, various AD definition methods extensively used in QSAR/QSPR studies of individual molecules, as well as several novel approaches suggested in this work for reactions, were benchmarked on several reaction datasets. The ability to exclude wrong reaction types, increase coverage, improve the model performance and detect Y-outliers were tested. As a result, several "best" AD definitions for the QRPR models predicting reaction characteristics have been revealed and tested on a previously published external dataset with a clear AD definition problem.


Subject(s)
Cheminformatics/trends , Protein Domains , Quantitative Structure-Activity Relationship , Thermodynamics , Chemical Phenomena , Kinetics , Models, Molecular
17.
Molecules ; 25(2)2020 Jan 17.
Article in English | MEDLINE | ID: mdl-31963467

ABSTRACT

Pharmacophore modeling is usually considered as a special type of virtual screening without probabilistic nature. Correspondence of at least one conformation of a molecule to pharmacophore is considered as evidence of its bioactivity. We show that pharmacophores can be treated as one-class machine learning models, and the probability the reflecting model's confidence can be assigned to a pharmacophore on the basis of their precision of active compounds identification on a calibration set. Two schemes (Max and Mean) of probability calculation for consensus prediction based on individual pharmacophore models were proposed. Both approaches to some extent correspond to commonly used consensus approaches like the common hit approach or the one based on a logical OR operation uniting hit lists of individual models. Unlike some known approaches, the proposed ones can rank compounds retrieved by multiple models. These approaches were benchmarked on multiple ChEMBL datasets used for ligand-based pharmacophore modeling and externally validated on corresponding DUD-E datasets. The influence of complexity of pharmacophores and their performance on a calibration set on results of virtual screening was analyzed. It was shown that Max and Mean approaches have superior early enrichment to the commonly used approaches. Thus, a well-performing, easy-to-implement, and probabilistic alternative to existing approaches for pharmacophore-based virtual screening was proposed.


Subject(s)
Drug Evaluation, Preclinical/methods , Pharmaceutical Preparations/analysis , Animals , Computer Simulation , Humans , Ligands , Machine Learning , Models, Chemical , Models, Molecular , Molecular Conformation , Protein Binding
18.
Int J Mol Sci ; 20(23)2019 Nov 20.
Article in English | MEDLINE | ID: mdl-31757043

ABSTRACT

Pharmacophore models are widely used for the identification of promising primary hits in compound large libraries. Recent studies have demonstrated that pharmacophores retrieved from protein-ligand molecular dynamic trajectories outperform pharmacophores retrieved from a single crystal complex structure. However, the number of retrieved pharmacophores can be enormous, thus, making it computationally inefficient to use all of them for virtual screening. In this study, we proposed selection of distinct representative pharmacophores by the removal of pharmacophores with identical three-dimensional (3D) pharmacophore hashes. We also proposed a new conformer coverage approach in order to rank compounds using all representative pharmacophores. Our results for four cyclin-dependent kinase 2 (CDK2) complexes with different ligands demonstrated that the proposed selection and ranking approaches outperformed the previously described common hits approach. We also demonstrated that ranking, based on averaged predicted scores obtained from different complexes, can outperform ranking based on scores from an individual complex. All developments were implemented in open-source software pharmd.


Subject(s)
Cyclin-Dependent Kinase 2/chemistry , Drug Discovery/methods , Molecular Dynamics Simulation , Small Molecule Libraries/chemistry , Binding Sites , Computer Simulation , Cyclin-Dependent Kinase 2/metabolism , Humans , Ligands , Molecular Docking Simulation/methods , Protein Binding , Protein Kinase Inhibitors/chemistry , Protein Kinase Inhibitors/pharmacology , Small Molecule Libraries/pharmacology
19.
J Chem Inf Model ; 59(11): 4569-4576, 2019 11 25.
Article in English | MEDLINE | ID: mdl-31638794

ABSTRACT

Here, we describe a concept of conjugated models for several properties (activities) linked by a strict mathematical relationship. This relationship can be directly integrated analytically into the ridge regression (RR) algorithm or accounted for in a special case of "twin" neural networks (NN). Developed approaches were applied to the modeling of the logarithm of the prototropic tautomeric constant (logKT) which can be expressed as the difference between the acidity constants (pKa) of two related tautomers. Both conjugated and individual RR and NN models for logKT and pKa were developed. The modeling set included 639 tautomeric constants and 2371 acidity constants of organic molecules in various solvents. A descriptor vector for each reaction resulted from the concatenation of structural descriptors and some parameters for reaction conditions. For the former, atom-centered substructural fragments describing acid sites in tautomer molecules were used. The latter were automatically identified using the condensed graph of reaction approach. Conjugated models performed similarly to the best individual models for logKT and pKa. At the same time, the physically grounded relationship between logKT and pKa was respected only for conjugated but not individual models.


Subject(s)
Organic Chemicals/chemistry , Pharmaceutical Preparations/chemistry , Acids/chemistry , Algorithms , Drug Discovery , Models, Chemical , Molecular Structure , Neural Networks, Computer , Quantitative Structure-Activity Relationship , Solvents/chemistry , Stereoisomerism
20.
J Chem Inf Model ; 59(6): 2516-2521, 2019 06 24.
Article in English | MEDLINE | ID: mdl-31063394

ABSTRACT

CGRtools is an open-source Python library aimed to handle molecular and reaction information. It is the sole library developed so far which can process condensed graph of reaction (CGR) handling. CGR provides the possibility for advanced operations with reaction information and could be used for reaction descriptor calculation, structure-reactivity modeling, atom-to-atom mapping comparison and correction, reaction center extraction, reaction balancing, and some other related tasks. Unlike other popular libraries, CGRtools is fully written in Python with minor dependencies on other libraries and cross-platform. Reaction, molecule, and CGR objects in CGRtools support native Python methods and are comparable with the help of operations "equal to", "less than", and "bigger than". CGRtools supports common structural formats. CGRtools is distributed via an L-GPL license and available on https://github.com/cimm-kzn/CGRtools .


Subject(s)
Cheminformatics/methods , Small Molecule Libraries/chemistry , Software , Chemical Phenomena , Models, Chemical
SELECTION OF CITATIONS
SEARCH DETAIL