Pesquisa | Portal Regional da BVS (teste)

SAVI, in silico generation of billions of easily synthesizable compounds through expert-system type rules.

Patel, Hitesh; Ihlenfeldt, Wolf-Dietrich; Judson, Philip N; Moroz, Yurii S; Pevzner, Yuri; Peach, Megan L; Delannée, Victorien; Tarasova, Nadya I; Nicklaus, Marc C.

Sci Data ; 7(1): 384, 2020 11 11.

Artigo em Inglês | MEDLINE | ID: mdl-33177514

RESUMO

We have made available a database of over 1 billion compounds predicted to be easily synthesizable, called Synthetically Accessible Virtual Inventory (SAVI). They have been created by a set of transforms based on an adaptation and extension of the CHMTRN/PATRAN programming languages describing chemical synthesis expert knowledge, which originally stem from the LHASA project. The chemoinformatics toolkit CACTVS was used to apply a total of 53 transforms to about 150,000 readily available building blocks (enamine.net). Only single-step, two-reactant syntheses were calculated for this database even though the technology can execute multi-step reactions. The possibility to incorporate scoring systems in CHMTRN allowed us to subdivide the database of 1.75 billion compounds in sets according to their predicted synthesizability, with the most-synthesizable class comprising 1.09 billion synthetic products. Properties calculated for all SAVI products show that the database should be well-suited for drug discovery. It is being made publicly available for free download from https://doi.org/10.35115/37n9-5738.

Adapting CHMTRN (CHeMistry TRaNslator) for a New Use.

Judson, Philip N; Ihlenfeldt, Wolf-Dietrich; Patel, Hitesh; Delannée, Victorien; Tarasova, Nadya; Nicklaus, Marc C.

J Chem Inf Model ; 60(7): 3336-3341, 2020 07 27.

Artigo em Inglês | MEDLINE | ID: mdl-32539385

RESUMO

We have adopted and extended the CHMTRN language and used it for the knowledge base of a computer program to generate a large database of synthetically accessible, drug-like chemical structures, the Synthetically Accessible Virtual Inventory (SAVI) Database. CHMTRN is a powerful language originally developed in the LHASA (Logic and Heuristics Applied to Synthetic Analysis) project at Harvard University and used together with the chemical pattern description language, PATRAN, to describe chemical retro-reactions. The languages have proven to be useful beyond the design of retrosynthetic routes and have the potential for much wider use in chemistry; this paper describes CHMTRN and PATRAN as now reimplemented for the forward-synthetic SAVI project but able to describe both forward and retro-reactions.

Assuntos

Técnicas de Química Combinatória , Software , Bases de Dados Factuais , Humanos

Toward a Comprehensive Treatment of Tautomerism in Chemoinformatics Including in InChI V2.

Dhaked, Devendra K; Ihlenfeldt, Wolf-Dietrich; Patel, Hitesh; Delannée, Victorien; Nicklaus, Marc C.

J Chem Inf Model ; 60(3): 1253-1275, 2020 03 23.

Artigo em Inglês | MEDLINE | ID: mdl-32043883

RESUMO

We have collected 86 different transforms of tautomeric interconversions. Out of those, 54 are for prototropic (non-ring-chain) tautomerism, 21 for ring-chain tautomerism, and 11 for valence tautomerism. The majority of these rules have been extracted from experimental literature. Twenty rules, covering the most well-known types of tautomerism such as keto-enol tautomerism, were taken from the default handling of tautomerism by the chemoinformatics toolkit CACTVS. The rules were analyzed against nine differerent databases totaling over 400 million (non-unique) structures as to their occurrence rates, mutual overlap in coverage, and recapitulation of the rules' enumerated tautomer sets by InChI V.1.05, both in InChI's Standard and a Nonstandard version with the increased tautomer-handling options 15T and KET turned on. These results and the background of this study are discussed in the context of the IUPAC InChI Project tasked with the redesign of handling of tautomerism for an InChI version 2. Applying the rules presented in this paper would approximately triple the number of compounds in typical small-molecule databases that would be affected by tautomeric interconversion by InChI V2. A web tool has been created to test these rules at https://cactus.nci.nih.gov/tautomerizer.

Assuntos

Quimioinformática , Bases de Dados Factuais

PDB ligand conformational energies calculated quantum-mechanically.

Sitzmann, Markus; Weidlich, Iwona E; Filippov, Igor V; Liao, Chenzhong; Peach, Megan L; Ihlenfeldt, Wolf-Dietrich; Karki, Rajeshri G; Borodina, Yulia V; Cachau, Raul E; Nicklaus, Marc C.

J Chem Inf Model ; 52(3): 739-56, 2012 Mar 26.

Artigo em Inglês | MEDLINE | ID: mdl-22303903

RESUMO

We present here a greatly updated version of an earlier study on the conformational energies of protein-ligand complexes in the Protein Data Bank (PDB) [Nicklaus et al. Bioorg. Med. Chem. 1995, 3, 411-428], with the goal of improving on all possible aspects such as number and selection of ligand instances, energy calculations performed, and additional analyses conducted. Starting from about 357,000 ligand instances deposited in the 2008 version of the Ligand Expo database of the experimental 3D coordinates of all small-molecule instances in the PDB, we created a "high-quality" subset of ligand instances by various filtering steps including application of crystallographic quality criteria and structural unambiguousness. Submission of 640 Gaussian 03 jobs yielded a set of about 415 successfully concluded runs. We used a stepwise optimization of internal degrees of freedom at the DFT level of theory with the B3LYP/6-31G(d) basis set and a single-point energy calculation at B3LYP/6-311++G(3df,2p) after each round of (partial) optimization to separate energy changes due to bond length stretches vs bond angle changes vs torsion changes. Even for the most "conservative" choice of all the possible conformational energies-the energy difference between the conformation in which all internal degrees of freedom except torsions have been optimized and the fully optimized conformer-significant energy values were found. The range of 0 to ~25 kcal/mol was populated quite evenly and independently of the crystallographic resolution. A smaller number of "outliers" of yet higher energies were seen only at resolutions above 1.3 Å. The energies showed some correlation with molecular size and flexibility but not with crystallographic quality metrics such as the Cruickshank diffraction-component precision index (DPI) and R(free)-R, or with the ligand instance-specific metrics such as occupancy-weighted B-factor (OWAB), real-space R factor (RSR), and real-space correlation coefficient (RSCC). We repeated these calculations with the solvent model IEFPCM, which yielded energy differences that were generally somewhat lower than the corresponding vacuum results but did not produce a qualitatively different picture. Torsional sampling around the crystal conformation at the molecular mechanics level using the MMFF94s force field typically led to an increase in energy.

Assuntos

Bases de Dados de Proteínas , Conformação Molecular , Teoria Quântica , Cristalografia por Raios X , Ligantes , Modelos Moleculares , Solventes/química , Termodinâmica

Tautomerism in large databases.

Sitzmann, Markus; Ihlenfeldt, Wolf-Dietrich; Nicklaus, Marc C.

J Comput Aided Mol Des ; 24(6-7): 521-51, 2010 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-20512400

RESUMO

We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compounds. This analysis was carried out using calculable chemical structure identifiers developed by the NCI CADD Group, based on hash codes available in the chemoinformatics toolkit CACTVS and a newly developed scoring scheme to define a canonical tautomer for any encountered structure. CACTVS's tautomerism definition, a set of 21 transform rules expressed in SMIRKS line notation, was used, which takes a comprehensive stance as to the possible types of tautomeric interconversion included. Tautomerism was found to be possible for more than 2/3 of the unique structures in the CSDB. A total of 680 million tautomers were calculated from, and including, the original structure records. Tautomerism overlap within the same individual database (i.e. at least one other entry was present that was really only a different tautomeric representation of the same compound) was found at an average rate of 0.3% of the original structure records, with values as high as nearly 2% for some of the databases in CSDB. Projected onto the set of unique structures (by FICuS identifier), this still occurred in about 1.5% of the cases. Tautomeric overlap across all constituent databases in CSDB was found for nearly 10% of the records in the collection.

Assuntos

Bases de Dados Factuais , Estrutura Molecular , Informática , Isomerismo

The impact of tautomer forms on pharmacophore-based virtual screening.

Oellien, Frank; Cramer, Jörg; Beyer, Carsten; Ihlenfeldt, Wolf-Dietrich; Selzer, Paul M.

J Chem Inf Model ; 46(6): 2342-54, 2006.

Artigo em Inglês | MEDLINE | ID: mdl-17125178

RESUMO

In the field of in silico screening, many applications do not automatically consider possible tautomeric states of molecules. However, the detection of new compound candidates might rely on correct structural description, which is important for the perfect fit toward the biologically relevant interactions. In this paper, we present a new exhaustive tautomer enumeration approach implemented by means of the CACTVS software package. The approach contains a set of 21 predefined SMIRKS-based transforms and a powerful transformation engine that is capable of generating most tautomers described comprehensively in the literature or found in databases in the field of medicinal chemistry. User-defined tautomer rules applied to specific structural databases or scientific issues can be implemented easily and used instead of the predefined rules. In addition, we describe the impact of tautomer-enriched databases on pharmacophore screening approaches for human matrix metalloproteinase 8 as an example of a protein-based pharmacophore screening scenario and for human cyclin-dependent kinases as an example of a ligand-based pharmacophore screening approach. In both test cases, as a preprocessing step, we have used our new tautomer enumerator tool for the tautomer enrichment of the screening data sets and have used it as a postprocessing step to remove tautomeric duplicates from the results. We could demonstrate that the tautomer-enriched screening data sets show significant advantages compared to their non-enhanced counterparts. The discrimination between hits and nonhits was significantly better in the case of tautomer-enriched databases. Moreover, it has been proved that tautomer-enhanced databases will lead to a higher number of potential hits.

Assuntos

Química Farmacêutica/métodos , Proteína Quinase CDC2/química , Catálise , Técnicas de Química Combinatória , Computadores , Quinase 2 Dependente de Ciclina/química , Avaliação de Medicamentos , Humanos , Hidrogênio/química , Ligantes , Metaloproteinase 8 da Matriz/química , Modelos Químicos , Conformação Molecular , Proteínas/química , Tecnologia Farmacêutica/métodos

InfVis--platform-independent visual data mining of multidimensional chemical data sets.

Oellien, Frank; Ihlenfeldt, Wolf-Dietrich; Gasteiger, Johann.

J Chem Inf Model ; 45(5): 1456-67, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-16180923

RESUMO

The tremendous increase of chemical data sets, both in size and number, and the simultaneous desire to speed up the drug discovery process has resulted in an increasing need for a new generation of computational tools that assist in the extraction of information from data and allow for rapid and in-depth data mining. During recent years, visual data mining has become an important tool within the life sciences and drug discovery area with the potential to help avoiding data analysis from turning into a bottleneck. In this paper, we present InfVis, a platform-independent visual data mining tool for chemists, who usually only have little experience with classical data mining tools, for the visualization, exploration, and analysis of multivariate data sets. InfVis represents multidimensional data sets by using intuitive 3D glyph information visualization techniques. Interactive and dynamic tools such as dynamic query devices allow real-time, interactive data set manipulations and support the user in the identification of relationships and patterns. InfVis has been implemented in Java and Java3D and can be run on a broad range of platforms and operating systems. It can also be embedded as an applet in Web-based interfaces. We will present in this paper examples detailing the analysis of a reaction database that demonstrate how InfVis assists chemists in identifying and extracting hidden information.

Assuntos

Biologia Computacional/instrumentação , Biologia Computacional/métodos , Avaliação Pré-Clínica de Medicamentos/métodos , Preparações Farmacêuticas/química , Software , Internet , Fatores de Tempo

PASS biological activity spectrum predictions in the enhanced open NCI database browser.

Poroikov, Vladimir V; Filimonov, Dmitrii A; Ihlenfeldt, Wolf-Dietrich; Gloriozova, Tatyana A; Lagunin, Alexey A; Borodina, Yulia V; Stepanchikova, Alla V; Nicklaus, Marc C.

J Chem Inf Comput Sci ; 43(1): 228-36, 2003.

Artigo em Inglês | MEDLINE | ID: mdl-12546557

RESUMO

The application of the program PASS (Prediction of Activity Spectra for Substances) to about 250 000 compounds of the NCI Open Database and the incorporation of over 64 million PASS predictions in the Enhanced NCI Database Browser are described. A total of 565 different types of activity are included, encompassing general pharmacological effects, specific mechanisms of action, known toxicities, and others. Application of this Web-based service to prediction of activities of the kinds "Angiogenesis inhibitor," "Antiviral (HIV)", and a set of activities that can be associated with antineoplastic action are reported. For this latter data set, a very substantial enrichment over random selection was found in the PASS predictions. It is shown how the user can conduct complex searches by combining ranges of PASS-predicted probabilities of compounds to be active or to be inactive, respectively, with, e.g., value ranges of physicochemical parameters, presence or absence of particular substructural fragment, and other search criteria.

Assuntos

Antineoplásicos/química , Antineoplásicos/farmacologia , Bases de Dados Factuais , Internet , National Institutes of Health (U.S.) , Software , Relação Estrutura-Atividade , Estados Unidos

Enhanced CACTVS browser of the Open NCI Database.

Ihlenfeldt, Wolf-Dietrich; Voigt, Johannes H; Bienfait, Bruno; Oellien, Frank; Nicklaus, Marc C.

J Chem Inf Comput Sci ; 42(1): 46-57, 2002.

Artigo em Inglês | MEDLINE | ID: mdl-11855965

RESUMO

A Web-based, graphical user interface has been developed to conduct rapid searches by numerous criteria in the more than 250,000 structures of the Open NCI Database. It is based on the chemistry information toolkit CACTVS. Nearly all structures and anticancer and anti-HIV screening data provided by NCI's Developmental Therapeutics Program have been included. This data set has been augmented by a large amount of additional, mostly computed, data, such as calculated log P values, predicted biological activities, systematically determined names, and others. Complex boolean searches are possible. Flexible substructure searches have been implemented. The user can conduct 3D pharmacophore queries in up to 25 conformations precalculated for each compound. Numerous output formats as well as 2D and 3D visualization options are provided. It is possible to export search results in various forms and with choices for data contents in the exported files, for structure sets ranging in size from a single compound to the entire database. Only a Web browser is needed to use this service, with a few plug-ins being useful but optional.

Assuntos

Bases de Dados Factuais , Internet , Interface Usuário-Computador , Humanos , Modelos Moleculares , Estrutura Molecular , National Institutes of Health (U.S.) , Neoplasias , Estados Unidos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA