Search | VHL Regional Portal

Enumeration of ring-chain tautomers based on SMIRKS rules.

Guasch, Laura; Sitzmann, Markus; Nicklaus, Marc C.

J Chem Inf Model ; 54(9): 2423-32, 2014 Sep 22.

Article in English | MEDLINE | ID: mdl-25158156

ABSTRACT

A compound exhibits (prototropic) tautomerism if it can be represented by two or more structures that are related by a formal intramolecular movement of a hydrogen atom from one heavy atom position to another. When the movement of the proton is accompanied by the opening or closing of a ring it is called ring-chain tautomerism. This type of tautomerism is well observed in carbohydrates, but it also occurs in other molecules such as warfarin. In this work, we present an approach that allows for the generation of all ring-chain tautomers of a given chemical structure. Based on Baldwin's Rules estimating the likelihood of ring closure reactions to occur, we have defined a set of transform rules covering the majority of ring-chain tautomerism cases. The rules automatically detect substructures in a given compound that can undergo a ring-chain tautomeric transformation. Each transformation is encoded in SMIRKS line notation. All work was implemented in the chemoinformatics toolkit CACTVS. We report on the application of our ring-chain tautomerism rules to a large database of commercially available screening samples in order to identify ring-chain tautomers.

Subject(s)

Molecular Conformation , Cyclization , Databases, Chemical

QSAR modeling of imbalanced high-throughput screening data in PubChem.

Zakharov, Alexey V; Peach, Megan L; Sitzmann, Markus; Nicklaus, Marc C.

J Chem Inf Model ; 54(3): 705-12, 2014 Mar 24.

Article in English | MEDLINE | ID: mdl-24524735

ABSTRACT

Many of the structures in PubChem are annotated with activities determined in high-throughput screening (HTS) assays. Because of the nature of these assays, the activity data are typically strongly imbalanced, with a small number of active compounds contrasting with a very large number of inactive compounds. We have used several such imbalanced PubChem HTS assays to test and develop strategies to efficiently build robust QSAR models from imbalanced data sets. Different descriptor types [Quantitative Neighborhoods of Atoms (QNA) and "biological" descriptors] were used to generate a variety of QSAR models in the program GUSAR. The models obtained were compared using external test and validation sets. We also report on our efforts to incorporate the most predictive of our models in the publicly available NCI/CADD Group Web services ( http://cactus.nci.nih.gov/chemical/apps/cap).

Subject(s)

Drug Evaluation, Preclinical/methods , High-Throughput Screening Assays/methods , Quantitative Structure-Activity Relationship , Small Molecule Libraries/chemistry , Small Molecule Libraries/pharmacology , Algorithms , Databases, Chemical , HEK293 Cells , Humans , Models, Biological , Software

A new approach to radial basis function approximation and its application to QSAR.

Zakharov, Alexey V; Peach, Megan L; Sitzmann, Markus; Nicklaus, Marc C.

J Chem Inf Model ; 54(3): 713-9, 2014 Mar 24.

Article in English | MEDLINE | ID: mdl-24451033

ABSTRACT

We describe a novel approach to RBF approximation, which combines two new elements: (1) linear radial basis functions and (2) weighting the model by each descriptor's contribution. Linear radial basis functions allow one to achieve more accurate predictions for diverse data sets. Taking into account the contribution of each descriptor produces more accurate similarity values used for model development. The method was validated on 14 public data sets comprising nine physicochemical properties and five toxicity endpoints. We also compared the new method with five different QSAR methods implemented in the EPA T.E.S.T. program. Our approach, implemented in the program GUSAR, showed a reasonable accuracy of prediction and high coverage for all external test sets, providing more accurate prediction results than the comparison methods and even the consensus of these methods. Using our new method, we have created models for physicochemical and toxicity endpoints, which we have made freely available in the form of an online service at http://cactus.nci.nih.gov/chemical/apps/cap.

Subject(s)

Algorithms , Models, Biological , Quantitative Structure-Activity Relationship , Software , Animals , Computer Simulation , Cyprinidae/physiology , Daphnia/drug effects , Daphnia/physiology , Databases, Factual , Internet , Neural Networks, Computer , Rats , Tetrahymena/drug effects , Tetrahymena/physiology , Toxicity Tests

Comparing the Chemical Structure and Protein Content of ChEMBL, DrugBank, Human Metabolome Database and the Therapeutic Target Database.

Southan, Christopher; Sitzmann, Markus; Muresan, Sorel.

Mol Inform ; 32(11-12): 881-897, 2013 Dec.

Article in English | MEDLINE | ID: mdl-24533037

ABSTRACT

ChEMBL, DrugBank, Human Metabolome Database and the Therapeutic Target Database are resources of curated chemistry-to-protein relationships widely used in the chemogenomic arena. In this work we have extended an earlier analysis (PMID 22821596) by comparing chemistry and protein target content between 2010 and 2013. For the former, details are presented for overlaps and differences, statistics of stereochemistry as well as stereo representation and MW profiles between the four databases. For 2013 our results indicate quality improvements, major expansion, increased achiral structures and changes in MW distributions. An orthogonal comparison of chemical content with different sources inside PubChem highlights further interpretable differences. Expansion of protein content by UniProt IDs is also recorded for 2013 and Gene Ontology comparisons for human-only sets indicate differences. These emphasise the expanding complementarity of chemistry-to-protein relationships between sources, although different criteria are used for their capture.

Computational tools and resources for metabolism-related property predictions. 2. Application to prediction of half-life time in human liver microsomes.

Zakharov, Alexey V; Peach, Megan L; Sitzmann, Markus; Filippov, Igor V; McCartney, Heather J; Smith, Layton H; Pugliese, Angelo; Nicklaus, Marc C.

Future Med Chem ; 4(15): 1933-44, 2012 Oct.

Article in English | MEDLINE | ID: mdl-23088274

ABSTRACT

BACKGROUND: The most important factor affecting metabolic excretion of compounds from the body is their half-life time. This provides an indication of compound stability of, for example, drug molecules. We report on our efforts to develop QSAR models for metabolic stability of compounds, based on in vitro half-life assay data measured in human liver microsomes. METHOD: A variety of QSAR models generated using different statistical methods and descriptor sets implemented in both open-source and commercial programs (KNIME, GUSAR and StarDrop) were analyzed. The models obtained were compared using four different external validation sets from public and commercial data sources, including two smaller sets of in vivo half-life data in humans. CONCLUSION: In many cases, the accuracy of prediction achieved on one external test set did not correspond to the results achieved with another test set. The most predictive models were used for predicting the metabolic stability of compounds from the open NCI database, the results of which are publicly available on the NCI/CADD Group web server ( http://cactus.nci.nih.gov ).

Subject(s)

Computational Biology , Microsomes, Liver/metabolism , Algorithms , Databases, Factual , Half-Life , Humans , Pharmaceutical Preparations/metabolism , Quantitative Structure-Activity Relationship , Software

Mapping between databases of compounds and protein targets.

Muresan, Sorel; Sitzmann, Markus; Southan, Christopher.

Methods Mol Biol ; 910: 145-64, 2012.

Article in English | MEDLINE | ID: mdl-22821596

ABSTRACT

Databases that provide links between bioactive compounds and their protein targets are increasingly important in drug discovery and chemical biology. They join the expanding universes of cheminformatics via chemical structures on the one hand and bioinformatics via sequences on the other. However, it is difficult to assess the relative utility of databases without the explicit comparison of content. We have exemplified an approach to this by comparing resources that each has a different focus on bioactive chemistry (ChEMBL, DrugBank, Human Metabolome Database, and Therapeutic Target Database) both at the chemical structure and protein levels. We compared the compound sets at different representational stringencies using NCI/CADD Structure Identifiers. The overlap and uniqueness in chemical content can be broadly interpreted in the context of different data capture strategies. However, we recorded apparent anomalies, such as many compounds-in-common between the metabolite and drug databases. We also compared the content of sequences mapped to the compounds via their UniProt protein identifiers. While these were also generally interpretable in the context of individual databases we discerned differences in coverage and the types of supporting data used. For example, the target concept is applied differently between DrugBank and the Therapeutic Target Database. In ChEMBL it encompasses a broader range of mappings from chemical biology and species orthologue cross-screening in addition to drug targets per se. Our analysis should assist users not only in exploiting the synergies between these four high-value resources but also in assessing the utility of other databases at the interface of chemistry and biology.

Subject(s)

Computational Biology , Databases, Chemical , Databases, Protein , Molecular Targeted Therapy , Humans , Structure-Activity Relationship

PDB ligand conformational energies calculated quantum-mechanically.

Sitzmann, Markus; Weidlich, Iwona E; Filippov, Igor V; Liao, Chenzhong; Peach, Megan L; Ihlenfeldt, Wolf-Dietrich; Karki, Rajeshri G; Borodina, Yulia V; Cachau, Raul E; Nicklaus, Marc C.

J Chem Inf Model ; 52(3): 739-56, 2012 Mar 26.

Article in English | MEDLINE | ID: mdl-22303903

ABSTRACT

We present here a greatly updated version of an earlier study on the conformational energies of protein-ligand complexes in the Protein Data Bank (PDB) [Nicklaus et al. Bioorg. Med. Chem. 1995, 3, 411-428], with the goal of improving on all possible aspects such as number and selection of ligand instances, energy calculations performed, and additional analyses conducted. Starting from about 357,000 ligand instances deposited in the 2008 version of the Ligand Expo database of the experimental 3D coordinates of all small-molecule instances in the PDB, we created a "high-quality" subset of ligand instances by various filtering steps including application of crystallographic quality criteria and structural unambiguousness. Submission of 640 Gaussian 03 jobs yielded a set of about 415 successfully concluded runs. We used a stepwise optimization of internal degrees of freedom at the DFT level of theory with the B3LYP/6-31G(d) basis set and a single-point energy calculation at B3LYP/6-311++G(3df,2p) after each round of (partial) optimization to separate energy changes due to bond length stretches vs bond angle changes vs torsion changes. Even for the most "conservative" choice of all the possible conformational energies-the energy difference between the conformation in which all internal degrees of freedom except torsions have been optimized and the fully optimized conformer-significant energy values were found. The range of 0 to ~25 kcal/mol was populated quite evenly and independently of the crystallographic resolution. A smaller number of "outliers" of yet higher energies were seen only at resolutions above 1.3 Å. The energies showed some correlation with molecular size and flexibility but not with crystallographic quality metrics such as the Cruickshank diffraction-component precision index (DPI) and R(free)-R, or with the ligand instance-specific metrics such as occupancy-weighted B-factor (OWAB), real-space R factor (RSR), and real-space correlation coefficient (RSCC). We repeated these calculations with the solvent model IEFPCM, which yielded energy differences that were generally somewhat lower than the corresponding vacuum results but did not produce a qualitatively different picture. Torsional sampling around the crystal conformation at the molecular mechanics level using the MMFF94s force field typically led to an increase in energy.

Subject(s)

Databases, Protein , Molecular Conformation , Quantum Theory , Crystallography, X-Ray , Ligands , Models, Molecular , Solvents/chemistry , Thermodynamics

Software and resources for computational medicinal chemistry.

Liao, Chenzhong; Sitzmann, Markus; Pugliese, Angelo; Nicklaus, Marc C.

Future Med Chem ; 3(8): 1057-85, 2011 Jun.

Article in English | MEDLINE | ID: mdl-21707404

ABSTRACT

Computer-aided drug design plays a vital role in drug discovery and development and has become an indispensable tool in the pharmaceutical industry. Computational medicinal chemists can take advantage of all kinds of software and resources in the computer-aided drug design field for the purposes of discovering and optimizing biologically active compounds. This article reviews software and other resources related to computer-aided drug design approaches, putting particular emphasis on structure-based drug design, ligand-based drug design, chemical databases and chemoinformatics tools.

Subject(s)

Computer-Aided Design , Drug Design , Software , Animals , Databases, Factual , Humans , Models, Molecular , Quantitative Structure-Activity Relationship

Tautomerism in large databases.

Sitzmann, Markus; Ihlenfeldt, Wolf-Dietrich; Nicklaus, Marc C.

J Comput Aided Mol Des ; 24(6-7): 521-51, 2010 Jun.

Article in English | MEDLINE | ID: mdl-20512400

ABSTRACT

We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compounds. This analysis was carried out using calculable chemical structure identifiers developed by the NCI CADD Group, based on hash codes available in the chemoinformatics toolkit CACTVS and a newly developed scoring scheme to define a canonical tautomer for any encountered structure. CACTVS's tautomerism definition, a set of 21 transform rules expressed in SMIRKS line notation, was used, which takes a comprehensive stance as to the possible types of tautomeric interconversion included. Tautomerism was found to be possible for more than 2/3 of the unique structures in the CSDB. A total of 680 million tautomers were calculated from, and including, the original structure records. Tautomerism overlap within the same individual database (i.e. at least one other entry was present that was really only a different tautomeric representation of the same compound) was found at an average rate of 0.3% of the original structure records, with values as high as nearly 2% for some of the databases in CSDB. Projected onto the set of unique structures (by FICuS identifier), this still occurred in about 1.5% of the cases. Tautomeric overlap across all constituent databases in CSDB was found for nearly 10% of the records in the collection.

Subject(s)

Databases, Factual , Molecular Structure , Informatics , Isomerism

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL