Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
2.
Structure ; 29(4): 393-400.e1, 2021 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-33657417

RESUMO

The Worldwide Protein Data Bank (wwPDB) has provided validation reports based on recommendations from community Validation Task Forces for structures in the PDB since 2013. To further enhance validation of small molecules as recommended from the 2016 Ligand Validation Workshop, wwPDB, Global Phasing Ltd., and the Noguchi Institute, recently formed a public/private partnership to incorporate some of their software tools into the wwPDB validation package. Augmented wwPDB validation report features include: two-dimensional (2D) diagrams of small-molecule ligands and carbohydrates, highlighting geometric validation outcomes; 2D topological diagrams of oligosaccharides present in branched entities generated using 2D Symbol Nomenclature for Glycan representation; and views of 3D electron density maps for ligands and carbohydrates, illustrating the goodness-of-fit between the atomic structure and experimental data (X-ray crystallographic structures only). These improvements will impact confidence in ligand conformation and ligand-macromolecular interactions that will aid in understanding biochemical function and contribute to small-molecule drug discovery.


Assuntos
Carboidratos/química , Bases de Dados de Proteínas/normas , Simulação de Acoplamento Molecular/métodos , Proteômica/métodos , Bibliotecas de Moléculas Pequenas/química , Quimioinformática/métodos , Bases de Dados de Compostos Químicos/normas , Humanos , Ligantes , Ligação Proteica , Proteoma/química , Proteoma/metabolismo
3.
SAR QSAR Environ Res ; 31(3): 171-186, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-31858821

RESUMO

The European Registration, Evaluation, Authorization and Restriction of Chemical Substances Regulation, requires marketed chemicals to be evaluated for Ready Biodegradability (RB), considering in silico prediction as valid alternative to experimental testing. However, currently available models may not be relevant to predict compounds of industrial interest, due to accuracy and applicability domain restriction issues. In this work, we present a new and extended RB dataset (2830 compounds), issued by the merging of several public data sources. It was used to train classification models, which were externally validated and benchmarked against already-existing tools on a set of 316 compounds coming from the industrial context. New models showed good performances in terms of predictive power (Balance Accuracy (BA) = 0.74-0.79) and data coverage (83-91%). The Generative Topographic Mapping approach identified several chemotypes and structural motifs unique to the industrial dataset, highlighting for which chemical classes currently available models may have less reliable predictions. Finally, public and industrial data were merged into global dataset containing 3146 compounds. This is the biggest dataset reported in the literature so far, covering some chemotypes absent in the public data. Thus, predictive model developed on the Global dataset has larger applicability domain than the existing ones.


Assuntos
Bases de Dados de Compostos Químicos , Poluentes Ambientais/química , Modelos Químicos , Algoritmos , Benchmarking , Biodegradação Ambiental , Simulação por Computador , Bases de Dados de Compostos Químicos/normas , Relação Quantitativa Estrutura-Atividade , Reprodutibilidade dos Testes
4.
Database (Oxford) ; 20192019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30753475

RESUMO

The discovery of antiviral drugs is a rapidly developing area of medicinal chemistry research. The emergence of resistant variants and outbreaks of poorly studied viral diseases make this area constantly developing. The amount of antiviral activity data available in ChEMBL consistently grows, but virus taxonomy annotation of these data is not sufficient for thorough studies of antiviral chemical space. We developed a procedure for semi-automatic extraction of antiviral activity data from ChEMBL and mapped them to the virus taxonomy developed by the International Committee for Taxonomy of Viruses (ICTV). The procedure is based on the lists of virus-related values of ChEMBL annotation fields and a dictionary of virus names and acronyms mapped to ICTV taxa. Application of this data extraction procedure allows retrieving from ChEMBL 1.6 times more assays linked to 2.5 times more compounds and data points than ChEMBL web interface allows. Mapping of these data to ICTV taxa allows analyzing all the compounds tested against each viral species. Activity values and structures of the compounds were standardized, and the antiviral activity profile was created for each standard structure. Data set compiled using this algorithm was called ViralChEMBL. As case studies, we compared descriptor and scaffold distributions for the full ChEMBL and its `viral' and `non-viral' subsets, identified the most studied compounds and created a self-organizing map for ViralChEMBL. Our approach to data annotation appeared to be a very efficient tool for the study of antiviral chemical space.


Assuntos
Antivirais/química , Antivirais/classificação , Curadoria de Dados , Bases de Dados de Compostos Químicos , Bases de Dados de Compostos Químicos/normas , Tomada de Decisões , Padrões de Referência
5.
Sci Data ; 6: 190023, 2019 02 19.
Artigo em Inglês | MEDLINE | ID: mdl-30778259

RESUMO

Identification of discrepant data in aggregated databases is a key step in data curation and remediation. We have applied the ALATIS approach, which is based on the international chemical shift identifier (InChI) model, to the full PubChem Compound database to generate unique and reproducible compound and atom identifiers for all entries for which three-dimensional structures were available. This exercise also served to identify entries with discrepancies between structures and chemical formulas or InChI strings. The use of unique compound identifiers and atom nomenclature should support more rigorous links between small-molecule databases including those containing atom-specific information of the type available from crystallography and spectroscopy. The comprehensive results from this analysis are publicly available through our webserver [http://alatis.nmrfam.wisc.edu/].


Assuntos
Confiabilidade dos Dados , Bases de Dados de Compostos Químicos , Bases de Dados de Compostos Químicos/normas
6.
Mol Inform ; 38(1-2): e1800086, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30247811

RESUMO

A key consideration at the screening stages of drug discovery is in vitro metabolic stability, often measured in human liver microsomes. Computational prediction models can be built using a large quantity of experimental data available from public databases, but these databases typically contain data measured using various protocols in different laboratories, raising the issue of data quality. In this study, we retrieved the intrinsic clearance (CLint ) measurements from an open database and performed extensive manual curation. Then, chemical descriptors were calculated using freely available software, and prediction models were built using machine learning algorithms. The models trained on the curated data showed better performance than those trained on the non-curated data and achieved performance comparable to previously published models, showing the importance of manual curation in data preparation. The curated data were made available, to make our models fully reproducible.


Assuntos
Bases de Dados de Compostos Químicos/normas , Descoberta de Drogas/métodos , Eliminação Hepatobiliar , Aprendizado de Máquina , Descoberta de Drogas/normas , Humanos , Taxa de Depuração Metabólica , Microssomos Hepáticos/metabolismo
7.
Mol Inform ; 38(3): e1800068, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30345657

RESUMO

1880 known drugs were collected and analysed for their mainstream molecular descriptors: MW, log P, HA, HD, RB and PSA. The statistical distributions were fitted to Gaussian functions for each of the descriptors. This gave a mathematical tool to calculate a weighted score, or an Index, for each descriptor. Known Drug Indexes (KDIs) were derived either by summation or multiplication of the Indexes, giving one number for each molecule calculated. The KDI summation and multiplication methods give a theoretical maxima of 6 and 1 respectively. According to both methods, methysergide (5.89/0.90), amsacrine (5.89/0.89) and fluorometholone (5.88/0.88) have the scores of the most well-balanced pharmaceuticals. The KDIs are advantageous tools in identifying the most well-balanced screening compounds based on the properties of known drugs; the screening collection can be optimised to only include quality compounds, which in turn produce tractable hit and lead compounds from the screening campaign.


Assuntos
Bases de Dados de Compostos Químicos/normas , Descoberta de Drogas/métodos , Ensaios de Triagem em Larga Escala/métodos , Bibliotecas de Moléculas Pequenas/normas , Algoritmos , Humanos , Relação Quantitativa Estrutura-Atividade , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia
8.
Angew Chem Int Ed Engl ; 57(46): 14986-14990, 2018 11 12.
Artigo em Inglês | MEDLINE | ID: mdl-29786940

RESUMO

Glycoinformatics is an actively developing scientific discipline, which provides scientists with the means of access to the data on natural glycans and with various tools of their processing. However, the informatization of glycomics has a long way to go before catching up with genomics and proteomics. In this Viewpoint, we review the current situation in glycoinformatics and discuss its achievements and shortcomings, emphasizing the major drawbacks: the lack of recognized standards, protocols, data indices and tools, and the informational isolation of the existing projects. We reiterate possible solutions of the persistent issues and describe our vision of an ideal glycoinformatics project.


Assuntos
Carboidratos/análise , Bases de Dados de Compostos Químicos , Glicômica , Animais , Biologia Computacional/métodos , Biologia Computacional/normas , Bases de Dados de Compostos Químicos/normas , Glicômica/métodos , Glicômica/normas , Humanos , Software
9.
Cell Syst ; 6(1): 13-24, 2018 01 24.
Artigo em Inglês | MEDLINE | ID: mdl-29199020

RESUMO

The Library of Integrated Network-Based Cellular Signatures (LINCS) is an NIH Common Fund program that catalogs how human cells globally respond to chemical, genetic, and disease perturbations. Resources generated by LINCS include experimental and computational methods, visualization tools, molecular and imaging data, and signatures. By assembling an integrated picture of the range of responses of human cells exposed to many perturbations, the LINCS program aims to better understand human disease and to advance the development of new therapies. Perturbations under study include drugs, genetic perturbations, tissue micro-environments, antibodies, and disease-causing mutations. Responses to perturbations are measured by transcript profiling, mass spectrometry, cell imaging, and biochemical methods, among other assays. The LINCS program focuses on cellular physiology shared among tissues and cell types relevant to an array of diseases, including cancer, heart disease, and neurodegenerative disorders. This Perspective describes LINCS technologies, datasets, tools, and approaches to data accessibility and reusability.


Assuntos
Catalogação/métodos , Biologia de Sistemas/métodos , Biologia Computacional/métodos , Bases de Dados de Compostos Químicos/normas , Perfilação da Expressão Gênica/métodos , Biblioteca Gênica , Humanos , Armazenamento e Recuperação da Informação/métodos , Programas Nacionais de Saúde , National Institutes of Health (U.S.)/normas , Transcriptoma , Estados Unidos
10.
Anal Chem ; 90(1): 649-656, 2018 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-29035042

RESUMO

NMR is a widely used analytical technique with a growing number of repositories available. As a result, demands for a vendor-agnostic, open data format for long-term archiving of NMR data have emerged with the aim to ease and encourage sharing, comparison, and reuse of NMR data. Here we present nmrML, an open XML-based exchange and storage format for NMR spectral data. The nmrML format is intended to be fully compatible with existing NMR data for chemical, biochemical, and metabolomics experiments. nmrML can capture raw NMR data, spectral data acquisition parameters, and where available spectral metadata, such as chemical structures associated with spectral assignments. The nmrML format is compatible with pure-compound NMR data for reference spectral libraries as well as NMR data from complex biomixtures, i.e., metabolomics experiments. To facilitate format conversions, we provide nmrML converters for Bruker, JEOL and Agilent/Varian vendor formats. In addition, easy-to-use Web-based spectral viewing, processing, and spectral assignment tools that read and write nmrML have been developed. Software libraries and Web services for data validation are available for tool developers and end-users. The nmrML format has already been adopted for capturing and disseminating NMR data for small molecules by several open source data processing tools and metabolomics reference spectral libraries, e.g., serving as storage format for the MetaboLights data repository. The nmrML open access data standard has been endorsed by the Metabolomics Standards Initiative (MSI), and we here encourage user participation and feedback to increase usability and make it a successful standard.


Assuntos
Bases de Dados de Compostos Químicos/normas , Espectroscopia de Ressonância Magnética/estatística & dados numéricos , Metabolômica/métodos , Software
11.
Nucleic Acids Res ; 46(D1): D661-D667, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29136241

RESUMO

WikiPathways (wikipathways.org) captures the collective knowledge represented in biological pathways. By providing a database in a curated, machine readable way, omics data analysis and visualization is enabled. WikiPathways and other pathway databases are used to analyze experimental data by research groups in many fields. Due to the open and collaborative nature of the WikiPathways platform, our content keeps growing and is getting more accurate, making WikiPathways a reliable and rich pathway database. Previously, however, the focus was primarily on genes and proteins, leaving many metabolites with only limited annotation. Recent curation efforts focused on improving the annotation of metabolism and metabolic pathways by associating unmapped metabolites with database identifiers and providing more detailed interaction knowledge. Here, we report the outcomes of the continued growth and curation efforts, such as a doubling of the number of annotated metabolite nodes in WikiPathways. Furthermore, we introduce an OpenAPI documentation of our web services and the FAIR (Findable, Accessible, Interoperable and Reusable) annotation of resources to increase the interoperability of the knowledge encoded in these pathways and experimental omics data. New search options, monthly downloads, more links to metabolite databases, and new portals make pathway knowledge more effortlessly accessible to individual researchers and research communities.


Assuntos
Bases de Dados de Compostos Químicos , Metabolômica , Animais , Curadoria de Dados , Mineração de Dados , Bases de Dados de Compostos Químicos/normas , Bases de Dados Genéticas , Humanos , Redes e Vias Metabólicas , Controle de Qualidade , Ferramenta de Busca , Software
12.
Crit Rev Toxicol ; 47(8): 705-727, 2017 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-28510487

RESUMO

The threshold of toxicological concern (TTC) approach is a resource-effective de minimis method for the safety assessment of chemicals, based on distributional analysis of the results of a large number of toxicological studies. It is being increasingly used to screen and prioritize substances with low exposure for which there is little or no toxicological information. The first step in the approach is the identification of substances that may be DNA-reactive mutagens, to which the lowest TTC value is applied. This TTC value was based on the analysis of the cancer potency database and involved a number of assumptions that no longer reflect the state-of-the-science and some of which were not as transparent as they could have been. Hence, review and updating of the database is proposed, using inclusion and exclusion criteria reflecting current knowledge. A strategy for the selection of appropriate substances for TTC determination, based on consideration of weight of evidence for genotoxicity and carcinogenicity is outlined. Identification of substances that are carcinogenic by a DNA-reactive mutagenic mode of action and those that clearly act by a non-genotoxic mode of action will enable the protectiveness to be determined of both the TTC for DNA-reactive mutagenicity and that applied by default to substances that may be carcinogenic but are unlikely to be DNA-reactive mutagens (i.e. for Cramer class I-III compounds). Critical to the application of the TTC approach to substances that are likely to be DNA-reactive mutagens is the reliability of the software tools used to identify such compounds. Current methods for this task are reviewed and recommendations made for their application.


Assuntos
Carcinógenos/química , Bases de Dados de Compostos Químicos/normas , Mutagênicos/química , Software/normas , Humanos , Medição de Risco
13.
SAR QSAR Environ Res ; 27(11): 939-965, 2016 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-27885862

RESUMO

The increasing availability of large collections of chemical structures and associated experimental data provides an opportunity to build robust QSAR models for applications in different fields. One common concern is the quality of both the chemical structure information and associated experimental data. Here we describe the development of an automated KNIME workflow to curate and correct errors in the structure and identity of chemicals using the publicly available PHYSPROP physicochemical properties and environmental fate datasets. The workflow first assembles structure-identity pairs using up to four provided chemical identifiers, including chemical name, CASRNs, SMILES, and MolBlock. Problems detected included errors and mismatches in chemical structure formats, identifiers and various structure validation issues, including hypervalency and stereochemistry descriptions. Subsequently, a machine learning procedure was applied to evaluate the impact of this curation process. The performance of QSAR models built on only the highest-quality subset of the original dataset was compared with the larger curated and corrected dataset. The latter showed statistically improved predictive performance. The final workflow was used to curate the full list of PHYSPROP datasets, and is being made publicly available for further usage and integration by the scientific community.


Assuntos
Curadoria de Dados/métodos , Bases de Dados de Compostos Químicos/normas , Conjuntos de Dados como Assunto/normas , Relação Quantitativa Estrutura-Atividade , Aprendizado de Máquina , Estrutura Molecular
14.
J Comput Aided Mol Des ; 29(9): 885-96, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26201396

RESUMO

The emergence of a number of publicly available bioactivity databases, such as ChEMBL, PubChem BioAssay and BindingDB, has raised awareness about the topics of data curation, quality and integrity. Here we provide an overview and discussion of the current and future approaches to activity, assay and target data curation of the ChEMBL database. This curation process involves several manual and automated steps and aims to: (1) maximise data accessibility and comparability; (2) improve data integrity and flag outliers, ambiguities and potential errors; and (3) add further curated annotations and mappings thus increasing the usefulness and accuracy of the ChEMBL data for all users and modellers in particular. Issues related to activity, assay and target data curation and integrity along with their potential impact for users of the data are discussed, alongside robust selection and filter strategies in order to avoid or minimise these, depending on the desired application.


Assuntos
Bioensaio , Confiabilidade dos Dados , Bases de Dados de Compostos Químicos , Curadoria de Dados/normas , Bases de Dados de Compostos Químicos/normas , Bases de Dados Factuais , Concentração Inibidora 50
16.
Med Sci (Paris) ; 31(4): 417-22, 2015 Apr.
Artigo em Francês | MEDLINE | ID: mdl-25958760

RESUMO

The French National Compound Library (Chimiothèque Nationale) has been created in 2003 and is the federation of local collections. It contains more than 56 000 small molecules and natural compounds synthesised or isolated in different laboratories over the past years. This explains the diversity of the collection. The strength of this initiative is the ability to connect chemists and biologists for the development of hits. This development involves the synthesis of analogues or/and chemical tools to find new targets. These collaborations lead to the identification of new chemical probes. These probes able to modulate a biological function are essential to study biological pathways. They can also be useful for therapeutic applications. This article will describe the major achievements and perspectives of the French Chemical Library.


Assuntos
Bibliotecas de Moléculas Pequenas , Bases de Dados de Compostos Químicos/normas , Bases de Dados de Compostos Químicos/provisão & distribuição , Bases de Dados de Compostos Químicos/tendências , Avaliação Pré-Clínica de Medicamentos , Serviços de Informação sobre Medicamentos/normas , Serviços de Informação sobre Medicamentos/provisão & distribuição , Serviços de Informação sobre Medicamentos/tendências , França , Humanos , Disseminação de Informação , Conformação Molecular , Bibliotecas de Moléculas Pequenas/provisão & distribuição
17.
Nat Chem Biol ; 11(5): 301, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25885950
18.
Mol Inform ; 34(9): 585-97, 2015 09.
Artigo em Inglês | MEDLINE | ID: mdl-27490710

RESUMO

In this paper we take a historical view of e-Science and e-Research developments within the Chemical Sciences at the University of Southampton, showing the development of several stages of the evolving data ecosystem as Chemistry moves into the digital age of the 21(st) Century. We cover our research on aspects of the representation of chemical information in the context of the world wide web (WWW) and its semantic enhancement (the Semantic Web) and illustrate this with the example of the representation of quantities and units within the Semantic Web. We explore the changing nature of laboratories as computing power becomes increasing powerful and pervasive and specifically look at the function and role of electronic or digital notebooks. Having focussed on the creation of chemical data and information in context, we finish the paper by following the use and reuse of this data as facilitated by the features provided by digital repositories and their importance in facilitating the exchange of chemical information touching on the issues of open and or intelligent access to the data.


Assuntos
Simulação por Computador/tendências , Bases de Dados de Compostos Químicos/normas , Bases de Dados de Compostos Químicos/tendências , Internet
19.
Drug Discov Today ; 18(17-18): 843-52, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23702085

RESUMO

Molecular information systems play an important part in modern data-driven drug discovery. They do not only support decision making but also enable new discoveries via association and inference. In this review, we outline the scientific requirements identified by the Innovative Medicines Initiative (IMI) Open PHACTS consortium for the design of an open pharmacological space (OPS) information system. The focus of this work is the integration of compound-target-pathway-disease/phenotype data for public and industrial drug discovery research. Typical scientific competency questions provided by the consortium members will be analyzed based on the underlying data concepts and associations needed to answer the questions. Publicly available data sources used to target these questions as well as the need for and potential of semantic web-based technology will be presented.


Assuntos
Bases de Dados de Compostos Químicos , Bases de Dados de Produtos Farmacêuticos , Descoberta de Drogas/métodos , Sistemas de Informação , Semântica , Integração de Sistemas , Mineração de Dados , Bases de Dados de Compostos Químicos/normas , Bases de Dados de Produtos Farmacêuticos/normas , Descoberta de Drogas/normas , Guias como Assunto , Sistemas de Informação/normas , Bases de Conhecimento , Estrutura Molecular , Relação Estrutura-Atividade
20.
Molecules ; 18(1): 735-56, 2013 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-23299552

RESUMO

With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families of drug target proteins for benchmarking LB-CADD methods. Each data set is public domain through PubChem and carefully collated through confirmation screens validating active compounds. These data sets provide the foundation for benchmarking a new cheminformatics framework BCL::ChemInfo, which is freely available for non-commercial use. Quantitative structure activity relationship (QSAR) models are built using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Kohonen networks (KNs). Problem-specific descriptor optimization protocols are assessed including Sequential Feature Forward Selection (SFFS) and various information content measures. Measures of predictive power and confidence are evaluated through cross-validation, and a consensus prediction scheme is tested that combines orthogonal machine learning algorithms into a single predictor. Enrichments ranging from 15 to 101 for a TPR cutoff of 25% are observed.


Assuntos
Bases de Dados de Compostos Químicos/normas , Ensaios de Triagem em Larga Escala/normas , Relação Quantitativa Estrutura-Atividade , Algoritmos , Animais , Área Sob a Curva , Simulação por Computador , Árvores de Decisões , Descoberta de Drogas/normas , Humanos , Concentração Inibidora 50 , Ligantes , Modelos Químicos , Redes Neurais de Computação , Melhoria de Qualidade , Curva ROC , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...