Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
SLAS Discov ; 29(2): 100144, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38316342

RESUMO

The EUOS/SLAS challenge aimed to facilitate the development of reliable algorithms to predict the aqueous solubility of small molecules using experimental data from 100 K compounds. In total, hundred teams took part in the challenge to predict low, medium and highly soluble compounds as measured by the nephelometry assay. This article describes the winning model, which was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM) available on the website https://ochem.eu/article/27. We describe in detail the assumptions and steps used to select methods, descriptors and strategy which contributed to the winning solution. In particular we show that consensus based on 28 models calculated using descriptor-based and representation learning methods allowed us to obtain the best score, which was higher than those based on individual approaches or consensus models developed using each individual approach. A combination of diverse models allowed us to decrease both bias and variance of individual models and to calculate the highest score. The model based on Transformer CNN contributed the best individual score thus highlighting the power of Natural Language Processing (NLP) methods. The inclusion of information about aleatoric uncertainty would be important to better understand and use the challenge data by the contestants.


Assuntos
Algoritmos , Redes Neurais de Computação , Solubilidade , Consenso , Bases de Dados de Compostos Químicos
2.
J Chem Inf Model ; 63(12): 3629-3636, 2023 06 26.
Artigo em Inglês | MEDLINE | ID: mdl-37272707

RESUMO

The discovery of novel molecules with desirable properties is a classic challenge in medicinal chemistry. With the recent advancements of machine learning, there has been a surge of de novo drug design tools. However, few resources exist that are user-friendly as well as easily customizable. In this application note, we present the new versatile open-source software package DrugEx for multiobjective reinforcement learning. This package contains the consolidated and redesigned scripts from the prior DrugEx papers including multiple generator architectures, a variety of scoring tools, and multiobjective optimization methods. It has a flexible application programming interface and can readily be used via the command line interface or the graphical user interface GenUI. The DrugEx package is publicly available at https://github.com/CDDLeiden/DrugEx.


Assuntos
Aprendizado Profundo , Software , Desenho de Fármacos , Aprendizado de Máquina
3.
Bioorg Med Chem ; 46: 116388, 2021 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-34488021

RESUMO

The vast majority of approved drugs are metabolized by the five major cytochrome P450 (CYP) isozymes, 1A2, 2C9, 2C19, 2D6 and 3A4. Inhibition of CYP isozymes can cause drug-drug interactions with severe pharmacological and toxicological consequences. Computational methods for the fast and reliable prediction of the inhibition of CYP isozymes by small molecules are therefore of high interest and relevance to pharmaceutical companies and a host of other industries, including the cosmetics and agrochemical industries. Today, a large number of machine learning models for predicting the inhibition of the major CYP isozymes by small molecules are available. With this work we aim to go beyond the coverage of existing models, by combining data from several major public and proprietary sources. More specifically, we used up to 18815 compounds with measured bioactivities to train random forest classification models for the individual CYP isozymes. A major advantage of the new data collection over existing ones is the better representation of the minority class, the CYP inhibitors. With the new data collection we achieved inhibitor-to-non-inhibitor ratios in the order of 1:1 (CYP1A2) to 1:3 (CYP2D6). We show that our models reach competitive performance on external data, with Matthews correlation coefficients (MCCs) ranging from 0.62 (CYP2C19) to 0.70 (CYP2D6), and areas under the receiver operating characteristic curve (AUCs) between 0.89 (CYP2C19) and 0.92 (CYPs 2D6 and 3A4). Importantly, the models show a high level of robustness, reflected in a good predictivity also for compounds that are structurally dissimilar to the compounds represented in the training data. The best models presented in this work are freely accessible for academic research via a web service.


Assuntos
Inibidores das Enzimas do Citocromo P-450/farmacologia , Sistema Enzimático do Citocromo P-450/metabolismo , Aprendizado de Máquina , Inibidores das Enzimas do Citocromo P-450/síntese química , Inibidores das Enzimas do Citocromo P-450/química , Relação Dose-Resposta a Droga , Humanos , Modelos Moleculares , Estrutura Molecular , Relação Estrutura-Atividade
4.
Chem Res Toxicol ; 34(2): 286-299, 2021 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-32786543

RESUMO

Predicting the structures of metabolites formed in humans can provide advantageous insights for the development of drugs and other compounds. Here we present GLORYx, which integrates machine learning-based site of metabolism (SoM) prediction with reaction rule sets to predict and rank the structures of metabolites that could potentially be formed by phase 1 and/or phase 2 metabolism. GLORYx extends the approach from our previously developed tool GLORY, which predicted metabolite structures for cytochrome P450-mediated metabolism only. A robust approach to ranking the predicted metabolites is attained by using the SoM probabilities predicted by the FAME 3 machine learning models to score the predicted metabolites. On a manually curated test data set containing both phase 1 and phase 2 metabolites, GLORYx achieves a recall of 77% and an area under the receiver operating characteristic curve (AUC) of 0.79. Separate analysis of performance on a large amount of freely available phase 1 and phase 2 metabolite data indicates that achieving a meaningful ranking of predicted metabolites is more difficult for phase 2 than for phase 1 metabolites. GLORYx is freely available as a web server at https://nerdd.zbh.uni-hamburg.de/ and is also provided as a software package upon request. The data sets as well as all the reaction rules from this work are also made freely available.


Assuntos
Biotransformação , Aprendizado de Máquina , Testes de Toxicidade , Xenobióticos/metabolismo , Humanos , Estrutura Molecular , Xenobióticos/química
5.
Bioinformatics ; 36(4): 1291-1292, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-32077475

RESUMO

SUMMARY: The New E-Resource for Drug Discovery (NERDD) is a quickly expanding web portal focused on the provision of peer-reviewed in silico tools for drug discovery. NERDD currently hosts tools for predicting the sites of metabolism (FAME) and metabolites (GLORY) of small organic molecules, for flagging compounds that are likely to interfere with biological assays (Hit Dexter), and for identifying natural products and natural product derivatives in large compound collections (NP-Scout). Several additional models and components are currently in development. AVAILABILITY AND IMPLEMENTATION: The NERDD web server is available at https://nerdd.zbh.uni-hamburg.de. Most tools are also available as software packages for local installation.


Assuntos
Produtos Biológicos , Descoberta de Drogas , Simulação por Computador , Computadores , Internet , Software
6.
J Chem Inf Model ; 59(8): 3400-3412, 2019 08 26.
Artigo em Inglês | MEDLINE | ID: mdl-31361490

RESUMO

In this work we present the third generation of FAst MEtabolizer (FAME 3), a collection of extra trees classifiers for the prediction of sites of metabolism (SoMs) in small molecules such as drugs, druglike compounds, natural products, agrochemicals, and cosmetics. FAME 3 was derived from the MetaQSAR database ( Pedretti et al. J. Med. Chem. 2018 , 61 , 1019 ), a recently published data resource on xenobiotic metabolism that contains more than 2100 substrates annotated with more than 6300 experimentally confirmed SoMs related to redox reactions, hydrolysis and other nonredox reactions, and conjugation reactions. In tests with holdout data, FAME 3 models reached competitive performance, with Matthews correlation coefficients (MCCs) ranging from 0.50 for a global model covering phase 1 and phase 2 metabolism, to 0.75 for a focused model for phase 2 metabolism. A model focused on cytochrome P450 metabolism yielded an MCC of 0.57. Results from case studies with several synthetic compounds, natural products, and natural product derivatives demonstrate the agreement between model predictions and literature data even for molecules with structural patterns clearly distinct from those present in the training data. The applicability domains of the individual models were estimated by a new, atom-based distance measure (FAMEscore) that is based on a nearest-neighbor search in the space of atom environments. FAME 3 is available via a public web service at https://nerdd.zbh.uni-hamburg.de/ and as a self-contained Java software package, free for academic and noncommercial research.


Assuntos
Produtos Biológicos/metabolismo , Biologia Computacional/métodos , Enzimas/metabolismo , Sítios de Ligação , Bases de Dados de Produtos Farmacêuticos , Enzimas/química
7.
Front Chem ; 7: 402, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31249827

RESUMO

Computational prediction of xenobiotic metabolism can provide valuable information to guide the development of drugs, cosmetics, agrochemicals, and other chemical entities. We have previously developed FAME 2, an effective tool for predicting sites of metabolism (SoMs). In this work, we focus on the prediction of the chemical structures of metabolites, in particular metabolites of xenobiotics. To this end, we have developed a new tool, GLORY, which combines SoM prediction with FAME 2 and a new collection of rules for metabolic reactions mediated by the cytochrome P450 enzyme family. GLORY has two modes: MaxEfficiency and MaxCoverage. For MaxEfficiency mode, the use of predicted SoMs to restrict the locations in the molecule at which the reaction rules could be applied was explored. For MaxCoverage mode, the predicted SoM probabilities were instead used to develop a new scoring approach for the predicted metabolites. With this scoring approach, GLORY achieves a recall of 0.83 and can predict at least one known metabolite within the top three ranked positions for 76% of the molecules of a new, manually curated test set. GLORY is freely available as a web server at https://acm.zbh.uni-hamburg.de/glory/, and the datasets and reaction rules are provided in the Supplementary Material.

8.
J Chem Inf Model ; 59(3): 1030-1043, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-30624935

RESUMO

Assay interference caused by small molecules continues to pose a significant challenge for early drug discovery. A number of rule-based and similarity-based approaches have been derived that allow the flagging of potentially "badly behaving compounds", "bad actors", or "nuisance compounds". These compounds are typically aggregators, reactive compounds, and/or pan-assay interference compounds (PAINS), and many of them are frequent hitters. Hit Dexter is a recently introduced machine learning approach that predicts frequent hitters independent of the underlying physicochemical mechanisms (including also the binding of compounds based on "privileged scaffolds" to multiple binding sites). Here we report on the development of a second generation of machine learning models which now covers both primary screening assays and confirmatory dose-response assays. Protein sequence clustering was newly introduced to minimize the overrepresentation of structurally and functionally related proteins. The models correctly classified compounds of large independent test sets as (highly) promiscuous or nonpromiscuous with Matthews correlation coefficient (MCC) values of up to 0.64 and area under the receiver operating characteristic curve (AUC) values of up to 0.96. The models were also utilized to characterize sets of compounds with specific biological and physicochemical properties, such as dark chemical matter, aggregators, compounds from a high-throughput screening library, drug-like compounds, approved drugs, potential PAINS, and natural products. Among the most interesting outcomes is that the new Hit Dexter models predict the presence of large fractions of (highly) promiscuous compounds among approved drugs. Importantly, predictions of the individual Hit Dexter models are generally in good agreement and consistent with those of Badapple, an established statistical model for the prediction of frequent hitters. The new Hit Dexter 2.0 web service, available at http://hitdexter2.zbh.uni-hamburg.de , not only provides user-friendly access to all machine learning models presented in this work but also to similarity-based methods for the prediction of aggregators and dark chemical matter as well as a comprehensive collection of available rule sets for flagging frequent hitters and compounds including undesired substructures.


Assuntos
Aprendizado de Máquina , Preparações Farmacêuticas/química , Proteínas/química , Sítios de Ligação , Bases de Dados de Produtos Farmacêuticos , Ensaios de Triagem em Larga Escala/métodos , Modelos Moleculares , Ligação Proteica , Curva ROC , Bibliotecas de Moléculas Pequenas/química
9.
ChemMedChem ; 13(6): 564-571, 2018 03 20.
Artigo em Inglês | MEDLINE | ID: mdl-29285887

RESUMO

False-positive assay readouts caused by badly behaving compounds-frequent hitters, pan-assay interference compounds (PAINS), aggregators, and others-continue to pose a major challenge to experimental screening. There are only a few in silico methods that allow the prediction of such problematic compounds. We report the development of Hit Dexter, two extremely randomized trees classifiers for the prediction of compounds likely to trigger positive assay readouts either by true promiscuity or by assay interference. The models were trained on a well-prepared dataset extracted from the PubChem Bioassay database, consisting of approximately 311 000 compounds tested for activity on at least 50 proteins. Hit Dexter reached MCC and AUC values of up to 0.67 and 0.96 on an independent test set, respectively. The models are expected to be of high value, in particular to medicinal chemists and biochemists who can use Hit Dexter to identify compounds for which extra caution should be exercised with positive assay readouts. Hit Dexter is available as a free web service at http://hitdexter.zbh. uni-hamburg.de.


Assuntos
Ensaios de Triagem em Larga Escala/métodos , Aprendizado de Máquina , Simulação por Computador , Bases de Dados Factuais , Reações Falso-Positivas , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia
10.
J Chem Inf Model ; 57(8): 1832-1846, 2017 08 28.
Artigo em Inglês | MEDLINE | ID: mdl-28782945

RESUMO

We report on the further development of FAst MEtabolizer (FAME; J. Chem. Inf. MODEL: 2013, 53, 2896-2907), a collection of random forest models for the prediction of sites of metabolism (SoMs) of xenobiotics. A broad set of descriptors was explored, from simple 2D descriptors such as those used in FAME, to quantum chemical descriptors employed in some of the most accurate models for SoM prediction currently available. In line with the original FAME approach, our objective was to keep things simple and to come up with accurate and robust models that are based on a small number of 2D descriptors. We found that circular descriptions of atoms and their environments with such descriptors in combination with an extremely randomized trees algorithm can yield models that perform equally well compared to more complex approaches. Thorough evaluation experiments on an independent test set showed that the best of these models obtained a Matthews correlation coefficient, area under the receiver operating characteristic curve, and Top-2 accuracy of 0.57, 0.91 and 94.1%, respectively. Models for the prediction of isoform-specific regioselectivity of CYP 3A4, 2D6, and 2C9 were also developed and showed competitive performance. The best models have been integrated into a newly developed software package (FAME 2), which is available free of charge from the authors.


Assuntos
Biologia Computacional/métodos , Sistema Enzimático do Citocromo P-450/metabolismo , Aprendizado de Máquina , Software , Estereoisomerismo , Especificidade por Substrato , Xenobióticos/química , Xenobióticos/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...