Pesquisa | Portal Regional da BVS (teste)

1.

Flexible Fitting of PROTAC Concentration-Response Curves with Changepoint Gaussian Processes.

Semenova, Elizaveta; Guerriero, Maria Luisa; Zhang, Bairu; Hock, Andreas; Hopcroft, Philip; Kadamur, Ganesh; Afzal, Avid M; Lazic, Stanley E.

SLAS Discov ; 26(9): 1212-1224, 2021 10.

Artigo em Inglês | MEDLINE | ID: mdl-34543136

RESUMO

A proteolysis-targeting chimera (PROTAC) is a new technology that marks proteins for degradation in a highly specific manner. During screening, PROTAC compounds are tested in concentration-response (CR) assays to determine their potency, and parameters such as the half-maximal degradation concentration (DC50) are estimated from the fitted CR curves. These parameters are used to rank compounds, with lower DC50 values indicating greater potency. However, PROTAC data often exhibit biphasic and polyphasic relationships, making standard sigmoidal CR models inappropriate. A common solution includes manual omitting of points (the so-called masking step), allowing standard models to be used on the reduced data sets. Due to its manual and subjective nature, masking becomes a costly and nonreproducible procedure. We therefore used a Bayesian changepoint Gaussian processes model that can flexibly fit both nonsigmoidal and sigmoidal CR curves without user input. Parameters such as the DC50, maximum effect Dmax, and point of departure (PoD) are estimated from the fitted curves. We then rank compounds based on one or more parameters and propagate the parameter uncertainty into the rankings, enabling us to confidently state if one compound is better than another. Hence, we used a flexible and automated procedure for PROTAC screening experiments. By minimizing subjective decisions, our approach reduces time and cost and ensures reproducibility of the compound-ranking procedure. The code and data are provided on GitHub (https://github.com/elizavetasemenova/gp_concentration_response).

Assuntos

Modelos Teóricos , Proteínas/química , Proteólise , Proteínas/metabolismo

2.

Probabilistic Random Forest improves bioactivity predictions close to the classification threshold by taking into account experimental uncertainty.

Mervin, Lewis H; Trapotsi, Maria-Anna; Afzal, Avid M; Barrett, Ian P; Bender, Andreas; Engkvist, Ola.

J Cheminform ; 13(1): 62, 2021 Aug 19.

Artigo em Inglês | MEDLINE | ID: mdl-34412708

RESUMO

Measurements of protein-ligand interactions have reproducibility limits due to experimental errors. Any model based on such assays will consequentially have such unavoidable errors influencing their performance which should ideally be factored into modelling and output predictions, such as the actual standard deviation of experimental measurements (σ) or the associated comparability of activity values between the aggregated heterogenous activity units (i.e., Ki versus IC50 values) during dataset assimilation. However, experimental errors are usually a neglected aspect of model generation. In order to improve upon the current state-of-the-art, we herein present a novel approach toward predicting protein-ligand interactions using a Probabilistic Random Forest (PRF) classifier. The PRF algorithm was applied toward in silico protein target prediction across ~ 550 tasks from ChEMBL and PubChem. Predictions were evaluated by taking into account various scenarios of experimental standard deviations in both training and test sets and performance was assessed using fivefold stratified shuffled splits for validation. The largest benefit in incorporating the experimental deviation in PRF was observed for data points close to the binary threshold boundary, when such information was not considered in any way in the original RF algorithm. For example, in cases when σ ranged between 0.4-0.6 log units and when ideal probability estimates between 0.4-0.6, the PRF outperformed RF with a median absolute error margin of ~ 17%. In comparison, the baseline RF outperformed PRF for cases with high confidence to belong to the active class (far from the binary decision threshold), although the RF models gave errors smaller than the experimental uncertainty, which could indicate that they were overtrained and/or over-confident. Finally, the PRF models trained with putative inactives decreased the performance compared to PRF models without putative inactives and this could be because putative inactives were not assigned an experimental pXC50 value, and therefore they were considered inactives with a low uncertainty (which in practice might not be true). In conclusion, PRF can be useful for target prediction models in particular for data where class boundaries overlap with the measurement uncertainty, and where a substantial part of the training data is located close to the classification threshold.

3.

Comparison of Chemical Structure and Cell Morphology Information for Multitask Bioactivity Predictions.

Trapotsi, Maria-Anna; Mervin, Lewis H; Afzal, Avid M; Sturm, Noé; Engkvist, Ola; Barrett, Ian P; Bender, Andreas.

J Chem Inf Model ; 61(3): 1444-1456, 2021 03 22.

Artigo em Inglês | MEDLINE | ID: mdl-33661004

RESUMO

The understanding of the mechanism-of-action (MoA) of compounds and the prediction of potential drug targets play an important role in small-molecule drug discovery. The aim of this work was to compare chemical and cell morphology information for bioactivity prediction. The comparison was performed using bioactivity data from the ExCAPE database, image data (in the form of CellProfiler features) from the Cell Painting data set (the largest publicly available data set of cell images with â¼30,000 compound perturbations), and extended connectivity fingerprints (ECFPs) using the multitask Bayesian matrix factorization (BMF) approach Macau. We found that the BMF Macau and random forest (RF) performance were overall similar when ECFPs were used as compound descriptors. However, BMF Macau outperformed RF in 159 out of 224 targets (71%) when image data were used as compound information. Using BMF Macau, 100 (corresponding to about 45%) and 90 (about 40%) of the 224 targets were predicted with high predictive performance (AUC > 0.8) with ECFP data and image data as side information, respectively. There were targets better predicted by image data as side information, such as ß-catenin, and others better predicted by fingerprint-based side information, such as proteins belonging to the G-protein-Coupled Receptor 1 family, which could be rationalized from the underlying data distributions in each descriptor domain. In conclusion, both cell morphology changes and chemical structure information contain information about compound bioactivity, which is also partially complementary, and can hence contribute to in silico MoA analysis.

Assuntos

Descoberta de Drogas , Proteínas , Teorema de Bayes , Simulação por Computador , Bases de Dados Factuais

4.

New Associations between Drug-Induced Adverse Events in Animal Models and Humans Reveal Novel Candidate Safety Targets.

Giblin, Kathryn A; Basili, Danilo; Afzal, Avid M; Rosenbrier-Ribeiro, Lyn; Greene, Nigel; Barrett, Ian; Hughes, Samantha J; Bender, Andreas.

Chem Res Toxicol ; 34(2): 438-451, 2021 02 15.

Artigo em Inglês | MEDLINE | ID: mdl-33338378

RESUMO

To improve our ability to extrapolate preclinical toxicity to humans, there is a need to understand and quantify the concordance of adverse events (AEs) between animal models and clinical studies. In the present work, we discovered 3011 statistically significant associations between preclinical and clinical AEs caused by drugs reported in the PharmaPendium database of which 2952 were new associations between toxicities encoded by different Medical Dictionary for Regulatory Activities terms across species. To find plausible and testable candidate off-target drug activities for the derived associations, we investigated the genetic overlap between the genes linked to both a preclinical and a clinical AE and the protein targets found to interact with one or more drugs causing both AEs. We discuss three associations from the analysis in more detail for which novel candidate off-target drug activities could be identified, namely, the association of preclinical mutagenicity readouts with clinical teratospermia and ovarian failure, the association of preclinical reflexes abnormal with clinical poor-quality sleep, and the association of preclinical psychomotor hyperactivity with clinical drug withdrawal syndrome. Our analysis successfully identified a total of 77% of known safety targets currently tested in in vitro screening panels plus an additional 431 genes which were proposed for investigation as future safety targets for different clinical toxicities. This work provides new translational toxicity relationships beyond AE term-matching, the results of which can be used for risk profiling of future new chemical entities for clinical studies and for the development of future in vitro safety panels.

Assuntos

Sistemas de Notificação de Reações Adversas a Medicamentos , Preparações Farmacêuticas/química , Animais , Bases de Dados Factuais , Humanos , Modelos Animais , Estrutura Molecular

5.

Systematic Analysis of Protein Targets Associated with Adverse Events of Drugs from Clinical Trials and Postmarketing Reports.

Smit, Ines A; Afzal, Avid M; Allen, Chad H G; Svensson, Fredrik; Hanser, Thierry; Bender, Andreas.

Chem Res Toxicol ; 34(2): 365-384, 2021 02 15.

Artigo em Inglês | MEDLINE | ID: mdl-33351593

RESUMO

Adverse drug reactions (ADRs) are undesired effects of medicines that can harm patients and are a significant source of attrition in drug development. ADRs are anticipated by routinely screening drugs against secondary pharmacology protein panels. However, there is still a lack of quantitative information on the links between these off-target proteins and the reporting of ADRs in humans. Here, we present a systematic analysis of associations between measured and predicted in vitro bioactivities of drugs and adverse events (AEs) in humans from two sources of data: the Side Effect Resource, derived from clinical trials, and the Food and Drug Administration Adverse Event Reporting System, derived from postmarketing surveillance. The ratio of a drug's therapeutic unbound plasma concentration over the drug's in vitro potency against a given protein was used to select proteins most likely to be relevant to in vivo effects. In examining individual target bioactivities as predictors of AEs, we found a trade-off between the positive predictive value and the fraction of drugs with AEs that can be detected. However, considering sets of multiple targets for the same AE can help identify a greater fraction of AE-associated drugs. Of the 45 targets with statistically significant associations to AEs, 30 are included on existing safety target panels. The remaining 15 targets include 9 carbonic anhydrases, of which CA5B is significantly associated with cholestatic jaundice. We include the full quantitative data on associations between measured and predicted in vitro bioactivities and AEs in humans in this work, which can be used to make a more informed selection of safety profiling targets.

Assuntos

Preparações Farmacêuticas/química , Proteínas/análise , Ensaios Clínicos como Assunto , Humanos , Estrutura Molecular , Preparações Farmacêuticas/sangue , Proteínas/antagonistas & inibidores , Estados Unidos , United States Food and Drug Administration

6.

Comparison of Scaling Methods to Obtain Calibrated Probabilities of Activity for Protein-Ligand Predictions.

Mervin, Lewis H; Afzal, Avid M; Engkvist, Ola; Bender, Andreas.

J Chem Inf Model ; 60(10): 4546-4559, 2020 10 26.

Artigo em Inglês | MEDLINE | ID: mdl-32865408

RESUMO

In the context of bioactivity prediction, the question of how to calibrate a score produced by a machine learning method into a probability of binding to a protein target is not yet satisfactorily addressed. In this study, we compared the performance of three such methods, namely, Platt scaling (PS), isotonic regression (IR), and Venn-ABERS predictors (VA), in calibrating prediction scores obtained from ligand-target prediction comprising the Naïve Bayes, support vector machines, and random forest (RF) algorithms. Calibration quality was assessed on bioactivity data available at AstraZeneca for 40 million data points (compound-target pairs) across 2112 targets and performance was assessed using stratified shuffle split (SSS) and leave 20% of scaffolds out (L20SO) validation. VA achieved the best calibration performances across all machine learning algorithms and cross validation methods tested and also the lowest (best) Brier score loss (mean squared difference between the outputted probability estimates assigned to a compound and the actual outcome). In comparison, the PS and IR methods can actually degrade the assigned probability estimates, particularly for the RF for SSS and during L20SO. Sphere exclusion, a method to sample additional (putative) inactive compounds, was shown to inflate the overall Brier score loss performance, through the artificial requirement for inactive molecules to be dissimilar to active compounds, but was shown to result in overconfident estimators. VA was able to successfully calibrate the probability estimates for even small calibration sets. The multiprobability values (lower and upper probability boundary intervals) were shown to produce large discordance for test set molecules that are neither very similar nor very dissimilar to the active training set, which were hence difficult to predict, suggesting that multiprobability discordance can be used as an estimate for target prediction uncertainty. Overall, we were able to show in this work that VA scaling of target prediction models is able to improve probability estimates in all testing instances and is currently being applied for in-house approaches.

Assuntos

Aprendizado de Máquina , Máquina de Vetores de Suporte , Teorema de Bayes , Ligantes , Probabilidade

7.

Understanding Conditional Associations between ToxCast in Vitro Readouts and the Hepatotoxicity of Compounds Using Rule-Based Methods.

Mahmoud, Samar Y; Svensson, Fredrik; Zoufir, Azedine; Módos, Dezso; Afzal, Avid M; Bender, Andreas.

Chem Res Toxicol ; 33(1): 137-153, 2020 01 21.

Artigo em Inglês | MEDLINE | ID: mdl-31442032

RESUMO

Current in vitro models for hepatotoxicity commonly suffer from low detection rates due to incomplete coverage of bioactivity space. Additionally, in vivo exposure measures such as Cmax are used for hepatotoxicity screening and are unavailable early on. Here we propose a novel rule-based framework to extract interpretable and biologically meaningful multiconditional associations to prioritize in vitro end points for hepatotoxicity and understand the associated physicochemical conditions. The data used in this study were derived for 673 compounds from 361 ToxCast bioactivity measurements and 29 calculated physicochemical properties against two lowest effective levels (LEL) of rodent hepatotoxicity from ToxRefDB, namely 15 mg/kg/day and 500 mg/kg/day. To achieve 80% coverage of toxic compounds, 35 rules with accuracies ranging from 96% to 73% using 39 unique ToxCast assays are needed at a threshold level of 500 mg/kg/day, whereas to describe the same coverage at a threshold of 15 mg/kg/day, 20 rules with accuracies of between 98% and 81% were needed, comprising 24 unique assays. Despite the 33-fold difference in dose levels, we found relative consistency in the key mechanistic groups in rule clusters, namely (i) activities against Cytochrome P, (ii) immunological responses, and (iii) nuclear receptor activities. Less specific effects, such as oxidative stress and cell cycle arrest, were used more by rules to describe toxicity at the level of 500 mg/kg/day. Although the endocrine disruption through nuclear receptor activity formulated an essential cluster of rules, this bioactivity was not covered in four commercial assay setups for hepatotoxicity. Using an external set of 29 drugs with drug-induced liver injury (DILI) labels, we found that promiscuity over important assays discriminates between compounds with different levels of liver injury. In vitro-in vivo associations were also improved by incorporating physicochemical properties especially for the potent, 15 mg/kg/day toxicity level as well for assays describing nuclear receptor activity and phenotypic changes. The most frequently used physicochemical properties, predictive for hepatotoxicity in combination with assay activities, are linked to bioavailability, which were the number of rotatable bonds (less than 7) at a of level of 15 mg/kg/day and the number of rings (of less than 3) at level of 500 mg/kg/day. In summary, hepatotoxicity cannot very well be captured by single assay end points, but better by a combination of bioactivities in relevant assays, with the likelihood of hepatotoxicity increasing with assay promiscuity. Together, these findings can be used to prioritize assay combinations that are appropriate to assess potential hepatotoxicity.

Assuntos

Doença Hepática Induzida por Substâncias e Drogas , Avaliação Pré-Clínica de Medicamentos/métodos , Animais , Bioensaio , Ensaios de Triagem em Larga Escala , Humanos , Fígado , Testes de Toxicidade

8.

Prediction of UGT-mediated Metabolism Using the Manually Curated MetaQSAR Database.

Mazzolari, Angelica; Afzal, Avid M; Pedretti, Alessandro; Testa, Bernard; Vistoli, Giulio; Bender, Andreas.

ACS Med Chem Lett ; 10(4): 633-638, 2019 Apr 11.

Artigo em Inglês | MEDLINE | ID: mdl-30996809

RESUMO

Even though glucuronidations are the most frequent metabolic reactions of conjugation, both in quantitative and qualitative terms, they have rather seldom been investigated using computational approaches. To fill this gap, we have used the manually collected MetaQSAR metabolic reaction database to generate two models for the prediction of UGT-mediated metabolism, both based on molecular descriptors and implementing the Random Forest algorithm. The first model predicts the occurrence of the reaction and was internally validated with a Matthew correlation coefficient (MCC) of 0.76 and an area under the ROC curve (AUC) of 0.94, and further externally validated using a test set composed of 120 additional xenobiotics (MCC of 0.70 and AUC of 0.90). The second model distinguishes between O- and N-glucuronidations and was optimized by the random undersampling procedure to improve the predictive accuracy during the internal validation, with the recall measure of the minority class increasing from 0.55 to 0.78.

9.

Information-Derived Mechanistic Hypotheses for Structural Cardiotoxicity.

Svensson, Fredrik; Zoufir, Azedine; Mahmoud, Samar; Afzal, Avid M; Smit, Ines; Giblin, Kathryn A; Clements, Peter J; Mettetal, Jerome T; Pointon, Amy; Harvey, James S; Greene, Nigel; Williams, Richard V; Bender, Andreas.

Chem Res Toxicol ; 31(11): 1119-1127, 2018 11 19.

Artigo em Inglês | MEDLINE | ID: mdl-30350600

RESUMO

Adverse events resulting from drug therapy can be a cause of drug withdrawal, reduced and or restricted clinical use, as well as a major economic burden for society. To increase the safety of new drugs, there is a need to better understand the mechanisms causing the adverse events. One way to derive new mechanistic hypotheses is by linking data on drug adverse events with the drugs' biological targets. In this study, we have used data mining techniques and mutual information statistical approaches to find associations between reported adverse events collected from the FDA Adverse Event Reporting System and assay outcomes from ToxCast, with the aim to generate mechanistic hypotheses related to structural cardiotoxicity (morphological damage to cardiomyocytes and/or loss of viability). Our workflow identified 22 adverse event-assay outcome associations. From these associations, 10 implicated targets could be substantiated with evidence from previous studies reported in the literature. For two of the identified targets, we also describe a more detailed mechanism, forming putative adverse outcome pathways associated with structural cardiotoxicity. Our study also highlights the difficulties deriving these type of associations from the very limited amount of data available.

Assuntos

Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Cardiopatias/induzido quimicamente , Modelos Teóricos , Sistemas de Notificação de Reações Adversas a Medicamentos , Animais , Mineração de Dados , Bases de Dados Factuais , Humanos , Estados Unidos , United States Food and Drug Administration

10.

Extending in Silico Protein Target Prediction Models to Include Functional Effects.

Mervin, Lewis H; Afzal, Avid M; Brive, Lars; Engkvist, Ola; Bender, Andreas.

Front Pharmacol ; 9: 613, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29942259

RESUMO

In silico protein target deconvolution is frequently used for mechanism-of-action investigations; however existing protocols usually do not predict compound functional effects, such as activation or inhibition, upon binding to their protein counterparts. This study is hence concerned with including functional effects in target prediction. To this end, we assimilated a bioactivity training set for 332 targets, comprising 817,239 active data points with unknown functional effect (binding data) and 20,761,260 inactive compounds, along with 226,045 activating and 1,032,439 inhibiting data points from functional screens. Chemical space analysis of the data first showed some separation between compound sets (binding and inhibiting compounds were more similar to each other than both binding and activating or activating and inhibiting compounds), providing a rationale for implementing functional prediction models. We employed three different architectures to predict functional response, ranging from simplistic random forest models ('Arch1') to cascaded models which use separate binding and functional effect classification steps ('Arch2' and 'Arch3'), differing in the way training sets were generated. Fivefold stratified cross-validation outlined cascading predictions provides superior precision and recall based on an internal test set. We next prospectively validated the architectures using a temporal set of 153,467 of in-house data points (after a 4-month interim from initial data extraction). Results outlined Arch3 performed with the highest target class averaged precision and recall scores of 71% and 53%, which we attribute to the use of inactive background sets. Distance-based applicability domain (AD) analysis outlined that Arch3 provides superior extrapolation into novel areas of chemical space, and thus based on the results presented here, propose as the most suitable architecture for the functional effect prediction of small molecules. We finally conclude including functional effects could provide vital insight in future studies, to annotate cases of unanticipated functional changeover, as outlined by our CHRM1 case study.

11.

Maximizing gain in high-throughput screening using conformal prediction.

Svensson, Fredrik; Afzal, Avid M; Norinder, Ulf; Bender, Andreas.

J Cheminform ; 10(1): 7, 2018 Feb 21.

Artigo em Inglês | MEDLINE | ID: mdl-29468427

RESUMO

Iterative screening has emerged as a promising approach to increase the efficiency of screening campaigns compared to traditional high throughput approaches. By learning from a subset of the compound library, inferences on what compounds to screen next can be made by predictive models, resulting in more efficient screening. One way to evaluate screening is to consider the cost of screening compared to the gain associated with finding an active compound. In this work, we introduce a conformal predictor coupled with a gain-cost function with the aim to maximise gain in iterative screening. Using this setup we were able to show that by evaluating the predictions on the training data, very accurate predictions on what settings will produce the highest gain on the test data can be made. We evaluate the approach on 12 bioactivity datasets from PubChem training the models using 20% of the data. Depending on the settings of the gain-cost function, the settings generating the maximum gain were accurately identified in 8-10 out of the 12 datasets. Broadly, our approach can predict what strategy generates the highest gain based on the results of the cost-gain evaluation: to screen the compounds predicted to be active, to screen all the remaining data, or not to screen any additional compounds. When the algorithm indicates that the predicted active compounds should be screened, our approach also indicates what confidence level to apply in order to maximize gain. Hence, our approach facilitates decision-making and allocation of the resources where they deliver the most value by indicating in advance the likely outcome of a screening campaign.

12.

Orthologue chemical space and its influence on target prediction.

Mervin, Lewis H; Bulusu, Krishna C; Kalash, Leen; Afzal, Avid M; Svensson, Fredrik; Firth, Mike A; Barrett, Ian; Engkvist, Ola; Bender, Andreas.

Bioinformatics ; 34(1): 72-79, 2018 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-28961699

RESUMO

Motivation: In silico approaches often fail to utilize bioactivity data available for orthologous targets due to insufficient evidence highlighting the benefit for such an approach. Deeper investigation into orthologue chemical space and its influence toward expanding compound and target coverage is necessary to improve the confidence in this practice. Results: Here we present analysis of the orthologue chemical space in ChEMBL and PubChem and its impact on target prediction. We highlight the number of conflicting bioactivities between human and orthologues is low and annotations are overall compatible. Chemical space analysis shows orthologues are chemically dissimilar to human with high intra-group similarity, suggesting they could effectively extend the chemical space modelled. Based on these observations, we show the benefit of orthologue inclusion in terms of novel target coverage. We also benchmarked predictive models using a time-series split and also using bioactivities from Chemistry Connect and HTS data available at AstraZeneca, showing that orthologue bioactivity inclusion statistically improved performance. Availability and implementation: Orthologue-based bioactivity prediction and the compound training set are available at www.github.com/lhm30/PIDGINv2. Contact: ab454@cam.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Biologia Computacional/métodos , Simulação por Computador , Descoberta de Drogas/métodos , Proteínas/metabolismo , Homologia de Sequência de Aminoácidos , Animais , Humanos , Ligantes , Modelos Biológicos , Proteínas/efeitos dos fármacos

13.

The Parzen Window method: In terms of two vectors and one matrix.

Mussa, Hamse Y; Mitchell, John B O; Afzal, Avid M.

Pattern Recognit Lett ; 63: 30-35, 2015 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-26435560

RESUMO

Pattern classification methods assign an object to one of several predefined classes/categories based on features extracted from observed attributes of the object (pattern). When L discriminatory features for the pattern can be accurately determined, the pattern classification problem presents no difficulty. However, precise identification of the relevant features for a classification algorithm (classifier) to be able to categorize real world patterns without errors is generally infeasible. In this case, the pattern classification problem is often cast as devising a classifier that minimizes the misclassification rate. One way of doing this is to consider both the pattern attributes and its class label as random variables, estimate the posterior class probabilities for a given pattern and then assign the pattern to the class/category for which the posterior class probability value estimated is maximum. More often than not, the form of the posterior class probabilities is unknown. The so-called Parzen Window approach is widely employed to estimate class-conditional probability (class-specific probability) densities for a given pattern. These probability densities can then be utilized to estimate the appropriate posterior class probabilities for that pattern. However, the Parzen Window scheme can become computationally impractical when the size of the training dataset is in the tens of thousands and L is also large (a few hundred or more). Over the years, various schemes have been suggested to ameliorate the computational drawback of the Parzen Window approach, but the problem still remains outstanding and unresolved. In this paper, we revisit the Parzen Window technique and introduce a novel approach that may circumvent the aforementioned computational bottleneck. The current paper presents the mathematical aspect of our idea. Practical realizations of the proposed scheme will be given elsewhere.

14.

Target prediction utilising negative bioactivity data covering large chemical space.

Mervin, Lewis H; Afzal, Avid M; Drakakis, Georgios; Lewis, Richard; Engkvist, Ola; Bender, Andreas.

J Cheminform ; 7: 51, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26500705

RESUMO

BACKGROUND: In silico analyses are increasingly being used to support mode-of-action investigations; however many such approaches do not utilise the large amounts of inactive data held in chemogenomic repositories. The objective of this work is concerned with the integration of such bioactivity data in the target prediction of orphan compounds to produce the probability of activity and inactivity for a range of targets. To this end, a novel human bioactivity data set was constructed through the assimilation of over 195 million bioactivity data points deposited in the ChEMBL and PubChem repositories, and the subsequent application of a sphere-exclusion selection algorithm to oversample presumed inactive compounds. RESULTS: A Bernoulli Naïve Bayes algorithm was trained using the data and evaluated using fivefold cross-validation, achieving a mean recall and precision of 67.7 and 63.8 % for active compounds and 99.6 and 99.7 % for inactive compounds, respectively. We show the performances of the models are considerably influenced by the underlying intraclass training similarity, the size of a given class of compounds, and the degree of additional oversampling. The method was also validated using compounds extracted from WOMBAT producing average precision-recall AUC and BEDROC scores of 0.56 and 0.85, respectively. Inactive data points used for this test are based on presumed inactivity, producing an approximated indication of the true extrapolative ability of the models. A distance-based applicability domain analysis was also conducted; indicating an average Tanimoto Coefficient distance of 0.3 or greater between a test and training set can be used to give a global measure of confidence in model predictions. A final comparison to a method trained solely on active data from ChEMBL performed with precision-recall AUC and BEDROC scores of 0.45 and 0.76. CONCLUSIONS: The inclusion of inactive data for model training produces models with superior AUC and improved early recognition capabilities, although the results from internal and external validation of the models show differing performance between the breadth of models. The realised target prediction protocol is available at https://github.com/lhm30/PIDGIN.Graphical abstractThe inclusion of large scale negative training data for in silico target prediction improves the precision and recall AUC and BEDROC scores for target models.

15.

A multi-label approach to target prediction taking ligand promiscuity into account.

Afzal, Avid M; Mussa, Hamse Y; Turner, Richard E; Bender, Andreas; Glen, Robert C.

J Cheminform ; 7: 24, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26064191

RESUMO

BACKGROUND: According to Cobanoglu et al., it is now widely acknowledged that the single target paradigm (one protein/target, one disease, one drug) that has been the dominant premise in drug development in the recent past is untenable. More often than not, a drug-like compound (ligand) can be promiscuous - it can interact with more than one target protein. In recent years, in in silico target prediction methods the promiscuity issue has generally been approached computationally in three main ways: ligand-based methods; target-protein-based methods; and integrative schemes. In this study we confine attention to ligand-based target prediction machine learning approaches, commonly referred to as target-fishing. The target-fishing approaches that are currently ubiquitous in cheminformatics literature can be essentially viewed as single-label multi-classification schemes; these approaches inherently bank on the single target paradigm assumption that a ligand can zero in on one single target. In order to address the ligand promiscuity issue, one might be able to cast target-fishing as a multi-label multi-class classification problem. For illustrative and comparison purposes, single-label and multi-label Naïve Bayes classification models (denoted here by SMM and MMM, respectively) for target-fishing were implemented. The models were constructed and tested on 65,587 compounds/ligands and 308 targets retrieved from the ChEMBL17 database. RESULTS: On classifying 3,332 test multi-label (promiscuous) compounds, SMM and MMM performed differently. At the 0.05 significance level, a Wilcoxon signed rank test performed on the paired target predictions yielded by SMM and MMM for the test ligands gave a p-value < 5.1 × 10(-94) and test statistics value of 6.8 × 10(5), in favour of MMM. The two models performed differently when tested on four datasets comprising single-label (non-promiscuous) compounds; McNemar's test yielded χ (2) values of 15.657, 16.500 and 16.405 (with corresponding p-values of 7.594 × 10(-05), 4.865 × 10(-05) and 5.115 × 10(-05)), respectively, for three test sets, in favour of MMM. The models performed similarly on the fourth set. CONCLUSIONS: The target prediction results obtained in this study indicate that multi-label multi-class approaches are more apt than the ubiquitous single-label multi-class schemes when it comes to the application of ligand-based classifiers to target-fishing.

16.

Bridging of anions by hydrogen bonds in nest motifs and its significance for Schellman loops and other larger motifs within proteins.

Afzal, Avid M; Al-Shubailly, Fawzia; Leader, David P; Milner-White, E James.

Proteins ; 82(11): 3023-31, 2014 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-25132631

RESUMO

The nest is a protein motif of three consecutive amino acid residues with dihedral angles 1,2-αR αL (RL nests) or 1,2-αL αR (LR nests). Many nests form a depression in which an anion or Î´-negative acceptor atom is bound by hydrogen bonds from the main chain NH groups. We have determined the extent and nature of this bridging in a database of protein structures using a computer program written for the purpose. Acceptor anions are bound by a pair of bridging hydrogen bonds in 40% of RL nests and 20% of LR nests. Two thirds of the bridges are between the NH groups at Positions 1 and 3 of the motif (N1N3-bridging)-which confers a concavity to the nest; one third are of the N2N3 type-which does not. In bridged LR nests N2N3-bridging predominates (14% N1N3: 75% N2N3), whereas in bridged RL nests the reverse is true (69% N1N3: 25% N2N3). Most bridged nests occur within larger motifs: 45% in (hexapeptide) Schellman loops with an additional 4 â 0 hydrogen bond (N1N3), 11% in Schellman loops with an additional 5 â 1 hydrogen bond (N2N3), 12% in a composite structure including a type 1ß-bulge loop and an asx- or ST- motif (N1N3)-remarkably homologous to the N1N3-bridged Schellman loop-and 3% in a composite structure including a type 2ß-bulge loop and an asx-motif (N2N3). A third hydrogen bond is a previously unrecognized feature of Schellman loops as those lacking bridged nests have an additional 4 â 0 hydrogen bond.

Assuntos

Motivos de Aminoácidos , Ligação de Hidrogênio , Algoritmos , Ânions/química , Biologia Computacional/métodos , Modelos Moleculares , Conformação Proteica

17.

FAst MEtabolizer (FAME): A rapid and accurate predictor of sites of metabolism in multiple species by endogenous enzymes.

Kirchmair, Johannes; Williamson, Mark J; Afzal, Avid M; Tyzack, Jonathan D; Choy, Alison P K; Howlett, Andrew; Rydberg, Patrik; Glen, Robert C.

J Chem Inf Model ; 53(11): 2896-907, 2013 Nov 25.

Artigo em Inglês | MEDLINE | ID: mdl-24219364

RESUMO

FAst MEtabolizer (FAME) is a fast and accurate predictor of sites of metabolism (SoMs). It is based on a collection of random forest models trained on diverse chemical data sets of more than 20 000 molecules annotated with their experimentally determined SoMs. Using a comprehensive set of available data, FAME aims to assess metabolic processes from a holistic point of view. It is not limited to a specific enzyme family or species. Besides a global model, dedicated models are available for human, rat, and dog metabolism; specific prediction of phase I and II metabolism is also supported. FAME is able to identify at least one known SoM among the top-1, top-2, and top-3 highest ranked atom positions in up to 71%, 81%, and 87% of all cases tested, respectively. These prediction rates are comparable to or better than SoM predictors focused on specific enzyme families (such as cytochrome P450s), despite the fact that FAME uses only seven chemical descriptors. FAME covers a very broad chemical space, which together with its inter- and extrapolation power makes it applicable to a wide range of chemicals. Predictions take less than 2.5 s per molecule in batch mode on an Ultrabook. Results are visualized using Jmol, with the most likely SoMs highlighted.

Assuntos

Algoritmos , Células Eucarióticas/enzimologia , Inativação Metabólica , Redes e Vias Metabólicas , Software , Animais , Inteligência Artificial , Sistema Enzimático do Citocromo P-450/química , Sistema Enzimático do Citocromo P-450/metabolismo , Diazepam/química , Diazepam/metabolismo , Cães , Humanos , Modelos Químicos , Teoria Quântica , Ratos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA