Search | VHL Regional Portal

Extreme Gradient Boosting as a Method for Quantitative Structure-Activity Relationships.

Sheridan, Robert P; Wang, Wei Min; Liaw, Andy; Ma, Junshui; Gifford, Eric M.

J Chem Inf Model ; 56(12): 2353-2360, 2016 12 27.

Article in English | MEDLINE | ID: mdl-27958738

ABSTRACT

In the pharmaceutical industry it is common to generate many QSAR models from training sets containing a large number of molecules and a large number of descriptors. The best QSAR methods are those that can generate the most accurate predictions but that are not overly expensive computationally. In this paper we compare eXtreme Gradient Boosting (XGBoost) to random forest and single-task deep neural nets on 30 in-house data sets. While XGBoost has many adjustable parameters, we can define a set of standard parameters at which XGBoost makes predictions, on the average, better than those of random forest and almost as good as those of deep neural nets. The biggest strength of XGBoost is its speed. Whereas efficient use of random forest requires generating each tree in parallel on a cluster, and deep neural nets are usually run on GPUs, XGBoost can be run on a single CPU in less than a third of the wall-clock time of either of the other methods.

Subject(s)

Quantitative Structure-Activity Relationship , Algorithms , Databases, Pharmaceutical , Drug Discovery , Humans , Models, Biological , Software

Systems chemical biology and the Semantic Web: what they mean for the future of drug discovery research.

Wild, David J; Ding, Ying; Sheth, Amit P; Harland, Lee; Gifford, Eric M; Lajiness, Michael S.

Drug Discov Today ; 17(9-10): 469-74, 2012 May.

Article in English | MEDLINE | ID: mdl-22222943

ABSTRACT

Systems chemical biology, the integration of chemistry, biology and computation to generate understanding about the way small molecules affect biological systems as a whole, as well as related fields such as chemogenomics, are central to emerging new paradigms of drug discovery such as drug repurposing and personalized medicine. Recent Semantic Web technologies such as RDF and SPARQL are technical enablers of systems chemical biology, facilitating the deployment of advanced algorithms for searching and mining large integrated datasets. In this paper, we aim to demonstrate how these technologies together can change the way that drug discovery is accomplished.

Subject(s)

Drug Discovery , Systems Biology/methods , Algorithms , Humans , Internet , Semantics

Comparing bioassay response and similarity ensemble approaches to probing protein pharmacology.

Chen, Bin; McConnell, Kevin J; Wale, Nikil; Wild, David J; Gifford, Eric M.

Bioinformatics ; 27(21): 3044-9, 2011 Nov 01.

Article in English | MEDLINE | ID: mdl-21903625

ABSTRACT

MOTIVATION: Networks to predict protein pharmacology can be created using ligand similarity or using known bioassay response profiles of ligands. Recent publications indicate that similarity methods can be highly accurate, but it has been unclear how similarity methods compare to methods that use bioassay response data directly. RESULTS: We created protein networks based on ligand similarity (Similarity Ensemble Approach or SEA) and ligand bioassay response-data (BARD) using 155 Pfizer internal BioPrint assays. Both SEA and BARD successfully cluster together proteins with known relationships, and predict some non-obvious relationships. Although the approaches assess target relations from different perspectives, their networks overlap considerably (40% overlap of the top 2% of correlated edges). They can thus be considered as comparable methods, with a distinct advantage of the similarity methods that they only require simple computations (similarity of compound) as opposed to extensive experimental data. CONTACTS: djwild@indiana.edu; eric.gifford@pfizer.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Drug Design , Proteins/chemistry , Proteins/metabolism , Biological Assay , Cluster Analysis , Ligands , Protein Interaction Maps

Using open source computational tools for predicting human metabolic stability and additional absorption, distribution, metabolism, excretion, and toxicity properties.

Gupta, Rishi R; Gifford, Eric M; Liston, Ted; Waller, Chris L; Hohman, Moses; Bunin, Barry A; Ekins, Sean.

Drug Metab Dispos ; 38(11): 2083-90, 2010 Nov.

Article in English | MEDLINE | ID: mdl-20693417

ABSTRACT

Ligand-based computational models could be more readily shared between researchers and organizations if they were generated with open source molecular descriptors [e.g., chemistry development kit (CDK)] and modeling algorithms, because this would negate the requirement for proprietary commercial software. We initially evaluated open source descriptors and model building algorithms using a training set of approximately 50,000 molecules and a test set of approximately 25,000 molecules with human liver microsomal metabolic stability data. A C5.0 decision tree model demonstrated that CDK descriptors together with a set of Smiles Arbitrary Target Specification (SMARTS) keys had good statistics [κ = 0.43, sensitivity = 0.57, specificity = 0.91, and positive predicted value (PPV) = 0.64], equivalent to those of models built with commercial Molecular Operating Environment 2D (MOE2D) and the same set of SMARTS keys (κ = 0.43, sensitivity = 0.58, specificity = 0.91, and PPV = 0.63). Extending the dataset to â¼193,000 molecules and generating a continuous model using Cubist with a combination of CDK and SMARTS keys or MOE2D and SMARTS keys confirmed this observation. When the continuous predictions and actual values were binned to get a categorical score we observed a similar κ statistic (0.42). The same combination of descriptor set and modeling method was applied to passive permeability and P-glycoprotein efflux data with similar model testing statistics. In summary, open source tools demonstrated predictive results comparable to those of commercial software with attendant cost savings. We discuss the advantages and disadvantages of open source descriptors and the opportunity for their use as a tool for organizations to share data precompetitively, avoiding repetition and assisting drug discovery.

Subject(s)

Computational Biology/methods , Drug Discovery/methods , Models, Biological , Pharmaceutical Preparations/metabolism , Software , Toxicology/methods , Absorption , Algorithms , Computer Simulation , Drug Stability , Humans , Microsomes, Liver/metabolism , Pharmaceutical Preparations/chemistry , Predictive Value of Tests , Solubility , Tissue Distribution

The development and validation of a computational model to predict rat liver microsomal clearance.

Chang, Cheng; Duignan, David B; Johnson, Kjell D; Lee, Pil H; Cowan, George S; Gifford, Eric M; Stankovic, Charles J; Lepsy, Christopher S; Stoner, Chad L.

J Pharm Sci ; 98(8): 2857-67, 2009 Aug.

Article in English | MEDLINE | ID: mdl-19116953

ABSTRACT

As the cost of discovering and developing new pharmaceutically relevant compounds continues to rise, it is increasingly important to select the right molecules to prosecute very early in drug discovery. The development of high throughput in vitro assays of hepatic metabolic clearance has allowed for vast quantities of data generation; however, these large screens are still costly and remain dependant on animal usage. To further expand the value of these screens and ultimately aid in animal usage reduction, we have developed an in silico model of rat liver microsomal (RLM) clearance. This model combines a large amount of rat clearance data (n = 27,697) generated at multiple Pfizer laboratories to represent the broadest possible chemistry space. The model predicts RLM stability (with 82% accuracy and a kappa value of 0.65 for test data set) based solely on chemical structural inputs, and provides a clear assessment of confidence in the prediction. The current in silico model should help accelerate the drug discovery process by using confidence-based stability-driven prioritization, and reduce cost by filtering out the most unstable/undesirable molecules. The model can also increase efficiency in the evaluation of chemical series by optimizing iterative testing and promoting rational drug design.

Subject(s)

Computational Biology/methods , Computational Biology/standards , Microsomes, Liver/metabolism , Models, Biological , Animals , Metabolic Clearance Rate/drug effects , Predictive Value of Tests , Rats

Development of CYP3A4 inhibition models: comparisons of machine-learning techniques and molecular descriptors.

Arimoto, Rieko; Prasad, Madhu-Ashni; Gifford, Eric M.

J Biomol Screen ; 10(3): 197-205, 2005 Apr.

Article in English | MEDLINE | ID: mdl-15809315

ABSTRACT

Computational models of cytochrome P450 3A4 inhibition were developed based on high-throughput screening data for 4470 proprietary compounds. Multiple models differentiating inhibitors (IC(50) <3 microM) and noninhibitors were generated using various machine-learning algorithms (recursive partitioning [RP], Bayesian classifier, logistic regression, k-nearest-neighbor, and support vector machine [SVM]) with structural fingerprints and topological indices. Nineteen models were evaluated by internal 10-fold cross-validation and also by an independent test set. Three most predictive models, Barnard Chemical Information (BCI)-fingerprint/SVM, MDL-keyset/SVM, and topological indices/RP, correctly classified 249, 248, and 236 compounds of 291 noninhibitors and 135, 137, and 147 compounds of 179 inhibitors in the validation set. Their overall accuracies were 82%, 82%, and 81%, respectively. Investigating applicability of the BCI/SVM model found a strong correlation between the predictive performance and the structural similarity to the training set. Using Tanimoto similarity index as a confidence measurement for the predictions, the limitation of the extrapolation was 0.7 in the case of the BCI/SVM model. Taking consensus of the 3 best models yielded a further improvement in predictive capability, kappa = 0.65 and accuracy = 83%. The consensus model could also be tuned to minimize either false positives or false negatives depending on the emphasis of the screening.

Subject(s)

Artificial Intelligence , Cytochrome P-450 Enzyme Inhibitors , Drug Evaluation, Preclinical/methods , Enzyme Inhibitors/chemistry , Models, Chemical , Computer Simulation , Cytochrome P-450 CYP3A , Enzyme Inhibitors/pharmacology , Humans , Models, Molecular , Molecular Structure

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL