Search | VHL Regional Portal

QSAR modeling of human serum protein binding with several modeling techniques utilizing structure-information representation.

Votano, Joseph R; Parham, Marc; Hall, L Mark; Hall, Lowell H; Kier, Lemont B; Oloff, Scott; Tropsha, Alexander.

J Med Chem ; 49(24): 7169-81, 2006 Nov 30.

Article in English | MEDLINE | ID: mdl-17125269

ABSTRACT

Four modeling techniques, using topological descriptors to represent molecular structure, were employed to produce models of human serum protein binding (% bound) on a data set of 1008 experimental values, carefully screened from publicly available sources. To our knowledge, this data is the largest set on human serum protein binding reported for QSAR modeling. The data was partitioned into a training set of 808 compounds and an external validation test set of 200 compounds. Partitioning was accomplished by clustering the compounds in a structure descriptor space so that random sampling of 20% of the whole data set produced an external test set that is a good representative of the training set with respect to both structure and protein binding values. The four modeling techniques include multiple linear regression (MLR), artificial neural networks (ANN), k-nearest neighbors (kNN), and support vector machines (SVM). With the exception of the MLR model, the ANN, kNN, and SVM QSARs were ensemble models. Training set correlation coefficients and mean absolute error ranged from r2=0.90 and MAE=7.6 for ANN to r2=0.61 and MAE=16.2 for MLR. Prediction results from the validation set yielded correlation coefficients and mean absolute errors which ranged from r2=0.70 and MAE=14.1 for ANN to a low of r2=0.59 and MAE=18.3 for the SVM model. Structure descriptors that contribute significantly to the models are discussed and compared with those found in other published models. For the ANN model, structure descriptor trends with respect to their affects on predicted protein binding can assist the chemist in structure modification during the drug design process.

Subject(s)

Blood Proteins/metabolism , Models, Molecular , Pharmaceutical Preparations/metabolism , Quantitative Structure-Activity Relationship , Drug Design , Humans , Linear Models , Neural Networks, Computer , Protein Binding

Recent uses of topological indices in the development of in silico ADMET models.

Votano, Joseph R.

Curr Opin Drug Discov Devel ; 8(1): 32-7, 2005 Jan.

Article in English | MEDLINE | ID: mdl-15679169

ABSTRACT

Topological indices are employed in an ever-widening family of quantitative models to describe the structural attributes of compounds as these relate to experimental endpoints in a host of physicochemical and biological processes. This is especially true where attention to ADMET (absorption, distribution, metabolism, excretion and toxicity) properties is a priority, using various training or learning algorithms to construct quantitative structure-activity or -property relationship ADMET models. This review discusses the in silico ADMET approaches used over the past two years, where the majority of descriptors are topological, including comparisons between models and their important descriptors where applicable. ADMET models for aqueous solubility involving several large datasets are reviewed, as are a number of models for human intestinal absorption. Also included is the use of topological indices extended to modeling metabolic stability for the cytochrome P450 cassette of enzymes in an interesting study with compounds measured in a uniform bioassay using human liver S9 homogenate. Finally, in the area of genotoxicity, two recent ADMET models are discussed, one for chromosomal aberrations and another using a large compound dataset for Ames mutagenicity. The latter study involved several thousand compounds, with comparisons of validation results for a number of well-known predictors of mutagenicity.

Subject(s)

Drug-Related Side Effects and Adverse Reactions , Toxicology/methods , Animals , Chemical Phenomena , Chemistry, Physical , Computer Simulation , Humans , Mutagens/toxicity , Pharmaceutical Preparations/chemistry , Pharmaceutical Preparations/metabolism , Pharmacokinetics

New predictors for several ADME/Tox properties: aqueous solubility, human oral absorption, and Ames genotoxicity using topological descriptors.

Votano, Joseph R; Parham, Marc; Hall, Lowell H; Kier, Lemont B.

Mol Divers ; 8(4): 379-91, 2004.

Article in English | MEDLINE | ID: mdl-15612642

ABSTRACT

In silico predictive models for aqueous solubility, human intestinal absorption (HIA), and Ames genotoxicity were developed principally using artificial neural net (ANN) analysis and topological descriptors. Approximately 10,000 compounds spread across three data sets were used in the construction of these quantitative-structure-activity/property-relationship (QSAR/QSPR) models. For aqueous solubility, 5,037 chemically diverse compounds were used to construct ANN-QSPRs for intrinsic aqueous solubility. When these robust models were applied to 938 compounds in external validation, they gave an r2 = 0.78 with 84% predicted within 1 log unit for these new chemical entities (NCEs). 417 therapeutic drugs were used in the development of an ANN-QSPR to predict for percent oral absorption (%OA). For validation testing on 195 new drugs, 92% of the compounds were predicted to within 25% of their reported %OA values, which ranged from 0% to 100%. Polar surface area and logP, the octanol-water partition coefficient, were found to be important descriptors in our QSPR model. Development of an ANN-QSAR as a genotoxicity predictor for S. typhimurium employed 2963 compounds including 290 therapeutic drugs. Validation results on 400 NCEs with the ANN-QSAR gave a concordance of 83% which rose to 91% when a confidence indicator was applied. With new drugs a concordance of 92% was reached, which increased to 97% when the reliably indicator was invoked.

Subject(s)

Administration, Oral , Mutagenicity Tests , Humans , Hydrogen-Ion Concentration , Intestines/drug effects , Models, Chemical , Models, Theoretical , Pharmaceutical Preparations , Quantitative Structure-Activity Relationship , Salmonella typhimurium , Sensitivity and Specificity , Solubility , Tissue Distribution , Water

Three new consensus QSAR models for the prediction of Ames genotoxicity.

Votano, Joseph R; Parham, Marc; Hall, Lowell H; Kier, Lemont B; Oloff, Scott; Tropsha, Alexander; Xie, Qian; Tong, Weida.

Mutagenesis ; 19(5): 365-77, 2004 Sep.

Article in English | MEDLINE | ID: mdl-15388809

ABSTRACT

Three QSAR methods, artificial neural net (ANN), k-nearest neighbors (kNN), and Decision Forest (DF), were applied to 3363 diverse compounds tested for their Ames genotoxicity. The ratio of mutagens to non-mutagens was 60/40 for this dataset. This group of compounds includes >300 therapeutic drugs. All models were developed using the same initial set of 148 topological indices: molecular connectivity chi indices and electrotopological state indices (atom-type, bond-type and group-type E-state), as well as binary indicators. While previous studies have found logP to be a determining factor in genotoxicity, it was not found to be important by any modeling method employed in this study. The three models yielded an average training/test concordance value of 88%, with a low percentage of false positives and false negatives. External validation testing on 400 compounds not used for QSAR model development gave an average concordance of 82%. This value increased to 92% upon removal of less reliable outcomes, as determined by a reliability criterion used within each model. The ANN model showed the best performance in predicting drug compounds, yielding 97% concordance (34/35 drugs) after the removal of less reliable predictions. The appreciable commonality found among the top 10 ranked descriptors from each model is of particular interest because of the diversity in the learning algorithms and descriptor selection techniques employed in this study. Forty percent of the most important descriptors in any one model are found in one or two other models. Fourteen of the most important descriptors relate directly to known toxicophores involved in potent genotoxic responses in Salmonella typhimurium. A comparison of the validation results with those of MULTICASE and DEREK indicated that the new models presented in this work perform substantially better than the former models in predicting genotoxicity of therapeutic drugs. Substantially higher specificity was achieved with these new models as compared with MULTICASE or DEREK with comparable sensitivities among all models.

Subject(s)

Mutagenicity Tests/methods , Algorithms , DNA Damage , Databases as Topic , Models, Chemical , Models, Theoretical , Mutagens , Neural Networks, Computer , Pharmaceutical Preparations , Quantitative Structure-Activity Relationship , Salmonella typhimurium/drug effects , Sensitivity and Specificity , Software , Structure-Activity Relationship

Prediction of aqueous solubility based on large datasets using several QSPR models utilizing topological structure representation.

Votano, Joseph R; Parham, Marc; Hall, Lowell H; Kier, Lemont B; Hall, L Mark.

Chem Biodivers ; 1(11): 1829-41, 2004 Nov.

Article in English | MEDLINE | ID: mdl-17191819

ABSTRACT

Several QSPR models were developed for predicting intrinsic aqueous solubility, S(o). A data set of 5,964 neutral compounds was sub-divided into two classes, aromatic and non-aromatic compounds. Three models were created with different methods on both data sets: two regression models (multiple linear regression and partial least squares) and an artificial neural network model. These models were based on 3343 aromatic and 1674 non-aromatic compounds for training sets; 938 compounds were used in external validation testing. The range in -log S(o) is -1.6 to 10. Topological structure descriptors were used with all models. A genetic algorithm was used for descriptor selection for regression models. For the artificial neural network (ANN) model, descriptor selection was done with a backward elimination process. All models performed well with r2 values ranging 0.72 to 0.84 in external validation testing. The mean absolute errors in validation ranged from 0.44 to 0.80 for the classes of compounds for all the models. These statistical results indicate a sound ANN model. Furthermore, in a comparison with eight other available models, based on predictions using a validation test set (442 compounds), the artificial neural network model presented in this work (CSLogWS) was clearly superior based on both the mean absolute error and the percentage of residuals less than one log unit. In the ANN model both E-State and hydrogen E-State descriptors were found to be important.

Subject(s)

Databases, Factual , Models, Molecular , Quantitative Structure-Activity Relationship , Water/chemistry , Molecular Structure , Predictive Value of Tests , Solubility

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL