Search | VHL Regional Portal

Prediction of HPLC retention index using artificial neural networks and IGroup E-state indices.

Albaugh, Daniel R; Hall, L Mark; Hill, Dennis W; Kertesz, Tzipporah M; Parham, Marc; Hall, Lowell H; Grant, David F.

J Chem Inf Model ; 49(4): 788-99, 2009 Apr.

Article in English | MEDLINE | ID: mdl-19309176

ABSTRACT

A back-propagation artificial neural network (ANN) was used to create a 10-fold leave-10%-out cross-validated ensemble model of high performance liquid chromatography retention index (HPLC-RI) for a data set of 498 diverse druglike compounds. A 10-fold multiple linear regression (MLR) ensemble model of the same data was developed for comparison. Molecular structure was described using IGroup E-state indices, a novel set of structure-information representation (SIR) descriptors, along with molecular connectivity chi and kappa indices and other SIR descriptors previously reported. The same input descriptors were used to develop models by both learning algorithms. The MLR model yielded marginally acceptable statistics with training correlation r(2) = 0.65, mean absolute error (MAE) = 83 RI units. External validation of 104 compounds not used for model development yielded validation v(2) = 0.49 and MAE = 73 RI units. The distribution of residuals for the fit and validate data sets suggest a nonlinear relationship between retention index and molecular structure as described by the SIR indices. Not surprisingly, the ANN model was significantly more accurate for both training and validation with training set r(2) = 0.93, MAE = 30 RI units and validation v(2) = 0.84, MAE = 41 RI units. For the ANN model, a total of 91% of validation predictions were within 100 RI units of the experimental value.

Subject(s)

Chromatography, High Pressure Liquid/statistics & numerical data , Neural Networks, Computer , Algorithms , Artificial Intelligence , Cluster Analysis , Databases, Factual , Forecasting , Linear Models , Models, Chemical , Quantitative Structure-Activity Relationship , Reproducibility of Results , Subject Headings

QSAR modeling of human serum protein binding with several modeling techniques utilizing structure-information representation.

Votano, Joseph R; Parham, Marc; Hall, L Mark; Hall, Lowell H; Kier, Lemont B; Oloff, Scott; Tropsha, Alexander.

J Med Chem ; 49(24): 7169-81, 2006 Nov 30.

Article in English | MEDLINE | ID: mdl-17125269

ABSTRACT

Four modeling techniques, using topological descriptors to represent molecular structure, were employed to produce models of human serum protein binding (% bound) on a data set of 1008 experimental values, carefully screened from publicly available sources. To our knowledge, this data is the largest set on human serum protein binding reported for QSAR modeling. The data was partitioned into a training set of 808 compounds and an external validation test set of 200 compounds. Partitioning was accomplished by clustering the compounds in a structure descriptor space so that random sampling of 20% of the whole data set produced an external test set that is a good representative of the training set with respect to both structure and protein binding values. The four modeling techniques include multiple linear regression (MLR), artificial neural networks (ANN), k-nearest neighbors (kNN), and support vector machines (SVM). With the exception of the MLR model, the ANN, kNN, and SVM QSARs were ensemble models. Training set correlation coefficients and mean absolute error ranged from r2=0.90 and MAE=7.6 for ANN to r2=0.61 and MAE=16.2 for MLR. Prediction results from the validation set yielded correlation coefficients and mean absolute errors which ranged from r2=0.70 and MAE=14.1 for ANN to a low of r2=0.59 and MAE=18.3 for the SVM model. Structure descriptors that contribute significantly to the models are discussed and compared with those found in other published models. For the ANN model, structure descriptor trends with respect to their affects on predicted protein binding can assist the chemist in structure modification during the drug design process.

Subject(s)

Blood Proteins/metabolism , Models, Molecular , Pharmaceutical Preparations/metabolism , Quantitative Structure-Activity Relationship , Drug Design , Humans , Linear Models , Neural Networks, Computer , Protein Binding

Protein crystallization: virtual screening and optimization.

Delucas, Lawrence J; Hamrick, David; Cosenza, Larry; Nagy, Lisa; McCombs, Debbie; Bray, Terry; Chait, Arnon; Stoops, Brad; Belgovskiy, Alexander; William Wilson, W; Parham, Marc; Chernov, Nikolai.

Prog Biophys Mol Biol ; 88(3): 285-309, 2005 Jul.

Article in English | MEDLINE | ID: mdl-15652246

ABSTRACT

Advances in genomics have yielded entire genetic sequences for a variety of prokaryotic and eukaryotic organisms. This accumulating information has escalated the demands for three-dimensional protein structure determinations. As a result, high-throughput structural genomics has become a major international research focus. This effort has already led to several significant improvements in X-ray crystallographic and nuclear magnetic resonance methodologies. Crystallography is currently the major contributor to three-dimensional protein structure information. However, the production of soluble, purified protein and diffraction-quality crystals are clearly the major roadblocks preventing the realization of high-throughput structure determination. This paper discusses a novel approach that may improve the efficiency and success rate for protein crystallization. An automated nanodispensing system is used to rapidly prepare crystallization conditions using minimal sample. Proteins are subjected to an incomplete factorial screen (balanced parameter screen), thereby efficiently searching the entire "crystallization space" for suitable conditions. The screen conditions and scored experimental results are subsequently analyzed using a neural network algorithm to predict new conditions likely to yield improved crystals. Results based on a small number of proteins suggest that the combination of a balanced incomplete factorial screen and neural network analysis may provide an efficient method for producing diffraction-quality protein crystals.

Subject(s)

Combinatorial Chemistry Techniques/methods , Crystallization/methods , Models, Molecular , Neural Networks, Computer , Proteins/chemistry , Robotics/methods , Sequence Analysis, Protein/methods , Computer Simulation , Multiprotein Complexes/chemistry , Multiprotein Complexes/ultrastructure , Protein Conformation , Proteins/ultrastructure

New predictors for several ADME/Tox properties: aqueous solubility, human oral absorption, and Ames genotoxicity using topological descriptors.

Votano, Joseph R; Parham, Marc; Hall, Lowell H; Kier, Lemont B.

Mol Divers ; 8(4): 379-91, 2004.

Article in English | MEDLINE | ID: mdl-15612642

ABSTRACT

In silico predictive models for aqueous solubility, human intestinal absorption (HIA), and Ames genotoxicity were developed principally using artificial neural net (ANN) analysis and topological descriptors. Approximately 10,000 compounds spread across three data sets were used in the construction of these quantitative-structure-activity/property-relationship (QSAR/QSPR) models. For aqueous solubility, 5,037 chemically diverse compounds were used to construct ANN-QSPRs for intrinsic aqueous solubility. When these robust models were applied to 938 compounds in external validation, they gave an r2 = 0.78 with 84% predicted within 1 log unit for these new chemical entities (NCEs). 417 therapeutic drugs were used in the development of an ANN-QSPR to predict for percent oral absorption (%OA). For validation testing on 195 new drugs, 92% of the compounds were predicted to within 25% of their reported %OA values, which ranged from 0% to 100%. Polar surface area and logP, the octanol-water partition coefficient, were found to be important descriptors in our QSPR model. Development of an ANN-QSAR as a genotoxicity predictor for S. typhimurium employed 2963 compounds including 290 therapeutic drugs. Validation results on 400 NCEs with the ANN-QSAR gave a concordance of 83% which rose to 91% when a confidence indicator was applied. With new drugs a concordance of 92% was reached, which increased to 97% when the reliably indicator was invoked.

Subject(s)

Administration, Oral , Mutagenicity Tests , Humans , Hydrogen-Ion Concentration , Intestines/drug effects , Models, Chemical , Models, Theoretical , Pharmaceutical Preparations , Quantitative Structure-Activity Relationship , Salmonella typhimurium , Sensitivity and Specificity , Solubility , Tissue Distribution , Water

Three new consensus QSAR models for the prediction of Ames genotoxicity.

Votano, Joseph R; Parham, Marc; Hall, Lowell H; Kier, Lemont B; Oloff, Scott; Tropsha, Alexander; Xie, Qian; Tong, Weida.

Mutagenesis ; 19(5): 365-77, 2004 Sep.

Article in English | MEDLINE | ID: mdl-15388809

ABSTRACT

Three QSAR methods, artificial neural net (ANN), k-nearest neighbors (kNN), and Decision Forest (DF), were applied to 3363 diverse compounds tested for their Ames genotoxicity. The ratio of mutagens to non-mutagens was 60/40 for this dataset. This group of compounds includes >300 therapeutic drugs. All models were developed using the same initial set of 148 topological indices: molecular connectivity chi indices and electrotopological state indices (atom-type, bond-type and group-type E-state), as well as binary indicators. While previous studies have found logP to be a determining factor in genotoxicity, it was not found to be important by any modeling method employed in this study. The three models yielded an average training/test concordance value of 88%, with a low percentage of false positives and false negatives. External validation testing on 400 compounds not used for QSAR model development gave an average concordance of 82%. This value increased to 92% upon removal of less reliable outcomes, as determined by a reliability criterion used within each model. The ANN model showed the best performance in predicting drug compounds, yielding 97% concordance (34/35 drugs) after the removal of less reliable predictions. The appreciable commonality found among the top 10 ranked descriptors from each model is of particular interest because of the diversity in the learning algorithms and descriptor selection techniques employed in this study. Forty percent of the most important descriptors in any one model are found in one or two other models. Fourteen of the most important descriptors relate directly to known toxicophores involved in potent genotoxic responses in Salmonella typhimurium. A comparison of the validation results with those of MULTICASE and DEREK indicated that the new models presented in this work perform substantially better than the former models in predicting genotoxicity of therapeutic drugs. Substantially higher specificity was achieved with these new models as compared with MULTICASE or DEREK with comparable sensitivities among all models.

Subject(s)

Mutagenicity Tests/methods , Algorithms , DNA Damage , Databases as Topic , Models, Chemical , Models, Theoretical , Mutagens , Neural Networks, Computer , Pharmaceutical Preparations , Quantitative Structure-Activity Relationship , Salmonella typhimurium/drug effects , Sensitivity and Specificity , Software , Structure-Activity Relationship

Prediction of aqueous solubility based on large datasets using several QSPR models utilizing topological structure representation.

Votano, Joseph R; Parham, Marc; Hall, Lowell H; Kier, Lemont B; Hall, L Mark.

Chem Biodivers ; 1(11): 1829-41, 2004 Nov.

Article in English | MEDLINE | ID: mdl-17191819

ABSTRACT

Several QSPR models were developed for predicting intrinsic aqueous solubility, S(o). A data set of 5,964 neutral compounds was sub-divided into two classes, aromatic and non-aromatic compounds. Three models were created with different methods on both data sets: two regression models (multiple linear regression and partial least squares) and an artificial neural network model. These models were based on 3343 aromatic and 1674 non-aromatic compounds for training sets; 938 compounds were used in external validation testing. The range in -log S(o) is -1.6 to 10. Topological structure descriptors were used with all models. A genetic algorithm was used for descriptor selection for regression models. For the artificial neural network (ANN) model, descriptor selection was done with a backward elimination process. All models performed well with r2 values ranging 0.72 to 0.84 in external validation testing. The mean absolute errors in validation ranged from 0.44 to 0.80 for the classes of compounds for all the models. These statistical results indicate a sound ANN model. Furthermore, in a comparison with eight other available models, based on predictions using a validation test set (442 compounds), the artificial neural network model presented in this work (CSLogWS) was clearly superior based on both the mean absolute error and the percentage of residuals less than one log unit. In the ANN model both E-State and hydrogen E-State descriptors were found to be important.

Subject(s)

Databases, Factual , Models, Molecular , Quantitative Structure-Activity Relationship , Water/chemistry , Molecular Structure , Predictive Value of Tests , Solubility

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL