Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
BMC Genomics ; 17: 205, 2016 Mar 08.
Article in English | MEDLINE | ID: mdl-26956490

ABSTRACT

BACKGROUND: Chemical bioavailability is an important dose metric in environmental risk assessment. Although many approaches have been used to evaluate bioavailability, not a single approach is free from limitations. Previously, we developed a new genomics-based approach that integrated microarray technology and regression modeling for predicting bioavailability (tissue residue) of explosives compounds in exposed earthworms. In the present study, we further compared 18 different regression models and performed variable selection simultaneously with parameter estimation. RESULTS: This refined approach was applied to both previously collected and newly acquired earthworm microarray gene expression datasets for three explosive compounds. Our results demonstrate that a prediction accuracy of R(2) = 0.71-0.82 was achievable at a relatively low model complexity with as few as 3-10 predictor genes per model. These results are much more encouraging than our previous ones. CONCLUSION: This study has demonstrated that our approach is promising for bioavailability measurement, which warrants further studies of mixed contamination scenarios in field settings.


Subject(s)
Explosive Agents/pharmacokinetics , Gene Expression Profiling/methods , Oligochaeta/genetics , Soil Pollutants/pharmacokinetics , Animals , Azocines/pharmacokinetics , Biological Availability , Oligochaeta/metabolism , Oligonucleotide Array Sequence Analysis , Regression Analysis , Triazines/pharmacokinetics , Trinitrotoluene/pharmacokinetics
2.
Mol Inform ; 33(9): 627-40, 2014 Sep.
Article in English | MEDLINE | ID: mdl-27486081

ABSTRACT

Glycogen synthase kinase-3 (GSK-3) is a multifunctional serine/threonine protein kinase which regulates a wide range of cellular processes, involving various signalling pathways. GSK-3ß has emerged as an important therapeutic target for diabetes and Alzheimer's disease. To identify structurally novel GSK-3ß inhibitors, we performed virtual screening by implementing a combined ligand-based/structure-based approach, which included quantitative structure-activity relationship (QSAR) analysis and docking prediction. To integrate and analyze complex data sets from multiple experimental sources, we drafted and validated a hierarchical QSAR method, which adopts a two-level structure to take data heterogeneity into account. A collection of 728 GSK-3 inhibitors with diverse structural scaffolds was obtained from published papers that used different experimental assay protocols. Support vector machines and random forests were implemented with wrapper-based feature selection algorithms to construct predictive learning models. The best models for each single group of compounds were then used to build the final hierarchical QSAR model, with an overall R(2) of 0.752 for the 141 compounds in the test set. The compounds obtained from the virtual screening experiment were tested for GSK-3ß inhibition. The bioassay results confirmed that 2 hit compounds are indeed GSK-3ß inhibitors exhibiting sub-micromolar inhibitory activity, and therefore validated our combined ligand-based/structure-based approach as effective for virtual screening experiments.

3.
BMC Bioinformatics ; 14 Suppl 14: S16, 2013.
Article in English | MEDLINE | ID: mdl-24267824

ABSTRACT

BACKGROUND: In drug discovery and development, it is crucial to determine which conformers (instances) of a given molecule are responsible for its observed biological activity and at the same time to recognize the most representative subset of features (molecular descriptors). Due to experimental difficulty in obtaining the bioactive conformers, computational approaches such as machine learning techniques are much needed. Multiple Instance Learning (MIL) is a machine learning method capable of tackling this type of problem. In the MIL framework, each instance is represented as a feature vector, which usually resides in a high-dimensional feature space. The high dimensionality may provide significant information for learning tasks, but at the same time it may also include a large number of irrelevant or redundant features that might negatively affect learning performance. Reducing the dimensionality of data will hence facilitate the classification task and improve the interpretability of the model. RESULTS: In this work we propose a novel approach, named multiple instance learning via joint instance and feature selection. The iterative joint instance and feature selection is achieved using an instance-based feature mapping and 1-norm regularized optimization. The proposed approach was tested on four biological activity datasets. CONCLUSIONS: The empirical results demonstrate that the selected instances (prototype conformers) and features (pharmacophore fingerprints) have competitive discriminative power and the convergence of the selection process is also fast.


Subject(s)
Drug Discovery , Algorithms , Artificial Intelligence , Humans , Imaging, Three-Dimensional , Ligands , Models, Molecular , Molecular Conformation
4.
BMC Bioinformatics ; 13 Suppl 15: S3, 2012.
Article in English | MEDLINE | ID: mdl-23046442

ABSTRACT

BACKGROUND: In the context of drug discovery and development, much effort has been exerted to determine which conformers of a given molecule are responsible for the observed biological activity. In this work we aimed to predict bioactive conformers using a variant of supervised learning, named multiple-instance learning. A single molecule, treated as a bag of conformers, is biologically active if and only if at least one of its conformers, treated as an instance, is responsible for the observed bioactivity; and a molecule is inactive if none of its conformers is responsible for the observed bioactivity. The implementation requires instance-based embedding, and joint feature selection and classification. The goal of the present project is to implement multiple-instance learning in drug activity prediction, and subsequently to identify the bioactive conformers for each molecule. METHODS: We encoded the 3-dimensional structures using pharmacophore fingerprints which are binary strings, and accomplished instance-based embedding using calculated dissimilarity distances. Four dissimilarity measures were employed and their performances were compared. 1-norm SVM was used for joint feature selection and classification. The approach was applied to four data sets, and the best proposed model for each data set was determined by using the dissimilarity measure yielding the smallest number of selected features. RESULTS: The predictive abilities of the proposed approach were compared with three classical predictive models without instance-based embedding. The proposed approach produced the best predictive models for one data set and second best predictive models for the rest of the data sets, based on the external validations. To validate the ability of the proposed approach to find bioactive conformers, 12 small molecules with co-crystallized structures were seeded in one data set. 10 out of 12 co-crystallized structures were indeed identified as significant conformers using the proposed approach. CONCLUSIONS: The proposed approach was proven not to suffer from overfitting and to be highly competitive with classical predictive models, so it is very powerful for drug activity prediction. The approach was also validated as a useful method for pursuit of bioactive conformers.


Subject(s)
Artificial Intelligence , Computational Biology/methods , Drug Discovery , Models, Theoretical , Molecular Conformation , Quantitative Structure-Activity Relationship
5.
IEEE Trans Nanobioscience ; 11(3): 228-36, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22987128

ABSTRACT

There are a vast number of biology related research problems involving a combination of multiple sources of data to achieve a better understanding of the underlying problems. It is important to select and interpret the most important information from these sources. Thus it will be beneficial to have a good algorithm to simultaneously extract rules and select features for better interpretation of the predictive model. We propose an efficient algorithm, Combined Rule Extraction and Feature Elimination (CRF), based on 1-norm regularized random forests. CRF simultaneously extracts a small number of rules generated by random forests and selects important features. We applied CRF to several drug activity prediction and microarray data sets. CRF is capable of producing performance comparable with state-of-the-art prediction algorithms using a small number of decision rules. Some of the decision rules are biologically significant.


Subject(s)
Algorithms , Artificial Intelligence , Computational Biology/methods , Decision Trees , ATP Binding Cassette Transporter, Subfamily B, Member 1/genetics , Databases, Factual , Humans , Models, Theoretical , Neoplasms/genetics , Oligonucleotide Array Sequence Analysis , Receptors, Cannabinoid/genetics , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...