Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
Sci Rep ; 13(1): 8476, 2023 05 25.
Article in English | MEDLINE | ID: mdl-37231056

ABSTRACT

We propose a resampling-based fast variable selection technique for detecting relevant single nucleotide polymorphisms (SNP) in a multi-marker mixed effect model. Due to computational complexity, current practice primarily involves testing the effect of one SNP at a time, commonly termed as 'single SNP association analysis'. Joint modeling of genetic variants within a gene or pathway may have better power to detect associated genetic variants, especially the ones with weak effects. In this paper, we propose a computationally efficient model selection approach-based on the e-values framework-for single SNP detection in families while utilizing information on multiple SNPs simultaneously. To overcome computational bottleneck of traditional model selection methods, our method trains one single model, and utilizes a fast and scalable bootstrap procedure. We illustrate through numerical studies that our proposed method is more effective in detecting SNPs associated with a trait than either single-marker analysis using family data or model selection methods that ignore the familial dependency structure. Further, we perform gene-level analysis in Minnesota Center for Twin and Family Research (MCTFR) dataset using our method to detect several SNPs using this that have been implicated to be associated with alcohol consumption.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Humans , Genome-Wide Association Study/methods , Phenotype , Alcohol Drinking , Minnesota , Models, Genetic
3.
Pharmaceuticals (Basel) ; 12(4)2019 Oct 16.
Article in English | MEDLINE | ID: mdl-31623241

ABSTRACT

Human life has been at the edge of catastrophe for millennia due diseases which emerge and reemerge at random. The recent outbreak of the Zika virus (ZIKV) is one such menace that shook the global public health community abruptly. Modern technologies, including computational tools as well as experimental approaches, need to be harnessed fast and effectively in a coordinated manner in order to properly address such challenges. In this paper, based on our earlier research, we have proposed a four-pronged approach to tackle the emerging pathogens like ZIKV: (a) Epidemiological modelling of spread mechanisms of ZIKV; (b) assessment of the public health risk of newly emerging strains of the pathogens by comparing them with existing strains/pathogens using fast computational sequence comparison methods; (c) implementation of vaccine design methods in order to produce a set of probable peptide vaccine candidates for quick synthesis/production and testing in the laboratory; and (d) designing of novel therapeutic molecules and their laboratory testing as well as validation of new drugs or repurposing of drugs for use against ZIKV. For each of these stages, we provide an extensive review of the technical challenges and current state-of-the-art. Further, we outline the future areas of research and discuss how they can work together to proactively combat ZIKV or future emerging pathogens.

4.
Mol Inform ; 38(8-9): e1800164, 2019 08.
Article in English | MEDLINE | ID: mdl-31322827

ABSTRACT

In this paper we used two sets of calculated molecular descriptors to predict blood-brain barrier (BBB) entry of a collection of 415 chemicals. The set of 579 descriptors were calculated by Schrodinger and TopoCluj software. Polly and Triplet software were used to calculate the second set of 198 descriptors. Following this, modelling and a two-deep, repeated external validation method was used for QSAR formulation. Results show that both sets of descriptors individually and their combination give models of reasonable prediction accuracy. We also uncover the effectiveness of a variable selection approach, by showing that for one of our descriptor sets, the top 5 % predictors in terms of random forest variable importance are able to provide a better performing model than the model with all predictors. The top influential descriptors indicate important aspects of molecular structural features that govern BBB entry of chemicals.


Subject(s)
Blood-Brain Barrier/metabolism , Machine Learning , Organic Chemicals/chemistry , Organic Chemicals/pharmacokinetics , Algorithms , Models, Molecular , Quantitative Structure-Activity Relationship , Software
5.
Epidemics ; 27: 59-65, 2019 06.
Article in English | MEDLINE | ID: mdl-30902616

ABSTRACT

The recent Zika virus (ZIKV) epidemic in the Americas ranks among the largest outbreaks in modern times. Like other mosquito-borne flaviviruses, ZIKV circulates in sylvatic cycles among primates that can serve as reservoirs of spillover infection to humans. Identifying sylvatic reservoirs is critical to mitigating spillover risk, but relevant surveillance and biological data remain limited for this and most other zoonoses. We confronted this data sparsity by combining a machine learning method, Bayesian multi-label learning, with a multiple imputation method on primate traits. The resulting models distinguished flavivirus-positive primates with 82% accuracy and suggest that species posing the greatest spillover risk are also among the best adapted to human habitations. Given pervasive data sparsity describing animal hosts, and the virtual guarantee of data sparsity in scenarios involving novel or emerging zoonoses, we show that computational methods can be useful in extracting actionable inference from available data to support improved epidemiological response and prevention.


Subject(s)
Primates/virology , Zika Virus Infection/epidemiology , Zika Virus/pathogenicity , Zoonoses/epidemiology , Zoonoses/virology , Animals , Bayes Theorem , Humans , Risk , Zika Virus Infection/pathology , Zoonoses/pathology
6.
Curr Comput Aided Drug Des ; 14(4): 284-291, 2018.
Article in English | MEDLINE | ID: mdl-29701159

ABSTRACT

BACKGROUND: Proper validation is an important aspect of QSAR modelling. External validation is one of the widely used validation methods in QSAR where the model is built on a subset of the data and validated on the rest of the samples. However, its effectiveness for datasets with a small number of samples but a large number of predictors remains suspect. OBJECTIVE: Calculating hundreds or thousands of molecular descriptors using currently available software has become the norm in QSAR research, owing to computational advances in the past few decades. Thus, for n chemical compounds and p descriptors calculated for each molecule, the typical chemometric dataset today has a high value of p but small n (i.e. n << p). Motivated by the evidence of inadequacies of external validation in estimating the true predictive capability of a statistical model in recent literature, this paper performs an extensive and comparative study of this method with several other validation techniques. METHODOLOGY: We compared four validation methods: Leave-one-out, K-fold, external and multi-split validation, using statistical models built using the LASSO regression, which simultaneously performs variable selection and modelling. We used 300 simulated datasets and one real dataset of 95 congeneric amine mutagens for this evaluation. RESULTS: External validation metrics have high variation among different random splits of the data, hence are not recommended for predictive QSAR models. LOO has the overall best performance among all validation methods applied in our scenario. CONCLUSION: Results from external validation are too unstable for the datasets we analyzed. Based on our findings, we recommend using the LOO procedure for validating QSAR predictive models built on high-dimensional small-sample data.


Subject(s)
Quantitative Structure-Activity Relationship , Amines/chemistry , Amines/pharmacology , Computer Simulation , Models, Biological , Models, Statistical , Mutagens/chemistry , Mutagens/pharmacology , Regression Analysis , Salmonella typhimurium/drug effects , Salmonella typhimurium/genetics , Software
8.
Indian Pediatr ; 54(1): 65, 2017 Jan 15.
Article in English | MEDLINE | ID: mdl-28141574
9.
Curr Comput Aided Drug Des ; 12(4): 294-301, 2016.
Article in English | MEDLINE | ID: mdl-27600878

ABSTRACT

BACKGROUND: Computed mathematical descriptors of molecules are used for the prediction of their property/ bioactivity. In the 1970s only a few descriptors could be calculated, currently available software can calculate a large number of descriptors for molecules or biomolecules like DNA/ RNA, proteins. OBJECTIVE: When p molecular descriptors are calculated for n molecules, the data set can be viewed as n vectors in p dimensions, each chemical being represented as a point in .. Because many of the descriptors are strongly correlated, the n points in ..will lie on a subspace of dimension lower than p. Methods like principal components analysis (PCA) can be used to characterize the intrinsic dimensionality of chemical spaces. Taking motivation from the work of Basak et al. in 1980s in using PCA of descriptors calculated for various congeneric and structurally diverse sets of chemicals relevant to new drug discovery and predictive toxicology, this paper explores the intrinsic dimensionality of chemical spaces for robust QSAR model development. METHODOLOGY: Intrinsic dimensionality of chemical spaces was studied using three new statistical approaches and two data sets, viz. a congeneric set of 95 aromatic and heteroaromatic amine mutagens and a structurally diverse set of 508 chemical mutagens. RESULTS: The new outlier-robust methods applied here yield favorable prediction results compared to previous studies on same datasets. CONCLUSION: We conclude that while analyzing data on large number of chemical descriptors, it is advisable to build QSAR models that are outlier-robust, and take into consideration the underlying correlations among predictors.


Subject(s)
Amines/chemistry , DNA/chemistry , Heterocyclic Compounds/chemistry , Models, Molecular , Models, Statistical , Mutagens/chemistry , Quantitative Structure-Activity Relationship , Amines/toxicity , Heterocyclic Compounds/toxicity , Humans , Mutagens/pharmacology , Nucleic Acid Conformation , Principal Component Analysis
12.
Curr Comput Aided Drug Des ; 11(2): 117-23, 2015.
Article in English | MEDLINE | ID: mdl-26202887

ABSTRACT

Variation in high-dimensional data is often caused by a few latent factors, and hence dimension reduction or variable selection techniques are often useful in gathering useful information from the data. In this paper we consider two such recent methods: Interrelated two-way clustering and envelope models. We couple these methods with traditional statistical procedures like ridge regression and linear discriminant analysis, and apply them on two data sets which have more predictors than samples (i.e. n << p scenario) and several types of molecular descriptors. One of these datasets consists of a congeneric group of Amines while the other has a much diverse collection compounds. The difference of prediction results between these two datasets for both the methods supports the hypothesis that for a congeneric set of compounds, descriptors of a certain type are enough to provide good QSAR models, but as the data set grows diverse including a variety of descriptors can improve model quality considerably.


Subject(s)
Mutagens/chemistry , Mutagens/toxicity , Amines/chemistry , Amines/toxicity , Animals , Cluster Analysis , Datasets as Topic , Discriminant Analysis , Humans , Models, Biological , Quantitative Structure-Activity Relationship
13.
Curr Comput Aided Drug Des ; 9(4): 463-71, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24138420

ABSTRACT

Interrelated Two-way Clustering (ITC) is an unsupervised clustering method developed to divide samples into two groups in gene expression data obtained through microarrays, selecting important genes simultaneously in the process. This has been found to be a better approach than conventional clustering methods like K-means or selforganizing map for the scenarios when number of samples is much smaller than number of variables (n«p). In this paper we used the ITC approach for classification of a diverse set of 508 chemicals regarding mutagenicity. A large number of topological indices (TIs), 3-dimensional, and quantum chemical descriptors, as well as atom pairs (APs) has been used as explanatory variables. In this paper, ITC has been used only for predictor selection, after which ridge regression is employed to build the final predictive model. The proper leave-one-out (LOO) method of cross-validation in this scenario is to take as holdout each of the 508 compounds before predictor thinning and compare the predicted values with the experimental data. ITC based results obtained here are comparable to those developed earlier.


Subject(s)
Gene Expression Profiling/methods , Models, Chemical , Models, Molecular , Cluster Analysis , Gene Expression , Humans , Molecular Structure , Mutagens/chemistry , Mutagens/toxicity , Oligonucleotide Array Sequence Analysis/methods , Quantitative Structure-Activity Relationship
SELECTION OF CITATIONS
SEARCH DETAIL
...