Búsqueda | Portal Regional de la BVS

Finding Needles in a Haystack: Determining Key Molecular Descriptors Associated with the Blood-brain Barrier Entry of Chemical Compounds Using Machine Learning.

Majumdar, Subhabrata; Basak, Subhash C; Lungu, Claudiu N; Diudea, Mircea V; Grunwald, Gregory D.

Mol Inform ; 38(8-9): e1800164, 2019 08.

Artículo en Inglés | MEDLINE | ID: mdl-31322827

RESUMEN

In this paper we used two sets of calculated molecular descriptors to predict blood-brain barrier (BBB) entry of a collection of 415 chemicals. The set of 579 descriptors were calculated by Schrodinger and TopoCluj software. Polly and Triplet software were used to calculate the second set of 198 descriptors. Following this, modelling and a two-deep, repeated external validation method was used for QSAR formulation. Results show that both sets of descriptors individually and their combination give models of reasonable prediction accuracy. We also uncover the effectiveness of a variable selection approach, by showing that for one of our descriptor sets, the top 5 % predictors in terms of random forest variable importance are able to provide a better performing model than the model with all predictors. The top influential descriptors indicate important aspects of molecular structural features that govern BBB entry of chemicals.

Asunto(s)

Barrera Hematoencefálica/metabolismo , Aprendizaje Automático , Compuestos Orgánicos/química , Compuestos Orgánicos/farmacocinética , Algoritmos , Modelos Moleculares , Relación Estructura-Actividad Cuantitativa , Programas Informáticos

Intercorrelation of Major DNA/RNA Sequence Descriptors - A Preliminary Study.

Sen, Dwaipayan; Dasgupta, Subhadeep; Pal, Indrajit; Manna, Smarajit; Basak, Subhash C; Nandy, Ashesh; Grunwald, Gregory D.

Curr Comput Aided Drug Des ; 12(3): 216-228, 2016.

Artículo en Inglés | MEDLINE | ID: mdl-27222032

RESUMEN

A large number of alignment-free techniques of graphical representation and numerical characterization (GRANCH) of bio-molecular sequences have been proposed in the recent past years, but the relative efficacy of these methods in determining the degree of similarities and dissimilarities of such sequences have not been ascertained. OBJECTIVE: Our objective is to make an assessment of the relative efficacy of these methods in determining the degree of similarities and dissimilarities of bio-molecular sequences. METHOD: We have chosen 7 published/communicated methods that represent various classes of GRANCH techniques and computed the descriptors that are expected to characterize similarities and dissimilarities in several sets of gene sequences. We critically appraise the different methods and determine which of these yield non-redundant structural information that could be used to compute different properties of the sequences, and which are correlated enough to one another so that using the simplest representative of the group would suffice. We also do a principal component analysis (PCA) to determine how the variances in the calculated sequence descriptors are explained by the computed principal components (PCs). RESULTS: We found that some of the descriptors are strongly correlated implying a commonality of structural information encoded by them while others are distinctly separate. The PCA results show that the first three PC's explain >97% of the variances. CONCLUSION: We found that some mathematical DNA descriptors calculated by a few of these techniques correlate strongly with one another implying a redundancy in the structural information quantified by those descriptors; others are not strongly correlated with one another suggesting that they encode non-redundant sequence information. From this and our PCA results, our recommendation would be to use minimally correlated set of descriptors or orthogonal descriptors like PCs derived from the descriptor set for the characterization of nucleic acid structure and function.

Asunto(s)

ADN/genética , ARN/genética , Animales , Secuencia de Bases , ADN/química , Presentación de Datos , Exones , Humanos , Análisis de Componente Principal , ARN/química , Estadística como Asunto , Globinas beta/genética

Adapting interrelated two-way clustering method for quantitative structure-activity relationship (QSAR) modeling of mutagenicity/non- mutagenicity of a diverse set of chemicals.

Majumdar, Subhabrata; Basak, Subhash C; Grunwald, Gregory D.

Curr Comput Aided Drug Des ; 9(4): 463-71, 2013 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-24138420

RESUMEN

Interrelated Two-way Clustering (ITC) is an unsupervised clustering method developed to divide samples into two groups in gene expression data obtained through microarrays, selecting important genes simultaneously in the process. This has been found to be a better approach than conventional clustering methods like K-means or selforganizing map for the scenarios when number of samples is much smaller than number of variables (n«p). In this paper we used the ITC approach for classification of a diverse set of 508 chemicals regarding mutagenicity. A large number of topological indices (TIs), 3-dimensional, and quantum chemical descriptors, as well as atom pairs (APs) has been used as explanatory variables. In this paper, ITC has been used only for predictor selection, after which ridge regression is employed to build the final predictive model. The proper leave-one-out (LOO) method of cross-validation in this scenario is to take as holdout each of the 508 compounds before predictor thinning and compare the predicted values with the experimental data. ITC based results obtained here are comparable to those developed earlier.

Asunto(s)

Perfilación de la Expresión Génica/métodos , Modelos Químicos , Modelos Moleculares , Análisis por Conglomerados , Expresión Génica , Humanos , Estructura Molecular , Mutágenos/química , Mutágenos/toxicidad , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Relación Estructura-Actividad Cuantitativa

Use of mathematical structural invariants in analyzing combinatorial libraries: a case study with psoralen derivatives.

Basak, Subhash C; Mills, Denise; Gute, Brian D; Balaban, Alexandru T; Basak, Kanika; Grunwald, Gregory D.

Curr Comput Aided Drug Des ; 6(4): 240-51, 2010 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-20883202

RESUMEN

In this paper, calculated topological indices have been used to cluster a large virtual library of 125 psoralen derivatives into 25 clusters in an effort to select a subset of mutually dissimilar structures from a large collection of molecules. Inspection of the 25 structures, one closest to the respective centroid of each cluster, shows that the molecules are structurally more diverse as compared to a subset of 25 selected randomly. It is expected that such methods based on easily calculated descriptors may find applications in new drug discovery from the analysis of libraries of interesting lead compounds.

Asunto(s)

Técnicas Químicas Combinatorias/métodos , Diseño de Fármacos , Furocumarinas/química , Biología Computacional , Diseño Asistido por Computadora , Descubrimiento de Drogas/métodos , Modelos Químicos , Análisis de Componente Principal , Bibliotecas de Moléculas Pequeñas

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA