Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Comput Chem ; 45(15): 1193-1214, 2024 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-38329198

RESUMO

This paper (i) explores the internal structure of two quantum mechanics datasets (QM7b, QM9), composed of several thousands of organic molecules and described in terms of electronic properties, and (ii) further explores an inverse design approach to molecular design consisting of using machine learning methods to approximate the atomic composition of molecules, using QM9 data. Understanding the structure and characteristics of this kind of data is important when predicting the atomic composition from physical-chemical properties in inverse molecular designs. Intrinsic dimension analysis, clustering, and outlier detection methods were used in the study. They revealed that for both datasets the intrinsic dimensionality is several times smaller than the descriptive dimensions. The QM7b data is composed of well-defined clusters related to atomic composition. The QM9 data consists of an outer region predominantly composed of outliers, and an inner, core region that concentrates clustered inliner objects. A significant relationship exists between the number of atoms in the molecule and its outlier/inliner nature. The spatial structure exhibits a relationship with molecular weight. Despite the structural differences between the two datasets, the predictability of variables of interest for inverse molecular design is high. This is exemplified by models estimating the number of atoms of the molecule from both the original properties and from lower dimensional embedding spaces. In the generative approach the input is given by a set of desired properties of the molecule and the output is an approximation of the atomic composition in terms of its constituent chemical elements. This could serve as the starting region for further search in the huge space determined by the set of possible chemical compounds. The quantum mechanic's dataset QM9 is used in the study, composed of 133,885 small organic molecules and 19 electronic properties. Different multi-target regression approaches were considered for predicting the atomic composition from the properties, including feature engineering techniques in an auto-machine learning framework. High-quality models were found that predict the atomic composition of the molecules from their electronic properties, as well as from a subset of only 52.6% size. Feature selection worked better than feature generation. The results validate the generative approach to inverse molecular design.

2.
Annu Int Conf IEEE Eng Med Biol Soc ; 2018: 303-306, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-30440398

RESUMO

Targeted therapy is a treatment that targets the cancer's specific genes, proteins, or the tissue environment that contributes to cancer growth and survival. Identification of therapeutics targets is a very challenging problem in bioinformatics. An integrative and iterative approach for the identification of drug-gene modules (i.e., groups of genes and drugs such that genes in the same module may regulate each other and are targets of some of the drugs in the same module) is developed. Application to clear cell carcinoma of the ovary data reveals several drug-gene modules and a target network that may play important roles in treating this disease.


Assuntos
Biologia Computacional , Redes Reguladoras de Genes , Feminino , Humanos , Ovário
3.
BMC Bioinformatics ; 18(1): 174, 2017 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-28302069

RESUMO

BACKGROUND: Phenotypic studies in Triticeae have shown that low temperature-induced protective mechanisms are developmentally regulated and involve dynamic acclimation processes. Understanding these mechanisms is important for breeding cold-resistant wheat cultivars. In this study, we combined three computational techniques for the analysis of gene expression data from spring and winter wheat cultivars subjected to low temperature treatments. Our main objective was to construct a comprehensive network of cold response transcriptional events in wheat, and to identify novel cold tolerance candidate genes in wheat. RESULTS: We assigned novel cold stress-related roles to 35 wheat genes, uncovered novel transcription (TF)-gene interactions, and identified 127 genes representing known and novel candidate targets associated with cold tolerance in wheat. Our results also show that delays in terms of activation or repression of the same genes across wheat cultivars play key roles in phenotypic differences among winter and spring wheat cultivars, and adaptation to low temperature stress, cold shock and cold acclimation. CONCLUSIONS: Using three computational approaches, we identified novel putative cold-response genes and TF-gene interactions. These results provide new insights into the complex mechanisms regulating the expression of cold-responsive genes in wheat.


Assuntos
Adaptação Fisiológica , Biologia Computacional/métodos , Triticum/genética , Temperatura Baixa , Regulação da Expressão Gênica de Plantas , Redes Reguladoras de Genes , Modelos Lineares , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Estações do Ano , Estresse Fisiológico , Triticum/metabolismo
4.
BMC Bioinformatics ; 13: 54, 2012 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-22475802

RESUMO

BACKGROUND: Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space. RESULTS: We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples. CONCLUSIONS: Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.


Assuntos
Algoritmos , Mineração de Dados , Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Animais , Arabidopsis , Brassica napus/genética , Brassica napus/crescimento & desenvolvimento , Análise por Conglomerados , Cotilédone/metabolismo , Malária/imunologia , Camundongos
5.
BMC Bioinformatics ; 11: 229, 2010 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-20459620

RESUMO

BACKGROUND: Modern high throughput experimental techniques such as DNA microarrays often result in large lists of genes. Computational biology tools such as clustering are then used to group together genes based on their similarity in expression profiles. Genes in each group are probably functionally related. The functional relevance among the genes in each group is usually characterized by utilizing available biological knowledge in public databases such as Gene Ontology (GO), KEGG pathways, association between a transcription factor (TF) and its target genes, and/or gene networks. RESULTS: We developed GOAL: Gene Ontology AnaLyzer, a software tool specifically designed for the functional evaluation of gene groups. GOAL implements and supports efficient and statistically rigorous functional interpretations of gene groups through its integration with available GO, TF-gene association data, and association with KEGG pathways. In order to facilitate more specific functional characterization of a gene group, we implement three GO-tree search strategies rather than one as in most existing GO analysis tools. Furthermore, GOAL offers flexibility in deployment. It can be used as a standalone tool, a plug-in to other computational biology tools, or a web server application. CONCLUSION: We developed a functional evaluation software tool, GOAL, to perform functional characterization of a gene group. GOAL offers three GO-tree search strategies and combines its strength in function integration, portability and visualization, and its flexibility in deployment. Furthermore, GOAL can be used to evaluate and compare gene groups as the output from computational biology tools such as clustering algorithms.


Assuntos
Genes , Genômica/métodos , Software , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Análise de Sequência com Séries de Oligonucleotídeos
6.
BMC Bioinformatics ; 10: 255, 2009 Aug 20.
Artigo em Inglês | MEDLINE | ID: mdl-19695084

RESUMO

BACKGROUND: Time series gene expression data analysis is used widely to study the dynamics of various cell processes. Most of the time series data available today consist of few time points only, thus making the application of standard clustering techniques difficult. RESULTS: We developed two new algorithms that are capable of extracting biological patterns from short time point series gene expression data. The two algorithms, ASTRO and MiMeSR, are inspired by the rank order preserving framework and the minimum mean squared residue approach, respectively. However, ASTRO and MiMeSR differ from previous approaches in that they take advantage of the relatively few number of time points in order to reduce the problem from NP-hard to linear. Tested on well-defined short time expression data, we found that our approaches are robust to noise, as well as to random patterns, and that they can correctly detect the temporal expression profile of relevant functional categories. Evaluation of our methods was performed using Gene Ontology (GO) annotations and chromatin immunoprecipitation (ChIP-chip) data. CONCLUSION: Our approaches generally outperform both standard clustering algorithms and algorithms designed specifically for clustering of short time series gene expression data. Both algorithms are available at http://www.benoslab.pitt.edu/astro/.


Assuntos
Algoritmos , Biologia Computacional/métodos , Expressão Gênica , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Bases de Dados Genéticas , Perfilação da Expressão Gênica
7.
Mol Cancer Ther ; 7(1): 27-37, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18187805

RESUMO

One reason that ovarian cancer is such a deadly disease is because it is not usually diagnosed until it has reached an advanced stage. In this study, we developed a novel algorithm for group biomarkers identification using gene expression data. Group biomarkers consist of coregulated genes across normal and different stage diseased tissues. Unlike prior sets of biomarkers identified by statistical methods, genes in group biomarkers are potentially involved in pathways related to different types of cancer development. They may serve as an alternative to the traditional single biomarkers or combination of biomarkers used for the diagnosis of early-stage and/or recurrent ovarian cancer. We extracted group biomarkers by applying biclustering algorithms that we recently developed on the gene expression data of over 400 normal, cancerous, and diseased tissues. We identified several groups of coregulated genes that encode for secreted proteins and exhibit expression levels in ovarian cancer that are at least 2-fold (in log2 scale) higher than in normal ovary and nonovarian tissues. In particular, three candidate group biomarkers exhibited a conserved biological pattern that may be used for early detection or recurrence of ovarian cancer with specificity greater than 99% and sensitivity equal to 100%. We validated these group biomarkers using publicly available gene expression data sets downloaded from a NIH Web site (http://www.ncbi.nlm.nih.gov/geo). Statistical analysis showed that our methodology identified an optimum combination of genes that have the highest effect on the diagnosis of the disease compared with several computational techniques that we tested. Our study also suggests that single or group biomarkers correlate with the stage of the disease.


Assuntos
Biomarcadores Tumorais/genética , Neoplasias Ovarianas/diagnóstico , Neoplasias Ovarianas/genética , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Criança , Diagnóstico Precoce , Feminino , Humanos , Pessoa de Meia-Idade , Curva ROC , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...