Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros










Base de dados
Tipo de estudo
Intervalo de ano de publicação
1.
In Silico Biol ; 9(1-2): S17-39, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19537163

RESUMO

There is a critical need for new and efficient computational methods aimed at discovering putative transcription factor binding sites (TFBSs) in promoter sequences. Among the existing methods, two families can be distinguished: statistical or stochastic approaches, and combinatorial approaches. Here we focus on a complete approach incorporating a combinatorial exhaustive motif extraction, together with a statistical Twilight Zone Indicator (TZI), in two datasets: a positive set and a negative one, which represents the result of a classical differential expression experiment. Our approach relies on the existence of prior biological information in the form of two sets of promoters of differentially expressed genes. We describe the complete procedure used for extracting either exact or degenerated motifs, ranking these motifs, and finding their known related TFBSs. We exemplify this approach using two different sets of promoters. The first set consists in promoters of genes either repressed or not by the transforming form of the v-erbA oncogene. The second set consists in genes the expression of which varies between self-renewing and differentiating progenitors. The biological meaning of the found TFBSs is discussed and, for one TF, its biological involvement is demonstrated. This study therefore illustrates the power of using relevant biological information, in the form of a set of differentially expressed genes that is a classical outcome in most of transcriptomics studies. This allows to severely reduce the search space and to design an adapted statistical indicator. Taken together, this allows the biologist to concentrate on a small number of putatively interesting TFs.


Assuntos
Algoritmos , Regulação da Expressão Gênica/fisiologia , Regiões Promotoras Genéticas/genética , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismo , Biologia Computacional
2.
BMC Bioinformatics ; 9: 378, 2008 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-18801154

RESUMO

BACKGROUND: There is an increasing need in transcriptome research for gene expression data and pattern warehouses. It is of importance to integrate in these warehouses both raw transcriptomic data, as well as some properties encoded in these data, like local patterns. DESCRIPTION: We have developed an application called SQUAT (SAGE Querying and Analysis Tools) which is available at: http://bsmc.insa-lyon.fr/squat/. This database gives access to both raw SAGE data and patterns mined from these data, for three species (human, mouse and chicken). This database allows to make simple queries like "In which biological situations is my favorite gene expressed?" as well as much more complex queries like: <>. Connections with external web databases enrich biological interpretations, and enable sophisticated queries. To illustrate the power of SQUAT, we show and analyze the results of three different queries, one of which led to a biological hypothesis that was experimentally validated. CONCLUSION: SQUAT is a user-friendly information retrieval platform, which aims at bringing some of the state-of-the-art mining tools to biologists.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Armazenamento e Recuperação da Informação/métodos , Internet , Software , Fatores de Transcrição/genética , Algoritmos , Animais , Aves , Humanos , Camundongos , Interface Usuário-Computador
3.
In Silico Biol ; 7(4-5): 467-83, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-18391238

RESUMO

The production of high-throughput gene expression data has generated a crucial need for bioinformatics tools to generate biologically interesting hypotheses. Whereas many tools are available for extracting global patterns, less attention has been focused on local pattern discovery. We propose here an original way to discover knowledge from gene expression data by means of the so-called formal concepts which hold in derived Boolean gene expression datasets. We first encoded the over-expression properties of genes in human cells using human SAGE data. It has given rise to a Boolean matrix from which we extracted the complete collection of formal concepts, i.e., all the largest sets of over-expressed genes associated to a largest set of biological situations in which their over-expression is observed. Complete collections of such patterns tend to be huge. Since their interpretation is a time-consuming task, we propose a new method to rapidly visualize clusters of formal concepts. This designates a reasonable number of Quasi-Synexpression-Groups (QSGs) for further analysis. The interest of our approach is illustrated using human SAGE data and interpreting one of the extracted QSGs. The assessment of its biological relevancy leads to the formulation of both previously proposed and new biological hypotheses.


Assuntos
Biologia Computacional/instrumentação , Expressão Gênica , Reconhecimento Automatizado de Padrão/métodos , Análise por Conglomerados , Genoma Humano , Humanos
4.
Genome Biol ; 3(12): RESEARCH0067, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-12537556

RESUMO

BACKGROUND: The association-rules discovery (ARD) technique has yet to be applied to gene-expression data analysis. Even in the absence of previous biological knowledge, it should identify sets of genes whose expression is correlated. The first association-rule miners appeared six years ago and proved efficient at dealing with sparse and weakly correlated data. A huge international research effort has led to new algorithms for tackling difficult contexts and these are particularly suited to analysis of large gene-expression matrices. To validate the ARD technique we have applied it to freely available human serial analysis of gene expression (SAGE) data. RESULTS: The approach described here enables us to designate sets of strong association rules. We normalized the SAGE data before applying our association rule miner. Depending on the discretization algorithm used, different properties of the data were highlighted. Both common and specific interpretations could be made from the extracted rules. In each and every case the extracted collections of rules indicated that a very strong co-regulation of mRNA encoding ribosomal proteins occurs in the dataset. Several rules associating proteins involved in signal transduction were obtained and analyzed, some pointing to yet-unexplored directions. Furthermore, by examining a subset of these rules, we were able both to reassign a wrongly labeled tag, and to propose a function for an expressed sequence tag encoding a protein of unknown function. CONCLUSIONS: We show that ARD is a promising technique that turns out to be complementary to existing gene-expression clustering techniques.


Assuntos
Perfilação da Expressão Gênica/métodos , Algoritmos , Análise por Conglomerados , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Bases de Dados Genéticas/estatística & dados numéricos , Perfilação da Expressão Gênica/estatística & dados numéricos , Regulação da Expressão Gênica/genética , Humanos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...