Pesquisa | Portal Regional da BVS

Evolution of motif variants and positional bias of the cyclic-AMP response element.

Smith, Brandon; Fang, Hung; Pan, Youlian; Walker, P Roy; Famili, A Fazel; Sikorska, Marianna.

BMC Evol Biol ; 7 Suppl 1: S15, 2007 Feb 08.

Artigo em Inglês | MEDLINE | ID: mdl-17288573

RESUMO

BACKGROUND: Transcription factors regulate gene expression by interacting with their specific DNA binding sites. Some transcription factors, particularly those involved in transcription initiation, always bind close to transcription start sites (TSS). Others have no such preference and are functional on sites even tens of thousands of base pairs (bp) away from the TSS. The Cyclic-AMP response element (CRE) binding protein (CREB) binds preferentially to a palindromic sequence (TGACGTCA), known as the canonical CRE, and also to other CRE variants. CREB can activate transcription at CREs thousands of bp away from the TSS, but in mammals CREs are found far more frequently within 1 to 150 bp upstream of the TSS than in any other region. This property is termed positional bias. The strength of CREB binding to DNA is dependent on the sequence of the CRE motif. The central CpG dinucleotide in the canonical CRE (TGACGTCA) is critical for strong binding of CREB dimers. Methylation of the cytosine in the CpG can inhibit binding of CREB. Deamination of the methylated cytosines causes a C to T transition, resulting in a functional, but lower affinity CRE variant, TGATGTCA. RESULTS: We performed genome-wide surveys of CREs in a number of species (from worm to human) and showed that only vertebrates exhibited a CRE positional bias. We performed pair-wise comparisons of human CREs with orthologous sequences in mouse, rat and dog genomes and found that canonical and TGATGTCA variant CREs are highly conserved in mammals. However, when orthologous sequences differ, canonical CREs in human are most frequently TGATGTCA in the other species and vice-versa. We have identified 207 human CREs showing such differences. CONCLUSION: Our data suggest that the positional bias of CREs likely evolved after the separation of urochordata and vertebrata. Although many canonical CREs are conserved among mammals, there are a number of orthologous genes that have canonical CREs in one species but the TGATGTCA variant in another. These differences are likely due to deamination of the methylated cytosines in the CpG and may contribute to differential transcriptional regulation among orthologous genes.

Assuntos

Proteína de Ligação ao Elemento de Resposta ao AMP Cíclico/metabolismo , Evolução Molecular , Variação Genética , Elementos de Resposta , Animais , Sequência de Bases , Mapeamento Cromossômico , Sequência Consenso , Ilhas de CpG , Metilação de DNA , Genoma , Humanos , Mamíferos , Análise de Sequência de DNA

Discovery of functional genes for systemic acquired resistance in Arabidopsis thaliana through integrated data mining.

Pan, Youlian; Pylatuik, Jeffrey D; Ouyang, Junjun; Famili, A Fazel; Fobert, Pierre R.

J Bioinform Comput Biol ; 2(4): 639-55, 2004 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-15617158

RESUMO

Various data mining techniques combined with sequence motif information in the promoter region of genes were applied to discover functional genes that are involved in the defense mechanism of systemic acquired resistance (SAR) in Arabidopsis thaliana. A series of K-Means clustering with difference-in-shape as distance measure was initially applied. A stability measure was used to validate this clustering process. A decision tree algorithm with the discover-and-mask technique was used to identify a group of most informative genes. Appearance and abundance of various transcription factor binding sites in the promoter region of the genes were studied. Through the combination of these techniques, we were able to identify 24 candidate genes involved in the SAR defense mechanism. The candidate genes fell into 2 highly resolved categories, each category showing significantly unique profiles of regulatory elements in their promoter regions. This study demonstrates the strength of such integration methods and suggests a broader application of this approach.

Assuntos

Algoritmos , Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica de Plantas/fisiologia , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Ácido Salicílico/toxicidade , Arabidopsis/efeitos dos fármacos , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Resistência a Medicamentos/fisiologia , Regulação da Expressão Gênica de Plantas/efeitos dos fármacos , Armazenamento e Recuperação da Informação/métodos , Análise de Sequência de DNA/métodos

Data mining of gene expression changes in Alzheimer brain.

Walker, P Roy; Smith, Brandon; Liu, Qing Yan; Famili, A Fazel; Valdés, Julio J; Liu, Ziying; Lach, Boleslaw.

Artif Intell Med ; 31(2): 137-54, 2004 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-15219291

RESUMO

Genome-wide transcription profiling is a powerful technique for studying the enormous complexity of cellular states. Moreover, when applied to disease tissue it may reveal quantitative and qualitative alterations in gene expression that give information on the context or underlying basis for the disease and may provide a new diagnostic approach. However, the data obtained from high-density microarrays is highly complex and poses considerable challenges in data mining. The data requires care in both pre-processing and the application of data mining techniques. This paper addresses the problem of dealing with microarray data that come from two known classes (Alzheimer and normal). We have applied three separate techniques to discover genes associated with Alzheimer disease (AD). The 67 genes identified in this study included a total of 17 genes that are already known to be associated with Alzheimer's or other neurological diseases. This is higher than any of the previously published Alzheimer's studies. Twenty known genes, not previously associated with the disease, have been identified as well as 30 uncharacterized expressed sequence tags (ESTs). Given the success in identifying genes already associated with AD, we can have some confidence in the involvement of the latter genes and ESTs. From these studies we can attempt to define therapeutic strategies that would prevent the loss of specific components of neuronal function in susceptible patients or be in a position to stimulate the replacement of lost cellular function in damaged neurons. Although our study is based on a relatively small number of patients (four AD and five normal), we think our approach sets the stage for a major step in using gene expression data for disease modeling (i.e. classification and diagnosis). It can also contribute to the future of gene function identification, pathology, toxicogenomics, and pharmacogenomics.

Assuntos

Doença de Alzheimer/genética , Doença de Alzheimer/fisiopatologia , Perfilação da Expressão Gênica , Predisposição Genética para Doença , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Bases de Dados Genéticas , Etiquetas de Sequências Expressas , Humanos , Armazenamento e Recuperação da Informação , Neurônios/patologia , Neurônios/fisiologia

Evaluation and optimization of clustering in gene expression data analysis.

Famili, A Fazel; Liu, Ganming; Liu, Ziying.

Bioinformatics ; 20(10): 1535-45, 2004 Jul 10.

Artigo em Inglês | MEDLINE | ID: mdl-14962920

RESUMO

MOTIVATION: A measurement of cluster quality is needed to choose potential clusters of genes that contain biologically relevant patterns of gene expression. This is strongly desirable when a large number of gene expression profiles have to be analyzed and proper clusters of genes need to be identified for further analysis, such as the search for meaningful patterns, identification of gene functions or gene response analysis. RESULTS: We propose a new cluster quality method, called stability, by which unsupervised learning of gene expression data can be performed efficiently. The method takes into account a cluster's stability on partition. We evaluate this method and demonstrate its performance using four independent, real gene expression and three simulated datasets. We demonstrate that our method outperforms other techniques listed in the literature. The method has applications in evaluating clustering validity as well as identifying stable clusters. AVAILABILITY: Please contact the first author.

Assuntos

Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Variação Genética , Genoma , Instabilidade Genômica/genética , Hepacivirus/genética , Leucemia/genética , Modelos Genéticos , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos , Leveduras/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA