RESUMO
BACKGROUND: While studies of non-model organisms are critical for many research areas, such as evolution, development, and environmental biology, they present particular challenges for both experimental and computational genomic level research. Resources such as mass-produced microarrays and the computational tools linking these data to functional annotation at the system and pathway level are rarely available for non-model species. This type of "systems-level" analysis is critical to the understanding of patterns of gene expression that underlie biological processes. RESULTS: We describe a bioinformatics pipeline known as FunnyBase that has been used to store, annotate, and analyze 40,363 expressed sequence tags (ESTs) from the heart and liver of the fish, Fundulus heteroclitus. Primary annotations based on sequence similarity are linked to networks of systematic annotation in Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) and can be queried and computationally utilized in downstream analyses. Steps are taken to ensure that the annotation is self-consistent and that the structure of GO is used to identify higher level functions that may not be annotated directly. An integrated framework for cDNA library production, sequencing, quality control, expression data generation, and systems-level analysis is presented and utilized. In a case study, a set of genes, that had statistically significant regression between gene expression levels and environmental temperature along the Atlantic Coast, shows a statistically significant (P < 0.001) enrichment in genes associated with amine metabolism. CONCLUSION: The methods described have application for functional genomics studies, particularly among non-model organisms. The web interface for FunnyBase can be accessed at http://genomics.rsmas.miami.edu/funnybase/super_craw4/. Data and source code are available by request at jpaschall@bioinfobase.umkc.edu.
Assuntos
Bases de Dados Genéticas , Etiquetas de Sequências Expressas , Fundulidae/genética , Regulação da Expressão Gênica , Animais , Biologia Computacional/métodos , DNA Complementar/metabolismo , Peixes , Expressão Gênica , Perfilação da Expressão Gênica , Biblioteca Gênica , Genoma , Genômica , Internet , Família Multigênica , Análise de Sequência com Séries de Oligonucleotídeos , Software , TemperaturaRESUMO
The Sp family of transcription factors binds GC-rich DNA sequences. The ubiquitously expressed Sp1 and Sp3 have been well characterized in mammals. Presented here is the characterization of the only Sp protein expressed in the liver or heart tissue of the teleost fish Fundulus heteroclitus. This protein, fSp3, is most similar to and homologous with mammalian Sp3 proteins. The evolution of the Sp transcription family is described, with Sp1 and Sp3 representing the most recent duplication of the Sp family. Sp4 appears to be the most ancestral member. Sp1, Sp3, and Sp4 form a monophyletic group without Sp2. Sp2 is the least similar of the Sp family and is more similar to the non-Sp transcription factors. These results suggest that Sp2 should not be considered a member of the Sp family. Only two domains (zinc fingers and B domain) share similarity outside the Sp family. The zinc fingers are homologous to other GC-binding domains, yet the B domain is homologous to protein-protein interacting domains in the CCAAT-binding/NF-Y transcription factor families. These results suggest that these different domains have different evolutionary histories.