RESUMO
Transcription factor (TF) binding specificities (motifs) are essential for the analysis of gene regulation. Accurate prediction of TF motifs is critical, because it is infeasible to assay all TFs in all sequenced eukaryotic genomes. There is ongoing controversy regarding the degree of motif diversification among related species that is, in part, because of uncertainty in motif prediction methods. Here we describe similarity regression, a significantly improved method for predicting motifs, which we use to update and expand the Cis-BP database. Similarity regression inherently quantifies TF motif evolution, and shows that previous claims of near-complete conservation of motifs between human and Drosophila are inflated, with nearly half of the motifs in each species absent from the other, largely due to extensive divergence in C2H2 zinc finger proteins. We conclude that diversification in DNA-binding motifs is pervasive, and present a new tool and updated resource to study TF diversity and gene regulation across eukaryotes.
Assuntos
Sequência de Bases , Sítios de Ligação , Evolução Molecular , Fatores de Transcrição/metabolismo , Animais , Biologia Computacional/métodos , Sequência Conservada , Bases de Dados Genéticas , Regulação da Expressão Gênica , Humanos , Motivos de Nucleotídeos , Ligação ProteicaRESUMO
BACKGROUND: The filamentous fungus Aspergillus nidulans has been a tractable model organism for cell biology and genetics for over 60 years. It is among a large number of Aspergilli whose genomes have been sequenced since 2005, including medically and industrially important species. In order to advance our knowledge of its biology and increase its utility as a genetic model by improving gene annotation we sequenced the transcriptome of A. nidulans with a focus on 5' end analysis. RESULTS: Strand-specific whole transcriptome sequencing showed that 80-95% of annotated genes appear to be expressed across the conditions tested. We estimate that the total gene number should be increased by approximately 1000, to 11,800. With respect to splicing 8.3% of genes had multiple alternative transcripts, but alternative splicing by exon-skipping was very rare. 75% of annotated genes showed some level of antisense transcription and for one gene, meaB, we demonstrated the antisense transcript has a regulatory role. Specific sequencing of the 5' ends of transcripts was used for genome wide mapping of transcription start sites, allowing us to interrogate over 7000 promoters and 5' untranslated regions. CONCLUSIONS: Our data has revealed the complexity of the A. nidulans transcriptome and contributed to improved genome annotation. The data can be viewed on the AspGD genome browser.