RESUMO
Previous research demonstrated the use of evolutionary computation for the discovery of transcription factor binding sites (TFBS) in promoter regions upstream of coexpressed genes. However, it remained unclear whether or not composite TFBS elements, commonly found in higher organisms where two or more TFBSs form functional complexes, could also be identified by using this approach. Here, we present an important refinement of our previous algorithm and test the identification of composite elements using NFAT/AP-1 as an example. We demonstrate that by using appropriate existing parameters such as window size, novel-scoring methods such as central bonusing and methods of self-adaptation to automatically adjust the variation operators during the evolutionary search, TFBSs of different sizes and complexity can be identified as top solutions. Some of these solutions have known experimental relationships with NFAT/AP-1. We also indicate that even after properly tuning the model parameters, the choice of the appropriate window size has a significant effect on algorithm performance. We believe that this improved algorithm will greatly augment TFBS discovery.
Assuntos
Algoritmos , Elementos Reguladores de Transcrição , Fatores de Transcrição/metabolismo , Sítios de Ligação , Biologia Computacional , Evolução Molecular , NF-kappa B/metabolismo , Fatores de Transcrição NFATC/metabolismo , Fator 1 de Transcrição de Octâmero/metabolismo , Fator de Transcrição AP-1/metabolismoRESUMO
Transcription factors are key regulatory elements that control gene expression. The TRANSFAC database represents the largest repository for experimentally derived transcription factor binding sites (TFBS). Understanding TFBS, which are typically conserved during evolution, helps us identify genomic regions related to human health and disease, and regions that might be predictive of patient outcomes. Here we present a statistical analysis of all TFBS in the TRANSFAC database. Our analysis suggests that current definition of TFBS core regions in TRANSFAC should be re-examined so as to capture a more precise notion of "cores." We offer insight into more appropriate definitions of TFBS consensus sequences and core regions. These revised definitions provide a better understanding of the nature of transcription factor-DNA binding and assist with developing algorithms for de novo TFBS discovery as well as finding novel variants of known TFBS.