Pesquisa | Portal Regional da BVS (teste)

Optimized position weight matrices in prediction of novel putative binding sites for transcription factors in the Drosophila melanogaster genome.

Morozov, Vyacheslav Y; Ioshikhes, Ilya P.

PLoS One ; 8(8): e68712, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23936309

RESUMO

Position weight matrices (PWMs) have become a tool of choice for the identification of transcription factor binding sites in DNA sequences. DNA-binding proteins often show degeneracy in their binding requirement and thus the overall binding specificity of many proteins is unknown and remains an active area of research. Although existing PWMs are more reliable predictors than consensus string matching, they generally result in a high number of false positive hits. Our previous study introduced a promising approach to PWM refinement in which known motifs are used to computationally mine putative binding sites directly from aligned promoter regions using composition of similar sites. In the present study, we extended this technique originally tested on single examples of transcription factors (TFs) and showed its capability to optimize PWM performance to predict new binding sites in the fruit fly genome. We propose refined PWMs in mono- and dinucleotide versions similarly computed for a large variety of transcription factors of Drosophila melanogaster. Along with the addition of many auxiliary sites the optimization includes variation of the PWM motif length, the binding sites location on the promoters and the PWM score threshold. To assess the predictive performance of the refined PWMs we compared them to conventional TRANSFAC and JASPAR sources. The results have been verified using performed tests and literature review. Overall, the refined PWMs containing putative sites derived from real promoter content processed using optimized parameters had better general accuracy than conventional PWMs.

Assuntos

Biologia Computacional/métodos , Drosophila melanogaster/genética , Genoma de Inseto/genética , Matrizes de Pontuação de Posição Específica , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo , Sequência de Aminoácidos , Animais , Sítios de Ligação , Reações Falso-Negativas , Reações Falso-Positivas , Dados de Sequência Molecular , Motivos de Nucleotídeos , Fatores de Transcrição/genética

A barrier nucleosome model for statistical positioning of nucleosomes throughout the yeast genome.

Mavrich, Travis N; Ioshikhes, Ilya P; Venters, Bryan J; Jiang, Cizhong; Tomsho, Lynn P; Qi, Ji; Schuster, Stephan C; Albert, Istvan; Pugh, B Franklin.

Genome Res ; 18(7): 1073-83, 2008 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-18550805

RESUMO

Most nucleosomes are well-organized at the 5' ends of S. cerevisiae genes where "-1" and "+1" nucleosomes bracket a nucleosome-free promoter region (NFR). How nucleosomal organization is specified by the genome is less clear. Here we establish and inter-relate rules governing genomic nucleosome organization by sequencing DNA from more than one million immunopurified S. cerevisiae nucleosomes (displayed at http://atlas.bx.psu.edu/). Evidence is presented that the organization of nucleosomes throughout genes is largely a consequence of statistical packing principles. The genomic sequence specifies the location of the -1 and +1 nucleosomes. The +1 nucleosome forms a barrier against which nucleosomes are packed, resulting in uniform positioning, which decays at farther distances from the barrier. We present evidence for a novel 3' NFR that is present at >95% of all genes. 3' NFRs may be important for transcription termination and anti-sense initiation. We present a high-resolution genome-wide map of TFIIB locations that implicates 3' NFRs in gene looping.

Assuntos

Mapeamento Cromossômico/estatística & dados numéricos , Cromossomos Fúngicos/genética , Genoma Fúngico , Modelos Genéticos , Nucleossomos/genética , Saccharomyces cerevisiae/genética , Regiões 3' não Traduzidas/genética , DNA Fúngico/análise , Regiões Promotoras Genéticas

Nucleosome organization in the Drosophila genome.

Mavrich, Travis N; Jiang, Cizhong; Ioshikhes, Ilya P; Li, Xiaoyong; Venters, Bryan J; Zanton, Sara J; Tomsho, Lynn P; Qi, Ji; Glaser, Robert L; Schuster, Stephan C; Gilmour, David S; Albert, Istvan; Pugh, B Franklin.

Nature ; 453(7193): 358-62, 2008 May 15.

Artigo em Inglês | MEDLINE | ID: mdl-18408708

RESUMO

Comparative genomics of nucleosome positions provides a powerful means for understanding how the organization of chromatin and the transcription machinery co-evolve. Here we produce a high-resolution reference map of H2A.Z and bulk nucleosome locations across the genome of the fly Drosophila melanogaster and compare it to that from the yeast Saccharomyces cerevisiae. Like Saccharomyces, Drosophila nucleosomes are organized around active transcription start sites in a canonical -1, nucleosome-free region, +1 arrangement. However, Drosophila does not incorporate H2A.Z into the -1 nucleosome and does not bury its transcriptional start site in the +1 nucleosome. At thousands of genes, RNA polymerase II engages the +1 nucleosome and pauses. How the transcription initiation machinery contends with the +1 nucleosome seems to be fundamentally different across major eukaryotic lines.

Assuntos

Drosophila melanogaster/genética , Genoma de Inseto/genética , Nucleossomos/genética , Nucleossomos/metabolismo , Animais , Sequência Conservada/genética , Drosophila melanogaster/embriologia , Drosophila melanogaster/enzimologia , Regulação da Expressão Gênica/genética , Genes de Insetos/genética , Histonas/metabolismo , Regiões Promotoras Genéticas/genética , RNA Polimerase II/metabolismo , Saccharomyces cerevisiae/genética , Sítio de Iniciação de Transcrição , Transcrição Gênica/genética

Nucleosome positions predicted through comparative genomics.

Ioshikhes, Ilya P; Albert, Istvan; Zanton, Sara J; Pugh, B Franklin.

Nat Genet ; 38(10): 1210-5, 2006 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-16964265

RESUMO

DNA sequence has long been recognized as an important contributor to nucleosome positioning, which has the potential to regulate access to genes. The extent to which the nucleosomal architecture at promoters is delineated by the underlying sequence is now being worked out. Here we use comparative genomics to report a genome-wide map of nucleosome positioning sequences (NPSs) located in the vicinity of all Saccharomyces cerevisiae genes. We find that the underlying DNA sequence provides a very good predictor of nucleosome locations that have been experimentally mapped to a small fraction of the genome. Notably, distinct classes of genes possess characteristic arrangements of NPSs that may be important for their regulation. In particular, genes that have a relatively compact NPS arrangement over the promoter region tend to have a TATA box buried in an NPS and tend to be highly regulated by chromatin modifying and remodeling factors.

Assuntos

DNA Fúngico/genética , Genômica/métodos , Nucleossomos/química , Saccharomyces cerevisiae/genética , DNA Fúngico/química , Regulação Fúngica da Expressão Gênica , Genoma Fúngico , Modelos Genéticos , Nucleossomos/genética , Regiões Promotoras Genéticas , TATA Box

The features of Drosophila core promoters revealed by statistical analysis.

Gershenzon, Naum I; Trifonov, Edward N; Ioshikhes, Ilya P.

BMC Genomics ; 7: 161, 2006 Jun 21.

Artigo em Inglês | MEDLINE | ID: mdl-16790048

RESUMO

BACKGROUND: Experimental investigation of transcription is still a very labor- and time-consuming process. Only a few transcription initiation scenarios have been studied in detail. The mechanism of interaction between basal machinery and promoter, in particular core promoter elements, is not known for the majority of identified promoters. In this study, we reveal various transcription initiation mechanisms by statistical analysis of 3393 nonredundant Drosophila promoters. RESULTS: Using Drosophila-specific position-weight matrices, we identified promoters containing TATA box, Initiator, Downstream Promoter Element (DPE), and Motif Ten Element (MTE), as well as core elements discovered in Human (TFIIB Recognition Element (BRE) and Downstream Core Element (DCE)). Promoters utilizing known synergetic combinations of two core elements (TATA_Inr, Inr_MTE, Inr_DPE, and DPE_MTE) were identified. We also establish the existence of promoters with potentially novel synergetic combinations: TATA_DPE and TATA_MTE. Our analysis revealed several motifs with the features of promoter elements, including possible novel core promoter element(s). Comparison of Human and Drosophila showed consistent percentages of promoters with TATA, Inr, DPE, and synergetic combinations thereof, as well as most of the same functional and mutual positions of the core elements. No statistical evidence of MTE utilization in Human was found. Distinct nucleosome positioning in particular promoter classes was revealed. CONCLUSION: We present lists of promoters that potentially utilize the aforementioned elements/combinations. The number of these promoters is two orders of magnitude larger than the number of promoters in which transcription initiation was experimentally studied. The sequences are ready to be experimentally tested or used for further statistical analysis. The developed approach may be utilized for other species.

Assuntos

Interpretação Estatística de Dados , Drosophila/genética , Regiões Promotoras Genéticas , Animais , Cromatina/química , Mapeamento Cromossômico/estatística & dados numéricos , Códon de Iniciação , Bases de Dados Genéticas , Humanos , Elementos Reguladores de Transcrição , TATA Box

Functional characterization of core promoter elements: the downstream core element is recognized by TAF1.

Lee, Dong-Hoon; Gershenzon, Naum; Gupta, Malavika; Ioshikhes, Ilya P; Reinberg, Danny; Lewis, Brian A.

Mol Cell Biol ; 25(21): 9674-86, 2005 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-16227614

RESUMO

Downstream elements are a newly appreciated class of core promoter elements of RNA polymerase II-transcribed genes. The downstream core element (DCE) was discovered in the human beta-globin promoter, and its sequence composition is distinct from that of the downstream promoter element (DPE). We show here that the DCE is a bona fide core promoter element present in a large number of promoters and with high incidence in promoters containing a TATA motif. Database analysis indicates that the DCE is found in diverse promoters, supporting its functional relevance in a variety of promoter contexts. The DCE consists of three subelements, and DCE function is recapitulated in a TFIID-dependent manner. Subelement 3 can function independently of the other two and shows a TFIID requirement as well. UV photo-cross-linking results demonstrate that TAF1/TAF(II)250 interacts with the DCE subelement DNA in a sequence-dependent manner. These data show that downstream elements consist of at least two types, those of the DPE class and those of the DCE class; they function via different DNA sequences and interact with different transcription activation factors. Finally, these data argue that TFIID is, in fact, a core promoter recognition complex.

Assuntos

Regiões Promotoras Genéticas , TATA Box/genética , Fatores Associados à Proteína de Ligação a TATA/genética , Fator de Transcrição TFIID/genética , Adenoviridae/genética , Motivos de Aminoácidos , Animais , Núcleo Celular/metabolismo , Bases de Dados Genéticas , Elementos Facilitadores Genéticos , Regulação da Expressão Gênica , Globinas/genética , Células HeLa , Histona Acetiltransferases , Humanos , Subunidades Proteicas/genética , Ratos , Saccharomyces cerevisiae/genética , Simplexvirus/genética

Promoter classifier: software package for promoter database analysis.

Gershenzon, Naum I; Ioshikhes, Ilya P.

Appl Bioinformatics ; 4(3): 205-9, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-16231962

RESUMO

Promoter Classifier is a package of seven stand-alone Windows-based C++ programs allowing the following basic manipulations with a set of promoter sequences: (i) calculation of positional distributions of nucleotides averaged over all promoters of the dataset; (ii) calculation of the averaged occurrence frequencies of the transcription factor binding sites and their combinations; (iii) division of the dataset into subsets of sequences containing or lacking certain promoter elements or combinations; (iv) extraction of the promoter subsets containing or lacking CpG islands around the transcription start site; and (v) calculation of spatial distributions of the promoter DNA stacking energy and bending stiffness. All programs have a user-friendly interface and provide the results in a convenient graphical form. The Promoter Classifier package is an effective tool for various basic manipulations with eukaryotic promoter sequences that usually are necessary for analysis of large promoter datasets. The program Promoter Divider is described in more detail as a representative component of the package.

Assuntos

Biologia Computacional/métodos , Regiões Promotoras Genéticas , Algoritmos , Sítios de Ligação , Computadores , Ilhas de CpG , Bases de Dados como Assunto , Bases de Dados de Ácidos Nucleicos , Internet , Dados de Sequência Molecular , Linguagens de Programação , Sequências Reguladoras de Ácido Nucleico , Alinhamento de Sequência , Software , Fator de Transcrição TFIIB/genética

Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites.

Gershenzon, Naum I; Stormo, Gary D; Ioshikhes, Ilya P.

Nucleic Acids Res ; 33(7): 2290-301, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-15849315

RESUMO

Position-weight matrices (PWMs) are broadly used to locate transcription factor binding sites in DNA sequences. The majority of existing PWMs provide a low level of both sensitivity and specificity. We present a new computational algorithm, a modification of the Staden-Bucher approach, that improves the PWM. We applied the proposed technique on the PWM of the GC-box, binding site for Sp1. The comparison of old and new PWMs shows that the latter increase both sensitivity and specificity. The statistical parameters of GC-box distribution in promoter regions and in the human genome, as well as in each chromosome, are presented. The majority of commonly used PWMs are the 4-row mononucleotide matrices, although 16-row dinucleotide matrices are known to be more informative. The algorithm efficiently determines the 16-row matrices and preliminary results show that such matrices provide better results than 4-row matrices.

Assuntos

Algoritmos , Biologia Computacional/métodos , Proteínas de Ligação a DNA/metabolismo , Análise de Sequência de DNA/métodos , Fatores de Transcrição/metabolismo , Sítios de Ligação , Genoma Humano , Humanos , Regiões Promotoras Genéticas , Elementos de Resposta , Fator de Transcrição Sp1/metabolismo

Synergy of human Pol II core promoter elements revealed by statistical sequence analysis.

Gershenzon, Naum I; Ioshikhes, Ilya P.

Bioinformatics ; 21(8): 1295-300, 2005 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-15572469

RESUMO

MOTIVATION: The subject of our paper is bioinformatics analysis of the distinguishing features of human promoter DNA sequences, in particular of synergetic combinations of core promoter elements therein. We suppose that specific scenarios of transcription initiation are essentially related to various particular implementations of the interaction of basal transcription machinery with promoter DNA, depending on the presence and mutual positioning of core promoter elements. RESULTS: In addition to the combinations of core promoter elements previously experimentally confirmed [TATA box and Initiator (Inr), Downstream Promoter Element (DPE) and Inr, and TFIIB recognition element (BRE) and TATA box] we propose other alternate synergetic combinations: BRE and Inr, BRE and DPE, and TATA and DPE with respective models. The suggestion is based on a high statistical significance of the alternate combinations in promoters, comparable with the significance of the known combinations. We also present arguments that the BRE element is statistically more important than previously thought, and suggest possible mechanisms of action of the core elements in the promoters with multiple transcription start sites. CONTACT: ioschikhes-1@medctr.osu.edu SUPPLEMENTARY INFORMATION: Supplementary information is available at http://bmi.osu.edu/~ilya/synergy/Gershenzon_SuppMat-R.pdf.

Assuntos

DNA Polimerase II/genética , Modelos Genéticos , Regiões Promotoras Genéticas/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Sítio de Iniciação de Transcrição , Ativação Transcricional/genética , Bases de Dados Genéticas , Humanos , Modelos Estatísticos , Elementos de Resposta/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA