Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Nat Biotechnol ; 31(2): 126-34, 2013 Feb.
Article in English | MEDLINE | ID: mdl-23354101

ABSTRACT

Genomic analyses often involve scanning for potential transcription factor (TF) binding sites using models of the sequence specificity of DNA binding proteins. Many approaches have been developed to model and learn a protein's DNA-binding specificity, but these methods have not been systematically compared. Here we applied 26 such approaches to in vitro protein binding microarray data for 66 mouse TFs belonging to various families. For nine TFs, we also scored the resulting motif models on in vivo data, and found that the best in vitro-derived motifs performed similarly to motifs derived from the in vivo data. Our results indicate that simple models based on mononucleotide position weight matrices trained by the best methods perform similarly to more complex models for most TFs examined, but fall short in specific cases (<10% of the TFs examined here). In addition, the best-performing motifs typically have relatively low information content, consistent with widespread degeneracy in eukaryotic TF sequence preferences.


Subject(s)
DNA-Binding Proteins/genetics , Nucleotide Motifs/genetics , Position-Specific Scoring Matrices , Transcription Factors , Algorithms , Animals , Computational Biology , DNA-Binding Proteins/chemistry , Genome , Mice , Protein Array Analysis , Transcription Factors/genetics , Transcription Factors/metabolism
2.
Cell ; 147(1): 132-46, 2011 Sep 30.
Article in English | MEDLINE | ID: mdl-21924763

ABSTRACT

Alternative splicing (AS) is a key process underlying the expansion of proteomic diversity and the regulation of gene expression. Here, we identify an evolutionarily conserved embryonic stem cell (ESC)-specific AS event that changes the DNA-binding preference of the forkhead family transcription factor FOXP1. We show that the ESC-specific isoform of FOXP1 stimulates the expression of transcription factor genes required for pluripotency, including OCT4, NANOG, NR5A2, and GDF3, while concomitantly repressing genes required for ESC differentiation. This isoform also promotes the maintenance of ESC pluripotency and contributes to efficient reprogramming of somatic cells into induced pluripotent stem cells. These results reveal a pivotal role for an AS event in the regulation of pluripotency through the control of critical ESC-specific transcriptional programs.


Subject(s)
Alternative Splicing , Cellular Reprogramming , Embryonic Stem Cells/metabolism , Forkhead Transcription Factors/metabolism , Gene Expression Regulation, Developmental , Pluripotent Stem Cells/metabolism , Repressor Proteins/metabolism , Animals , DNA/metabolism , Embryonic Stem Cells/cytology , Genes, Homeobox , Humans , Mice , Pluripotent Stem Cells/cytology , Protein Isoforms/metabolism
3.
Dev Biol ; 358(1): 137-46, 2011 Oct 01.
Article in English | MEDLINE | ID: mdl-21810415

ABSTRACT

The cAMP response element-binding protein (CREB) is a highly conserved transcription factor that integrates signaling through the cAMP-dependent protein kinase A (PKA) in many eukaryotes. PKA plays a critical role in Dictyostelium development but no CREB homologue has been identified in this system. Here we show that Dictyostelium utilizes a CREB-like protein, BzpF, to integrate PKA signaling during late development. bzpF(-) mutants produce compromised spores, which are extremely unstable and germination defective. Previously, we have found that BzpF binds the canonical CRE motif in vitro. In this paper, we determined the DNA binding specificity of BzpF using protein binding microarray (PBM) and showed that the motif with the highest specificity is a CRE-like sequence. BzpF is necessary to activate the transcription of at least 15 PKA-regulated, late-developmental target genes whose promoters contain BzpF binding motifs. BzpF is sufficient to activate two of these genes. The comparison of RNA sequencing data between wild type and bzpF(-) mutant revealed that the mutant fails to express 205 genes, many of which encode cellulose-binding and sugar-binding proteins. We propose that BzpF is a CREB-like transcription factor that regulates spore maturation and stability in a PKA-related manner.


Subject(s)
Cyclic AMP Response Element-Binding Protein/metabolism , Dictyostelium/physiology , Signal Transduction/physiology , Spores, Protozoan/growth & development , Cyclic AMP Response Element-Binding Protein/genetics , Cyclic AMP-Dependent Protein Kinases/metabolism , DNA Primers/genetics , Microarray Analysis , Plasmids/genetics , Protein Binding , Reverse Transcriptase Polymerase Chain Reaction , Sequence Analysis, RNA , Spores, Protozoan/metabolism
4.
Nucleic Acids Res ; 38(22): 7927-42, 2010 Dec.
Article in English | MEDLINE | ID: mdl-20705649

ABSTRACT

Classifying proteins into subgroups with similar molecular function on the basis of sequence is an important step in deriving reliable functional annotations computationally. So far, however, available classification procedures have been evaluated against protein subgroups that are defined by experts using mainly qualitative descriptions of molecular function. Recently, in vitro DNA-binding preferences to all possible 8-nt DNA sequences have been measured for 178 mouse homeodomains using protein-binding microarrays, offering the unprecedented opportunity of evaluating the classification methods against quantitative measures of molecular function. To this end, we automatically derive homeodomain subtypes from the DNA-binding data and independently group the same domains using sequence information alone. We test five sequence-based methods, which use different sequence-similarity measures and algorithms to group sequences. Results show that methods that optimize the classification robustness reflect well the detailed functional specificity revealed by the experimental data. In some of these classifications, 73-83% of the subfamilies exactly correspond to, or are completely contained in, the function-based subtypes. Our findings demonstrate that certain sequence-based classifications are capable of yielding very specific molecular function annotations. The availability of quantitative descriptions of molecular function, such as DNA-binding data, will be a key factor in exploiting this potential in the future.


Subject(s)
Homeodomain Proteins/classification , Animals , DNA/metabolism , Homeodomain Proteins/chemical synthesis , Homeodomain Proteins/metabolism , Mice , Sequence Analysis, Protein
5.
EMBO J ; 29(13): 2147-60, 2010 Jul 07.
Article in English | MEDLINE | ID: mdl-20517297

ABSTRACT

Members of the large ETS family of transcription factors (TFs) have highly similar DNA-binding domains (DBDs)-yet they have diverse functions and activities in physiology and oncogenesis. Some differences in DNA-binding preferences within this family have been described, but they have not been analysed systematically, and their contributions to targeting remain largely uncharacterized. We report here the DNA-binding profiles for all human and mouse ETS factors, which we generated using two different methods: a high-throughput microwell-based TF DNA-binding specificity assay, and protein-binding microarrays (PBMs). Both approaches reveal that the ETS-binding profiles cluster into four distinct classes, and that all ETS factors linked to cancer, ERG, ETV1, ETV4 and FLI1, fall into just one of these classes. We identify amino-acid residues that are critical for the differences in specificity between all the classes, and confirm the specificities in vivo using chromatin immunoprecipitation followed by sequencing (ChIP-seq) for a member of each class. The results indicate that even relatively small differences in in vitro binding specificity of a TF contribute to site selectivity in vivo.


Subject(s)
DNA/metabolism , Genome-Wide Association Study , Proto-Oncogene Proteins c-ets/metabolism , Animals , Base Sequence , Binding Sites , Cell Line , DNA/chemistry , Humans , Mice , Models, Molecular , Protein Binding , Proto-Oncogene Proteins c-ets/chemistry , Sequence Analysis, DNA
6.
Genome Res ; 20(6): 861-73, 2010 Jun.
Article in English | MEDLINE | ID: mdl-20378718

ABSTRACT

The genetic code-the binding specificity of all transfer-RNAs--defines how protein primary structure is determined by DNA sequence. DNA also dictates when and where proteins are expressed, and this information is encoded in a pattern of specific sequence motifs that are recognized by transcription factors. However, the DNA-binding specificity is only known for a small fraction of the approximately 1400 human transcription factors (TFs). We describe here a high-throughput method for analyzing transcription factor binding specificity that is based on systematic evolution of ligands by exponential enrichment (SELEX) and massively parallel sequencing. The method is optimized for analysis of large numbers of TFs in parallel through the use of affinity-tagged proteins, barcoded selection oligonucleotides, and multiplexed sequencing. Data are analyzed by a new bioinformatic platform that uses the hundreds of thousands of sequencing reads obtained to control the quality of the experiments and to generate binding motifs for the TFs. The described technology allows higher throughput and identification of much longer binding profiles than current microarray-based methods. In addition, as our method is based on proteins expressed in mammalian cells, it can also be used to characterize DNA-binding preferences of full-length proteins or proteins requiring post-translational modifications. We validate the method by determining binding specificities of 14 different classes of TFs and by confirming the specificities for NFATC1 and RFX3 using ChIP-seq. Our results reveal unexpected dimeric modes of binding for several factors that were thought to preferentially bind DNA as monomers.


Subject(s)
SELEX Aptamer Technique , Transcription Factors/metabolism , Affinity Labels , Base Sequence , Binding Sites , DNA , Humans , Molecular Sequence Data
7.
Nat Biotechnol ; 27(7): 667-70, 2009 Jul.
Article in English | MEDLINE | ID: mdl-19561594

ABSTRACT

Metazoan genomes encode hundreds of RNA-binding proteins (RBPs) but RNA-binding preferences for relatively few RBPs have been well defined. Current techniques for determining RNA targets, including in vitro selection and RNA co-immunoprecipitation, require significant time and labor investment. Here we introduce RNAcompete, a method for the systematic analysis of RNA binding specificities that uses a single binding reaction to determine the relative preferences of RBPs for short RNAs that contain a complete range of k-mers in structured and unstructured RNA contexts. We tested RNAcompete by analyzing nine diverse RBPs (HuR, Vts1, FUSIP1, PTB, U1A, SF2/ASF, SLM2, RBM4 and YB1). RNAcompete identified expected and previously unknown RNA binding preferences. Using in vitro and in vivo binding data, we demonstrate that preferences for individual 7-mers identified by RNAcompete are a more accurate representation of binding activity than are conventional motif models. We anticipate that RNAcompete will be a valuable tool for the study of RNA-protein interactions.


Subject(s)
Oligonucleotide Array Sequence Analysis/methods , RNA-Binding Proteins/metabolism , RNA/metabolism , Animals , Base Sequence , Binding Sites/genetics , Databases, Nucleic Acid , Genome , Molecular Sequence Data , RNA/chemistry , RNA/genetics , RNA-Binding Proteins/chemistry , RNA-Binding Proteins/genetics , ROC Curve , Substrate Specificity
8.
Science ; 324(5935): 1720-3, 2009 Jun 26.
Article in English | MEDLINE | ID: mdl-19443739

ABSTRACT

Sequence preferences of DNA binding proteins are a primary mechanism by which cells interpret the genome. Despite the central importance of these proteins in physiology, development, and evolution, comprehensive DNA binding specificities have been determined experimentally for only a few proteins. Here, we used microarrays containing all 10-base pair sequences to examine the binding specificities of 104 distinct mouse DNA binding proteins representing 22 structural classes. Our results reveal a complex landscape of binding, with virtually every protein analyzed possessing unique preferences. Roughly half of the proteins each recognized multiple distinctly different sequence motifs, challenging our molecular understanding of how proteins interact with their DNA binding sites. This complexity in DNA recognition may be important in gene regulation and in the evolution of transcriptional regulatory networks.


Subject(s)
DNA/metabolism , Transcription Factors/chemistry , Transcription Factors/metabolism , Amino Acid Motifs , Amino Acid Sequence , Animals , Base Sequence , Binding Sites , DNA/chemistry , Electrophoretic Mobility Shift Assay , Gene Expression Regulation , Gene Regulatory Networks , Humans , Mice , Protein Array Analysis , Protein Binding , Protein Structure, Tertiary , Recombinant Fusion Proteins/chemistry , Recombinant Fusion Proteins/metabolism
9.
Bioinformatics ; 25(8): 1012-8, 2009 Apr 15.
Article in English | MEDLINE | ID: mdl-19088121

ABSTRACT

MOTIVATION: Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA-protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members. RESULTS: We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF-DNA recognition, and suggest a rational approach for future analyses of TF families.


Subject(s)
Computational Biology/methods , DNA/chemistry , Sequence Analysis, DNA/methods , Transcription Factors/metabolism , Binding Sites , DNA/metabolism , Transcription Factors/chemistry
10.
Mol Cell ; 32(6): 878-87, 2008 Dec 26.
Article in English | MEDLINE | ID: mdl-19111667

ABSTRACT

The sequence specificity of DNA-binding proteins is the primary mechanism by which the cell recognizes genomic features. Here, we describe systematic determination of yeast transcription factor DNA-binding specificities. We obtained binding specificities for 112 DNA-binding proteins representing 19 distinct structural classes. One-third of the binding specificities have not been previously reported. Several binding sequences have striking genomic distributions relative to transcription start sites, supporting their biological relevance and suggesting a role in promoter architecture. Among these are Rsc3 binding sequences, containing the core CGCG, which are found preferentially approximately 100 bp upstream of transcription start sites. Mutation of RSC3 results in a dramatic increase in nucleosome occupancy in hundreds of proximal promoters containing a Rsc3 binding element, but has little impact on promoters lacking Rsc3 binding sequences, indicating that Rsc3 plays a broad role in targeting nucleosome exclusion at yeast promoters.


Subject(s)
DNA-Binding Proteins/metabolism , Nucleosomes/metabolism , Promoter Regions, Genetic , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/genetics , Transcription Factors/genetics , Base Sequence , Binding Sites , Genes, Fungal , Molecular Sequence Data , Mutation/genetics , Phylogeny , Reproducibility of Results , Sequence Homology, Amino Acid , Transcription Factors/metabolism
11.
Cell ; 133(7): 1266-76, 2008 Jun 27.
Article in English | MEDLINE | ID: mdl-18585359

ABSTRACT

Most homeodomains are unique within a genome, yet many are highly conserved across vast evolutionary distances, implying strong selection on their precise DNA-binding specificities. We determined the binding preferences of the majority (168) of mouse homeodomains to all possible 8-base sequences, revealing rich and complex patterns of sequence specificity and showing that there are at least 65 distinct homeodomain DNA-binding activities. We developed a computational system that successfully predicts binding sites for homeodomain proteins as distant from mouse as Drosophila and C. elegans, and we infer full 8-mer binding profiles for the majority of known animal homeodomains. Our results provide an unprecedented level of resolution in the analysis of this simple domain structure and suggest that variation in sequence recognition may be a factor in its functional diversity and evolutionary success.


Subject(s)
DNA/chemistry , Homeodomain Proteins/chemistry , Animals , Base Sequence , Computational Biology , Conserved Sequence , DNA/metabolism , Evolution, Molecular , Homeodomain Proteins/metabolism , Mice , Models, Molecular , Protein Binding , Transcription Factors/chemistry , Transcription Factors/metabolism
12.
J Proteome Res ; 7(4): 1529-41, 2008 Apr.
Article in English | MEDLINE | ID: mdl-18311902

ABSTRACT

In breast cancer, there is a significant degree of molecular diversity among tumors. Multiple perturbations in signal transduction pathways impinge on transcriptional networks that in turn dictate malignant transformation and metastatic progression. Detailed knowledge of the sequence-specific transcription factors that become activated or repressed within a tumor and comparison of their relative levels of expression in cancer versus normal tissue should therefore provide insight into disease mechanisms, improving patient stratification and facilitating personalized treatment. While high-throughput tandem mass spectrometry methods for global proteome profiling have been developed, existing approaches have limited sensitivity and are often unable to detect low-abundance transcription factors in a complex biological specimen like a biopsy or tumor cell extract. To this end, we have undertaken a systematic comparative evaluation of three MS/MS methods for the ability to detect reference transcription factors spiked in known amounts into a cell-free breast cancer nuclear extract: Data-Dependent Acquisition (DDA), wherein precursor ion intensity dictates selection for fragmentation; Targeted Peptide Monitoring (TPM), a directed approach using successive isolation and fragmentation of predefined m/ z ratios; and Multiple Reaction Monitoring (MRM), in which specific precursor ion to product ion transitions are selectively monitored. Through a series of controlled, parallel benchmarking experiments, we have determined the relative figures-of-merit of each approach, and have established that prior knowledge of signature proteotypic peptides markedly improves overall detection sensitivity, reliability, and quantification.


Subject(s)
Breast Neoplasms/metabolism , Proteomics/methods , Tandem Mass Spectrometry/methods , Transcription Factors/analysis , Amino Acid Sequence , Breast Neoplasms/genetics , Breast Neoplasms/pathology , Cell Line, Tumor , Cell Nucleus/metabolism , Chromatography, Liquid , Female , Gene Expression Regulation, Neoplastic , Humans , Molecular Sequence Data , NF-kappa B p52 Subunit/analysis , NF-kappa B p52 Subunit/genetics , Recombinant Fusion Proteins/analysis , Recombinant Fusion Proteins/genetics , STAT1 Transcription Factor/analysis , STAT1 Transcription Factor/genetics , Transcription Factors/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...