Search | VHL Regional Portal

1.

Aberrant homeodomain-DNA cooperative dimerization underlies distinct developmental defects in two dominant CRX retinopathy models.

Zheng, Yiqiao; Stormo, Gary D; Chen, Shiming.

bioRxiv ; 2024 Mar 14.

Article in English | MEDLINE | ID: mdl-38559186

ABSTRACT

Paired-class homeodomain transcription factors (HD TFs) play essential roles in vertebrate development, and their mutations are linked to human diseases. One unique feature of paired-class HD is cooperative dimerization on specific palindrome DNA sequences. Yet, the functional significance of HD cooperative dimerization in animal development and its dysregulation in diseases remain elusive. Using the retinal TF Cone-rod Homeobox (CRX) as a model, we have studied how blindness-causing mutations in the paired HD, p.E80A and p.K88N, alter CRX's cooperative dimerization, lead to gene misexpression and photoreceptor developmental deficits in dominant manners. CRXE80A maintains binding at monomeric WT CRX motifs but is deficient in cooperative binding at dimeric motifs. CRXE80A's cooperativity defect impacts the exponential increase of photoreceptor gene expression in terminal differentiation and produces immature, non-functional photoreceptors in the CrxE80A retinas. CRXK88N is highly cooperative and localizes to ectopic genomic sites with strong enrichment of dimeric HD motifs. CRXK88N's altered biochemical properties disrupt CRX's ability to direct dynamic chromatin remodeling during development to activate photoreceptor differentiation programs and silence progenitor programs. Our study here provides in vitro and in vivo molecular evidence that paired-class HD cooperative dimerization regulates neuronal development and dysregulation of cooperative binding contributes to severe dominant blinding retinopathies.

2.

Finding motifs using DNA images derived from sparse representations.

Chu, Shane K; Stormo, Gary D.

Bioinformatics ; 39(6)2023 06 01.

Article in English | MEDLINE | ID: mdl-37294804

ABSTRACT

MOTIVATION: Motifs play a crucial role in computational biology, as they provide valuable information about the binding specificity of proteins. However, conventional motif discovery methods typically rely on simple combinatoric or probabilistic approaches, which can be biased by heuristics such as substring-masking for multiple motif discovery. In recent years, deep neural networks have become increasingly popular for motif discovery, as they are capable of capturing complex patterns in data. Nonetheless, inferring motifs from neural networks remains a challenging problem, both from a modeling and computational standpoint, despite the success of these networks in supervised learning tasks. RESULTS: We present a principled representation learning approach based on a hierarchical sparse representation for motif discovery. Our method effectively discovers gapped, long, or overlapping motifs that we show to commonly exist in next-generation sequencing datasets, in addition to the short and enriched primary binding sites. Our model is fully interpretable, fast, and capable of capturing motifs in a large number of DNA strings. A key concept emerged from our approach-enumerating at the image level-effectively overcomes the k-mers paradigm, enabling modest computational resources for capturing the long and varied but conserved patterns, in addition to capturing the primary binding sites. AVAILABILITY AND IMPLEMENTATION: Our method is available as a Julia package under the MIT license at https://github.com/kchu25/MOTIFs.jl, and the results on experimental data can be found at https://zenodo.org/record/7783033.

Subject(s)

Proteins , Software , Proteins/chemistry , Binding Sites , Neural Networks, Computer , DNA

3.

On the dependent recognition of some long zinc finger proteins.

Zuo, Zheng; Billings, Timothy; Walker, Michael; Petkov, Petko M; Fordyce, Polly M; Stormo, Gary D.

Nucleic Acids Res ; 51(11): 5364-5376, 2023 06 23.

Article in English | MEDLINE | ID: mdl-36951113

ABSTRACT

The human genome contains about 800 C2H2 zinc finger proteins (ZFPs), and most of them are composed of long arrays of zinc fingers. Standard ZFP recognition model asserts longer finger arrays should recognize longer DNA-binding sites. However, recent experimental efforts to identify in vivo ZFP binding sites contradict this assumption, with many exhibiting short motifs. Here we use ZFY, CTCF, ZIM3, and ZNF343 as examples to address three closely related questions: What are the reasons that impede current motif discovery methods? What are the functions of those seemingly unused fingers and how can we improve the motif discovery algorithms based on long ZFPs' biophysical properties? Using ZFY, we employed a variety of methods and find evidence for 'dependent recognition' where downstream fingers can recognize some previously undiscovered motifs only in the presence of an intact core site. For CTCF, high-throughput measurements revealed its upstream specificity profile depends on the strength of its core. Moreover, the binding strength of the upstream site modulates CTCF's sensitivity to different epigenetic modifications within the core, providing new insight into how the previously identified intellectual disability-causing and cancer-related mutant R567W disrupts upstream recognition and deregulates the epigenetic control by CTCF. Our results establish that, because of irregular motif structures, variable spacing and dependent recognition between sub-motifs, the specificities of long ZFPs are significantly underestimated, so we developed an algorithm, ModeMap, to infer the motifs and recognition models of ZIM3 and ZNF343, which facilitates high-confidence identification of specific binding sites, including repeats-derived elements. With revised concept, technique, and algorithm, we can discover the overlooked specificities and functions of those 'extra' fingers, and therefore decipher their broader roles in human biology and diseases.

Subject(s)

DNA , Transcription Factors , Zinc Fingers , Humans , Binding Sites , Transcription Factors/chemistry , Transcription Factors/metabolism , Algorithms , Nucleotide Motifs , Amino Acid Motifs , DNA/chemistry , DNA/metabolism

4.

Corrigendum to: Spec-seq: determining protein-DNA-binding specificity by sequencing.

Zuo, Zheng; Chang, Yiming Kenny; Stormo, Gary D.

Brief Funct Genomics ; 21(2): 142, 2022 Apr 11.

Article in English | MEDLINE | ID: mdl-34923581

5.

Directed Evolution of an Enhanced POU Reprogramming Factor for Cell Fate Engineering.

Tan, Daisylyn Senna; Chen, Yanpu; Gao, Ya; Bednarz, Anastasia; Wei, Yuanjie; Malik, Vikas; Ho, Derek Hoi-Hang; Weng, Mingxi; Ho, Sik Yin; Srivastava, Yogesh; Velychko, Sergiy; Yang, Xiaoxiao; Fan, Ligang; Kim, Johnny; Graumann, Johannes; Stormo, Gary D; Braun, Thomas; Yan, Jian; Schöler, Hans R; Jauch, Ralf.

Mol Biol Evol ; 38(7): 2854-2868, 2021 06 25.

Article in English | MEDLINE | ID: mdl-33720298

ABSTRACT

Transcription factor-driven cell fate engineering in pluripotency induction, transdifferentiation, and forward reprogramming requires efficiency, speed, and maturity for widespread adoption and clinical translation. Here, we used Oct4, Sox2, Klf4, and c-Myc driven pluripotency reprogramming to evaluate methods for enhancing and tailoring cell fate transitions, through directed evolution with iterative screening of pooled mutant libraries and phenotypic selection. We identified an artificially evolved and enhanced POU factor (ePOU) that substantially outperforms wild-type Oct4 in terms of reprogramming speed and efficiency. In contrast to Oct4, not only can ePOU induce pluripotency with Sox2 alone, but it can also do so in the absence of Sox2 in a three-factor ePOU/Klf4/c-Myc cocktail. Biochemical assays combined with genome-wide analyses showed that ePOU possesses a new preference to dimerize on palindromic DNA elements. Yet, the moderate capacity of Oct4 to function as a pioneer factor, its preference to bind octamer DNA and its capability to dimerize with Sox2 and Sox17 proteins remain unchanged in ePOU. Compared with Oct4, ePOU is thermodynamically stabilized and persists longer in reprogramming cells. In consequence, ePOU: 1) differentially activates several genes hitherto not implicated in reprogramming, 2) reveals an unappreciated role of thyrotropin-releasing hormone signaling, and 3) binds a distinct class of retrotransposons. Collectively, these features enable ePOU to accelerate the establishment of the pluripotency network. This demonstrates that the phenotypic selection of novel factor variants from mammalian cells with desired properties is key to advancing cell fate conversions with artificially evolved biomolecules.

Subject(s)

Cellular Reprogramming Techniques , Directed Molecular Evolution , POU Domain Factors/genetics , Animals , Kruppel-Like Factor 4 , Mice , Protein Engineering

6.

Autoregulation of yeast ribosomal proteins discovered by efficient search for feedback regulation.

Roy, Basab; Granas, David; Bragg, Fredrick; Cher, Jonathan A Y; White, Michael A; Stormo, Gary D.

Commun Biol ; 3(1): 761, 2020 12 11.

Article in English | MEDLINE | ID: mdl-33311538

ABSTRACT

Post-transcriptional autoregulation of gene expression is common in bacteria but many fewer examples are known in eukaryotes. We used the yeast collection of genes fused to GFP as a rapid screen for examples of feedback regulation in ribosomal proteins by overexpressing a non-regulatable version of a gene and observing the effects on the expression of the GFP-fused version. We tested 95 ribosomal protein genes and found a wide continuum of effects, with 30% showing at least a 3-fold reduction in expression. Two genes, RPS22B and RPL1B, showed over a 10-fold repression. In both cases the cis-regulatory segment resides in the 5' UTR of the gene as shown by placing that segment of the mRNA upstream of GFP alone and demonstrating it is sufficient to cause repression of GFP when the protein is over-expressed. Further analyses showed that the intron in the 5' UTR of RPS22B is required for regulation, presumably because the protein inhibits splicing that is necessary for translation. The 5' UTR of RPL1B contains a sequence and structure motif that is conserved in the binding sites of Rpl1 orthologs from bacteria to mammals, and mutations within the motif eliminate repression.

Subject(s)

Feedback, Physiological , Fungal Proteins/genetics , Gene Expression Regulation, Fungal , Ribosomal Proteins/genetics , Base Sequence , Fungal Proteins/chemistry , Fungal Proteins/metabolism , Gene Order , Genes, Reporter , Homeostasis , Plasmids/genetics , Ribosomal Proteins/chemistry , Ribosomal Proteins/metabolism

7.

Alternative Splicing During the Chlamydomonasreinhardtii Cell Cycle.

Pandey, Manishi; Stormo, Gary D; Dutcher, Susan K.

G3 (Bethesda) ; 10(10): 3797-3810, 2020 10 05.

Article in English | MEDLINE | ID: mdl-32817123

ABSTRACT

Genome-wide analysis of transcriptome data in Chlamydomonas reinhardtii shows periodic patterns in gene expression levels when cultures are grown under alternating light and dark cycles so that G1 of the cell cycle occurs in the light phase and S/M/G0 occurs during the dark phase. However, alternative splicing, a process that enables a greater protein diversity from a limited set of genes, remains largely unexplored by previous transcriptome based studies in C. reinhardtii In this study, we used existing longitudinal RNA-seq data obtained during the light-dark cycle to investigate the changes in the alternative splicing pattern and found that 3277 genes (19.75% of 17,746 genes) undergo alternative splicing. These splicing events include Alternative 5' (Alt 5'), Alternative 3' (Alt 3') and Exon skipping (ES) events that are referred as alternative site selection (ASS) events and Intron retention (IR) events. By clustering analysis, we identified a subset of events (26 ASS events and 10 IR events) that show periodic changes in the splicing pattern during the cell cycle. About two-thirds of these 36 genes either introduce a pre-termination codon (PTC) or introduce insertions or deletions into functional domains of the proteins, which implicate splicing in altering gene function. These findings suggest that alternative splicing is also regulated during the Chlamydomonas cell cycle, although not as extensively as changes in gene expression. The longitudinal changes in the alternative splicing pattern during the cell cycle captured by this study provides an important resource to investigate alternative splicing in genes of interest during the cell cycle in Chlamydomonas reinhardtii and other eukaryotes.

Subject(s)

Alternative Splicing , Chlamydomonas reinhardtii , Cell Cycle/genetics , Chlamydomonas reinhardtii/genetics , Exons , Introns

8.

Redefining fundamental concepts of transcription initiation in bacteria.

Mejía-Almonte, Citlalli; Busby, Stephen J W; Wade, Joseph T; van Helden, Jacques; Arkin, Adam P; Stormo, Gary D; Eilbeck, Karen; Palsson, Bernhard O; Galagan, James E; Collado-Vides, Julio.

Nat Rev Genet ; 21(11): 699-714, 2020 11.

Article in English | MEDLINE | ID: mdl-32665585

ABSTRACT

Despite enormous progress in understanding the fundamentals of bacterial gene regulation, our knowledge remains limited when compared with the number of bacterial genomes and regulatory systems to be discovered. Derived from a small number of initial studies, classic definitions for concepts of gene regulation have evolved as the number of characterized promoters has increased. Together with discoveries made using new technologies, this knowledge has led to revised generalizations and principles. In this Expert Recommendation, we suggest precise, updated definitions that support a logical, consistent conceptual framework of bacterial gene regulation, focusing on transcription initiation. The resulting concepts can be formalized by ontologies for computational modelling, laying the foundation for improved bioinformatics tools, knowledge-based resources and scientific communication. Thus, this work will help researchers construct better predictive models, with different formalisms, that will be useful in engineering, synthetic biology, microbiology and genetics.

Subject(s)

Bacteria/genetics , Gene Expression Regulation, Bacterial , Transcription Initiation, Genetic , Operon , Promoter Regions, Genetic , Regulon , Transcription Factors/physiology

9.

Comparison of discriminative motif optimization using matrix and DNA shape-based models.

Ruan, Shuxiang; Stormo, Gary D.

BMC Bioinformatics ; 19(1): 86, 2018 03 06.

Article in English | MEDLINE | ID: mdl-29510689

ABSTRACT

BACKGROUND: Transcription factor (TF) binding site specificity is commonly represented by some form of matrix model in which the positions in the binding site are assumed to contribute independently to the site's activity. The independence assumption is known to be an approximation, often a good one but sometimes poor. Alternative approaches have been developed that use k-mers (DNA "words" of length k) to account for the non-independence, and more recently DNA structural parameters have been incorporated into the models. ChIP-seq data are often used to assess the discriminatory power of motifs and to compare different models. However, to measure the improvement due to using more complex models, one must compare to optimized matrix models. RESULTS: We describe a program "Discriminative Additive Model Optimization" (DAMO) that uses positive and negative examples, as in ChIP-seq data, and finds the additive position weight matrix (PWM) that maximizes the Area Under the Receiver Operating Characteristic Curve (AUROC). We compare to a recent study where structural parameters, serving as features in a gradient boosting classifier algorithm, are shown to improve the AUROC over JASPAR position frequency matrices (PFMs). In agreement with the previous results, we find that adding structural parameters gives the largest improvement, but most of the gain can be obtained by an optimized PWM and nearly all of the gain can be obtained with a di-nucleotide extension to the PWM. CONCLUSION: To appropriately compare different models for TF bind sites, optimized models must be used. PWMs and their extensions are good representations of binding specificity for most TFs, and more complex models, including the incorporation of DNA shape features and gradient boosting classifiers, provide only moderate improvements for a few TFs.

Subject(s)

Algorithms , DNA/chemistry , Models, Molecular , Nucleotide Motifs/genetics , Position-Specific Scoring Matrices , Area Under Curve , Binding Sites , Databases, Nucleic Acid , Humans , Protein Binding

10.

Quantitative profiling of BATF family proteins/JUNB/IRF hetero-trimers using Spec-seq.

Chang, Yiming K; Zuo, Zheng; Stormo, Gary D.

BMC Mol Biol ; 19(1): 5, 2018 03 27.

Article in English | MEDLINE | ID: mdl-29587652

ABSTRACT

BACKGROUND: BATF family transcription factors (BATF, BATF2 and BATF3) form hetero-trimers with JUNB and either IRF4 or IRF8 to regulate cell fate in T cells and dendritic cells in vivo. While each combination of the hetero-trimer has a distinct role, some degree of cross-compensation was observed. The basis for the differential actions of IRF4 and IRF8 with BATF factors and JUNB is still unknown. We propose that the differences in function between these hetero-trimers may be caused by differences in their DNA binding preferences. While all three BATF family transcription factors have similar binding preferences when binding as a hetero-dimer with JUNB, the cooperative binding of IRF4 or IRF8 to the hetero-dimer/DNA complex could change the preferences. We used Spec-seq, which allows for the efficient and accurate determination of relative affinity to a large collection of sequences in parallel, to find differences between cooperative DNA binding of IRF4, IRF8 and BATF family members. RESULTS: We found that without IRF binding, all three hetero-dimer pairs exhibit nearly the same binding preferences to both expected wildtype binding sites TRE (TGA(C/G)TCA) and CRE (TGACGTCA). IRF4 and IRF8 show the very similar DNA binding preferences when binding with any of the three hetero-dimers. No major change of binding preferences was found in the half-sites between different hetero-trimers. IRF proteins bind with substantially lower affinity with either a single nucleotide spacer between IRF and BATF binding site or with an alternative mode of binding in the opposite orientation. In addition, the preference to CRE binding site was reduced with either IRF binding in all BATF-JUNB combinations. CONCLUSIONS: The specificities of BATF, BATF2 and BATF3 are all very similar as are their interactions with IRF4 and IRF8. IRF proteins binding adjacent to BATF sites increases affinity substantially compared to sequences with spacings between the sites, indicating cooperative binding through protein-protein interactions. The preference for the type of BATF binding site, TRE or CRE, is also altered when IRF proteins bind. These in vitro preferences aid in the understanding of in vivo binding activities.

Subject(s)

Basic-Leucine Zipper Transcription Factors/metabolism , Interferon Regulatory Factors/genetics , Sequence Analysis, DNA/methods , Transcription Factors/genetics , Animals , Basic-Leucine Zipper Transcription Factors/chemistry , Basic-Leucine Zipper Transcription Factors/genetics , Binding Sites , Humans , Interferon Regulatory Factors/chemistry , Interferon Regulatory Factors/metabolism , Mice , Protein Multimerization , Repressor Proteins/chemistry , Repressor Proteins/genetics , Repressor Proteins/metabolism , Transcription Factors/chemistry , Transcription Factors/metabolism , Tumor Suppressor Proteins/chemistry , Tumor Suppressor Proteins/genetics , Tumor Suppressor Proteins/metabolism

11.

Measuring quantitative effects of methylation on transcription factor-DNA binding affinity.

Zuo, Zheng; Roy, Basab; Chang, Yiming Kenny; Granas, David; Stormo, Gary D.

Sci Adv ; 3(11): eaao1799, 2017 11.

Article in English | MEDLINE | ID: mdl-29159284

ABSTRACT

Methylation of CpG (cytosine-phosphate-guanine) dinucleotides is a common epigenetic mark that influences gene expression. The effects of methylation on transcription factor (TF) binding are unknown for most TFs and, even when known, such knowledge is often only qualitative. In reality, methylation sensitivity is a quantitative effect, just as changes to the DNA sequence have quantitative effects on TF binding affinity. We describe Methyl-Spec-seq, an easy-to-use method that measures the effects of CpG methylation (mCPG) on binding affinity for hundreds to thousands of variants in parallel, allowing one to quantitatively assess the effects at every position in a binding site. We demonstrate its use on several important DNA binding proteins. We calibrate the accuracy of Methyl-Spec-seq using a novel two-color competitive fluorescence anisotropy method that can accurately determine the relative affinities of two sequences in solution. We also present software that extends standard methods for representing, visualizing, and searching for matches to binding site motifs to include the effects of methylation. These tools facilitate the study of the consequences for gene regulation of epigenetic marks on DNA.

Subject(s)

DNA Methylation , DNA/metabolism , Transcription Factors/metabolism , Animals , CCCTC-Binding Factor/chemistry , CCCTC-Binding Factor/metabolism , CpG Islands , DNA/chemistry , Fluorescence Polarization , Homeodomain Proteins/chemistry , Homeodomain Proteins/metabolism , Mice , Patched-1 Receptor/chemistry , Patched-1 Receptor/metabolism , Protein Binding , Recombinant Proteins/biosynthesis , Recombinant Proteins/chemistry , Recombinant Proteins/isolation & purification , Repressor Proteins/chemistry , Repressor Proteins/genetics , Repressor Proteins/metabolism , Transcription Factors/chemistry

12.

Coop-Seq Analysis Demonstrates that Sox2 Evokes Latent Specificities in the DNA Recognition by Pax6.

Hu, Caizhen; Malik, Vikas; Chang, Yiming Kenny; Veerapandian, Veeramohan; Srivastava, Yogesh; Huang, Yong-Heng; Hou, Linlin; Cojocaru, Vlad; Stormo, Gary D; Jauch, Ralf.

J Mol Biol ; 429(23): 3626-3634, 2017 11 24.

Article in English | MEDLINE | ID: mdl-29050852

ABSTRACT

Sox2 and Pax6 co-regulate genes in neural lineages and the lens by forming a ternary complex likely facilitated allosterically through DNA. We used the quantitative and scalable cooperativity-by-sequencing (Coop-seq) approach to interrogate Sox2/Pax6 dimerization on a DNA library where five positions of the Pax6 half-site were randomized yielding 1024 cooperativity factors. Consensus positions normally required for the high-affinity DNA binding by Pax6 need to be mutated for effective dimerization with Sox2. Out of the five randomized bases, a 5' thymidine is present in most of the top ranking elements. However, this thymidine maps to a region outside of the Pax half site and is not expected to directly interact with Pax6 in known binding modes suggesting structural reconfigurations. Re-analysis of ChIP-seq data identified several genomic regions where the cooperativity promoting sequence pattern is co-bound by Sox2 and Pax6. A highly conserved Sox2/Pax6 bound site near the Sprouty2 locus was verified to promote cooperative dimerization designating Sprouty2 as a potential target reliant on Sox2/Pax6 cooperativity in several neural cell types. Collectively, the functional interplay of Sox2 and Pax6 demands the relaxation of high-affinity binding sites and is enabled by alternative DNA sequences. We conclude that this binding mode evolved to warrant that a subset of target genes is only regulated in the presence of suitable partner factors.

Subject(s)

DNA/metabolism , PAX6 Transcription Factor/metabolism , SOXB1 Transcription Factors/metabolism , Sequence Analysis, DNA/methods , DNA/chemistry , DNA/genetics , Humans , Models, Molecular , PAX6 Transcription Factor/chemistry , PAX6 Transcription Factor/genetics , Protein Binding , Protein Conformation , Protein Multimerization , SOXB1 Transcription Factors/chemistry , SOXB1 Transcription Factors/genetics

13.

Inherent limitations of probabilistic models for protein-DNA binding specificity.

Ruan, Shuxiang; Stormo, Gary D.

PLoS Comput Biol ; 13(7): e1005638, 2017 Jul.

Article in English | MEDLINE | ID: mdl-28686588

ABSTRACT

The specificities of transcription factors are most commonly represented with probabilistic models. These models provide a probability for each base occurring at each position within the binding site and the positions are assumed to contribute independently. The model is simple and intuitive and is the basis for many motif discovery algorithms. However, the model also has inherent limitations that prevent it from accurately representing true binding probabilities, especially for the highest affinity sites under conditions of high protein concentration. The limitations are not due to the assumption of independence between positions but rather are caused by the non-linear relationship between binding affinity and binding probability and the fact that independent normalization at each position skews the site probabilities. Generally probabilistic models are reasonably good approximations, but new high-throughput methods allow for biophysical models with increased accuracy that should be used whenever possible.

Subject(s)

DNA/chemistry , DNA/metabolism , Models, Statistical , Transcription Factors/chemistry , Transcription Factors/metabolism , Computational Biology , Computer Simulation , Software

14.

Quantitative specificity of STAT1 and several variants.

Roy, Basab; Zuo, Zheng; Stormo, Gary D.

Nucleic Acids Res ; 45(14): 8199-8207, 2017 Aug 21.

Article in English | MEDLINE | ID: mdl-28510715

ABSTRACT

The quantitative specificity of the STAT1 transcription factor was determined by measuring the relative affinity to hundreds of variants of the consensus binding site including variations in the length of the site. The known consensus sequence is observed to have the highest affinity, with all variants decreasing binding affinity considerably. There is very little loss of binding affinity when the CpG within the consensus binding site is methylated. Additionally, the specificity of mutant proteins, with variants of amino acids that interact with the DNA, was determined and nearly all of them are observed to lose specificity across the entire binding site. The change of Asn at position 460 to His, which corresponds to the natural amino acid at the homologous position in STAT6, does not change the specificity nor does it change the length preference to match that of STAT6. These results provide the first quantitative analysis of changes in binding affinity for the STAT1 protein, and several variants of it, to hundreds of different binding sites including different spacer lengths, and the effect of CpG methylation.

Subject(s)

CpG Islands/genetics , DNA/genetics , Genetic Variation , STAT1 Transcription Factor/genetics , Algorithms , Amino Acid Sequence , Base Sequence , Binding Sites/genetics , Binding, Competitive , DNA/metabolism , DNA Methylation , Electrophoresis, Polyacrylamide Gel , Kinetics , Mutation, Missense , Protein Binding , STAT1 Transcription Factor/metabolism , STAT6 Transcription Factor/genetics , STAT6 Transcription Factor/metabolism , Sequence Homology, Amino Acid

15.

BEESEM: estimation of binding energy models using HT-SELEX data.

Ruan, Shuxiang; Swamidass, S Joshua; Stormo, Gary D.

Bioinformatics ; 33(15): 2288-2295, 2017 Aug 01.

Article in English | MEDLINE | ID: mdl-28379348

ABSTRACT

MOTIVATION: Characterizing the binding specificities of transcription factors (TFs) is crucial to the study of gene expression regulation. Recently developed high-throughput experimental methods, including protein binding microarrays (PBM) and high-throughput SELEX (HT-SELEX), have enabled rapid measurements of the specificities for hundreds of TFs. However, few studies have developed efficient algorithms for estimating binding motifs based on HT-SELEX data. Also the simple method of constructing a position weight matrix (PWM) by comparing the frequency of the preferred sequence with single-nucleotide variants has the risk of generating motifs with higher information content than the true binding specificity. RESULTS: We developed an algorithm called BEESEM that builds on a comprehensive biophysical model of protein-DNA interactions, which is trained using the expectation maximization method. BEESEM is capable of selecting the optimal motif length and calculating the confidence intervals of estimated parameters. By comparing BEESEM with the published motifs estimated using the same HT-SELEX data, we demonstrate that BEESEM provides significant improvements. We also evaluate several motif discovery algorithms on independent PBM and ChIP-seq data. BEESEM provides significantly better fits to in vitro data, but its performance is similar to some other methods on in vivo data under the criterion of the area under the receiver operating characteristic curve (AUROC). This highlights the limitations of the purely rank-based AUROC criterion. Using quantitative binding data to assess models, however, demonstrates that BEESEM improves on prior models. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://stormo.wustl.edu/resources.html . CONTACT: stormo@wustl.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Chromatin Immunoprecipitation/methods , DNA/metabolism , Protein Array Analysis/methods , Software , Thermodynamics , Transcription Factors/metabolism , Algorithms , Animals , Binding Sites , DNA/chemistry , Humans , Mice , Position-Specific Scoring Matrices , Protein Binding , Sequence Analysis, DNA/methods , Transcription Factors/chemistry

16.

SMARCAD1 Contributes to the Regulation of Naive Pluripotency by Interacting with Histone Citrullination.

Xiao, Shu; Lu, Jia; Sridhar, Bharat; Cao, Xiaoyi; Yu, Pengfei; Zhao, Tianyi; Chen, Chieh-Chun; McDee, Darina; Sloofman, Laura; Wang, Yang; Rivas-Astroza, Marcelo; Telugu, Bhanu Prakash V L; Levasseur, Dana; Zhang, Kang; Liang, Han; Zhao, Jing Crystal; Tanaka, Tetsuya S; Stormo, Gary; Zhong, Sheng.

Cell Rep ; 18(13): 3117-3128, 2017 03 28.

Article in English | MEDLINE | ID: mdl-28355564

ABSTRACT

Histone citrullination regulates diverse cellular processes. Here, we report that SMARCAD1 preferentially associates with H3 arginine 26 citrullination (H3R26Cit) peptides present on arrays composed of 384 histone peptides harboring distinct post-transcriptional modifications. Among ten histone modifications assayed by ChIP-seq, H3R26Cit exhibited the most extensive genomewide co-localization with SMARCAD1 binding. Increased Smarcad1 expression correlated with naive pluripotency in pre-implantation embryos. In the presence of LIF, Smarcad1 knockdown (KD) embryonic stem cells lost naive state phenotypes but remained pluripotent, as suggested by morphology, gene expression, histone modifications, alkaline phosphatase activity, energy metabolism, embryoid bodies, teratoma, and chimeras. The majority of H3R26Cit ChIP-seq peaks occupied by SMARCAD1 were associated with increased levels of H3K9me3 in Smarcad1 KD cells. Inhibition of H3Cit induced H3K9me3 at the overlapping regions of H3R26Cit peaks and SMARCAD1 peaks. These data suggest a model in which SMARCAD1 regulates naive pluripotency by interacting with H3R26Cit and suppressing heterochromatin formation.

Subject(s)

Citrullination , Histones/metabolism , Nuclear Proteins/metabolism , Pluripotent Stem Cells/metabolism , Animals , Base Sequence , Binding Sites , Cells, Cultured , Chromatin/metabolism , DNA Helicases , Embryo, Mammalian/metabolism , Embryonic Development , Embryonic Stem Cells/metabolism , Epigenesis, Genetic , Female , Gene Knockdown Techniques , Genome , Lysine/metabolism , Male , Methylation , Mice , Phenotype , Protein Binding , Protein Processing, Post-Translational , Transcriptome/genetics

17.

Quantitative profiling of selective Sox/POU pairing on hundreds of sequences in parallel by Coop-seq.

Chang, Yiming K; Srivastava, Yogesh; Hu, Caizhen; Joyce, Adam; Yang, Xiaoxiao; Zuo, Zheng; Havranek, James J; Stormo, Gary D; Jauch, Ralf.

Nucleic Acids Res ; 45(2): 832-845, 2017 01 25.

Article in English | MEDLINE | ID: mdl-27915232

ABSTRACT

Cooperative binding of transcription factors is known to be important in the regulation of gene expression programs conferring cellular identities. However, current methods to measure cooperativity parameters have been laborious and therefore limited to studying only a few sequence variants at a time. We developed Coop-seq (cooperativity by sequencing) that is capable of efficiently and accurately determining the cooperativity parameters for hundreds of different DNA sequences in a single experiment. We apply Coop-seq to 12 dimer pairs from the Sox and POU families of transcription factors using 324 unique sequences with changed half-site orientation, altered spacing and discrete randomization within the binding elements. The study reveals specific dimerization profiles of different Sox factors with Oct4. By contrast, Oct4 and the three neural class III POU factors Brn2, Brn4 and Oct6 assemble with Sox2 in a surprisingly indistinguishable manner. Two novel half-site configurations can support functional Sox/Oct dimerization in addition to known composite motifs. Moreover, Coop-seq uncovers a nucleotide switch within the POU half-site when spacing is altered, which is mirrored in genomic loci bound by Sox2/Oct4 complexes.

Subject(s)

POU Domain Factors/metabolism , SOX Transcription Factors/metabolism , Animals , DNA/chemistry , DNA/metabolism , Mice , Models, Molecular , Octamer Transcription Factor-3/chemistry , Octamer Transcription Factor-3/metabolism , POU Domain Factors/chemistry , Protein Binding , Protein Conformation , Protein Multimerization , SOX Transcription Factors/chemistry , SOXB1 Transcription Factors/chemistry , SOXB1 Transcription Factors/metabolism

18.

DNA Structure Helps Predict Protein Binding.

Stormo, Gary D; Roy, Basab.

Cell Syst ; 3(3): 216-218, 2016 Sep 28.

Article in English | MEDLINE | ID: mdl-27684185

ABSTRACT

Incorporating information about DNA structure can increase the reliability of predictions of transcription factor binding sites.

Subject(s)

DNA/chemistry , Binding Sites , Protein Binding , Reproducibility of Results , Transcription Factors

19.

Combinatorial Cis-regulation in Saccharomyces Species.

Spivak, Aaron T; Stormo, Gary D.

G3 (Bethesda) ; 6(3): 653-67, 2016 Jan 15.

Article in English | MEDLINE | ID: mdl-26772747

ABSTRACT

Transcriptional control of gene expression requires interactions between the cis-regulatory elements (CREs) controlling gene promoters. We developed a sensitive computational method to identify CRE combinations with conserved spacing that does not require genome alignments. When applied to seven sensu stricto and sensu lato Saccharomyces species, 80% of the predicted interactions displayed some evidence of combinatorial transcriptional behavior in several existing datasets including: (1) chromatin immunoprecipitation data for colocalization of transcription factors, (2) gene expression data for coexpression of predicted regulatory targets, and (3) gene ontology databases for common pathway membership of predicted regulatory targets. We tested several predicted CRE interactions with chromatin immunoprecipitation experiments in a wild-type strain and strains in which a predicted cofactor was deleted. Our experiments confirmed that transcription factor (TF) occupancy at the promoters of the CRE combination target genes depends on the predicted cofactor while occupancy of other promoters is independent of the predicted cofactor. Our method has the additional advantage of identifying regulatory differences between species. By analyzing the S. cerevisiae and S. bayanus genomes, we identified differences in combinatorial cis-regulation between the species and showed that the predicted changes in gene regulation explain several of the species-specific differences seen in gene expression datasets. In some instances, the same CRE combinations appear to regulate genes involved in distinct biological processes in the two different species. The results of this research demonstrate that (1) combinatorial cis-regulation can be inferred by multi-genome analysis and (2) combinatorial cis-regulation can explain differences in gene expression between species.

Subject(s)

Gene Expression Regulation, Fungal , Regulatory Sequences, Nucleic Acid , Saccharomyces/genetics , Chromatin Immunoprecipitation , Cluster Analysis , Gene Expression Profiling , Gene Regulatory Networks , High-Throughput Nucleotide Sequencing , Saccharomyces/metabolism , Signal Transduction , Transcription Factors/metabolism , Transcription, Genetic , Transcriptome

20.

DNA Motif Databases and Their Uses.

Stormo, Gary D.

Curr Protoc Bioinformatics ; 51: 2.15.1-2.15.6, 2015 Sep 03.

Article in English | MEDLINE | ID: mdl-26334922

ABSTRACT

Transcription factors (TFs) recognize and bind to specific DNA sequences. The specificity of a TF is usually represented as a position weight matrix (PWM). Several databases of DNA motifs exist and are used in biological research to address important biological questions. This overview describes PWMs and some of the most commonly used motif databases, as well as a few of their common applications.

Subject(s)

DNA/genetics , Data Mining/methods , Databases, Nucleic Acid , Nucleotide Motifs/genetics , Sequence Analysis, DNA/methods , Transcription Factors/genetics , Binding Sites , DNA/chemistry , Databases, Protein , Molecular Sequence Data , Protein Binding , Transcription Factors/chemistry

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL