Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 149
Filter
1.
bioRxiv ; 2024 May 31.
Article in English | MEDLINE | ID: mdl-38853896

ABSTRACT

Despite extensive characterization of mammalian Pol II transcription, the DNA sequence determinants of transcription initiation at a third of human promoters and most enhancers remain poorly understood. Hence, we trained and interpreted a neural network called ProCapNet that accurately models base-resolution initiation profiles from PRO-cap experiments using local DNA sequence. ProCapNet learns sequence motifs with distinct effects on initiation rates and TSS positioning and uncovers context-specific cryptic initiator elements intertwined within other TF motifs. ProCapNet annotates predictive motifs in nearly all actively transcribed regulatory elements across multiple cell-lines, revealing a shared cis-regulatory logic across promoters and enhancers mediated by a highly epistatic sequence syntax of cooperative and competitive motif interactions. ProCapNet models of RAMPAGE profiles measuring steady-state RNA abundance at TSSs distill initiation signals on par with models trained directly on PRO-cap profiles. ProCapNet learns a largely cell-type-agnostic cis-regulatory code of initiation complementing sequence drivers of cell-type-specific chromatin state critical for accurate prediction of cell-type-specific transcription initiation.

2.
bioRxiv ; 2024 May 29.
Article in English | MEDLINE | ID: mdl-38853998

ABSTRACT

Deep learning approaches have made significant advances in predicting cell type-specific chromatin patterns from the identity and arrangement of transcription factor (TF) binding motifs. However, most models have been applied in unperturbed contexts, precluding a predictive understanding of how chromatin state responds to TF perturbation. Here, we used transfer learning to train and interpret deep learning models that use DNA sequence to predict, with accuracy approaching experimental reproducibility, how the concentration of two dosage-sensitive TFs (TWIST1, SOX9) affects regulatory element (RE) chromatin accessibility in facial progenitor cells. High-affinity motifs that allow for heterotypic TF co-binding and are concentrated at the center of REs buffer against quantitative changes in TF dosage and strongly predict unperturbed accessibility. In contrast, motifs with low-affinity or homotypic binding distributed throughout REs lead to sensitive responses with minimal contributions to unperturbed accessibility. Both buffering and sensitizing features show signatures of purifying selection. We validated these predictive sequence features using reporter assays and showed that a biophysical model of TF-nucleosome competition can explain the sensitizing effect of low-affinity motifs. Our approach of combining transfer learning and quantitative measurements of the chromatin response to TF dosage therefore represents a powerful method to reveal additional layers of the cis-regulatory code.

3.
Sci Adv ; 10(22): eadk3121, 2024 May 31.
Article in English | MEDLINE | ID: mdl-38809988

ABSTRACT

Regular, long-term aspirin use may act synergistically with genetic variants, particularly those in mechanistically relevant pathways, to confer a protective effect on colorectal cancer (CRC) risk. We leveraged pooled data from 52 clinical trial, cohort, and case-control studies that included 30,806 CRC cases and 41,861 controls of European ancestry to conduct a genome-wide interaction scan between regular aspirin/nonsteroidal anti-inflammatory drug (NSAID) use and imputed genetic variants. After adjusting for multiple comparisons, we identified statistically significant interactions between regular aspirin/NSAID use and variants in 6q24.1 (top hit rs72833769), which has evidence of influencing expression of TBC1D7 (a subunit of the TSC1-TSC2 complex, a key regulator of MTOR activity), and variants in 5p13.1 (top hit rs350047), which is associated with expression of PTGER4 (codes a cell surface receptor directly involved in the mode of action of aspirin). Genetic variants with functional impact may modulate the chemopreventive effect of regular aspirin use, and our study identifies putative previously unidentified targets for additional mechanistic interrogation.


Subject(s)
Anti-Inflammatory Agents, Non-Steroidal , Colorectal Neoplasms , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Humans , Colorectal Neoplasms/genetics , Colorectal Neoplasms/drug therapy , Anti-Inflammatory Agents, Non-Steroidal/pharmacology , Aspirin/pharmacology , Receptors, Prostaglandin E, EP4 Subtype/genetics , Receptors, Prostaglandin E, EP4 Subtype/metabolism , Male , Genetic Predisposition to Disease , Female , Case-Control Studies , Middle Aged , Genetic Loci , Aged
4.
EBioMedicine ; 104: 105146, 2024 May 14.
Article in English | MEDLINE | ID: mdl-38749303

ABSTRACT

BACKGROUND: Consumption of fibre, fruits and vegetables have been linked with lower colorectal cancer (CRC) risk. A genome-wide gene-environment (G × E) analysis was performed to test whether genetic variants modify these associations. METHODS: A pooled sample of 45 studies including up to 69,734 participants (cases: 29,896; controls: 39,838) of European ancestry were included. To identify G × E interactions, we used the traditional 1--degree-of-freedom (DF) G × E test and to improve power a 2-step procedure and a 3DF joint test that investigates the association between a genetic variant and dietary exposure, CRC risk and G × E interaction simultaneously. FINDINGS: The 3-DF joint test revealed two significant loci with p-value <5 × 10-8. Rs4730274 close to the SLC26A3 gene showed an association with fibre (p-value: 2.4 × 10-3) and G × fibre interaction with CRC (OR per quartile of fibre increase = 0.87, 0.80, and 0.75 for CC, TC, and TT genotype, respectively; G × E p-value: 1.8 × 10-7). Rs1620977 in the NEGR1 gene showed an association with fruit intake (p-value: 1.0 × 10-8) and G × fruit interaction with CRC (OR per quartile of fruit increase = 0.75, 0.65, and 0.56 for AA, AG, and GG genotype, respectively; G × E -p-value: 0.029). INTERPRETATION: We identified 2 loci associated with fibre and fruit intake that also modify the association of these dietary factors with CRC risk. Potential mechanisms include chronic inflammatory intestinal disorders, and gut function. However, further studies are needed for mechanistic validation and replication of findings. FUNDING: National Institutes of Health, National Cancer Institute. Full funding details for the individual consortia are provided in acknowledgments.

5.
Sci Adv ; 10(21): eadj4452, 2024 May 24.
Article in English | MEDLINE | ID: mdl-38781344

ABSTRACT

Most genetic variants associated with psychiatric disorders are located in noncoding regions of the genome. To investigate their functional implications, we integrate epigenetic data from the PsychENCODE Consortium and other published sources to construct a comprehensive atlas of candidate brain cis-regulatory elements. Using deep learning, we model these elements' sequence syntax and predict how binding sites for lineage-specific transcription factors contribute to cell type-specific gene regulation in various types of glia and neurons. The elements' evolutionary history suggests that new regulatory information in the brain emerges primarily via smaller sequence mutations within conserved mammalian elements rather than entirely new human- or primate-specific sequences. However, primate-specific candidate elements, particularly those active during fetal brain development and in excitatory neurons and astrocytes, are implicated in the heritability of brain-related human traits. Additionally, we introduce PsychSCREEN, a web-based platform offering interactive visualization of PsychENCODE-generated genetic and epigenetic data from diverse brain cell types in individuals with psychiatric disorders and healthy controls.


Subject(s)
Brain , Epigenesis, Genetic , Regulatory Sequences, Nucleic Acid , Humans , Brain/metabolism , Regulatory Sequences, Nucleic Acid/genetics , Animals , Evolution, Molecular , Mental Disorders/genetics , Regulatory Elements, Transcriptional/genetics , Neurons/metabolism , Gene Expression Regulation , Transcription Factors/genetics , Transcription Factors/metabolism
6.
bioRxiv ; 2024 Apr 14.
Article in English | MEDLINE | ID: mdl-38645064

ABSTRACT

Over the past 15 years, a variety of next-generation sequencing assays have been developed for measuring the 3D conformation of DNA in the nucleus. Each of these assays gives, for a particular cell or tissue type, a distinct picture of 3D chromatin architecture. Accordingly, making sense of the relationship between genome structure and function requires teasing apart two closely related questions: how does chromatin 3D structure change from one cell type to the next, and how do different measurements of that structure differ from one another, even when the two assays are carried out in the same cell type? In this work, we assemble a collection of chromatin 3D datasets-each represented as a 2D contact map- spanning multiple assay types and cell types. We then build a machine learning model that predicts missing contact maps in this collection. We use the model to systematically explore how genome 3D architecture changes, at the level of compartments, domains, and loops, between cell type and between assay types.

7.
Br J Cancer ; 130(10): 1687-1696, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38561434

ABSTRACT

BACKGROUND: Menopausal hormone therapy (MHT), a common treatment to relieve symptoms of menopause, is associated with a lower risk of colorectal cancer (CRC). To inform CRC risk prediction and MHT risk-benefit assessment, we aimed to evaluate the joint association of a polygenic risk score (PRS) for CRC and MHT on CRC risk. METHODS: We used data from 28,486 postmenopausal women (11,519 cases and 16,967 controls) of European descent. A PRS based on 141 CRC-associated genetic variants was modeled as a categorical variable in quartiles. Multiplicative interaction between PRS and MHT use was evaluated using logistic regression. Additive interaction was measured using the relative excess risk due to interaction (RERI). 30-year cumulative risks of CRC for 50-year-old women according to MHT use and PRS were calculated. RESULTS: The reduction in odds ratios by MHT use was larger in women within the highest quartile of PRS compared to that in women within the lowest quartile of PRS (p-value = 2.7 × 10-8). At the highest quartile of PRS, the 30-year CRC risk was statistically significantly lower for women taking any MHT than for women not taking any MHT, 3.7% (3.3%-4.0%) vs 6.1% (5.7%-6.5%) (difference 2.4%, P-value = 1.83 × 10-14); these differences were also statistically significant but smaller in magnitude in the lowest PRS quartile, 1.6% (1.4%-1.8%) vs 2.2% (1.9%-2.4%) (difference 0.6%, P-value = 1.01 × 10-3), indicating 4 times greater reduction in absolute risk associated with any MHT use in the highest compared to the lowest quartile of genetic CRC risk. CONCLUSIONS: MHT use has a greater impact on the reduction of CRC risk for women at higher genetic risk. These findings have implications for the development of risk prediction models for CRC and potentially for the consideration of genetic information in the risk-benefit assessment of MHT use.


Subject(s)
Colorectal Neoplasms , Genetic Predisposition to Disease , Humans , Female , Colorectal Neoplasms/genetics , Colorectal Neoplasms/epidemiology , Middle Aged , Case-Control Studies , Risk Factors , Aged , Hormone Replacement Therapy/adverse effects , Risk Assessment , Menopause , Postmenopause , Estrogen Replacement Therapy/adverse effects
8.
STAR Protoc ; 5(2): 102941, 2024 Mar 12.
Article in English | MEDLINE | ID: mdl-38483898

ABSTRACT

Dinoflagellate genomes often are very large and difficult to assemble, which has until recently precluded their analysis with modern functional genomic tools. Here, we present a protocol for mapping three-dimensional (3D) genome organization in dinoflagellates and using it for scaffolding their genome assemblies. We describe steps for crosslinking, nuclear lysis, denaturation, restriction digest, ligation, and DNA shearing and purification. We then detail procedures sequencing library generation and computational analysis, including initial Hi-C read mapping and 3D-DNA scaffolding/assembly correction. For complete details on the use and execution of this protocol, please refer to Marinov et al.1.

9.
Nat Methods ; 21(4): 723-734, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38504114

ABSTRACT

The ENCODE Consortium's efforts to annotate noncoding cis-regulatory elements (CREs) have advanced our understanding of gene regulatory landscapes. Pooled, noncoding CRISPR screens offer a systematic approach to investigate cis-regulatory mechanisms. The ENCODE4 Functional Characterization Centers conducted 108 screens in human cell lines, comprising >540,000 perturbations across 24.85 megabases of the genome. Using 332 functionally confirmed CRE-gene links in K562 cells, we established guidelines for screening endogenous noncoding elements with CRISPR interference (CRISPRi), including accurate detection of CREs that exhibit variable, often low, transcriptional effects. Benchmarking five screen analysis tools, we find that CASA produces the most conservative CRE calls and is robust to artifacts of low-specificity single guide RNAs. We uncover a subtle DNA strand bias for CRISPRi in transcribed regions with implications for screen design and analysis. Together, we provide an accessible data resource, predesigned single guide RNAs for targeting 3,275,697 ENCODE SCREEN candidate CREs with CRISPRi and screening guidelines to accelerate functional characterization of the noncoding genome.


Subject(s)
CRISPR-Cas Systems , Clustered Regularly Interspaced Short Palindromic Repeats , Humans , CRISPR-Cas Systems/genetics , Clustered Regularly Interspaced Short Palindromic Repeats/genetics , RNA, Guide, CRISPR-Cas Systems , Genome , K562 Cells
10.
bioRxiv ; 2024 Apr 03.
Article in English | MEDLINE | ID: mdl-35547855

ABSTRACT

Clinical diagnosis typically incorporates physical examination, patient history, and various laboratory tests and imaging studies, but makes limited use of the human system's own record of antigen exposures encoded by receptors on B cells and T cells. We analyzed immune receptor datasets from 593 individuals to develop MAchine Learning for Immunological Diagnosis (Mal-ID) , an interpretive framework to screen for multiple illnesses simultaneously or precisely test for one condition. This approach detects specific infections, autoimmune disorders, vaccine responses, and disease severity differences. Human-interpretable features of the model recapitulate known immune responses to SARS-CoV-2, Influenza, and HIV, highlight antigen-specific receptors, and reveal distinct characteristics of Systemic Lupus Erythematosus and Type-1 Diabetes autoreactivity. This analysis framework has broad potential for scientific and clinical interpretation of human immune responses.

11.
bioRxiv ; 2024 Jan 13.
Article in English | MEDLINE | ID: mdl-37873443

ABSTRACT

The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has led to significant global morbidity and mortality. A crucial viral protein, the non-structural protein 14 (nsp14), catalyzes the methylation of viral RNA and plays a critical role in viral genome replication and transcription. Due to the low mutation rate in the nsp region among various SARS-CoV-2 variants, nsp14 has emerged as a promising therapeutic target. However, discovering potential inhibitors remains a challenge. In this work, we introduce a computational pipeline for the rapid and efficient identification of potential nsp14 inhibitors by leveraging virtual screening and the NCI open compound collection, which contains 250,000 freely available molecules for researchers worldwide. The introduced pipeline provides a cost-effective and efficient approach for early-stage drug discovery by allowing researchers to evaluate promising molecules without incurring synthesis expenses. Our pipeline successfully identified seven promising candidates after experimentally validating only 40 compounds. Notably, we discovered NSC620333, a compound that exhibits a strong binding affinity to nsp14 with a dissociation constant of 427 ± 84 nM. In addition, we gained new insights into the structure and function of this protein through molecular dynamics simulations. We identified new conformational states of the protein and determined that residues Phe367, Tyr368, and Gln354 within the binding pocket serve as stabilizing residues for novel ligand interactions. We also found that metal coordination complexes are crucial for the overall function of the binding pocket. Lastly, we present the solved crystal structure of the nsp14-MTase complexed with SS148 (PDB:8BWU), a potent inhibitor of methyltransferase activity at the nanomolar level (IC50 value of 70 ± 6 nM). Our computational pipeline accurately predicted the binding pose of SS148, demonstrating its effectiveness and potential in accelerating drug discovery efforts against SARS-CoV-2 and other emerging viruses.

12.
Cancer Epidemiol Biomarkers Prev ; 33(3): 400-410, 2024 03 01.
Article in English | MEDLINE | ID: mdl-38112776

ABSTRACT

BACKGROUND: High red meat and/or processed meat consumption are established colorectal cancer risk factors. We conducted a genome-wide gene-environment (GxE) interaction analysis to identify genetic variants that may modify these associations. METHODS: A pooled sample of 29,842 colorectal cancer cases and 39,635 controls of European ancestry from 27 studies were included. Quantiles for red meat and processed meat intake were constructed from harmonized questionnaire data. Genotyping arrays were imputed to the Haplotype Reference Consortium. Two-step EDGE and joint tests of GxE interaction were utilized in our genome-wide scan. RESULTS: Meta-analyses confirmed positive associations between increased consumption of red meat and processed meat with colorectal cancer risk [per quartile red meat OR = 1.30; 95% confidence interval (CI) = 1.21-1.41; processed meat OR = 1.40; 95% CI = 1.20-1.63]. Two significant genome-wide GxE interactions for red meat consumption were found. Joint GxE tests revealed the rs4871179 SNP in chromosome 8 (downstream of HAS2); greater than median of consumption ORs = 1.38 (95% CI = 1.29-1.46), 1.20 (95% CI = 1.12-1.27), and 1.07 (95% CI = 0.95-1.19) for CC, CG, and GG, respectively. The two-step EDGE method identified the rs35352860 SNP in chromosome 18 (SMAD7 intron); greater than median of consumption ORs = 1.18 (95% CI = 1.11-1.24), 1.35 (95% CI = 1.26-1.44), and 1.46 (95% CI = 1.26-1.69) for CC, CT, and TT, respectively. CONCLUSIONS: We propose two novel biomarkers that support the role of meat consumption with an increased risk of colorectal cancer. IMPACT: The reported GxE interactions may explain the increased risk of colorectal cancer in certain population subgroups.


Subject(s)
Colorectal Neoplasms , Red Meat , Humans , Gene-Environment Interaction , Red Meat/adverse effects , Meat/adverse effects , Risk Factors , Colorectal Neoplasms/genetics
13.
bioRxiv ; 2023 Nov 02.
Article in English | MEDLINE | ID: mdl-37961278

ABSTRACT

Histone proteins have traditionally been thought to be restricted to eukaryotes and most archaea, with eukaryotic nucleosomal histones deriving from their archaeal ancestors. In contrast, bacteria lack histones as a rule. However, histone proteins have recently been identified in a few bacterial clades, most notably the phylum Bdellovibrionota, and these histones have been proposed to exhibit a range of divergent features compared to histones in archaea and eukaryotes. However, no functional genomic studies of the properties of Bdellovibrionota chromatin have been carried out. In this work, we map the landscape of chromatin accessibility, active transcription and three-dimensional genome organization in a member of Bdellovibrionota (a Bacteriovorax strain). We find that, similar to what is observed in some archaea and in eukaryotes with compact genomes such as yeast, Bacteriovorax chromatin is characterized by preferential accessibility around promoter regions. Similar to eukaryotes, chromatin accessibility in Bacteriovorax positively correlates with gene expression. Mapping active transcription through single-strand DNA (ssDNA) profiling revealed that unlike in yeast, but similar to the state of mammalian and fly promoters, Bacteriovorax promoters exhibit very strong polymerase pausing. Finally, similar to that of other bacteria without histones, the Bacteriovorax genome exists in a three-dimensional (3D) configuration organized by the parABS system along the axis defined by replication origin and termination regions. These results provide a foundation for understanding the chromatin biology of the unique Bdellovibrionota bacteria and the functional diversity in chromatin organization across the tree of life.

14.
Nature ; 623(7987): 608-615, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37938768

ABSTRACT

Cell therapies have yielded durable clinical benefits for patients with cancer, but the risks associated with the development of therapies from manipulated human cells are understudied. For example, we lack a comprehensive understanding of the mechanisms of toxicities observed in patients receiving T cell therapies, including recent reports of encephalitis caused by reactivation of human herpesvirus 6 (HHV-6)1. Here, through petabase-scale viral genomics mining, we examine the landscape of human latent viral reactivation and demonstrate that HHV-6B can become reactivated in cultures of human CD4+ T cells. Using single-cell sequencing, we identify a rare population of HHV-6 'super-expressors' (about 1 in 300-10,000 cells) that possess high viral transcriptional activity, among research-grade allogeneic chimeric antigen receptor (CAR) T cells. By analysing single-cell sequencing data from patients receiving cell therapy products that are approved by the US Food and Drug Administration2 or are in clinical studies3-5, we identify the presence of HHV-6-super-expressor CAR T cells in patients in vivo. Together, the findings of our study demonstrate the utility of comprehensive genomics analyses in implicating cell therapy products as a potential source contributing to the lytic HHV-6 infection that has been reported in clinical trials1,6-8 and may influence the design and production of autologous and allogeneic cell therapies.


Subject(s)
CD4-Positive T-Lymphocytes , Herpesvirus 6, Human , Immunotherapy, Adoptive , Receptors, Chimeric Antigen , Virus Activation , Virus Latency , Humans , CD4-Positive T-Lymphocytes/immunology , CD4-Positive T-Lymphocytes/virology , Clinical Trials as Topic , Gene Expression Regulation, Viral , Genomics , Herpesvirus 6, Human/genetics , Herpesvirus 6, Human/isolation & purification , Herpesvirus 6, Human/physiology , Immunotherapy, Adoptive/adverse effects , Immunotherapy, Adoptive/methods , Infectious Encephalitis/complications , Infectious Encephalitis/virology , Receptors, Chimeric Antigen/immunology , Roseolovirus Infections/complications , Roseolovirus Infections/virology , Single-Cell Gene Expression Analysis , Viral Load
15.
Cell ; 186(24): 5254-5268.e26, 2023 11 22.
Article in English | MEDLINE | ID: mdl-37944513

ABSTRACT

A fundamental feature of cellular growth is that total protein and RNA amounts increase with cell size to keep concentrations approximately constant. A key component of this is that global transcription rates increase in larger cells. Here, we identify RNA polymerase II (RNAPII) as the limiting factor scaling mRNA transcription with cell size in budding yeast, as transcription is highly sensitive to the dosage of RNAPII but not to other components of the transcriptional machinery. Our experiments support a dynamic equilibrium model where global RNAPII transcription at a given size is set by the mass action recruitment kinetics of unengaged nucleoplasmic RNAPII to the genome. However, this only drives a sub-linear increase in transcription with size, which is then partially compensated for by a decrease in mRNA decay rates as cells enlarge. Thus, limiting RNAPII and feedback on mRNA stability work in concert to scale mRNA amounts with cell size.


Subject(s)
Cell Size , RNA Polymerase II , Transcription, Genetic , Feedback , RNA Polymerase II/metabolism , RNA Stability , RNA, Messenger/genetics , RNA, Messenger/metabolism
16.
bioRxiv ; 2023 Nov 13.
Article in English | MEDLINE | ID: mdl-38014075

ABSTRACT

Identifying transcriptional enhancers and their target genes is essential for understanding gene regulation and the impact of human genetic variation on disease1-6. Here we create and evaluate a resource of >13 million enhancer-gene regulatory interactions across 352 cell types and tissues, by integrating predictive models, measurements of chromatin state and 3D contacts, and largescale genetic perturbations generated by the ENCODE Consortium7. We first create a systematic benchmarking pipeline to compare predictive models, assembling a dataset of 10,411 elementgene pairs measured in CRISPR perturbation experiments, >30,000 fine-mapped eQTLs, and 569 fine-mapped GWAS variants linked to a likely causal gene. Using this framework, we develop a new predictive model, ENCODE-rE2G, that achieves state-of-the-art performance across multiple prediction tasks, demonstrating a strategy involving iterative perturbations and supervised machine learning to build increasingly accurate predictive models of enhancer regulation. Using the ENCODE-rE2G model, we build an encyclopedia of enhancer-gene regulatory interactions in the human genome, which reveals global properties of enhancer networks, identifies differences in the functions of genes that have more or less complex regulatory landscapes, and improves analyses to link noncoding variants to target genes and cell types for common, complex diseases. By interpreting the model, we find evidence that, beyond enhancer activity and 3D enhancer-promoter contacts, additional features guide enhancerpromoter communication including promoter class and enhancer-enhancer synergy. Altogether, these genome-wide maps of enhancer-gene regulatory interactions, benchmarking software, predictive models, and insights about enhancer function provide a valuable resource for future studies of gene regulation and human genetics.

17.
Genome Biol ; 24(1): 253, 2023 11 06.
Article in English | MEDLINE | ID: mdl-37932847

ABSTRACT

BACKGROUND: Archaea, together with Bacteria, represent the two main divisions of life on Earth, with many of the defining characteristics of the more complex eukaryotes tracing their origin to evolutionary innovations first made in their archaeal ancestors. One of the most notable such features is nucleosomal chromatin, although archaeal histones and chromatin differ significantly from those of eukaryotes, not all archaea possess histones and it is not clear if histones are a main packaging component for all that do. Despite increased interest in archaeal chromatin in recent years, its properties have been little studied using genomic tools. RESULTS: Here, we adapt the ATAC-seq assay to archaea and use it to map the accessible landscape of the genome of the euryarchaeote Haloferax volcanii. We integrate the resulting datasets with genome-wide maps of active transcription and single-stranded DNA (ssDNA) and find that while H. volcanii promoters exist in a preferentially accessible state, unlike most eukaryotes, modulation of transcriptional activity is not associated with changes in promoter accessibility. Applying orthogonal single-molecule footprinting methods, we quantify the absolute levels of physical protection of H. volcanii and find that Haloferax chromatin is similarly or only slightly more accessible, in aggregate, than that of eukaryotes. We also evaluate the degree of coordination of transcription within archaeal operons and make the unexpected observation that some CRISPR arrays are associated with highly prevalent ssDNA structures. CONCLUSIONS: Our results provide the first comprehensive maps of chromatin accessibility and active transcription in Haloferax across conditions and thus a foundation for future functional studies of archaeal chromatin.


Subject(s)
Archaeal Proteins , Haloferax volcanii , Chromatin , Histones/genetics , Haloferax volcanii/genetics , Haloferax volcanii/metabolism , Nucleosomes , Biological Evolution , Eukaryota/genetics , Archaeal Proteins/genetics
18.
bioRxiv ; 2023 Nov 06.
Article in English | MEDLINE | ID: mdl-37986808

ABSTRACT

Mapping the functional human genome and impact of genetic variants is often limited to European-descendent population samples. To aid in overcoming this limitation, we measured gene expression using RNA sequencing in lymphoblastoid cell lines (LCLs) from 599 individuals from six African populations to identify novel transcripts including those not represented in the hg38 reference genome. We used whole genomes from the 1000 Genomes Project and 164 Maasai individuals to identify 8,881 expression and 6,949 splicing quantitative trait loci (eQTLs/sQTLs), and 2,611 structural variants associated with gene expression (SV-eQTLs). We further profiled chromatin accessibility using ATAC-Seq in a subset of 100 representative individuals, to identity chromatin accessibility quantitative trait loci (caQTLs) and allele-specific chromatin accessibility, and provide predictions for the functional effect of 78.9 million variants on chromatin accessibility. Using this map of eQTLs and caQTLs we fine-mapped GWAS signals for a range of complex diseases. Combined, this work expands global functional genomic data to identify novel transcripts, functional elements and variants, understand population genetic history of molecular quantitative trait loci, and further resolve the genetic basis of multiple human traits and disease.

19.
bioRxiv ; 2023 Oct 21.
Article in English | MEDLINE | ID: mdl-37873116

ABSTRACT

Ectopic expression of OCT4, SOX2, KLF4 and MYC (OSKM) transforms differentiated cells into induced pluripotent stem cells. To refine our mechanistic understanding of reprogramming, especially during the early stages, we profiled chromatin accessibility and gene expression at single-cell resolution across a densely sampled time course of human fibroblast reprogramming. Using neural networks that map DNA sequence to ATAC-seq profiles at base-resolution, we annotated cell-state-specific predictive transcription factor (TF) motif syntax in regulatory elements, inferred affinity- and concentration-dependent dynamics of Tn5-bias corrected TF footprints, linked peaks to putative target genes, and elucidated rewiring of TF-to-gene cis-regulatory networks. Our models reveal that early in reprogramming, OSK, at supraphysiological concentrations, rapidly open transient regulatory elements by occupying non-canonical low-affinity binding sites. As OSK concentration falls, the accessibility of these transient elements decays as a function of motif affinity. We find that these OSK-dependent transient elements sequester the somatic TF AP-1. This redistribution is strongly associated with the silencing of fibroblast-specific genes within individual nuclei. Together, our integrated single-cell resource and models reveal insights into the cis-regulatory code of reprogramming at unprecedented resolution, connect TF stoichiometry and motif syntax to diversification of cell fate trajectories, and provide new perspectives on the dynamics and role of transient regulatory elements in somatic silencing.

20.
Science ; 381(6664): eadd1250, 2023 09 22.
Article in English | MEDLINE | ID: mdl-37733848

ABSTRACT

Short tandem repeats (STRs) are enriched in eukaryotic cis-regulatory elements and alter gene expression, yet how they regulate transcription remains unknown. We found that STRs modulate transcription factor (TF)-DNA affinities and apparent on-rates by about 70-fold by directly binding TF DNA-binding domains, with energetic impacts exceeding many consensus motif mutations. STRs maximize the number of weakly preferred microstates near target sites, thereby increasing TF density, with impacts well predicted by statistical mechanics. Confirming that STRs also affect TF binding in cells, neural networks trained only on in vivo occupancies predicted effects identical to those observed in vitro. Approximately 90% of TFs preferentially bound STRs that need not resemble known motifs, providing a cis-regulatory mechanism to target TFs to genomic sites.


Subject(s)
Gene Expression Regulation , Microsatellite Repeats , Transcription Factors , Eukaryotic Cells , Transcription Factors/chemistry , Transcription Factors/genetics , Protein Binding , Humans , Animals , Saccharomyces cerevisiae , Protein Domains , Protein Conformation
SELECTION OF CITATIONS
SEARCH DETAIL
...