ABSTRACT
The growth of omic data presents evolving challenges in data manipulation, analysis, and integration. Addressing these challenges, Bioconductor1 provides an extensive community-driven biological data analysis platform. Meanwhile, tidy R programming2 offers a revolutionary standard for data organisation and manipulation. Here, we present the tidyomics software ecosystem, bridging Bioconductor to the tidy R paradigm. This ecosystem aims to streamline omic analysis, ease learning, and encourage cross-disciplinary collaborations. We demonstrate the effectiveness of tidyomics by analysing 7.5 million peripheral blood mononuclear cells from the Human Cell Atlas3, spanning six data frameworks and ten analysis tools.
ABSTRACT
The growth of omic data presents evolving challenges in data manipulation, analysis and integration. Addressing these challenges, Bioconductor provides an extensive community-driven biological data analysis platform. Meanwhile, tidy R programming offers a revolutionary data organization and manipulation standard. Here we present the tidyomics software ecosystem, bridging Bioconductor to the tidy R paradigm. This ecosystem aims to streamline omic analysis, ease learning and encourage cross-disciplinary collaborations. We demonstrate the effectiveness of tidyomics by analyzing 7.5 million peripheral blood mononuclear cells from the Human Cell Atlas, spanning six data frameworks and ten analysis tools.
ABSTRACT
Genome-wide chromatin conformation capture assays provide formidable insights into the spatial organization of genomes. However, due to the complexity of the data structure, their integration in multi-omics workflows remains challenging. We present data structures, computational methods and visualization tools available in Bioconductor to investigate Hi-C, micro-C and other 3C-related data, in R. An online book ( https://bioconductor.org/books/OHCA/ ) further provides prospective end users with a number of workflows to process, import, analyze and visualize any type of chromosome conformation capture data.
Subject(s)
Chromatin , Chromosomes , Prospective Studies , Chromatin/genetics , Chromosomes/genetics , Genome , Molecular ConformationABSTRACT
Periodic occurrences of oligonucleotide sequences can impact the physical properties of DNA. For example, DNA bendability is modulated by 10-bp periodic occurrences of WW (W = A/T) dinucleotides. We present periodicDNA, an R package to identify k-mer periodicity and generate continuous tracks of k-mer periodicity over genomic loci of interest, such as regulatory elements. periodicDNA will facilitate investigation and improve understanding of how periodic DNA sequence features impact function.
Subject(s)
DNA , Genomics , DNA/genetics , Genome , Sequence Analysis, DNAABSTRACT
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
ABSTRACT
RNA profiling has provided increasingly detailed knowledge of gene expression patterns, yet the different regulatory architectures that drive them are not well understood. To address this, we profiled and compared transcriptional and regulatory element activities across five tissues of Caenorhabditis elegans, covering â¼90% of cells. We find that the majority of promoters and enhancers have tissue-specific accessibility, and we discover regulatory grammars associated with ubiquitous, germline, and somatic tissue-specific gene expression patterns. In addition, we find that germline-active and soma-specific promoters have distinct features. Germline-active promoters have well-positioned +1 and -1 nucleosomes associated with a periodic 10-bp WW signal (W = A/T). Somatic tissue-specific promoters lack positioned nucleosomes and this signal, have wide nucleosome-depleted regions, and are more enriched for core promoter elements, which largely differ between tissues. We observe the 10-bp periodic WW signal at ubiquitous promoters in other animals, suggesting it is an ancient conserved signal. Our results show fundamental differences in regulatory architectures of germline and somatic tissue-specific genes, uncover regulatory rules for generating diverse gene expression patterns, and provide a tissue-specific resource for future studies.
Subject(s)
Caenorhabditis elegans Proteins/genetics , Caenorhabditis elegans/genetics , Gene Expression Profiling/veterinary , Germ Cells/chemistry , Animals , Gene Expression Regulation , Humans , Mice , Organ Specificity , Promoter Regions, Genetic , Sequence Analysis, RNA , Tissue Distribution , Transcription Initiation SiteABSTRACT
Nuclear compartments have diverse roles in regulating gene expression, yet the molecular forces and components that drive compartment formation remain largely unclear1. The long non-coding RNA Xist establishes an intra-chromosomal compartment by localizing at a high concentration in a territory spatially close to its transcription locus2 and binding diverse proteins3-5 to achieve X-chromosome inactivation (XCI)6,7. The XCI process therefore serves as a paradigm for understanding how RNA-mediated recruitment of various proteins induces a functional compartment. The properties of the inactive X (Xi)-compartment are known to change over time, because after initial Xist spreading and transcriptional shutoff a state is reached in which gene silencing remains stable even if Xist is turned off8. Here we show that the Xist RNA-binding proteins PTBP19, MATR310, TDP-4311 and CELF112 assemble on the multivalent E-repeat element of Xist7 and, via self-aggregation and heterotypic protein-protein interactions, form a condensate1 in the Xi. This condensate is required for gene silencing and for the anchoring of Xist to the Xi territory, and can be sustained in the absence of Xist. Notably, these E-repeat-binding proteins become essential coincident with transition to the Xist-independent XCI phase8, indicating that the condensate seeded by the E-repeat underlies the developmental switch from Xist-dependence to Xist-independence. Taken together, our data show that Xist forms the Xi compartment by seeding a heteromeric condensate that consists of ubiquitous RNA-binding proteins, revealing an unanticipated mechanism for heritable gene silencing.
Subject(s)
Gene Silencing , RNA, Long Noncoding/genetics , RNA-Binding Proteins/metabolism , Animals , CELF1 Protein/metabolism , Cell Line , DNA-Binding Proteins/metabolism , Female , Heterogeneous-Nuclear Ribonucleoproteins/metabolism , Humans , In Situ Hybridization, Fluorescence , Male , Mice , Nuclear Matrix-Associated Proteins/metabolism , Polypyrimidine Tract-Binding Protein/metabolism , X Chromosome Inactivation/geneticsABSTRACT
Cancer is characterized by genomic instability leading to deletion or amplification of oncogenes or tumor suppressors. However, most of the altered regions are devoid of known cancer drivers. Here, we identify lncRNAs frequently lost or amplified in cancer. Among them, we found amplified lncRNA associated with lung cancer-1 (ALAL-1) as frequently amplified in lung adenocarcinomas. ALAL-1 is also overexpressed in additional tumor types, such as lung squamous carcinoma. The RNA product of ALAL-1 is able to promote the proliferation and tumorigenicity of lung cancer cells. ALAL-1 is a TNFα- and NF-κB-induced cytoplasmic lncRNA that specifically interacts with SART3, regulating the subcellular localization of the protein deubiquitinase USP4 and, in turn, its function in the cell. Interestingly, ALAL-1 expression inversely correlates with the immune infiltration of lung squamous tumors, while tumors with ALAL-1 amplification show lower infiltration of several types of immune cells. We have thus unveiled a pro-oncogenic lncRNA that mediates cancer immune evasion, pointing to a new target for immune potentiation.
Subject(s)
DNA Copy Number Variations/genetics , Immune Evasion/genetics , Lung Neoplasms/genetics , RNA, Long Noncoding/genetics , A549 Cells , Adenocarcinoma of Lung/genetics , Antigens, Neoplasm/genetics , Carcinoma, Squamous Cell/genetics , Cell Line, Tumor , Cell Proliferation/genetics , Gene Expression Regulation, Neoplastic/genetics , Humans , NF-kappa B/genetics , Oncogenes/genetics , Ubiquitin-Specific Proteases/geneticsABSTRACT
An essential step for understanding the transcriptional circuits that control development and physiology is the global identification and characterization of regulatory elements. Here, we present the first map of regulatory elements across the development and ageing of an animal, identifying 42,245 elements accessible in at least one Caenorhabditis elegans stage. Based on nuclear transcription profiles, we define 15,714 protein-coding promoters and 19,231 putative enhancers, and find that both types of element can drive orientation-independent transcription. Additionally, more than 1000 promoters produce transcripts antisense to protein coding genes, suggesting involvement in a widespread regulatory mechanism. We find that the accessibility of most elements changes during development and/or ageing and that patterns of accessibility change are linked to specific developmental or physiological processes. The map and characterization of regulatory elements across C. elegans life provides a platform for understanding how transcription controls development and ageing.
Subject(s)
Aging/metabolism , Caenorhabditis elegans/growth & development , Caenorhabditis elegans/metabolism , Chromatin/metabolism , Animals , Caenorhabditis elegans/genetics , DNA/genetics , Enhancer Elements, Genetic , Gene Expression Regulation, Developmental , Histone Code , Histones/metabolism , Molecular Sequence Annotation , Promoter Regions, Genetic , Reproducibility of Results , Transcription Factors/metabolism , Transcription Initiation SiteABSTRACT
Since the discovery of chromosome territories, it has been clear that DNA within the nucleus is spatially organized. During the last decade, a tremendous body of work has described architectural features of chromatin at different spatial scales, such as A/B compartments, topologically associating domains (TADs), and chromatin loops. These features correlate with domains of chromatin marking and gene expression, supporting their relevance for gene regulation. Recent work has highlighted the dynamic nature of spatial folding and investigated mechanisms of their formation. Here we discuss current understanding and highlight key open questions in chromosome organization in animals.