Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 67
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2024 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-38915583

RESUMO

Postnatal genomic regulation significantly influences tissue and organ maturation but is under-studied relative to existing genomic catalogs of adult tissues or prenatal development in mouse. The ENCODE4 consortium generated the first comprehensive single-nucleus resource of postnatal regulatory events across a diverse set of mouse tissues. The collection spans seven postnatal time points, mirroring human development from childhood to adulthood, and encompasses five core tissues. We identified 30 cell types, further subdivided into 69 subtypes and cell states across adrenal gland, left cerebral cortex, hippocampus, heart, and gastrocnemius muscle. Our annotations cover both known and novel cell differentiation dynamics ranging from early hippocampal neurogenesis to a new sex-specific adrenal gland population during puberty. We used an ensemble Latent Dirichlet Allocation strategy with a curated vocabulary of 2,701 regulatory genes to identify regulatory "topics," each of which is a gene vector, linked to cell type differentiation, subtype specialization, and transitions between cell states. We find recurrent regulatory topics in tissue-resident macrophages, neural cell types, endothelial cells across multiple tissues, and cycling cells of the adrenal gland and heart. Cell-type-specific topics are enriched in transcription factors and microRNA host genes, while chromatin regulators dominate mitosis topics. Corresponding chromatin accessibility data reveal dynamic and sex-specific regulatory elements, with enriched motifs matching transcription factors in regulatory topics. Together, these analyses identify both tissue-specific and common regulatory programs in postnatal development across multiple tissues through the lens of the factors regulating transcription.

2.
bioRxiv ; 2024 Feb 29.
Artigo em Inglês | MEDLINE | ID: mdl-38464087

RESUMO

The gene expression profiles of distinct cell types reflect complex genomic interactions among multiple simultaneous biological processes within each cell that can be altered by disease progression as well as genetic background. The identification of these active cellular programs is an open challenge in the analysis of single-cell RNA-seq data. Latent Dirichlet Allocation (LDA) is a generative method used to identify recurring patterns in counts data, commonly referred to as topics that can be used to interpret the state of each cell. However, LDA's interpretability is hindered by several key factors including the hyperparameter selection of the number of topics as well as the variability in topic definitions due to random initialization. We developed Topyfic, a Reproducible LDA (rLDA) package, to accurately infer the identity and activity of cellular programs in single-cell data, providing insights into the relative contributions of each program in individual cells. We apply Topyfic to brain single-cell and single-nucleus datasets of two 5xFAD mouse models of Alzheimer's disease crossed with C57BL6/J or CAST/EiJ mice to identify distinct cell types and states in different cell types such as microglia. We find that 8-month 5xFAD/Cast F1 males show higher level of microglial activation than matching 5xFAD/BL6 F1 males, whereas female mice show similar levels of microglial activation. We show that regulatory genes such as TFs, microRNA host genes, and chromatin regulatory genes alone capture cell types and cell states. Our study highlights how topic modeling with a limited vocabulary of regulatory genes can identify gene expression programs in single-cell data in order to quantify similar and divergent cell states in distinct genotypes.

3.
Nature ; 2023 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-38057666

RESUMO

Human limbs emerge during the fourth post-conception week as mesenchymal buds, which develop into fully formed limbs over the subsequent months1. This process is orchestrated by numerous temporally and spatially restricted gene expression programmes, making congenital alterations in phenotype common2. Decades of work with model organisms have defined the fundamental mechanisms underlying vertebrate limb development, but an in-depth characterization of this process in humans has yet to be performed. Here we detail human embryonic limb development across space and time using single-cell and spatial transcriptomics. We demonstrate extensive diversification of cells from a few multipotent progenitors to myriad differentiated cell states, including several novel cell populations. We uncover two waves of human muscle development, each characterized by different cell states regulated by separate gene expression programmes, and identify musculin (MSC) as a key transcriptional repressor maintaining muscle stem cell identity. Through assembly of multiple anatomically continuous spatial transcriptomic samples using VisiumStitcher, we map cells across a sagittal section of a whole fetal hindlimb. We reveal a clear anatomical segregation between genes linked to brachydactyly and polysyndactyly, and uncover transcriptionally and spatially distinct populations of the mesenchyme in the autopod. Finally, we perform single-cell RNA sequencing on mouse embryonic limbs to facilitate cross-species developmental comparison, finding substantial homology between the two species.

4.
Genome Res ; 2023 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-37852782

RESUMO

Transcription factors (TFs) are trans-acting proteins that bind cis-regulatory elements (CREs) in DNA to control gene expression. Here, we analyzed the genomic localization profiles of 529 sequence-specific TFs and 151 cofactors and chromatin regulators in the human cancer cell line HepG2, for a total of 680 broadly termed DNA-associated proteins (DAPs). We used this deep collection to model each TF's impact on gene expression, and identified a cohort of 26 candidate transcriptional repressors. We examine high occupancy target (HOT) sites in the context of three-dimensional genome organization and show biased motif placement in distal-promoter connections involving HOT sites. We also found a substantial number of closed chromatin regions with multiple DAPs bound, and explored their properties, finding that a MAFF/MAFK TF pair correlates with transcriptional repression. Altogether, these analyses provide novel insights into the regulatory logic of the human cell line HepG2 genome and show the usefulness of large genomic analyses for elucidation of individual TF functions.

5.
bioRxiv ; 2023 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-37292896

RESUMO

The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.

6.
Bioinformatics ; 39(4)2023 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-36897015

RESUMO

SUMMARY: Large-scale sharing of genomic quantification data requires standardized access interfaces. In this Global Alliance for Genomics and Health project, we developed RNAget, an API for secure access to genomic quantification data in matrix form. RNAget provides for slicing matrices to extract desired subsets of data and is applicable to all expression matrix-format data, including RNA sequencing and microarrays. Further, it generalizes to quantification matrices of other sequence-based genomics such as ATAC-seq and ChIP-seq. AVAILABILITY AND IMPLEMENTATION: https://ga4gh-rnaseq.github.io/schema/docs/index.html.


Assuntos
RNA , Software , Genômica , Genoma , Análise de Sequência de RNA
8.
Genome Biol ; 22(1): 286, 2021 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-34620214

RESUMO

The rise in throughput and quality of long-read sequencing should allow unambiguous identification of full-length transcript isoforms. However, its application to single-cell RNA-seq has been limited by throughput and expense. Here we develop and characterize long-read Split-seq (LR-Split-seq), which uses combinatorial barcoding to sequence single cells with long reads. Applied to the C2C12 myogenic system, LR-split-seq associates isoforms to cell types with relative economy and design flexibility. We find widespread evidence of changing isoform expression during differentiation including alternative transcription start sites (TSS) and/or alternative internal exon usage. LR-Split-seq provides an affordable method for identifying cluster-specific isoforms in single cells.


Assuntos
Isoformas de RNA/metabolismo , RNA-Seq/métodos , Análise de Célula Única/métodos , Animais , Diferenciação Celular/genética , Linhagem Celular , Núcleo Celular/genética , Cromatina/metabolismo , Genômica , Camundongos , Modelos Genéticos , Miogenina/genética , Fator de Transcrição PAX7/genética , Sítio de Iniciação de Transcrição , Transcrição Gênica
9.
Nature ; 583(7818): 720-728, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32728244

RESUMO

Transcription factors are DNA-binding proteins that have key roles in gene regulation1,2. Genome-wide occupancy maps of transcriptional regulators are important for understanding gene regulation and its effects on diverse biological processes3-6. However, only a minority of the more than 1,600 transcription factors encoded in the human genome has been assayed. Here we present, as part of the ENCODE (Encyclopedia of DNA Elements) project, data and analyses from chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experiments using the human HepG2 cell line for 208 chromatin-associated proteins (CAPs). These comprise 171 transcription factors and 37 transcriptional cofactors and chromatin regulator proteins, and represent nearly one-quarter of CAPs expressed in HepG2 cells. The binding profiles of these CAPs form major groups associated predominantly with promoters or enhancers, or with both. We confirm and expand the current catalogue of DNA sequence motifs for transcription factors, and describe motifs that correspond to other transcription factors that are co-enriched with the primary ChIP target. For example, FOX family motifs are enriched in ChIP-seq peaks of 37 other CAPs. We show that motif content and occupancy patterns can distinguish between promoters and enhancers. This catalogue reveals high-occupancy target regions at which many CAPs associate, although each contains motifs for only a minority of the numerous associated transcription factors. These analyses provide a more complete overview of the gene regulatory networks that define this cell type, and demonstrate the usefulness of the large-scale production efforts of the ENCODE Consortium.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Cromatina/genética , Cromatina/metabolismo , Proteínas de Ligação a DNA/metabolismo , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico/genética , Conjuntos de Dados como Assunto , Elementos Facilitadores Genéticos/genética , Células Hep G2 , Humanos , Motivos de Nucleotídeos/genética , Regiões Promotoras Genéticas/genética , Ligação Proteica , Fatores de Transcrição/metabolismo
10.
Nature ; 583(7818): 760-767, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32728245

RESUMO

During mammalian embryogenesis, differential gene expression gradually builds the identity and complexity of each tissue and organ system1. Here we systematically quantified mouse polyA-RNA from day 10.5 of embryonic development to birth, sampling 17 tissues and organs. The resulting developmental transcriptome is globally structured by dynamic cytodifferentiation, body-axis and cell-proliferation gene sets that were further characterized by the transcription factor motif codes of their promoters. We decomposed the tissue-level transcriptome using single-cell RNA-seq (sequencing of RNA reverse transcribed into cDNA) and found that neurogenesis and haematopoiesis dominate at both the gene and cellular levels, jointly accounting for one-third of differential gene expression and more than 40% of identified cell types. By integrating promoter sequence motifs with companion ENCODE epigenomic profiles, we identified a prominent promoter de-repression mechanism in neuronal expression clusters that was attributable to known and novel repressors. Focusing on the developing limb, single-cell RNA data identified 25 candidate cell types that included progenitor and differentiating states with computationally inferred lineage relationships. We extracted cell-type transcription factor networks and complementary sets of candidate enhancer elements by using single-cell RNA-seq to decompose integrative cis-element (IDEAS) models that were derived from whole-tissue epigenome chromatin data. These ENCODE reference data, computed network components and IDEAS chromatin segmentations are companion resources to the matching epigenomic developmental matrix, and are available for researchers to further mine and integrate.


Assuntos
Embrião de Mamíferos/citologia , Embrião de Mamíferos/embriologia , Desenvolvimento Embrionário/genética , Regulação da Expressão Gênica no Desenvolvimento , Análise de Célula Única , Transcriptoma , Animais , Diferenciação Celular/genética , Linhagem da Célula/genética , Cromatina/genética , Embrião de Mamíferos/metabolismo , Elementos Facilitadores Genéticos , Epigenômica , Extremidades/embriologia , Feminino , Masculino , Camundongos , Poli A/genética , Poli A/metabolismo , Regiões Promotoras Genéticas , RNA-Seq , Fatores de Transcrição/metabolismo
11.
Genome Res ; 30(7): 939-950, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32616518

RESUMO

DNA-associated proteins (DAPs) classically regulate gene expression by binding to regulatory loci such as enhancers or promoters. As expanding catalogs of genome-wide DAP binding maps reveal thousands of loci that, unlike the majority of conventional enhancers and promoters, associate with dozens of different DAPs with apparently little regard for motif preference, an understanding of DAP association and coordination at such regulatory loci is essential to deciphering how these regions contribute to normal development and disease. In this study, we aggregated publicly available ChIP-seq data from 469 human DAPs assayed in three cell lines and integrated these data with an orthogonal data set of 352 nonredundant, in vitro-derived motifs mapped to the genome within DNase I hypersensitivity footprints to characterize regions with high numbers of DAP associations. We establish a generalizable definition for high occupancy target (HOT) loci and identify putative driver DAP motifs in HepG2 cells, including HNF4A, SP1, SP5, and ETV4, that are highly prevalent and show sequence conservation at HOT loci. The number of different DAPs associated with an element is positively associated with evidence of regulatory activity, and by systematically mutating 245 HOT loci with a massively parallel mutagenesis assay, we localized regulatory activity to a central core region that depends on the motif sequences of our previously nominated driver DAPs. In sum, this work leverages the increasingly large number of DAP motif and ChIP-seq data publicly available to explore how DAP associations contribute to genome-wide transcriptional regulation.


Assuntos
Elementos Facilitadores Genéticos , Regulação da Expressão Gênica , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo , Composição de Bases , Linhagem Celular , Cromatina/química , Sequenciamento de Cromatina por Imunoprecipitação , DNA/química , Loci Gênicos , Genoma , Células Hep G2 , Humanos , Mutagênese , Mutação , Motivos de Nucleotídeos
12.
Genome Res ; 29(11): 1900-1909, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31645363

RESUMO

MicroRNAs (miRNAs) play a critical role as posttranscriptional regulators of gene expression. The ENCODE Project profiled the expression of miRNAs in an extensive set of organs during a time-course of mouse embryonic development and captured the expression dynamics of 785 miRNAs. We found distinct organ-specific and developmental stage-specific miRNA expression clusters, with an overall pattern of increasing organ-specific expression as embryonic development proceeds. Comparative analysis of conserved miRNAs in mouse and human revealed stronger clustering of expression patterns by organ type rather than by species. An analysis of messenger RNA expression clusters compared with miRNA expression clusters identifies the potential role of specific miRNA expression clusters in suppressing the expression of mRNAs specific to other developmental programs in the organ in which these miRNAs are expressed during embryonic development. Our results provide the most comprehensive time-course of miRNA expression as part of an integrated ENCODE reference data set for mouse embryonic development.


Assuntos
Desenvolvimento Embrionário/genética , MicroRNAs/genética , Animais , Feminino , Regulação da Expressão Gênica no Desenvolvimento , Camundongos , Gravidez , RNA Mensageiro/genética
13.
Cell Syst ; 9(4): 321-337.e9, 2019 10 23.
Artigo em Inglês | MEDLINE | ID: mdl-31629685

RESUMO

Intrathymic T cell development converts multipotent precursors to committed pro-T cells, silencing progenitor genes while inducing T cell genes, but the underlying steps have remained obscure. Single-cell profiling was used to define the order of regulatory changes, employing single-cell RNA sequencing (scRNA-seq) for full-transcriptome analysis, plus sequential multiplexed single-molecule fluorescent in situ hybridization (seqFISH) to quantitate functionally important transcripts in intrathymic precursors. Single-cell cloning verified high T cell precursor frequency among the immunophenotypically defined "early T cell precursor" (ETP) population; a discrete committed granulocyte precursor subset was also distinguished. We established regulatory phenotypes of sequential ETP subsets, confirmed initial co-expression of progenitor with T cell specification genes, defined stage-specific relationships between cell cycle and differentiation, and generated a pseudotime model from ETP to T lineage commitment, supported by RNA velocity and transcription factor perturbations. This model was validated by developmental kinetics of ETP subsets at population and clonal levels. The results imply that multilineage priming is integral to T cell specification.


Assuntos
Modelos Imunológicos , Células-Tronco Pluripotentes/fisiologia , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Linfócitos T/fisiologia , Timo/fisiologia , Diferenciação Celular , Linhagem da Célula , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Inativação Gênica , Hibridização in Situ Fluorescente
14.
Proc Natl Acad Sci U S A ; 115(13): E2930-E2939, 2018 03 27.
Artigo em Inglês | MEDLINE | ID: mdl-29531064

RESUMO

RNA-sequencing (RNA-seq) is commonly used to identify genetic modules that respond to perturbations. In single cells, transcriptomes have been used as phenotypes, but this concept has not been applied to whole-organism RNA-seq. Also, quantifying and interpreting epistatic effects using expression profiles remains a challenge. We developed a single coefficient to quantify transcriptome-wide epistasis that reflects the underlying interactions and which can be interpreted intuitively. To demonstrate our approach, we sequenced four single and two double mutants of Caenorhabditis elegans From these mutants, we reconstructed the known hypoxia pathway. In addition, we uncovered a class of 56 genes with HIF-1-dependent expression that have opposite changes in expression in mutants of two genes that cooperate to negatively regulate HIF-1 abundance; however, the double mutant of these genes exhibits suppression epistasis. This class violates the classical model of HIF-1 regulation but can be explained by postulating a role of hydroxylated HIF-1 in transcriptional control.


Assuntos
Proteínas de Caenorhabditis elegans/genética , Caenorhabditis elegans/genética , Epistasia Genética , Redes Reguladoras de Genes , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Transcriptoma , Animais , Caenorhabditis elegans/crescimento & desenvolvimento
15.
Development ; 143(19): 3632-3637, 2016 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-27702788

RESUMO

In situ hybridization methods are used across the biological sciences to map mRNA expression within intact specimens. Multiplexed experiments, in which multiple target mRNAs are mapped in a single sample, are essential for studying regulatory interactions, but remain cumbersome in most model organisms. Programmable in situ amplifiers based on the mechanism of hybridization chain reaction (HCR) overcome this longstanding challenge by operating independently within a sample, enabling multiplexed experiments to be performed with an experimental timeline independent of the number of target mRNAs. To assist biologists working across a broad spectrum of organisms, we demonstrate multiplexed in situ HCR in diverse imaging settings: bacteria, whole-mount nematode larvae, whole-mount fruit fly embryos, whole-mount sea urchin embryos, whole-mount zebrafish larvae, whole-mount chicken embryos, whole-mount mouse embryos and formalin-fixed paraffin-embedded human tissue sections. In addition to straightforward multiplexing, in situ HCR enables deep sample penetration, high contrast and subcellular resolution, providing an incisive tool for the study of interlaced and overlapping expression patterns, with implications for research communities across the biological sciences.


Assuntos
Hibridização In Situ/métodos , RNA Mensageiro/metabolismo , Animais , Drosophila , Embrião não Mamífero/metabolismo , Humanos , Peixe-Zebra
16.
J Am Med Inform Assoc ; 22(6): 1143-7, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26174866

RESUMO

The world's genomics data will never be stored in a single repository - rather, it will be distributed among many sites in many countries. No one site will have enough data to explain genotype to phenotype relationships in rare diseases; therefore, sites must share data. To accomplish this, the genetics community must forge common standards and protocols to make sharing and computing data among many sites a seamless activity. Through the Global Alliance for Genomics and Health, we are pioneering the development of shared application programming interfaces (APIs) to connect the world's genome repositories. In parallel, we are developing an open source software stack (ADAM) that uses these APIs. This combination will create a cohesive genome informatics ecosystem. Using containers, we are facilitating the deployment of this software in a diverse array of environments. Through benchmarking efforts and big data driver projects, we are ensuring ADAM's performance and utility.


Assuntos
Conjuntos de Dados como Assunto , Genômica , Pesquisa Translacional Biomédica , Biologia Computacional , Humanos , Bases de Conhecimento , National Institutes of Health (U.S.) , Estados Unidos
17.
Dev Cell ; 32(6): 765-71, 2015 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-25805138

RESUMO

Huang et al. (2013) recently reported that chromatin immunoprecipitation sequencing (ChIP-seq) reveals the genome-wide sites of occupancy by Piwi, a piRNA-guided Argonaute protein central to transposon silencing in Drosophila. Their study also reported that loss of Piwi causes widespread rewiring of transcriptional patterns, as evidenced by changes in RNA polymerase II occupancy across the genome. Here we reanalyze their data and report that the underlying deep-sequencing dataset does not support the authors' genome-wide conclusions.


Assuntos
Proteínas Argonautas/genética , Proteínas de Ligação a DNA/genética , Proteínas de Drosophila/genética , RNA Polimerase II/genética , Animais , Sequência de Bases , Sítios de Ligação/genética , Imunoprecipitação da Cromatina , Drosophila melanogaster , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Metiltransferases , Interferência de RNA , RNA Interferente Pequeno/genética , Análise de Sequência de DNA
18.
Cell Stem Cell ; 16(1): 88-101, 2015 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-25575081

RESUMO

Cellular reprogramming highlights the epigenetic plasticity of the somatic cell state. Long noncoding RNAs (lncRNAs) have emerging roles in epigenetic regulation, but their potential functions in reprogramming cell fate have been largely unexplored. We used single-cell RNA sequencing to characterize the expression patterns of over 16,000 genes, including 437 lncRNAs, during defined stages of reprogramming to pluripotency. Self-organizing maps (SOMs) were used as an intuitive way to structure and interrogate transcriptome data at the single-cell level. Early molecular events during reprogramming involved the activation of Ras signaling pathways, along with hundreds of lncRNAs. Loss-of-function studies showed that activated lncRNAs can repress lineage-specific genes, while lncRNAs activated in multiple reprogramming cell types can regulate metabolic gene expression. Our findings demonstrate that reprogramming cells activate defined sets of functionally relevant lncRNAs and provide a resource to further investigate how dynamic changes in the transcriptome reprogram cell state.


Assuntos
Reprogramação Celular/genética , RNA Longo não Codificante/genética , Análise de Célula Única/métodos , Transcriptoma/genética , Animais , Linhagem da Célula/genética , Regulação da Expressão Gênica no Desenvolvimento , Genes Controladores do Desenvolvimento , Hematopoese/genética , Células-Tronco Pluripotentes Induzidas/citologia , Células-Tronco Pluripotentes Induzidas/metabolismo , Camundongos , Células-Tronco Pluripotentes/metabolismo , RNA Longo não Codificante/metabolismo , Transdução de Sinais/genética , Proteínas ras/metabolismo
19.
BMC Bioinformatics ; 15: 331, 2014 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-25411051

RESUMO

BACKGROUND: Gene co-expression analysis has previously been based on measures that include correlation coefficients and mutual information, as well as newcomers such as MIC. These measures depend primarily on the degree of association between the RNA levels of two genes and to a lesser extent on their variability. They focus on the similarity of expression value trajectories that change in like manner across samples. However there are relationships of biological interest for which these classical measures are expected to be insensitive. These include genes whose expression levels are ratiometrically stable and genes whose variance is tightly constrained. Large-scale studies of relatively homogeneous samples, including single cell RNA-seq, are experimental settings in which such relationships might be especially pertinent. RESULTS: We develop and implement a ratiometric approach for detecting gene associations (abbreviated RA). It is based on the coefficient of variation of the measured expression ratio of each pair of genes. We apply it to a collection of lymphoblastoid RNA-seq data from the 1000 Genomes Project Consortium, a typical sample set with high overall homogeneity. RA is a selective method, reporting in this case ~1/4 of all possible gene pairs, yet these relationships include a distilled picture of biological relationships previously found by other methods. In addition, RA reveals expression relationships that are not detected by traditional correlation and mutual information methods. We also analyze data from individual lymphoblastoid cells and show that desirable properties of the RA method extend to single-cell RNA-seq. CONCLUSION: We show that our ratiometric method identifies biologically significant relationships that are often missed or low-ranked by conventional association-based methods when applied to a relatively homogenous dataset. The results open new questions about the regulatory mechanisms that produce strong RA relationships. RA is scalable and potentially well suited for the analysis of thousands of bulk-RNA or single-cell transcriptomes.


Assuntos
Perfilação da Expressão Gênica/métodos , Estudos de Associação Genética/métodos , Análise de Sequência de RNA , Análise de Célula Única , Linfócitos B/metabolismo , Linhagem Celular Transformada , Projeto Genoma Humano , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...