Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
1.
medRxiv ; 2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38746091

RESUMO

Tandem repeat sequences comprise approximately 8% of the human genome and are linked to more than 50 neurodegenerative disorders. Accurate characterization of disease-associated repeat loci remains resource intensive and often lacks high resolution genotype calls. We introduce a multiplexed, targeted nanopore sequencing panel and HMMSTR, a sequence-based tandem repeat copy number caller. HMMSTR outperforms current signal- and sequence-based callers relative to two assemblies and we show it performs with high accuracy in heterozygous regions and at low read coverage. The flexible panel allows us to capture disease associated regions at an average coverage of >150x. Using these tools, we successfully characterize known or suspected repeat expansions in patient derived samples. In these samples we also identify unexpected expanded alleles at tandem repeat loci not previously associated with the underlying diagnosis. This genotyping approach for tandem repeat expansions is scalable, simple, flexible, and accurate, offering significant potential for diagnostic applications and investigation of expansion co-occurrence in neurodegenerative disorders.

2.
PLoS One ; 19(3): e0298688, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38478504

RESUMO

Understanding the functional effects of sequence variation is crucial in genomics. Individual human genomes contain millions of variants that contribute to phenotypic variability and disease risks at the population level. Because variants rarely act in isolation, we must consider potential interactions of neighboring variants to accurately predict functional effects. We can accomplish this using haplotagging, which matches sequencing reads to their parental haplotypes using alleles observed at known heterozygous variants. However, few published tools for haplotagging exist and these share several technical and usability-related shortcomings that limit applicability, in particular a lack of insight or control over error rates, and lack of key metrics on the underlying sources of haplotagging error. Here we present HaplotagLR: a user-friendly tool that haplotags long sequencing reads based on a multinomial model and existing phased variant lists. HaplotagLR is user-configurable and includes a basic error model to control the empirical FDR in its output. We show that HaplotagLR outperforms the leading haplotagging method in simulated datasets, especially at high levels of specificity, and displays 7% greater sensitivity in haplotagging real data. HaplotagLR advances both the immediate utility of haplotagging and paves the way for further improvements to this important method.


Assuntos
Genoma Humano , Genômica , Humanos , Análise de Sequência de DNA/métodos , Genômica/métodos , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos
3.
Cell Genom ; 3(10): 100404, 2023 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-37868037

RESUMO

Genome-wide association studies (GWASs) have successfully identified 145 genomic regions that contribute to schizophrenia risk, but linkage disequilibrium makes it challenging to discern causal variants. We performed a massively parallel reporter assay (MPRA) on 5,173 fine-mapped schizophrenia GWAS variants in primary human neural progenitors and identified 439 variants with allelic regulatory effects (MPRA-positive variants). Transcription factor binding had modest predictive power, while fine-map posterior probability, enhancer overlap, and evolutionary conservation failed to predict MPRA-positive variants. Furthermore, 64% of MPRA-positive variants did not exhibit expressive quantitative trait loci signature, suggesting that MPRA could identify yet unexplored variants with regulatory potentials. To predict the combinatorial effect of MPRA-positive variants on gene regulation, we propose an accessibility-by-contact model that combines MPRA-measured allelic activity with neuronal chromatin architecture.

4.
Nat Struct Mol Biol ; 30(8): 1077-1091, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37460896

RESUMO

Conventional dogma presumes that protamine-mediated DNA compaction in sperm is achieved by electrostatic interactions between DNA and the arginine-rich core of protamines. Phylogenetic analysis reveals several non-arginine residues conserved within, but not across species. The significance of these residues and their post-translational modifications are poorly understood. Here, we investigated the role of K49, a rodent-specific lysine residue in protamine 1 (P1) that is acetylated early in spermiogenesis and retained in sperm. In sperm, alanine substitution (P1(K49A)) decreases sperm motility and male fertility-defects that are not rescued by arginine substitution (P1(K49R)). In zygotes, P1(K49A) leads to premature male pronuclear decompaction, altered DNA replication, and embryonic arrest. In vitro, P1(K49A) decreases protamine-DNA binding and alters DNA compaction and decompaction kinetics. Hence, a single amino acid substitution outside the P1 arginine core is sufficient to profoundly alter protein function and developmental outcomes, suggesting that protamine non-arginine residues are essential for reproductive fitness.


Assuntos
Aminoácidos , Aptidão Genética , Animais , Masculino , Camundongos , Aminoácidos/metabolismo , Arginina/metabolismo , Cromatina/metabolismo , DNA/genética , DNA/metabolismo , Filogenia , Protaminas/química , Protaminas/genética , Protaminas/metabolismo , Sêmen/metabolismo , Motilidade dos Espermatozoides , Espermatozoides
5.
HGG Adv ; 4(3): 100210, 2023 07 13.
Artigo em Inglês | MEDLINE | ID: mdl-37305558

RESUMO

Understanding the genetic basis for complex, heterogeneous disorders, such as autism spectrum disorder (ASD), is a persistent challenge in human medicine. Owing to their phenotypic complexity, the genetic mechanisms underlying these disorders may be highly variable across individual patients. Furthermore, much of their heritability is unexplained by known regulatory or coding variants. Indeed, there is evidence that much of the causal genetic variation stems from rare and de novo variants arising from ongoing mutation. These variants occur mostly in noncoding regions, likely affecting regulatory processes for genes linked to the phenotype of interest. However, because there is no uniform code for assessing regulatory function, it is difficult to separate these mutations into likely functional and nonfunctional subsets. This makes finding associations between complex diseases and potentially causal de novo single-nucleotide variants (dnSNVs) a difficult task. To date, most published studies have struggled to find any significant associations between dnSNVs from ASD patients and any class of known regulatory elements. We sought to identify the underlying reasons for this and present strategies for overcoming these challenges. We show that, contrary to previous claims, the main reason for failure to find robust statistical enrichments is not only the number of families sampled, but also the quality and relevance to ASD of the annotations used to prioritize dnSNVs, and the reliability of the set of dnSNVs itself. We present a list of recommendations for designing future studies of this sort that will help researchers avoid common pitfalls.


Assuntos
Transtorno do Espectro Autista , Medicina , Humanos , Transtorno do Espectro Autista/diagnóstico , Reprodutibilidade dos Testes , Movimento Celular , Fenótipo
7.
Genome Res ; 33(5): 741-749, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-37156622

RESUMO

Recombinant plasmid vectors are versatile tools that have facilitated discoveries in molecular biology, genetics, proteomics, and many other fields. As the enzymatic and bacterial processes used to create recombinant DNA can introduce errors, sequence validation is an essential step in plasmid assembly. Sanger sequencing is the current standard for plasmid validation; however, this method is limited by an inability to sequence through complex secondary structure and lacks scalability when applied to full-plasmid sequencing of multiple plasmids owing to read-length limits. Although high-throughput sequencing does provide full-plasmid sequencing at scale, it is impractical and costly when used outside of library-scale validation. Here, we present Oxford nanopore-based rapid analysis of multiplexed plasmids (OnRamp), an alternative method for routine plasmid validation that combines the advantages of high-throughput sequencing's full-plasmid coverage and scalability with Sanger's affordability and accessibility by leveraging nanopore's long-read sequencing technology. We include customized wet-laboratory protocols for plasmid preparation along with a pipeline designed for analysis of read data obtained using these protocols. This analysis pipeline is deployed on the OnRamp web app, which generates alignments between actual and predicted plasmid sequences, quality scores, and read-level views. OnRamp is designed to be broadly accessible regardless of programming experience to facilitate more widespread adoption of long-read sequencing for routine plasmid validation. Here we describe the OnRamp protocols and pipeline and show our ability to obtain full sequences from pooled plasmids while detecting sequence variation even in regions of high secondary structure at less than half the cost of equivalent Sanger sequencing.


Assuntos
Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA/métodos , Plasmídeos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Proteômica
8.
bioRxiv ; 2023 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-36712073

RESUMO

Understanding the functional effects of sequence variation is among the primary goals of contemporary genomics. Individual human genomes contain millions of variants which are thought to contribute to phenotypic variability and differential disease risks at the population level. However, because variants rarely act in isolation, we cannot accurately predict functional effects without first considering the potential effects of other interacting variants on the same chromosome. This information can be obtained by phasing the read data from sequencing experiments. However, no standalone tools are available to simply phase reads based on known haplotypes. Here we present LRphase: a user-friendly utility for simple phasing of long sequencing reads.

9.
BMC Bioinformatics ; 23(1): 317, 2022 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-35927613

RESUMO

MOTIVATION: Aberrant DNA methylation in transcription factor binding sites has been shown to lead to anomalous gene regulation that is strongly associated with human disease. However, the majority of methylation-sensitive positions within transcription factor binding sites remain unknown. Here we introduce SEMplMe, a computational tool to generate predictions of the effect of methylation on transcription factor binding strength in every position within a transcription factor's motif. RESULTS: SEMplMe uses ChIP-seq and whole genome bisulfite sequencing to predict effects of methylation within binding sites. SEMplMe validates known methylation sensitive and insensitive positions within a binding motif, identifies cell type specific transcription factor binding driven by methylation, and outperforms SELEX-based predictions for CTCF. These predictions can be used to identify aberrant sites of DNA methylation contributing to human disease. AVAILABILITY AND IMPLEMENTATION: SEMplMe is available from https://github.com/Boyle-Lab/SEMplMe .


Assuntos
Metilação de DNA , Fatores de Transcrição , Sítios de Ligação , Regulação da Expressão Gênica , Humanos , Ligação Proteica , Fatores de Transcrição/metabolismo
10.
Genome Biol ; 23(1): 105, 2022 04 26.
Artigo em Inglês | MEDLINE | ID: mdl-35473573

RESUMO

BACKGROUND: Revealing the gene targets of distal regulatory elements is challenging yet critical for interpreting regulome data. Experiment-derived enhancer-gene links are restricted to a small set of enhancers and/or cell types, while the accuracy of genome-wide approaches remains elusive due to the lack of a systematic evaluation. We combined multiple spatial and in silico approaches for defining enhancer locations and linking them to their target genes aggregated across >500 cell types, generating 1860 human genome-wide distal enhancer-to-target gene definitions (EnTDefs). To evaluate performance, we used gene set enrichment (GSE) testing on 87 independent ENCODE ChIP-seq datasets of 34 transcription factors (TFs) and assessed concordance of results with known TF Gene Ontology annotations, and other benchmarks. RESULTS: The top ranked 741 (40%) EnTDefs significantly outperform the common, naïve approach of linking distal regions to the nearest genes, and the top 10 EnTDefs perform well when applied to ChIP-seq data of other cell types. The GSE-based ranking of EnTDefs is highly concordant with ranking based on overlap with curated benchmarks of enhancer-gene interactions. Both our top general EnTDef and cell-type-specific EnTDefs significantly outperform seven independent computational and experiment-based enhancer-gene pair datasets. We show that using our top EnTDefs for GSE with either genome-wide DNA methylation or ATAC-seq data is able to better recapitulate the biological processes changed in gene expression data performed in parallel for the same experiment than our lower-ranked EnTDefs. CONCLUSIONS: Our findings illustrate the power of our approach to provide genome-wide interpretation regardless of cell type.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Sequências Reguladoras de Ácido Nucleico , DNA , Genoma Humano , Humanos , Anotação de Sequência Molecular
11.
Nucleic Acids Res ; 50(1): e6, 2022 01 11.
Artigo em Inglês | MEDLINE | ID: mdl-34648033

RESUMO

Understanding the functional consequences of genetic variation in the non-coding regions of the human genome remains a challenge. We introduce h ere a computational tool, TURF, to prioritize regulatory variants with tissue-specific function by leveraging evidence from functional genomics experiments, including over 3000 functional genomics datasets from the ENCODE project provided in the RegulomeDB database. TURF is able to generate prediction scores at both organism and tissue/organ-specific levels for any non-coding variant on the genome. We present that TURF has an overall top performance in prediction by using validated variants from MPRA experiments. We also demonstrate how TURF can pick out the regulatory variants with tissue-specific function over a candidate list from associate studies. Furthermore, we found that various GWAS traits showed the enrichment of regulatory variants predicted by TURF scores in the trait-relevant organs, which indicates that these variants can be a valuable source for future studies.


Assuntos
Genoma Humano , Genômica/métodos , Software , Linhagem Celular , Análise de Dados , Humanos
12.
Genome Biol ; 22(1): 298, 2021 10 27.
Artigo em Inglês | MEDLINE | ID: mdl-34706748

RESUMO

We present SquiggleNet, the first deep-learning model that can classify nanopore reads directly from their electrical signals. SquiggleNet operates faster than DNA passes through the pore, allowing real-time classification and read ejection. Using 1 s of sequencing data, the classifier achieves significantly higher accuracy than base calling followed by sequence alignment. Our approach is also faster and requires an order of magnitude less memory than alignment-based approaches. SquiggleNet distinguished human from bacterial DNA with over 90% accuracy, generalized to unseen bacterial species in a human respiratory meta genome sample, and accurately classified sequences containing human long interspersed repeat elements.


Assuntos
Aprendizado Profundo , Sequenciamento por Nanoporos/métodos , DNA Bacteriano/análise , Humanos , Elementos Nucleotídeos Longos e Dispersos , Metagenoma , Sistema Respiratório/microbiologia
13.
Front Genet ; 12: 683394, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34220959

RESUMO

BACKGROUND: Zebrafish are a foundational model organism for studying the spatio-temporal activity of genes and their regulatory sequences. A variety of approaches are currently available for editing genes and modifying gene expression in zebrafish, including RNAi, Cre/lox, and CRISPR-Cas9. However, the lac operator-repressor system, an E. coli lac operon component which has been adapted for use in many other species and is a valuable, flexible tool for inducible modulation of gene expression studies, has not been previously tested in zebrafish. RESULTS: Here we demonstrate that the lac operator-repressor system robustly decreases expression of firefly luciferase in cultured zebrafish fibroblast cells. Our work establishes the lac operator-repressor system as a promising tool for the manipulation of gene expression in whole zebrafish. CONCLUSION: Our results lay the groundwork for the development of lac-based reporter assays in zebrafish, and adds to the tools available for investigating dynamic gene expression in embryogenesis. We believe this work will catalyze the development of new reporter assay systems to investigate uncharacterized regulatory elements and their cell-type specific activities.

14.
Nat Commun ; 12(1): 3586, 2021 06 11.
Artigo em Inglês | MEDLINE | ID: mdl-34117247

RESUMO

Mobile element insertions (MEIs) are repetitive genomic sequences that contribute to genetic variation and can lead to genetic disorders. Targeted and whole-genome approaches using short-read sequencing have been developed to identify reference and non-reference MEIs; however, the read length hampers detection of these elements in complex genomic regions. Here, we pair Cas9-targeted nanopore sequencing with computational methodologies to capture active MEIs in human genomes. We demonstrate parallel enrichment for distinct classes of MEIs, averaging 44% of reads on-targeted signals and exhibiting a 13.4-54x enrichment over whole-genome approaches. We show an individual flow cell can recover most MEIs (97% L1Hs, 93% AluYb, 51% AluYa, 99% SVA_F, and 65% SVA_E). We identify seventeen non-reference MEIs in GM12878 overlooked by modern, long-read analysis pipelines, primarily in repetitive genomic regions. This work introduces the utility of nanopore sequencing for MEI enrichment and lays the foundation for rapid discovery of elusive, repetitive genetic elements.


Assuntos
Sistemas CRISPR-Cas , Genômica , Sequências Repetitivas Dispersas , Sequenciamento por Nanoporos/métodos , Linhagem Celular , Proteínas de Ligação a DNA , Genoma Humano , Humanos , Sequências Repetitivas de Ácido Nucleico , Ribonucleoproteínas/metabolismo , Análise de Sequência de DNA
15.
NAR Genom Bioinform ; 3(1): lqab012, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33655209

RESUMO

Genomic and epigenomic features are captured at a genome-wide level by using high-throughput sequencing (HTS) technologies. Peak calling delineates features identified in HTS experiments, such as open chromatin regions and transcription factor binding sites, by comparing the observed read distributions to a random expectation. Since its introduction, F-Seq has been widely used and shown to be the most sensitive and accurate peak caller for DNase I hypersensitive site (DNase-seq) data. However, the first release (F-Seq1) has two key limitations: lack of support for user-input control datasets, and poor test statistic reporting. These constrain its ability to capture systematic and experimental biases inherent to the background distributions in peak prediction, and to subsequently rank predicted peaks by confidence. To address these limitations, we present F-Seq2, which combines kernel density estimation and a dynamic 'continuous' Poisson test to account for local biases and accurately rank candidate peaks. The output of F-Seq2 is suitable for irreproducible discovery rate analysis as test statistics are calculated for individual candidate summits, allowing direct comparison of predictions across replicates. These improvements significantly boost the performance of F-Seq2 for ATAC-seq and ChIP-seq datasets, outperforming competing peak callers used by the ENCODE Consortium in terms of precision and recall.

16.
Proc Natl Acad Sci U S A ; 117(48): 30799-30804, 2020 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-33199612

RESUMO

Eukaryotic genomes are pervasively transcribed, yet most transcribed sequences lack conservation or known biological functions. In Arabidopsis thaliana, RNA polymerase V (Pol V) produces noncoding transcripts, which base pair with small interfering RNA (siRNA) and allow specific establishment of RNA-directed DNA methylation (RdDM) on transposable elements. Here, we show that Pol V transcribes much more broadly than previously expected, including subsets of both heterochromatic and euchromatic regions. At already established RdDM targets, Pol V and siRNA work together to maintain silencing. In contrast, some euchromatic sequences do not give rise to siRNA but are covered by low levels of Pol V transcription, which is needed to establish RdDM de novo if a transposon is reactivated. We propose a model where Pol V surveils the genome to make it competent to silence newly activated or integrated transposons. This indicates that pervasive transcription of nonconserved sequences may serve an essential role in maintenance of genome integrity.


Assuntos
RNA Polimerases Dirigidas por DNA/metabolismo , Genoma , RNA não Traduzido , Transcrição Gênica , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/metabolismo , Elementos de DNA Transponíveis , Regulação da Expressão Gênica de Plantas , Inativação Gênica , Modelos Biológicos , Complexos Multiproteicos/metabolismo , Especificidade por Substrato
17.
BMC Bioinformatics ; 21(1): 416, 2020 Sep 22.
Artigo em Inglês | MEDLINE | ID: mdl-32962625

RESUMO

BACKGROUND: Comparative genomics studies are growing in number partly because of their unique ability to provide insight into shared and divergent biology between species. Of particular interest is the use of phylogenetic methods to infer the evolutionary history of cis-regulatory sequence features, which contribute strongly to phenotypic divergence and are frequently gained and lost in eutherian genomes. Understanding the mechanisms by which cis-regulatory element turnover generate emergent phenotypes is crucial to our understanding of adaptive evolution. Ancestral reconstruction methods can place species-specific cis-regulatory features in their evolutionary context, thus increasing our understanding of the process of regulatory sequence turnover. However, applying these methods to gain and loss of cis-regulatory features historically required complex workflows, preventing widespread adoption by the broad scientific community. RESULTS: MapGL simplifies phylogenetic inference of the evolutionary history of short genomic sequence features by combining the necessary steps into a single piece of software with a simple set of inputs and outputs. We show that MapGL can reliably disambiguate the mechanisms underlying differential regulatory sequence content across a broad range of phylogenetic topologies and evolutionary distances. Thus, MapGL provides the necessary context to evaluate how genomic sequence gain and loss contribute to species-specific divergence. CONCLUSIONS: MapGL makes phylogenetic inference of species-specific sequence gain and loss easy for both expert and non-expert users, making it a powerful tool for gaining novel insights into genome evolution.


Assuntos
Evolução Molecular , Genoma/genética , Genômica/métodos , Sequências Reguladoras de Ácido Nucleico , Software , Animais , Humanos , Mamíferos/genética , Fenótipo , Filogenia
18.
Genome Res ; 30(7): 1040-1046, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32660981

RESUMO

Transcription is tightly regulated by cis-regulatory DNA elements where transcription factors (TFs) can bind. Thus, identification of TF binding sites (TFBSs) is key to understanding gene expression and whole regulatory networks within a cell. The standard approaches used for TFBS prediction, such as position weight matrices (PWMs) and chromatin immunoprecipitation followed by sequencing (ChIP-seq), are widely used but have their drawbacks, including high false-positive rates and limited antibody availability, respectively. Several computational footprinting algorithms have been developed to detect TFBSs by investigating chromatin accessibility patterns; however, these also have limitations. We have developed a footprinting method to predict TF footprints in active chromatin elements (TRACE) to improve the prediction of TFBS footprints. TRACE incorporates DNase-seq data and PWMs within a multivariate hidden Markov model (HMM) to detect footprint-like regions with matching motifs. TRACE is an unsupervised method that accurately annotates binding sites for specific TFs automatically with no requirement for pregenerated candidate binding sites or ChIP-seq training data. Compared with published footprinting algorithms, TRACE has the best overall performance with the distinct advantage of targeting multiple motifs in a single model.


Assuntos
Cromatina/metabolismo , Pegada de DNA/métodos , Análise de Sequência de DNA , Fatores de Transcrição/metabolismo , Sítios de Ligação , Linhagem Celular , Desoxirribonucleases , Humanos , Células K562 , Cadeias de Markov , Motivos de Nucleotídeos
19.
Nat Commun ; 11(1): 1796, 2020 04 14.
Artigo em Inglês | MEDLINE | ID: mdl-32286261

RESUMO

Chromatin looping is important for gene regulation, and studies of 3D chromatin structure across species and cell types have improved our understanding of the principles governing chromatin looping. However, 3D genome evolution and its relationship with natural selection remains largely unexplored. In mammals, the CTCF protein defines the boundaries of most chromatin loops, and variations in CTCF occupancy are associated with looping divergence. While many CTCF binding sites fall within transposable elements (TEs), their contribution to 3D chromatin structural evolution is unknown. Here we report the relative contributions of TE-driven CTCF binding site expansions to conserved and divergent chromatin looping in human and mouse. We demonstrate that TE-derived CTCF binding divergence may explain a large fraction of variable loops. These variable loops contribute significantly to corresponding gene expression variability across cells and species, possibly by refining sub-TAD-scale loop contacts responsible for cell-type-specific enhancer-promoter interactions.


Assuntos
Cromatina/metabolismo , Elementos de DNA Transponíveis/genética , Regulação da Expressão Gênica , Genoma , Mamíferos/genética , Animais , Sítios de Ligação , Fator de Ligação a CCCTC/metabolismo , Proteínas de Ciclo Celular/metabolismo , Cromatina/química , Cromossomos de Mamíferos/genética , Proteínas de Ligação a DNA/metabolismo , Humanos , Camundongos , Mutagênese Insercional/genética , Conformação de Ácido Nucleico , Filogenia , Especificidade da Espécie , Sintenia/genética
20.
NAR Genom Bioinform ; 2(1): lqaa006, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-32051932

RESUMO

Gene set enrichment (GSE) testing enhances the biological interpretation of ChIP-seq data and other large sets of genomic regions. Our group has previously introduced two GSE methods for genomic regions: ChIP-Enrich for narrow regions and Broad-Enrich for broad regions. Here, we introduce Poly-Enrich, which has wider applicability, additional capabilities and models the number of peaks assigned to a gene using a generalized additive model with a negative binomial family to determine gene set enrichment, while adjusting for gene locus length. As opposed to ChIP-Enrich, Poly-Enrich works well even when nearly all genes have a peak, illustrated by using Poly-Enrich to characterize pathways and types of genic regions enriched with different families of repetitive elements. By comparing Poly-Enrich and ChIP-Enrich results with ENCODE ChIP-seq data, we found that the optimal test depends more on the pathway being regulated than on properties of the transcription factors. Using known transcription factor functions, we discovered clusters of related biological processes consistently better modeled with Poly-Enrich. This suggests that the regulation of certain processes may be modified by multiple binding events, better modeled by a count-based method. Our new hybrid method automatically uses the optimal method for each gene set, with correct FDR-adjustment.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...