Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
1.
Nat Commun ; 10(1): 4613, 2019 10 10.
Article in English | MEDLINE | ID: mdl-31601804

ABSTRACT

Characterizing and interpreting heterogeneous mixtures at the cellular level is a critical problem in genomics. Single-cell assays offer an opportunity to resolve cellular level heterogeneity, e.g., scRNA-seq enables single-cell expression profiling, and scATAC-seq identifies active regulatory elements. Furthermore, while scHi-C can measure the chromatin contacts (i.e., loops) between active regulatory elements to target genes in single cells, bulk HiChIP can measure such contacts in a higher resolution. In this work, we introduce DC3 (De-Convolution and Coupled-Clustering) as a method for the joint analysis of various bulk and single-cell data such as HiChIP, RNA-seq and ATAC-seq from the same heterogeneous cell population. DC3 can simultaneously identify distinct subpopulations, assign single cells to the subpopulations (i.e., clustering) and de-convolve the bulk data into subpopulation-specific data. The subpopulation-specific profiles of gene expression, chromatin accessibility and enhancer-promoter contact obtained by DC3 provide a comprehensive characterization of the gene regulatory system in each subpopulation.


Subject(s)
Algorithms , Cluster Analysis , Gene Expression Profiling/statistics & numerical data , Genomics/statistics & numerical data , Single-Cell Analysis/statistics & numerical data , Animals , Cell Line , Chromatin , Chromatin Immunoprecipitation/statistics & numerical data , Computer Simulation , Gene Expression Profiling/methods , Gene Regulatory Networks , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/statistics & numerical data , Humans , Mice , Promoter Regions, Genetic , Single-Cell Analysis/methods
2.
Pac Symp Biocomput ; 24: 184-195, 2019.
Article in English | MEDLINE | ID: mdl-30864321

ABSTRACT

Genetic variations of the human genome are linked to many disease phenotypes. While whole-genome sequencing and genome-wide association studies (GWAS) have uncovered a number of genotype-phenotype associations, their functional interpretation remains challenging given most single nucleotide polymorphisms (SNPs) fall into the non-coding region of the genome. Advances in chromatin immunoprecipitation sequencing (ChIP-seq) have made large-scale repositories of epigenetic data available, allowing investigation of coordinated mechanisms of epigenetic markers and transcriptional regulation and their influence on biological function. To address this, we propose SNPs2ChIP, a method to infer biological functions of non-coding variants through unsupervised statistical learning methods applied to publicly-available epigenetic datasets. We systematically characterized latent factors by applying singular value decomposition to ChIP-seq tracks of lymphoblastoid cell lines, and annotated the biological function of each latent factor using the genomic region enrichment analysis tool. Using these annotated latent factors as reference, we developed SNPs2ChIP, a pipeline that takes genomic region(s) as an input, identifies the relevant latent factors with quantitative scores, and returns them along with their inferred functions. As a case study, we focused on systemic lupus erythematosus and demonstrated our method's ability to infer relevant biological function. We systematically applied SNPs2ChIP on publicly available datasets, including known GWAS associations from the GWAS catalogue and ChIP-seq peaks from a previously published study. Our approach to leverage latent patterns across genome-wide epigenetic datasets to infer the biological function will advance understanding of the genetics of human diseases by accelerating the interpretation of non-coding genomes.


Subject(s)
Chromatin Immunoprecipitation/statistics & numerical data , Polymorphism, Single Nucleotide , Algorithms , Cell Line , Computational Biology/methods , Databases, Nucleic Acid/statistics & numerical data , Epigenesis, Genetic , Genetic Association Studies , Genome, Human , Genome-Wide Association Study/statistics & numerical data , High-Throughput Nucleotide Sequencing/statistics & numerical data , Humans , Lupus Erythematosus, Systemic/genetics , Lymphocytes/metabolism , Receptors, Calcitriol/genetics
3.
BMC Genomics ; 20(1): 6, 2019 Jan 05.
Article in English | MEDLINE | ID: mdl-30611200

ABSTRACT

BACKGROUND: Sequencing data has become a standard measure of diverse cellular activities. For example, gene expression is accurately measured by RNA sequencing (RNA-Seq) libraries, protein-DNA interactions are captured by chromatin immunoprecipitation sequencing (ChIP-Seq), protein-RNA interactions by crosslinking immunoprecipitation sequencing (CLIP-Seq) or RNA immunoprecipitation (RIP-Seq) sequencing, DNA accessibility by assay for transposase-accessible chromatin (ATAC-Seq), DNase or MNase sequencing libraries. The processing of these sequencing techniques involves library-specific approaches. However, in all cases, once the sequencing libraries are processed, the result is a count table specifying the estimated number of reads originating from each genomic locus. Differential analysis to determine which loci have different cellular activity under different conditions starts with the count table and iterates through a cycle of data assessment, preparation and analysis. Such complex analysis often relies on multiple programs and is therefore a challenge for those without programming skills. RESULTS: We developed DEBrowser as an R bioconductor project to interactively visualize every step of the differential analysis, without programming. The application provides a rich and interactive web based graphical user interface built on R's shiny infrastructure. DEBrowser allows users to visualize data with various types of graphs that can be explored further by selecting and re-plotting any desired subset of data. Using the visualization approaches provided, users can determine and correct technical variations such as batch effects and sequencing depth that affect differential analysis. We show DEBrowser's ease of use by reproducing the analysis of two previously published data sets. CONCLUSIONS: DEBrowser is a flexible, intuitive, web-based analysis platform that enables an iterative and interactive analysis of count data without any requirement of programming knowledge.


Subject(s)
Chromatin Immunoprecipitation/statistics & numerical data , Genome, Human/genetics , Sequence Analysis, RNA/statistics & numerical data , Software , Chromatin/genetics , DNA/genetics , DNA-Binding Proteins/genetics , Data Interpretation, Statistical , Genomics/statistics & numerical data , High-Throughput Nucleotide Sequencing/statistics & numerical data , Humans , Sequence Analysis, DNA
4.
PLoS Comput Biol ; 14(4): e1006090, 2018 04.
Article in English | MEDLINE | ID: mdl-29684008

ABSTRACT

Genome-wide in vivo protein-DNA interactions are routinely mapped using high-throughput chromatin immunoprecipitation (ChIP). ChIP-reported regions are typically investigated for enriched sequence-motifs, which are likely to model the DNA-binding specificity of the profiled protein and/or of co-occurring proteins. However, simple enrichment analyses can miss insights into the binding-activity of the protein. Note that ChIP reports regions making direct contact with the protein as well as those binding through intermediaries. For example, consider a ChIP experiment targeting protein X, which binds DNA at its cognate sites, but simultaneously interacts with four other proteins. Each of these proteins also binds to its own specific cognate sites along distant parts of the genome, a scenario consistent with the current view of transcriptional hubs and chromatin loops. Since ChIP will pull down all X-associated regions, the final reported data will be a union of five distinct sets of regions, each containing binding sites of one of the five proteins, respectively. Characterizing all five different motifs and the corresponding sets is important to interpret the ChIP experiment and ultimately, the role of X in regulation. We present diversity which attempts exactly this: it partitions the data so that each partition can be characterized with its own de novo motif. Diversity uses a Bayesian approach to identify the optimal number of motifs and the associated partitions, which together explain the entire dataset. This is in contrast to standard motif finders, which report motifs individually enriched in the data, but do not necessarily explain all reported regions. We show that the different motifs and associated regions identified by diversity give insights into the various complexes that may be forming along the chromatin, something that has so far not been attempted from ChIP data. Webserver at http://diversity.ncl.res.in/; standalone (Mac OS X/Linux) from https://github.com/NarlikarLab/DIVERSITY/releases/tag/v1.0.0.


Subject(s)
Chromatin Immunoprecipitation/statistics & numerical data , Software , Algorithms , Animals , Bayes Theorem , Binding Sites , Chromatin/genetics , Chromatin/metabolism , Computational Biology , DNA/genetics , DNA/metabolism , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Evolution, Molecular , High-Throughput Nucleotide Sequencing/statistics & numerical data , Humans , Neurons/metabolism , Nucleotide Motifs , Protein Binding , Sequence Analysis, DNA/statistics & numerical data , Transcription Factors/genetics , Transcription Factors/metabolism
5.
Brief Bioinform ; 19(5): 1069-1081, 2018 09 28.
Article in English | MEDLINE | ID: mdl-28334268

ABSTRACT

Transcription factors are proteins that bind to specific DNA sequences and play important roles in controlling the expression levels of their target genes. Hence, prediction of transcription factor binding sites (TFBSs) provides a solid foundation for inferring gene regulatory mechanisms and building regulatory networks for a genome. Chromatin immunoprecipitation sequencing (ChIP-seq) technology can generate large-scale experimental data for such protein-DNA interactions, providing an unprecedented opportunity to identify TFBSs (a.k.a. cis-regulatory motifs). The bottleneck, however, is the lack of robust mathematical models, as well as efficient computational methods for TFBS prediction to make effective use of massive ChIP-seq data sets in the public domain. The purpose of this study is to review existing motif-finding methods for ChIP-seq data from an algorithmic perspective and provide new computational insight into this field. The state-of-the-art methods were shown through summarizing eight representative motif-finding algorithms along with corresponding challenges, and introducing some important relative functions according to specific biological demands, including discriminative motif finding and cofactor motifs analysis. Finally, potential directions and plans for ChIP-seq-based motif-finding tools were showcased in support of future algorithm development.


Subject(s)
Algorithms , Gene Regulatory Networks , Software , Base Sequence , Binding Sites/genetics , Chromatin Immunoprecipitation/statistics & numerical data , Computational Biology/methods , DNA/genetics , DNA/metabolism , Humans , Sequence Analysis, DNA/statistics & numerical data , Transcription Factors/metabolism
6.
Nucleic Acids Res ; 43(6): e40, 2015 Mar 31.
Article in English | MEDLINE | ID: mdl-25564527

ABSTRACT

RNA-seq is a sensitive and accurate technique to compare steady-state levels of RNA between different cellular states. However, as it does not provide an account of transcriptional activity per se, other technologies are needed to more precisely determine acute transcriptional responses. Here, we have developed an easy, sensitive and accurate novel computational method, IRNA-SEQ: , for genome-wide assessment of transcriptional activity based on analysis of intron coverage from total RNA-seq data. Comparison of the results derived from iRNA-seq analyses with parallel results derived using current methods for genome-wide determination of transcriptional activity, i.e. global run-on (GRO)-seq and RNA polymerase II (RNAPII) ChIP-seq, demonstrate that iRNA-seq provides similar results in terms of number of regulated genes and their fold change. However, unlike the current methods that are all very labor-intensive and demanding in terms of sample material and technologies, iRNA-seq is cheap and easy and requires very little sample material. In conclusion, iRNA-seq offers an attractive novel alternative to current methods for determination of changes in transcriptional activity at a genome-wide level.


Subject(s)
Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Cell Line , Chromatin Immunoprecipitation/methods , Chromatin Immunoprecipitation/statistics & numerical data , Gene Expression Profiling/statistics & numerical data , Gene Expression Regulation , Genome, Human , Humans , Introns , Sequence Analysis, RNA/statistics & numerical data
7.
Nucleic Acids Res ; 43(6): e38, 2015 Mar 31.
Article in English | MEDLINE | ID: mdl-25539918

ABSTRACT

Genome-wide chromatin immunoprecipitation (ChIP) studies have brought significant insight into the genomic localization of chromatin-associated proteins and histone modifications. The large amount of data generated by these analyses, however, require approaches that enable rapid validation and analysis of biological relevance. Furthermore, there are still protein and modification targets that are difficult to detect using standard ChIP methods. To address these issues, we developed an immediate chromatin immunoprecipitation procedure which we call ZipChip. ZipChip significantly reduces the time and increases sensitivity allowing for rapid screening of multiple loci. Here we describe how ZipChIP enables detection of histone modifications (H3K4 mono- and trimethylation) and two yeast histone demethylases, Jhd2 and Rph1, which were previously difficult to detect using standard methods. Furthermore, we demonstrate the versatility of ZipChIP by analyzing the enrichment of the histone deacetylase Sir2 at heterochromatin in yeast and enrichment of the chromatin remodeler, PICKLE, at euchromatin in Arabidopsis thaliana.


Subject(s)
Chromatin Immunoprecipitation/methods , Real-Time Polymerase Chain Reaction/methods , Actins/genetics , Actins/metabolism , Arabidopsis/genetics , Arabidopsis/metabolism , Arabidopsis Proteins/genetics , Arabidopsis Proteins/metabolism , Chromatin/genetics , Chromatin/metabolism , Chromatin Immunoprecipitation/statistics & numerical data , DNA Helicases/genetics , DNA Helicases/metabolism , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Genes, Fungal , Genes, Plant , Histone Demethylases/genetics , Histone Demethylases/metabolism , Histones/genetics , Histones/metabolism , Jumonji Domain-Containing Histone Demethylases/genetics , Jumonji Domain-Containing Histone Demethylases/metabolism , Open Reading Frames , Promoter Regions, Genetic , Repressor Proteins/genetics , Repressor Proteins/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism , Silent Information Regulator Proteins, Saccharomyces cerevisiae/genetics , Silent Information Regulator Proteins, Saccharomyces cerevisiae/metabolism , Sirtuin 2/genetics , Sirtuin 2/metabolism
8.
Pac Symp Biocomput ; : 320-31, 2013.
Article in English | MEDLINE | ID: mdl-23424137

ABSTRACT

We have developed a novel approach called ChIPModule to systematically discover transcription factors and their cofactors from ChIP-seq data. Given a ChIP-seq dataset and the binding patterns of a large number of transcription factors, ChIPModule can efficiently identify groups of transcription factors, whose binding sites significantly co-occur in the ChIP-seq peak regions. By testing ChIPModule on simulated data and experimental data, we have shown that ChIPModule identifies known cofactors of transcription factors, and predicts new cofactors that are supported by literature. ChIPModule provides a useful tool for studying gene transcriptional regulation.


Subject(s)
Chromatin Immunoprecipitation/statistics & numerical data , Sequence Analysis/statistics & numerical data , Transcription Factors/genetics , Transcription Factors/metabolism , Binding Sites/genetics , Computational Biology , Databases, Genetic/statistics & numerical data , Humans
9.
Brief Bioinform ; 14(2): 225-37, 2013 Mar.
Article in English | MEDLINE | ID: mdl-22517426

ABSTRACT

Motif discovery has been one of the most widely studied problems in bioinformatics ever since genomic and protein sequences have been available. In particular, its application to the de novo prediction of putative over-represented transcription factor binding sites in nucleotide sequences has been, and still is, one of the most challenging flavors of the problem. Recently, novel experimental techniques like chromatin immunoprecipitation (ChIP) have been introduced, permitting the genome-wide identification of protein-DNA interactions. ChIP, applied to transcription factors and coupled with genome tiling arrays (ChIP on Chip) or next-generation sequencing technologies (ChIP-Seq) has opened new avenues in research, as well as posed new challenges to bioinformaticians developing algorithms and methods for motif discovery.


Subject(s)
High-Throughput Nucleotide Sequencing/statistics & numerical data , Regulatory Elements, Transcriptional , Transcription Factors/metabolism , Algorithms , Animals , Binding Sites/genetics , Chromatin Immunoprecipitation/statistics & numerical data , Computational Biology , Consensus Sequence , DNA/genetics , DNA/metabolism , Gene Expression Profiling/statistics & numerical data , Humans
10.
PLoS One ; 7(1): e28272, 2012.
Article in English | MEDLINE | ID: mdl-22238575

ABSTRACT

Chromatin Immuno Precipitation (ChIP) profiling detects in vivo protein-DNA binding, and has revealed a large combinatorial complexity in the binding of chromatin associated proteins and their post-translational modifications. To fully explore the spatial and combinatorial patterns in ChIP-profiling data and detect potentially meaningful patterns, the areas of enrichment must be aligned and clustered, which is an algorithmically and computationally challenging task. We have developed CATCHprofiles, a novel tool for exhaustive pattern detection in ChIP profiling data. CATCHprofiles is built upon a computationally efficient implementation for the exhaustive alignment and hierarchical clustering of ChIP profiling data. The tool features a graphical interface for examination and browsing of the clustering results. CATCHprofiles requires no prior knowledge about functional sites, detects known binding patterns "ab initio", and enables the detection of new patterns from ChIP data at a high resolution, exemplified by the detection of asymmetric histone and histone modification patterns around H2A.Z-enriched sites. CATCHprofiles' capability for exhaustive analysis combined with its ease-of-use makes it an invaluable tool for explorative research based on ChIP profiling data. CATCHprofiles and the CATCH algorithm run on all platforms and is available for free through the CATCH website: http://catch.cmbi.ru.nl/. User support is available by subscribing to the mailing list catch-users@bioinformatics.org.


Subject(s)
Chromatin Immunoprecipitation/statistics & numerical data , Data Interpretation, Statistical , Microarray Analysis/statistics & numerical data , Sequence Alignment , Software , Algorithms , Base Sequence , Cells, Cultured , Chromatin Immunoprecipitation/methods , Cluster Analysis , Computational Biology/methods , Efficiency , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Humans , Models, Biological , Molecular Sequence Data , Promoter Regions, Genetic/genetics , Sequence Alignment/methods , Sequence Alignment/statistics & numerical data
11.
Biostatistics ; 13(1): 113-28, 2012 Jan.
Article in English | MEDLINE | ID: mdl-21914728

ABSTRACT

Chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) is a powerful technique that is being used in a wide range of biological studies including genome-wide measurements of protein-DNA interactions, DNA methylation, and histone modifications. The vast amount of data and biases introduced by sequencing and/or genome mapping pose new challenges and call for effective methods and fast computer programs for statistical analysis. To systematically model ChIP-seq data, we build a dynamic signal profile for each chromosome and then model the profile using a fully Bayesian hidden Ising model. The proposed model naturally takes into account spatial dependency and global and local distributions of sequence tags. It can be used for one-sample and two-sample analyses. Through model diagnosis, the proposed method can detect falsely enriched regions caused by sequencing and/or mapping errors, which is usually not offered by the existing hypothesis-testing-based methods. The proposed method is illustrated using 3 transcription factor (TF) ChIP-seq data sets and 2 mixed ChIP-seq data sets and compared with 4 popular and/or well-documented methods: MACS, CisGenome, BayesPeak, and SISSRs. The results indicate that the proposed method achieves equivalent or higher sensitivity and spatial resolution in detecting TF binding sites with false discovery rate at a much lower level.


Subject(s)
Chromatin Immunoprecipitation/statistics & numerical data , Models, Statistical , Sequence Analysis, DNA/statistics & numerical data , Algorithms , Bayes Theorem , Binding Sites/genetics , Biotechnology , DNA/genetics , DNA/metabolism , Data Interpretation, Statistical , Databases, Nucleic Acid , Humans , Markov Chains , Transcription Factors/metabolism
12.
J Bioinform Comput Biol ; 9(2): 269-82, 2011 Apr.
Article in English | MEDLINE | ID: mdl-21523932

ABSTRACT

New high-throughput sequencing technologies can generate millions of short sequences in a single experiment. As the size of the data increases, comparison of multiple experiments on different cell lines under different experimental conditions becomes a big challenge. In this paper, we investigate ways to compare multiple ChIP-sequencing experiments. We specifically studied epigenetic regulation of breast cancer and the effect of estrogen using 50 ChIP-sequencing data from Illumina Genome Analyzer II. First, we evaluate the correlation among different experiments focusing on the total number of reads in transcribed and promoter regions of the genome. Then, we adopt the method that is used to identify the most stable genes in RT-PCR experiments to understand background signal across all of the experiments and to identify the most variable transcribed and promoter regions of the genome. We observed that the most variable genes for transcribed regions and promoter regions are very distinct. Gene ontology and function enrichment analysis on these most variable genes demonstrate the biological relevance of the results. In this study, we present a method that can effectively select differential regions of the genome based on protein-binding profiles over multiple experiments using real data points without any normalization among the samples.


Subject(s)
Chromatin Immunoprecipitation/statistics & numerical data , Algorithms , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Cell Line , Cell Line, Tumor , Computational Biology , Epigenesis, Genetic , Female , Genome, Human , Humans , Protein Binding
13.
Hum Genomics ; 5(2): 117-23, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21296745

ABSTRACT

Chromatin immunoprecipitation followed by massively parallel next-generation sequencing (ChIP-seq) is a valuable experimental strategy for assaying protein-DNA interaction over the whole genome. Many computational tools have been designed to find the peaks of the signals corresponding to protein binding sites. In this paper, three computational methods, ChIP-seq processing pipeline (spp), PeakSeq and CisGenome, used in ChIP-seq data analysis are reviewed. There is also a comparison of how they agree and disagree on finding peaks using the publically available Signal Transducers and Activators of Transcription protein 1 (STAT1) and RNA polymerase II (PolII) datasets with corresponding negative controls.


Subject(s)
Chromatin Immunoprecipitation/methods , Sequence Analysis, DNA , Software , Chromatin Immunoprecipitation/statistics & numerical data , Humans , Protein Binding , RNA Polymerase II/genetics , Research Design , STAT1 Transcription Factor/genetics
14.
Biometrics ; 66(4): 1284-94, 2010 Dec.
Article in English | MEDLINE | ID: mdl-20128774

ABSTRACT

ChIP-chip experiments are procedures that combine chromatin immunoprecipitation (ChIP) and DNA microarray (chip) technology to study a variety of biological problems, including protein-DNA interaction, histone modification, and DNA methylation. The most important feature of ChIP-chip data is that the intensity measurements of probes are spatially correlated because the DNA fragments are hybridized to neighboring probes in the experiments. We propose a simple, but powerful Bayesian hierarchical approach to ChIP-chip data through an Ising model with high-order interactions. The proposed method naturally takes into account the intrinsic spatial structure of the data and can be used to analyze data from multiple platforms with different genomic resolutions. The model parameters are estimated using the Gibbs sampler. The proposed method is illustrated using two publicly available data sets from Affymetrix and Agilent platforms, and compared with three alternative Bayesian methods, namely, Bayesian hierarchical model, hierarchical gamma mixture model, and Tilemap hidden Markov model. The numerical results indicate that the proposed method performs as well as the other three methods for the data from Affymetrix tiling arrays, but significantly outperforms the other three methods for the data from Agilent promoter arrays. In addition, we find that the proposed method has better operating characteristics in terms of sensitivities and false discovery rates under various scenarios.


Subject(s)
Bayes Theorem , Chromatin Immunoprecipitation/statistics & numerical data , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Humans , Methods , Sensitivity and Specificity
15.
Genome Biol ; 10(12): R142, 2009.
Article in English | MEDLINE | ID: mdl-20028542

ABSTRACT

We present CSDeconv, a computational method that determines locations of transcription factor binding from ChIP-seq data. CSDeconv differs from prior methods in that it uses a blind deconvolution approach that allows closely-spaced binding sites to be called accurately. We apply CSDeconv to novel ChIP-seq data for DosR binding in Mycobacterium tuberculosis and to existing data for GABP in humans and show that it can discriminate binding sites separated by as few as 40 bp.


Subject(s)
Chromatin Immunoprecipitation/statistics & numerical data , Computational Biology/methods , Software , Transcription Factors/metabolism , Binding Sites/genetics , Humans , Mycobacterium tuberculosis/genetics , Transcription Factors/genetics
16.
Methods Mol Biol ; 521: 255-78, 2009.
Article in English | MEDLINE | ID: mdl-19563111

ABSTRACT

Chromatin immunoprecipitation (ChIP) is a widely used method to study the interactions between proteins and discrete chromosomal loci in vivo. Originally, ChIP was developed for analysis of protein associations with DNA sequences known or suspected to bind the protein of interest. The advent of DNA microarrays has enabled the identification of all DNA sequences enriched by ChIP, providing a genomic view of protein binding. This powerful approach, termed ChIP-chip, is broadly applicable and has been particularly valuable in DNA replication studies to map replication origins in Saccharomyces cerevisiae based on the association of replication proteins with these chromosomal elements. We present a detailed ChIP-chip protocol for S. cerevisiae that uses oligonucleotide DNA microarrays printed on polylysine-coated glass slides and can also be easily adapted for commercially available high-density tiling microarrays from NimbleGen. We also outline general protocols for data analysis; however, microarray data analyses usually must be tailored specifically for individual studies, depending on experimental design, microarray format, and data quality.


Subject(s)
Chromatin Immunoprecipitation/methods , Chromatin/metabolism , DNA Replication , DNA-Binding Proteins/metabolism , Oligonucleotide Array Sequence Analysis/methods , Chromatin Immunoprecipitation/statistics & numerical data , Cross-Linking Reagents , DNA, Fungal/biosynthesis , DNA, Fungal/isolation & purification , Data Interpretation, Statistical , Fluorescent Dyes , Nucleic Acid Hybridization , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Replication Origin , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/metabolism
17.
Biometrics ; 65(4): 1087-95, 2009 Dec.
Article in English | MEDLINE | ID: mdl-19210737

ABSTRACT

We propose a unified framework for the analysis of chromatin (Ch) immunoprecipitation (IP) microarray (ChIP-chip) data for detecting transcription factor binding sites (TFBSs) or motifs. ChIP-chip assays are used to focus the genome-wide search for TFBSs by isolating a sample of DNA fragments with TFBSs and applying this sample to a microarray with probes corresponding to tiled segments across the genome. Present analytical methods use a two-step approach: (i) analyze array data to estimate IP-enrichment peaks then (ii) analyze the corresponding sequences independently of intensity information. The proposed model integrates peak finding and motif discovery through a unified Bayesian hidden Markov model (HMM) framework that accommodates the inherent uncertainty in both measurements. A Markov chain Monte Carlo algorithm is formulated for parameter estimation, adapting recursive techniques used for HMMs. In simulations and applications to a yeast RAP1 dataset, the proposed method has favorable TFBS discovery performance compared to currently available two-stage procedures in terms of both sensitivity and specificity.


Subject(s)
Biometry/methods , Chromatin Immunoprecipitation/statistics & numerical data , Genomics/statistics & numerical data , Models, Statistical , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Algorithms , Base Sequence , Bayes Theorem , Binding Sites/genetics , DNA, Fungal/genetics , DNA, Fungal/metabolism , Markov Chains , Monte Carlo Method , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Shelterin Complex , Telomere-Binding Proteins/metabolism , Transcription Factors/metabolism
18.
PLoS Comput Biol ; 4(10): e1000201, 2008 Oct.
Article in English | MEDLINE | ID: mdl-18927605

ABSTRACT

Computational methods to identify functional genomic elements using genetic information have been very successful in determining gene structure and in identifying a handful of cis-regulatory elements. But the vast majority of regulatory elements have yet to be discovered, and it has become increasingly apparent that their discovery will not come from using genetic information alone. Recently, high-throughput technologies have enabled the creation of information-rich epigenetic maps, most notably for histone modifications. However, tools that search for functional elements using this epigenetic information have been lacking. Here, we describe an unsupervised learning method called ChromaSig to find, in an unbiased fashion, commonly occurring chromatin signatures in both tiling microarray and sequencing data. Applying this algorithm to nine chromatin marks across a 1% sampling of the human genome in HeLa cells, we recover eight clusters of distinct chromatin signatures, five of which correspond to known patterns associated with transcriptional promoters and enhancers. Interestingly, we observe that the distinct chromatin signatures found at enhancers mark distinct functional classes of enhancers in terms of transcription factor and coactivator binding. In addition, we identify three clusters of novel chromatin signatures that contain evolutionarily conserved sequences and potential cis-regulatory elements. Applying ChromaSig to a panel of 21 chromatin marks mapped genomewide by ChIP-Seq reveals 16 classes of genomic elements marked by distinct chromatin signatures. Interestingly, four classes containing enrichment for repressive histone modifications appear to be locally heterochromatic sites and are enriched in quickly evolving regions of the genome. The utility of this approach in uncovering novel, functionally significant genomic elements will aid future efforts of genome annotation via chromatin modifications.


Subject(s)
Chromatin/genetics , Genome, Human , Models, Genetic , Models, Statistical , Artificial Intelligence , Chromatin/metabolism , Chromatin Immunoprecipitation/statistics & numerical data , Computational Biology , Enhancer Elements, Genetic , HeLa Cells , Histones/chemistry , Histones/genetics , Histones/metabolism , Humans , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Promoter Regions, Genetic , Protein Processing, Post-Translational , Transcription Initiation Site
19.
Curr Opin Biotechnol ; 19(1): 50-4, 2008 Feb.
Article in English | MEDLINE | ID: mdl-18207385

ABSTRACT

Changes in transcript levels are assessed by microarray analysis on an individual basis, essentially resulting in long lists of genes that were found to have significantly changed transcript levels. However, in biology these changes do not occur as independent events as such lists suggest, but in a highly coordinated and interdependent manner. Understanding the biological meaning of the observed changes requires elucidating such biological interdependencies. The most common way to achieve this is to project the gene lists onto distinct biological processes often represented in the form of gene-ontology (GO) categories or metabolic and regulatory pathways as derived from literature analysis. This review focuses on different approaches and tools employed for this task, starting form GO-ranking methods, covering pathway mappings, finally converging on biological network analysis. A brief outlook of the application of such approaches to the newest microarray-based technologies (Chromatin-ImmunoPrecipitation, ChIP-on-chip) concludes the review.


Subject(s)
Oligonucleotide Array Sequence Analysis/statistics & numerical data , Biotechnology , Chromatin Immunoprecipitation/statistics & numerical data , Computational Biology , Data Interpretation, Statistical , Databases, Genetic
20.
Pac Symp Biocomput ; : 515-26, 2008.
Article in English | MEDLINE | ID: mdl-18229712

ABSTRACT

Whole genome tiling arrays at a user specified resolution are becoming a versatile tool in genomics. Chromatin immunoprecipitation on microarrays (ChIP-chip) is a powerful application of these arrays. Although there is an increasing number of methods for analyzing ChIP-chip data, perhaps the most simple and commonly used one, due to its computational efficiency, is testing with a moving average statistic. Current moving average methods assume exchangeability of the measurements within an array. They are not tailored to deal with the issues due to array designs such as overlapping probes that result in correlated measurements. We investigate the correlation structure of data from such arrays and propose an extension of the moving average testing via a robust and rapid method called CMARRT. We illustrate the pitfalls of ignoring the correlation structure in simulations and a case study. Our approach is implemented as an R package called CMARRT and can be used with any tiling array platform.


Subject(s)
Algorithms , Chromatin Immunoprecipitation/statistics & numerical data , Microarray Analysis/statistics & numerical data , Computational Biology , Data Interpretation, Statistical , Markov Chains , Models, Statistical , Regression Analysis , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...