Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
1.
J Proteome Res ; 17(10): 3431-3444, 2018 10 05.
Article in English | MEDLINE | ID: mdl-30125121

ABSTRACT

Cellular control of gene expression is a complex process that is subject to multiple levels of regulation, but ultimately it is the protein produced that determines the biosynthetic state of the cell. One way that a cell can regulate the protein output from each gene is by expressing alternate isoforms with distinct amino acid sequences. These isoforms may exhibit differences in localization and binding interactions that can have profound functional implications. High-throughput liquid chromatography tandem mass spectrometry proteomics (LC-MS/MS) relies on enzymatic digestion and has lower coverage and sensitivity than transcriptomic profiling methods such as RNA-seq. Digestion results in predictable fragmentation of a protein, which can limit the generation of peptides capable of distinguishing between isoforms. Here we exploit transcript-level expression from RNA-seq to set prior likelihoods and enable protein isoform abundances to be directly estimated from LC-MS/MS, an approach derived from the principle that most genes appear to be expressed as a single dominant isoform in a given cell type or tissue. Through this deep integration of RNA-seq and LC-MS/MS data from the same sample, we show that a principal isoform can be identified in >80% of gene products in homogeneous HEK293 cell culture and >70% of proteins detected in complex human brain tissue. We demonstrate that the incorporation of translatome data from ribosome profiling further refines this process. Defining isoforms in experiments with matched RNA-seq/translatome and proteomic data increases the functional relevance of such data sets and will further broaden our understanding of multilevel control of gene expression.


Subject(s)
Gene Expression Profiling/methods , High-Throughput Nucleotide Sequencing/methods , Proteome/metabolism , Proteomics/methods , Algorithms , Alternative Splicing , Chromatography, Liquid/methods , HEK293 Cells , Humans , Protein Biosynthesis/genetics , Protein Isoforms/genetics , Protein Isoforms/metabolism , Proteome/genetics , Reproducibility of Results , Ribosomes/genetics , Ribosomes/metabolism , Tandem Mass Spectrometry/methods
2.
J Extracell Vesicles ; 4: 27497, 2015.
Article in English | MEDLINE | ID: mdl-26320941

ABSTRACT

The large diversity and volume of extracellular RNA (exRNA) data that will form the basis of the exRNA Atlas generated by the Extracellular RNA Communication Consortium pose a substantial data integration challenge. We here present the strategy that is being implemented by the exRNA Data Management and Resource Repository, which employs metadata, biomedical ontologies and Linked Data technologies, such as Resource Description Framework to integrate a diverse set of exRNA profiles into an exRNA Atlas and enable integrative exRNA analysis. We focus on the following three specific data integration tasks: (a) selection of samples from a virtual biorepository for exRNA profiling and for inclusion in the exRNA Atlas; (b) retrieval of a data slice from the exRNA Atlas for integrative analysis and (c) interpretation of exRNA analysis results in the context of pathways and networks. As exRNA profiling gains wide adoption in the research community, we anticipate that the strategies discussed here will increasingly be required to enable data reuse and to facilitate integrative analysis of exRNA data.

3.
Nat Neurosci ; 17(11): 1491-9, 2014 Nov.
Article in English | MEDLINE | ID: mdl-25349915

ABSTRACT

The immense intercellular and intracellular heterogeneity of the CNS presents major challenges for high-throughput omic analyses. Transcriptional, translational and post-translational regulatory events are localized to specific neuronal cell types or subcellular compartments, resulting in discrete patterns of protein expression and activity. A spatial and quantitative knowledge of the neuroproteome is therefore critical to understanding both normal and pathological aspects of the functional genomics and anatomy of the CNS. Improvements in mass spectrometry allow the profiling of proteins at a sufficient depth to complement results from high-throughput genomic and transcriptomic assays. However, there are challenges in integrating proteomic data with other data modalities and even greater challenges in obtaining comprehensive neuroproteomic data with cell-type specificity. Here we discuss how proteomics should be exploited to enhance high-throughput functional genomic analysis by tighter integration of data analyses. We also discuss experimental strategies to achieve finer cellular and subcellular resolution in transcriptomic and proteomic studies of neural tissues.


Subject(s)
Central Nervous System/metabolism , Proteins/metabolism , Proteomics , Animals , Central Nervous System/anatomy & histology , Gene Expression Profiling , Genomics/methods , Humans , Mass Spectrometry , Proteins/genetics , Proteomics/methods
5.
Genome Biol ; 11(10): R104, 2010.
Article in English | MEDLINE | ID: mdl-20964841

ABSTRACT

We have developed FusionSeq to identify fusion transcripts from paired-end RNA-sequencing. FusionSeq includes filters to remove spurious candidate fusions with artifacts, such as misalignment or random pairing of transcript fragments, and it ranks candidates according to several statistics. It also has a module to identify exact sequences at breakpoint junctions. FusionSeq detected known and novel fusions in a specially sequenced calibration data set, including eight cancers with and without known rearrangements.


Subject(s)
Computational Biology/methods , Gene Fusion , Neoplasms/genetics , Sequence Analysis, RNA/methods , Base Sequence , Cell Line, Tumor , Expressed Sequence Tags , Gene Expression Profiling , Gene Rearrangement , Humans , Male , Molecular Sequence Data , Prostatic Neoplasms/genetics , RNA, Neoplasm/genetics , Reverse Transcriptase Polymerase Chain Reaction
6.
Nucleic Acids Res ; 35(15): e99, 2007.
Article in English | MEDLINE | ID: mdl-17686789

ABSTRACT

A generic DNA microarray design applicable to any species would greatly benefit comparative genomics. We have addressed the feasibility of such a design by leveraging the great feature densities and relatively unbiased nature of genomic tiling microarrays. Specifically, we first divided each Homo sapiens Refseq-derived gene's spliced nucleotide sequence into all of its possible contiguous 25 nt subsequences. For each of these 25 nt subsequences, we searched a recent human transcript mapping experiment's probe design for the 25 nt probe sequence having the fewest mismatches with the subsequence, but that did not match the subsequence exactly. Signal intensities measured with each gene's nearest-neighbor features were subsequently averaged to predict their gene expression levels in each of the experiment's thirty-three hybridizations. We examined the fidelity of this approach in terms of both sensitivity and specificity for detecting actively transcribed genes, for transcriptional consistency between exons of the same gene, and for reproducibility between tiling array designs. Taken together, our results provide proof-of-principle for probing nucleic acid targets with off-target, nearest-neighbor features.


Subject(s)
Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Oligonucleotide Probes/chemistry , Genome, Human , Humans , Sequence Analysis, DNA , Transcription, Genetic
7.
Genome Res ; 17(6): 669-81, 2007 Jun.
Article in English | MEDLINE | ID: mdl-17567988

ABSTRACT

While sequencing of the human genome surprised us with how many protein-coding genes there are, it did not fundamentally change our perspective on what a gene is. In contrast, the complex patterns of dispersed regulation and pervasive transcription uncovered by the ENCODE project, together with non-genic conservation and the abundance of noncoding RNA genes, have challenged the notion of the gene. To illustrate this, we review the evolution of operational definitions of a gene over the past century--from the abstract elements of heredity of Mendel and Morgan to the present-day ORFs enumerated in the sequence databanks. We then summarize the current ENCODE findings and provide a computational metaphor for the complexity. Finally, we propose a tentative update to the definition of a gene: A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products. Our definition side-steps the complexities of regulation and transcription by removing the former altogether from the definition and arguing that final, functional gene products (rather than intermediate transcripts) should be used to group together entities associated with a single gene. It also manifests how integral the concept of biological function is in defining genes.


Subject(s)
Chromosome Mapping , Genome, Human , Human Genome Project , Open Reading Frames/genetics , RNA, Untranslated/genetics , Chromosome Mapping/history , History, 19th Century , History, 20th Century , History, 21st Century , Human Genome Project/history , Human Genome Project/organization & administration , Humans
8.
Genome Res ; 17(6): 732-45, 2007 Jun.
Article in English | MEDLINE | ID: mdl-17567993

ABSTRACT

For the approximately 1% of the human genome in the ENCODE regions, only about half of the transcriptionally active regions (TARs) identified with tiling microarrays correspond to annotated exons. Here we categorize this large amount of "unannotated transcription." We use a number of disparate features to classify the 6988 novel TARs-array expression profiles across cell lines and conditions, sequence composition, phylogenetic profiles (presence/absence of syntenic conservation across 17 species), and locations relative to genes. In the classification, we first filter out TARs with unusual sequence composition and those likely resulting from cross-hybridization. We then associate some of those remaining with proximal exons having correlated expression profiles. Finally, we cluster unclassified TARs into putative novel loci, based on similar expression and phylogenetic profiles. To encapsulate our classification, we construct a Database of Active Regions and Tools (DART.gersteinlab.org). DART has special facilities for rapidly handling and comparing many sets of TARs and their heterogeneous features, synchronizing across builds, and interfacing with other resources. Overall, we find that approximately 14% of the novel TARs can be associated with known genes, while approximately 21% can be clustered into approximately 200 novel loci. We observe that TARs associated with genes are enriched in the potential to form structural RNAs and many novel TAR clusters are associated with nearby promoters. To benchmark our classification, we design a set of experiments for testing the connectivity of novel TARs. Overall, we find that 18 of the 46 connections tested validate by RT-PCR and four of five sequenced PCR products confirm connectivity unambiguously.


Subject(s)
Chromosome Mapping , Gene Expression Profiling , Gene Expression Regulation/physiology , Genome, Human/physiology , Quantitative Trait Loci/genetics , Transcription, Genetic/physiology , Base Sequence , Humans , Molecular Sequence Data
9.
Genome Res ; 17(6): 898-909, 2007 Jun.
Article in English | MEDLINE | ID: mdl-17568005

ABSTRACT

Recent progress in mapping transcription factor (TF) binding regions can largely be credited to chromatin immunoprecipitation (ChIP) technologies. We compared strategies for mapping TF binding regions in mammalian cells using two different ChIP schemes: ChIP with DNA microarray analysis (ChIP-chip) and ChIP with DNA sequencing (ChIP-PET). We first investigated parameters central to obtaining robust ChIP-chip data sets by analyzing STAT1 targets in the ENCODE regions of the human genome, and then compared ChIP-chip to ChIP-PET. We devised methods for scoring and comparing results among various tiling arrays and examined parameters such as DNA microarray format, oligonucleotide length, hybridization conditions, and the use of competitor Cot-1 DNA. The best performance was achieved with high-density oligonucleotide arrays, oligonucleotides >/=50 bases (b), the presence of competitor Cot-1 DNA and hybridizations conducted in microfluidics stations. When target identification was evaluated as a function of array number, 80%-86% of targets were identified with three or more arrays. Comparison of ChIP-chip with ChIP-PET revealed strong agreement for the highest ranked targets with less overlap for the low ranked targets. With advantages and disadvantages unique to each approach, we found that ChIP-chip and ChIP-PET are frequently complementary in their relative abilities to detect STAT1 targets for the lower ranked targets; each method detected validated targets that were missed by the other method. The most comprehensive list of STAT1 binding regions is obtained by merging results from ChIP-chip and ChIP-sequencing. Overall, this study provides information for robust identification, scoring, and validation of TF targets using ChIP-based technologies.


Subject(s)
Chromatin Immunoprecipitation , Genome, Human , Microfluidic Analytical Techniques , Oligonucleotide Array Sequence Analysis , Sequence Analysis, DNA , Animals , Binding Sites/genetics , HeLa Cells , Humans , STAT1 Transcription Factor/genetics
10.
Bioinformatics ; 23(8): 988-97, 2007 Apr 15.
Article in English | MEDLINE | ID: mdl-17387113

ABSTRACT

MOTIVATION: Increases in microarray feature density allow the construction of so-called tiling microarrays. These arrays, or sets of arrays, contain probes targeting regions of sequenced genomes at regular genomic intervals. The unbiased nature of this approach allows for the identification of novel transcribed sequences, the localization of transcription factor binding sites (ChIP-chip), and high resolution comparative genomic hybridization, among other uses. These applications are quickly growing in popularity as tiling microarrays become more affordable. To reach maximum utility, the tiling microarray platform needs be developed to the point that 1 nt resolutions are achieved and that we have confidence in individual measurements taken at this fine of resolution. Any biases in tiling array signals must be systematically removed to achieve this goal. RESULTS: Towards this end, we investigated the importance of probe sequence composition on the efficacy of tiling microarrays for identifying novel transcription and transcription factor binding sites. We found that intensities are highly sequence dependent and can greatly influence results. We developed three metrics for assessing this sequence dependence and use them in evaluating existing sequence-based normalizations from the tiling microarray literature. In addition, we applied three new techniques for addressing this problem; one method, adapted from similar work on GeneChip brand microarrays, is based on modeling array signal as a linear function of probe sequence, the second method extends this approach by iterative weighting and re-fitting of the model, and the third technique extrapolates the popular quantile normalization algorithm for between-array normalization to probe sequence space. These three methods perform favorably to existing strategies, based on the metrics defined here. AVAILABILITY: http://tiling.gersteinlab.org/sequence_effects/


Subject(s)
Algorithms , DNA Probes/genetics , Models, Genetic , Oligonucleotide Array Sequence Analysis/methods , Sequence Analysis, DNA/methods , Base Sequence , Computer Simulation , Molecular Sequence Data , Oligonucleotide Array Sequence Analysis/standards , Reproducibility of Results , Sensitivity and Specificity
11.
Genome Res ; 17(6): 886-97, 2007 Jun.
Article in English | MEDLINE | ID: mdl-17119069

ABSTRACT

Genomic tiling microarrays have become a popular tool for interrogating the transcriptional activity of large regions of the genome in an unbiased fashion. There are several key parameters associated with each tiling experiment (e.g., experimental protocols and genomic tiling density). Here, we assess the role of these parameters as they are manifest in different tiling-array platforms used for transcription mapping. First, we analyze how a number of published tiling-array experiments agree with established gene annotation on human chromosome 22. We observe that the transcription detected from high-density arrays correlates substantially better with annotation than that from other array types. Next, we analyze the transcription-mapping performance of the two main high-density oligonucleotide array platforms in the ENCODE regions of the human genome. We hybridize identical biological samples and develop several ways of scoring the arrays and segmenting the genome into transcribed and nontranscribed regions, with the aim of making the platforms most comparable to each other. Finally, we develop a platform comparison approach based on agreement with known annotation. Overall, we find that the performance improves with more data points per locus, coupled with statistical scoring approaches that properly take advantage of this, where this larger number of data points arises from higher genomic tiling density and the use of replicate arrays and mismatches. While we do find significant differences in the performance of the two high-density platforms, we also find that they complement each other to some extent. Finally, our experiments reveal a significant amount of novel transcription outside of known genes, and an appreciable sample of this was validated by independent experiments.


Subject(s)
Chromosome Mapping , Chromosomes, Human, Pair 22/genetics , Genome, Human/physiology , Oligonucleotide Array Sequence Analysis , Quantitative Trait Loci/physiology , Transcription, Genetic/physiology , Cell Line , Gene Expression Profiling , Humans
12.
Bioinformatics ; 22(24): 3016-24, 2006 Dec 15.
Article in English | MEDLINE | ID: mdl-17038339

ABSTRACT

MOTIVATION: Large-scale tiling array experiments are becoming increasingly common in genomics. In particular, the ENCODE project requires the consistent segmentation of many different tiling array datasets into 'active regions' (e.g. finding transfrags from transcriptional data and putative binding sites from ChIP-chip experiments). Previously, such segmentation was done in an unsupervised fashion mainly based on characteristics of the signal distribution in the tiling array data itself. Here we propose a supervised framework for doing this. It has the advantage of explicitly incorporating validated biological knowledge into the model and allowing for formal training and testing. METHODOLOGY: In particular, we use a hidden Markov model (HMM) framework, which is capable of explicitly modeling the dependency between neighboring probes and whose extended version (the generalized HMM) also allows explicit description of state duration density. We introduce a formal definition of the tiling-array analysis problem, and explain how we can use this to describe sampling small genomic regions for experimental validation to build up a gold-standard set for training and testing. We then describe various ideal and practical sampling strategies (e.g. maximizing signal entropy within a selected region versus using gene annotation or known promoters as positives for transcription or ChIP-chip data, respectively). RESULTS: For the practical sampling and training strategies, we show how the size and noise in the validated training data affects the performance of an HMM applied to the ENCODE transcriptional and ChIP-chip experiments. In particular, we show that the HMM framework is able to efficiently process tiling array data as well as or better than previous approaches. For the idealized sampling strategies, we show how we can assess their performance in a simulation framework and how a maximum entropy approach, which samples sub-regions with very different signal intensities, gives the maximally performing gold-standard. This latter result has strong implications for the optimum way medium-scale validation experiments should be carried out to verify the results of the genome-scale tiling array experiments.


Subject(s)
Chromosome Mapping/methods , Databases, Genetic , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Sequence Analysis, DNA/methods , Transcription Factors/metabolism , Transcription, Genetic/physiology , Artificial Intelligence , Biology/methods , Information Storage and Retrieval , Markov Chains , Pattern Recognition, Automated/methods , Transcription Factors/genetics
13.
Methods Enzymol ; 411: 282-311, 2006.
Article in English | MEDLINE | ID: mdl-16939796

ABSTRACT

A credit to microarray technology is its broad application. Two experiments--the tiling microarray experiment and the protein microarray experiment--are exemplars of the versatility of the microarrays. With the technology's expanding list of uses, the corresponding bioinformatics must evolve in step. There currently exists a rich literature developing statistical techniques for analyzing traditional gene-centric DNA microarrays, so the first challenge in analyzing the advanced technologies is to identify which of the existing statistical protocols are relevant and where and when revised methods are needed. A second challenge is making these often very technical ideas accessible to the broader microarray community. The aim of this chapter is to present some of the most widely used statistical techniques for normalizing and scoring traditional microarray data and indicate their potential utility for analyzing the newer protein and tiling microarray experiments. In so doing, we will assume little or no prior training in statistics of the reader. Areas covered include background correction, intensity normalization, spatial normalization, and the testing of statistical significance.


Subject(s)
Oligonucleotide Array Sequence Analysis/methods , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Protein Array Analysis/methods , Protein Array Analysis/statistics & numerical data , Animals , Data Interpretation, Statistical , Humans
14.
Genome Res ; 16(2): 271-81, 2006 Feb.
Article in English | MEDLINE | ID: mdl-16365382

ABSTRACT

A recent development in microarray research entails the unbiased coverage, or tiling, of genomic DNA for the large-scale identification of transcribed sequences and regulatory elements. A central issue in designing tiling arrays is that of arriving at a single-copy tile path, as significant sequence cross-hybridization can result from the presence of non-unique probes on the array. Due to the fragmentation of genomic DNA caused by the widespread distribution of repetitive elements, the problem of obtaining adequate sequence coverage increases with the sizes of subsequence tiles that are to be included in the design. This becomes increasingly problematic when considering complex eukaryotic genomes that contain many thousands of interspersed repeats. The general problem of sequence tiling can be framed as finding an optimal partitioning of non-repetitive subsequences over a prescribed range of tile sizes, on a DNA sequence comprising repetitive and non-repetitive regions. Exact solutions to the tiling problem become computationally infeasible when applied to large genomes, but successive optimizations are developed that allow their practical implementation. These include an efficient method for determining the degree of similarity of many oligonucleotide sequences over large genomes, and two algorithms for finding an optimal tile path composed of longer sequence tiles. The first algorithm, a dynamic programming approach, finds an optimal tiling in linear time and space; the second applies a heuristic search to reduce the space complexity to a constant requirement. A Web resource has also been developed, accessible at http://tiling.gersteinlab.org, to generate optimal tile paths from user-provided DNA sequences.


Subject(s)
Algorithms , Gene Expression Profiling , Genome, Human , Interspersed Repetitive Sequences , Oligonucleotide Array Sequence Analysis , Animals , Gene Expression Profiling/methods , Gene Expression Profiling/standards , Genome, Human/genetics , Humans , Interspersed Repetitive Sequences/genetics , Oligonucleotide Array Sequence Analysis/methods , Oligonucleotide Array Sequence Analysis/standards , Reproducibility of Results , Sensitivity and Specificity
15.
Trends Genet ; 21(8): 466-75, 2005 Aug.
Article in English | MEDLINE | ID: mdl-15979196

ABSTRACT

Traditional microarrays use probes complementary to known genes to quantitate the differential gene expression between two or more conditions. Genomic tiling microarray experiments differ in that probes that span a genomic region at regular intervals are used to detect the presence or absence of transcription. This difference means the same sets of biases and the methods for addressing them are unlikely to be relevant to both types of experiment. We introduce the informatics challenges arising in the analysis of tiling microarray experiments as open problems to the scientific community and present initial approaches for the analysis of this nascent technology.


Subject(s)
Oligonucleotide Array Sequence Analysis/methods , Algorithms , Base Sequence , Computational Biology , DNA/genetics , Humans , Molecular Probes , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Transcription, Genetic
16.
Science ; 306(5705): 2242-6, 2004 Dec 24.
Article in English | MEDLINE | ID: mdl-15539566

ABSTRACT

Elucidating the transcribed regions of the genome constitutes a fundamental aspect of human biology, yet this remains an outstanding problem. To comprehensively identify coding sequences, we constructed a series of high-density oligonucleotide tiling arrays representing sense and antisense strands of the entire nonrepetitive sequence of the human genome. Transcribed sequences were located across the genome via hybridization to complementary DNA samples, reverse-transcribed from polyadenylated RNA obtained from human liver tissue. In addition to identifying many known and predicted genes, we found 10,595 transcribed sequences not detected by other methods. A large fraction of these are located in intergenic regions distal from previously annotated genes and exhibit significant homology to other mammalian proteins.


Subject(s)
Genome, Human , Oligonucleotide Array Sequence Analysis/methods , Transcription, Genetic , Animals , Base Sequence , Computational Biology , Conserved Sequence , CpG Islands , DNA, Complementary , DNA, Intergenic , Databases, Genetic , Exons , Humans , Introns , Mice , Nucleic Acid Hybridization , Oligonucleotide Probes , Proteins/chemistry , Proteins/genetics , RNA, Messenger/genetics , Reproducibility of Results , Reverse Transcriptase Polymerase Chain Reaction , Sequence Homology, Nucleic Acid
17.
Dev Cell ; 6(6): 791-800, 2004 Jun.
Article in English | MEDLINE | ID: mdl-15177028

ABSTRACT

Many anatomical differences exist between males and females; these are manifested on a molecular level by different hormonal environments. Although several molecular differences in adult tissues have been identified, a comprehensive investigation of the gene expression differences between males and females has not been performed. We surveyed the expression patterns of 13,977 mouse genes in male and female hypothalamus, kidney, liver, and reproductive tissues. Extensive differential gene expression was observed not only in the reproductive tissues, but also in the kidney and liver. The differentially expressed genes are involved in drug and steroid metabolism, osmotic regulation, or as yet unresolved cellular roles. In contrast, very few molecular differences were observed between the male and female hypothalamus in both mice and humans. We conclude that there are persistent differences in gene expression between adult males and females. These molecular differences have important implications for the physiological differences between males and females.


Subject(s)
Gene Expression Regulation/genetics , Genitalia/metabolism , Kidney/metabolism , Liver/metabolism , Pharmaceutical Preparations/metabolism , Sex Characteristics , Animals , DNA/analysis , DNA/genetics , Female , Gene Expression Profiling , Genitalia/cytology , Humans , Hypothalamus/cytology , Hypothalamus/metabolism , Kidney/cytology , Liver/cytology , Male , Metabolic Clearance Rate/genetics , Mice , Oligonucleotide Array Sequence Analysis , Organ Specificity , Ovary/cytology , Ovary/metabolism , Receptors, Cell Surface/genetics , Serpins , Testis/cytology , Testis/metabolism , Transcortin
SELECTION OF CITATIONS
SEARCH DETAIL
...