Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
Genome Res ; 17(6): 746-59, 2007 Jun.
Article in English | MEDLINE | ID: mdl-17567994

ABSTRACT

This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5' rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations.


Subject(s)
Chromosome Mapping , Exons , Genome, Human , Promoter Regions, Genetic , Quantitative Trait Loci , Transcription, Genetic/physiology , DNA, Complementary/genetics , Human Genome Project , Humans , Open Reading Frames
2.
Science ; 316(5830): 1484-8, 2007 Jun 08.
Article in English | MEDLINE | ID: mdl-17510325

ABSTRACT

Significant fractions of eukaryotic genomes give rise to RNA, much of which is unannotated and has reduced protein-coding potential. The genomic origins and the associations of human nuclear and cytosolic polyadenylated RNAs longer than 200 nucleotides (nt) and whole-cell RNAs less than 200 nt were investigated in this genome-wide study. Subcellular addresses for nucleotides present in detected RNAs were assigned, and their potential processing into short RNAs was investigated. Taken together, these observations suggest a novel role for some unannotated RNAs as primary transcripts for the production of short RNAs. Three potentially functional classes of RNAs have been identified, two of which are syntenically conserved and correlate with the expression state of protein-coding genes. These data support a highly interleaved organization of the human transcriptome.


Subject(s)
Genome, Human , RNA Precursors/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , RNA/genetics , Transcription, Genetic , Animals , Cell Line, Tumor , Cell Nucleus/metabolism , Cytosol/metabolism , Exons , Gene Expression , Genome , HeLa Cells , Humans , Mice , Promoter Regions, Genetic , RNA/metabolism , RNA Precursors/metabolism , Synteny , Terminator Regions, Genetic
3.
Nat Genet ; 38(10): 1151-8, 2006 Oct.
Article in English | MEDLINE | ID: mdl-16951679

ABSTRACT

Many animal and plant genomes are transcribed much more extensively than current annotations predict. However, the biological function of these unannotated transcribed regions is largely unknown. Approximately 7% and 23% of the detected transcribed nucleotides during D. melanogaster embryogenesis map to unannotated intergenic and intronic regions, respectively. Based on computational analysis of coordinated transcription, we conservatively estimate that 29% of all unannotated transcribed sequences function as missed or alternative exons of well-characterized protein-coding genes. We estimate that 15.6% of intergenic transcribed regions function as missed or alternative transcription start sites (TSS) used by 11.4% of the expressed protein-coding genes. Identification of P element mutations within or near newly identified 5' exons provides a strategy for mapping previously uncharacterized mutations to their respective genes. Collectively, these data indicate that at least 85% of the fly genome is transcribed and processed into mature transcripts representing at least 30% of the fly genome.


Subject(s)
Drosophila melanogaster/embryology , Drosophila melanogaster/genetics , Gene Expression Regulation, Developmental , Transcription, Genetic , Amino Acid Sequence , Animals , DNA, Intergenic , Drosophila Proteins/genetics , Embryo, Nonmammalian , Exons , Genome, Insect , Molecular Sequence Data , Mutation , Oligonucleotide Array Sequence Analysis , Transcription Initiation Site
4.
Genome Res ; 15(7): 987-97, 2005 Jul.
Article in English | MEDLINE | ID: mdl-15998911

ABSTRACT

Recently, we mapped the sites of transcription across approximately 30% of the human genome and elucidated the structures of several hundred novel transcripts. In this report, we describe a novel combination of techniques including the rapid amplification of cDNA ends (RACE) and tiling array technologies that was used to further characterize transcripts in the human transcriptome. This technical approach allows for several important pieces of information to be gathered about each array-detected transcribed region, including strand of origin, start and termination positions, and the exonic structures of spliced and unspliced coding and noncoding RNAs. In this report, the structures of transcripts from 14 transcribed loci, representing both known genes and unannotated transcripts taken from the several hundred randomly selected unannotated transcripts described in our previous work are represented as examples of the complex organization of the human transcriptome. As a consequence of this complexity, it is not unusual that a single base pair can be part of an intricate network of multiple isoforms of overlapping sense and antisense transcripts, the majority of which are unannotated. Some of these transcripts follow the canonical splicing rules, whereas others combine the exons of different genes or represent other types of noncanonical transcripts. These results have important implications concerning the correlation of genotypes to phenotypes, the regulation of complex interlaced transcriptional patterns, and the definition of a gene.


Subject(s)
Nucleic Acid Amplification Techniques , Oligonucleotide Array Sequence Analysis , Transcription, Genetic , Cell Line , Gene Expression Profiling , Humans , Jurkat Cells , Models, Genetic , Molecular Sequence Data , Nucleic Acid Amplification Techniques/methods , Oligonucleotide Array Sequence Analysis/methods , Protein Isoforms/genetics , Tumor Cells, Cultured
5.
Science ; 308(5725): 1149-54, 2005 May 20.
Article in English | MEDLINE | ID: mdl-15790807

ABSTRACT

Sites of transcription of polyadenylated and nonpolyadenylated RNAs for 10 human chromosomes were mapped at 5-base pair resolution in eight cell lines. Unannotated, nonpolyadenylated transcripts comprise the major proportion of the transcriptional output of the human genome. Of all transcribed sequences, 19.4, 43.7, and 36.9% were observed to be polyadenylated, nonpolyadenylated, and bimorphic, respectively. Half of all transcribed sequences are found only in the nucleus and for the most part are unannotated. Overall, the transcribed portions of the human genome are predominantly composed of interlaced networks of both poly A+ and poly A- annotated transcripts and unannotated transcripts of unknown function. This organization has important implications for interpreting genotype-phenotype associations, regulation of gene expression, and the definition of a gene.


Subject(s)
Chromosomes, Human/genetics , Genome, Human , RNA, Messenger/analysis , Transcription, Genetic , Cell Line , Cell Line, Tumor , Cell Nucleus/metabolism , Chromosomes, Human, Pair 13/genetics , Chromosomes, Human, Pair 14/genetics , Chromosomes, Human, Pair 19/genetics , Chromosomes, Human, Pair 20/genetics , Chromosomes, Human, Pair 21/genetics , Chromosomes, Human, Pair 22/genetics , Chromosomes, Human, Pair 6/genetics , Chromosomes, Human, Pair 7/genetics , Chromosomes, Human, X/genetics , Chromosomes, Human, Y/genetics , Computational Biology , Cytosol/metabolism , DNA, Complementary , DNA, Intergenic , Exons , Female , Humans , Introns , Male , Molecular Sequence Data , Nucleic Acid Amplification Techniques , Oligonucleotide Array Sequence Analysis , Physical Chromosome Mapping , RNA Splicing
6.
Genome Res ; 14(12): 2424-9, 2004 Dec.
Article in English | MEDLINE | ID: mdl-15574821

ABSTRACT

The completion of the mouse and other mammalian genome sequences will provide necessary, but not sufficient, knowledge for an understanding of much of mouse biology at the molecular level. As a requisite next step in this process, the genes in mouse and their structure must be elucidated. In particular, knowledge of the transcriptional start site of these genes will be necessary for further study of their regulatory regions. To assess the current state of mouse genome annotation to support this activity, we identified several hundred gene predictions in mouse with varying levels of supporting evidence and tested them using RACE-PCR. Modifications were made to the procedure allowing pooling of RNA samples, resulting in a scaleable procedure. The results illustrate potential errors or omissions in the current 5' end annotations in 58% of the genes detected. In testing experimentally unsupported gene predictions, we were able to identify 58 that are not usually annotated as genes but produced spliced transcripts (approximately 25% success rate). In addition, in many genes we were able to detect novel exons not predicted by any gene prediction algorithms. In 19.8% of the genes detected in this study, multiple transcript species were observed. These data show an urgent need to provide direct experimental validation of gene annotations. Moreover, these results show that direct validation using RACE-PCR can be an important component of genome-wide validation. This approach can be a useful tool in the ongoing efforts to increase the quality of gene annotations, especially transcriptional start sites, in complex genomes.


Subject(s)
Genes/genetics , Genome , Mice/genetics , Open Reading Frames/genetics , Transcription Initiation Site , Animals , Base Sequence , CpG Islands/genetics , DNA Primers , DNA, Complementary/genetics , Exons/genetics , Molecular Sequence Data , Polymerase Chain Reaction/methods , Sequence Analysis, DNA
7.
Science ; 302(5653): 2115-7, 2003 Dec 19.
Article in English | MEDLINE | ID: mdl-14684820

ABSTRACT

Gene enrichment strategies offer an alternative to sequencing large and repetitive genomes such as that of maize. We report the generation and analysis of nearly 100,000 undermethylated (or methylation filtration) maize sequences. Comparison with the rice genome reveals that methylation filtration results in a more comprehensive representation of maize genes than those that result from expressed sequence tags or transposon insertion sites sequences. About 7% of the repetitive DNA is unmethylated and thus selected in our libraries, but potentially active transposons and unmethylated organelle genomes can be identified. Reverse transcription polymerase chain reaction can be used to finish the maize transcriptome.


Subject(s)
DNA Methylation , Genome, Plant , Sequence Analysis, DNA/methods , Zea mays/genetics , Algorithms , Chromosomes, Artificial, Bacterial , Cloning, Molecular , Computational Biology , Conserved Sequence , Contig Mapping , CpG Islands , DNA Transposable Elements , DNA, Chloroplast/genetics , DNA, Complementary , DNA, Mitochondrial/genetics , DNA, Plant/genetics , Databases, Nucleic Acid , Escherichia coli/genetics , Exons , Expressed Sequence Tags , Genes, Plant , Genomic Library , Oryza/genetics , Repetitive Sequences, Nucleic Acid , Retroelements , Reverse Transcriptase Polymerase Chain Reaction
SELECTION OF CITATIONS
SEARCH DETAIL
...