Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
Nucleic Acids Res ; 33(Database issue): D71-4, 2005 Jan 01.
Article in English | MEDLINE | ID: mdl-15608288

ABSTRACT

Although the list of completed genome sequencing projects has expanded rapidly, sequencing and analysis of expressed sequence tags (ESTs) remain a primary tool for discovery of novel genes in many eukaryotes and a key element in genome annotation. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi) are a collection of 77 species-specific databases that use a highly refined protocol to analyze gene and EST sequences in an attempt to identify and characterize expressed transcripts and to present them on the Web in a user-friendly, consistent fashion. A Gene Index database is constructed for each selected organism by first clustering, then assembling EST and annotated cDNA and gene sequences from GenBank. This process produces a set of unique, high-fidelity virtual transcripts, or tentative consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to genetic and physical maps, to provide links to orthologous and paralogous genes, and as a resource for comparative and functional genomic analysis.


Subject(s)
Databases, Genetic , Expressed Sequence Tags/chemistry , Genomics , Animals , Base Sequence , Consensus Sequence , Databases, Genetic/trends , Eukaryotic Cells/metabolism , Genome , Humans , Internet , Sequence Analysis, DNA , Software
2.
Science ; 302(5653): 2118-20, 2003 Dec 19.
Article in English | MEDLINE | ID: mdl-14684821

ABSTRACT

Approximately 80% of the maize genome comprises highly repetitive sequences interspersed with single-copy, gene-rich sequences, and standard genome sequencing strategies are not readily adaptable to this type of genome. Methodologies that enrich for genic sequences might more rapidly generate useful results from complex genomes. Equivalent numbers of clones from maize selected by techniques called methylation filtering and High C0t selection were sequenced to generate approximately 200,000 reads (approximately 132 megabases), which were assembled into contigs. Combination of the two techniques resulted in a sixfold reduction in the effective genome size and a fourfold increase in the gene identification rate in comparison to a nonenriched library.


Subject(s)
Genes, Plant , Genome, Plant , Sequence Analysis, DNA/methods , Zea mays/genetics , Chromosomes, Plant/genetics , Cloning, Molecular , Computational Biology , Contig Mapping , DNA Methylation , DNA, Plant/genetics , Databases, Nucleic Acid , Expressed Sequence Tags , Gene Dosage , Gene Library , Molecular Sequence Data , Repetitive Sequences, Nucleic Acid , Retroelements , Sequence Alignment , Transcription, Genetic
3.
Genome Res ; 11(4): 626-30, 2001 Apr.
Article in English | MEDLINE | ID: mdl-11282978

ABSTRACT

An essential component of functional genomics studies is the sequence of DNA expressed in tissues of interest. To provide a resource of bovine-specific expressed sequence data and facilitate this powerful approach in cattle research, four normalized cDNA libraries were produced and arrayed for high-throughput sequencing. The libraries were made with RNA pooled from multiple tissues to increase efficiency of normalization and maximize the number of independent genes for which sequence data were obtained. Target tissues included those with highest likelihood to have impact on production parameters of animal health, growth, reproductive efficiency, and carcass merit. Success of normalization and inter- and intralibrary redundancy were assessed by collecting 6000-23,000 sequences from each of the libraries (68,520 total sequences deposited in GenBank). Sequence comparison and assembly of these sequences was performed in combination with 56,500 other bovine EST sequences present in the GenBank dbEST database to construct a cattle Gene Index (available from The Institute for Genomic Research at http://www.tigr.org/tdb/tgi.shtml). The 124,381 bovine ESTs present in GenBank at the time of the analysis form 16,740 assemblies that are listed and annotated on the Web site. Analysis of individual library sequence data indicates that the pooled-tissue approach was highly effective in preparing libraries for efficient deep sequencing.


Subject(s)
Gene Library , Oligonucleotide Array Sequence Analysis/methods , Animals , Cattle , Databases, Factual , Expressed Sequence Tags , Female , Fetus , Gene Expression Profiling/methods , Organ Specificity/genetics , Pregnancy
4.
Nucleic Acids Res ; 29(1): 159-64, 2001 Jan 01.
Article in English | MEDLINE | ID: mdl-11125077

ABSTRACT

While genome sequencing projects are advancing rapidly, EST sequencing and analysis remains a primary research tool for the identification and categorization of gene sequences in a wide variety of species and an important resource for annotation of genomic sequence. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi. shtml) are a collection of species-specific databases that use a highly refined protocol to analyze EST sequences in an attempt to identify the genes represented by that data and to provide additional information regarding those genes. Gene Indices are constructed by first clustering, then assembling EST and annotated gene sequences from GenBank for the targeted species. This process produces a set of unique, high-fidelity virtual transcripts, or Tentative Consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to mapping and genomic sequence data, to provide links between orthologous and paralogous genes and as a resource for comparative sequence analysis.


Subject(s)
Databases, Factual , Expressed Sequence Tags , Animals , Base Sequence , Genes/genetics , Humans , Internet , Molecular Sequence Data , Sequence Alignment , Sequence Homology, Nucleic Acid , Species Specificity
5.
Genome Biol ; 2(11): SOFTWARE0002, 2001.
Article in English | MEDLINE | ID: mdl-16173164

ABSTRACT

Microarray expression analysis is providing unprecedented data on gene expression in humans and mammalian model systems. Although such studies provide a tremendous resource for understanding human disease states, one of the significant challenges is cross-referencing the data derived from different species, across diverse expression analysis platforms, in order to properly derive inferences regarding gene expression and disease state. To address this problem, we have developed RESOURCERER, a microarray-resource annotation and cross-reference database built using the analysis of expressed sequence tags (ESTs) and gene sequences provided by the TIGR Gene Index (TGI) and TIGR Orthologous Gene Alignment (TOGA) databases [now called Eukaryotic Gene Orthologs (EGO)].


Subject(s)
Databases, Genetic , Gene Expression Profiling , Oligonucleotide Array Sequence Analysis , Animals , Expressed Sequence Tags , Humans , Internet , Mice , Rats , Systems Integration
6.
Nucleic Acids Res ; 28(18): 3657-65, 2000 Sep 15.
Article in English | MEDLINE | ID: mdl-10982889

ABSTRACT

The vast body of Expressed Sequence Tag (EST) data in the public databases provide an important resource for comparative and functional genomics studies and an invaluable tool for the annotation of genomic sequences. We have developed a rigorous protocol for reconstructing the sequences of transcribed genes from EST and gene sequence fragments. A key element in developing this protocol has been the evaluation of a number of sequence assembly programs to determine which most faithfully reproduce transcript sequences from EST data. The TIGR Gene Indices constructed using this protocol for human, mouse, rat and a variety of other plant and animal models have demonstrated their utility in a variety of applications and are freely available to the scientific research community.


Subject(s)
Expressed Sequence Tags , Sequence Analysis, DNA/methods , Algorithms , Animals , Consensus Sequence , Databases, Factual , Humans , Multigene Family , Rats
7.
Nat Genet ; 25(2): 239-40, 2000 Jun.
Article in English | MEDLINE | ID: mdl-10835646

ABSTRACT

Although sequencing of the human genome will soon be completed, gene identification and annotation remains a challenge. Early estimates suggested that there might be 60,000-100,000 (ref. 1) human genes, but recent analyses of the available data from EST sequencing projects have estimated as few as 45,000 (ref. 2) or as many as 140, 000 (ref. 3) distinct genes. The Chromosome 22 Sequencing Consortium estimated a minimum of 45,000 genes based on their annotation of the complete chromosome, although their data suggests there may be additional genes. The nearly 2,000,000 human ESTs in dbEST provide an important resource for gene identification and genome annotation, but these single-pass sequences must be carefully analysed to remove contaminating sequences, including those from genomic DNA, spurious transcription, and vector and bacterial sequences. We have developed a highly refined and rigorously tested protocol for cleaning, clustering and assembling EST sequences to produce high-fidelity consensus sequences for the represented genes (F.L. et al., manuscript submitted) and used this to create the TIGR Gene Indices-databases of expressed genes for human, mouse, rat and other species (http://www.tigr.org/tdb/tgi.html). Using highly refined and tested algorithms for EST analysis, we have arrived at two independent estimates indicating the human genome contains approximately 120,000 genes.


Subject(s)
Expressed Sequence Tags , Genes , Genome, Human , Algorithms , Chromosomes, Human, Pair 22/genetics , Computational Biology , Consensus Sequence/genetics , Databases, Factual , Humans , Internet , Physical Chromosome Mapping , Reproducibility of Results , Software
8.
Nucleic Acids Res ; 28(1): 141-5, 2000 Jan 01.
Article in English | MEDLINE | ID: mdl-10592205

ABSTRACT

Expressed sequence tags (ESTs) have provided a first glimpse of the collection of transcribed sequences in a variety of organisms. However, a careful analysis of this sequence data can provide significant additional functional, structural and evolutionary information. Our analysis of the public EST sequences, available through the TIGR Gene Indices (TGI; http://www.tigr.org/tdb/tdb.html ), is an attempt to identify the genes represented by that data and to provide additional information regarding those genes. Gene Indices are constructed for selected organisms by first clustering, then assembling EST and annotated gene sequences from GenBank. This process produces a set of unique, high-fidelity virtual transcripts, or tentative consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to mapping and genomic sequence data, and to provide links between orthologous and paralogous genes.


Subject(s)
Databases, Factual , Expressed Sequence Tags , Base Sequence , DNA , Database Management Systems , Human Genome Project , Humans , Internet , Molecular Sequence Data
SELECTION OF CITATIONS
SEARCH DETAIL
...