Search | VHL Regional Portal

Genome annotation assessment in Drosophila melanogaster.

Reese, M G; Hartzell, G; Harris, N L; Ohler, U; Abril, J F; Lewis, S E.

Genome Res ; 10(4): 483-501, 2000 Apr.

Article in English | MEDLINE | ID: mdl-10779488

ABSTRACT

Computational methods for automated genome annotation are critical to our community's ability to make full use of the large volume of genomic sequence being generated and released. To explore the accuracy of these automated feature prediction tools in the genomes of higher organisms, we evaluated their performance on a large, well-characterized sequence contig from the Adh region of Drosophila melanogaster. This experiment, known as the Genome Annotation Assessment Project (GASP), was launched in May 1999. Twelve groups, applying state-of-the-art tools, contributed predictions for features including gene structure, protein homologies, promoter sites, and repeat elements. We evaluated these predictions using two standards, one based on previously unreleased high-quality full-length cDNA sequences and a second based on the set of annotations generated as part of an in-depth study of the region by a group of Drosophila experts. Although these standard sets only approximate the unknown distribution of features in this region, we believe that when taken in context the results of an evaluation based on them are meaningful. The results were presented as a tutorial at the conference on Intelligent Systems in Molecular Biology (ISMB-99) in August 1999. Over 95% of the coding nucleotides in the region were correctly identified by the majority of the gene finders, and the correct intron/exon structures were predicted for >40% of the genes. Homology-based annotation techniques recognized and associated functions with almost half of the genes in the region; the remainder were only identified by the ab initio techniques. This experiment also presents the first assessment of promoter prediction techniques for a significant number of genes in a large contiguous region. We discovered that the promoter predictors' high false-positive rates make their predictions difficult to use. Integrating gene finding and cDNA/EST alignments with promoter predictions decreases the number of false-positive classifications but discovers less than one-third of the promoters in the region. We believe that by establishing standards for evaluating genomic annotations and by assessing the performance of existing automated genome annotation tools, this experiment establishes a baseline that contributes to the value of ongoing large-scale annotation projects and should guide further research in genome informatics.

Subject(s)

Computational Biology/methods , Drosophila melanogaster/genetics , Genes, Insect , Genome , Alcohol Dehydrogenase/chemistry , Alcohol Dehydrogenase/genetics , Animals , DNA, Complementary , Databases, Factual/trends , Drosophila melanogaster/enzymology , Expressed Sequence Tags , Promoter Regions, Genetic/genetics , Sequence Homology, Amino Acid

An exploration of the sequence of a 2.9-Mb region of the genome of Drosophila melanogaster: the Adh region.

Ashburner, M; Misra, S; Roote, J; Lewis, S E; Blazej, R; Davis, T; Doyle, C; Galle, R; George, R; Harris, N; Hartzell, G; Harvey, D; Hong, L; Houston, K; Hoskins, R; Johnson, G; Martin, C; Moshrefi, A; Palazzolo, M; Reese, M G; Spradling, A; Tsang, G; Wan, K; Whitelaw, K; Celniker, S.

Genetics ; 153(1): 179-219, 1999 Sep.

Article in English | MEDLINE | ID: mdl-10471707

ABSTRACT

A contiguous sequence of nearly 3 Mb from the genome of Drosophila melanogaster has been sequenced from a series of overlapping P1 and BAC clones. This region covers 69 chromosome polytene bands on chromosome arm 2L, including the genetically well-characterized "Adh region." A computational analysis of the sequence predicts 218 protein-coding genes, 11 tRNAs, and 17 transposable element sequences. At least 38 of the protein-coding genes are arranged in clusters of from 2 to 6 closely related genes, suggesting extensive tandem duplication. The gene density is one protein-coding gene every 13 kb; the transposable element density is one element every 171 kb. Of 73 genes in this region identified by genetic analysis, 49 have been located on the sequence; P-element insertions have been mapped to 43 genes. Ninety-five (44%) of the known and predicted genes match a Drosophila EST, and 144 (66%) have clear similarities to proteins in other organisms. Genes known to have mutant phenotypes are more likely to be represented in cDNA libraries, and far more likely to have products similar to proteins of other organisms, than are genes with no known mutant phenotype. Over 650 chromosome aberration breakpoints map to this chromosome region, and their nonrandom distribution on the genetic map reflects variation in gene spacing on the DNA. This is the first large-scale analysis of the genome of D. melanogaster at the sequence level. In addition to the direct results obtained, this analysis has allowed us to develop and test methods that will be needed to interpret the complete sequence of the genome of this species. Before beginning a Hunt, it is wise to ask someone what you are looking for before you begin looking for it. Milne 1926

Subject(s)

Alcohol Dehydrogenase/genetics , Drosophila melanogaster/genetics , Genes, Insect/genetics , Genome , Physical Chromosome Mapping , Animals , Base Composition , Chromosome Breakage/genetics , Conserved Sequence/genetics , DNA Transposable Elements/genetics , Evolution, Molecular , Expressed Sequence Tags , Gene Duplication , Genes, Overlapping/genetics , Mutation , Phenotype , RNA, Transfer/genetics , Sequence Analysis, DNA , Transcription, Genetic/genetics

A computer program for aligning a cDNA sequence with a genomic DNA sequence.

Florea, L; Hartzell, G; Zhang, Z; Rubin, G M; Miller, W.

Genome Res ; 8(9): 967-74, 1998 Sep.

Article in English | MEDLINE | ID: mdl-9750195

ABSTRACT

We address the problem of efficiently aligning a transcribed and spliced DNA sequence with a genomic sequence containing that gene, allowing for introns in the genomic sequence and a relatively small number of sequencing errors. A freely available computer program, described herein, solves the problem for a 100-kb genomic sequence in a few seconds on a workstation.

Subject(s)

DNA, Complementary/genetics , DNA/genetics , Genome , Sequence Alignment/methods , Software , Algorithms , Animals , Computational Biology , Drosophila melanogaster/genetics , Expressed Sequence Tags , Humans , Mice , RNA, Messenger/genetics , Sequence Homology, Nucleic Acid

The nucleotide sequence of Saccharomyces cerevisiae chromosome V.

Dietrich, F S; Mulligan, J; Hennessy, K; Yelton, M A; Allen, E; Araujo, R; Aviles, E; Berno, A; Brennan, T; Carpenter, J; Chen, E; Cherry, J M; Chung, E; Duncan, M; Guzman, E; Hartzell, G; Hunicke-Smith, S; Hyman, R W; Kayser, A; Komp, C; Lashkari, D; Lew, H; Lin, D; Mosedale, D; Davis, R W.

Nature ; 387(6632 Suppl): 78-81, 1997 May 29.

Article in English | MEDLINE | ID: mdl-9169868

ABSTRACT

Here we report the sequence of 569,202 base pairs of Saccharomyces cerevisiae chromosome V. Analysis of the sequence revealed a centromere, two telomeres and 271 open reading frames (ORFs) plus 13 tRNAs and four small nuclear RNAs. There are two Tyl transposable elements, each of which contains an ORF (included in the count of 271). Of the ORFs, 78 (29%) are new, 81 (30%) have potential homologues in the public databases, and 112 (41%) are previously characterized yeast genes.

Subject(s)

Chromosomes, Fungal , Saccharomyces cerevisiae/genetics , Base Sequence , DNA, Fungal , Molecular Sequence Data

Overview of combustion toxicology.

Hartzell, G E.

Toxicology ; 115(1-3): 7-23, 1996 Dec 31.

Article in English | MEDLINE | ID: mdl-9016738

ABSTRACT

Combustion toxicology embraces the nature, the severity, and the time course of adverse effects produced upon exposure to fire-generated toxic species. These species usually consist of narcotic toxicants or asphyxiants, along with those which may produce sensory/upper respiratory and even pulmonary irritation. They all act in concert to compromise the vital systems of those exposed, leading to incapacitation and death generally through various hypoxia-producing mechanisms. Some fire gas toxicants are material-dependent, some are largely dependent on the combustion conditions of the fire, while others may be dependent on both. Since the rates of generation of fire toxicants are powered by the energy release of the fire, the development of toxic hazard is also dependent on the fire itself.

Subject(s)

Fires , Smoke/adverse effects , Animals , Asphyxia/chemically induced , Humans , Irritants , Smoke Inhalation Injury , Toxicity Tests/methods

Laparoscopic versus conventional appendectomy.

Bonanni, F; Reed, J; Hartzell, G; Trostle, D; Boorse, R; Gittleman, M; Cole, A.

J Am Coll Surg ; 179(3): 273-8, 1994 Sep.

Article in English | MEDLINE | ID: mdl-8069421

ABSTRACT

BACKGROUND: The results of recent series suggest remarkable advantages of laparoscopic appendectomy over the conventional open appendectomy. To determine if clear advantages could be established, the charts of all patients admitted to our institution with a presumptive diagnosis of acute appendicitis and subsequent appendectomy were retrospectively reviewed. STUDY DESIGN: From January 1990 through June 1992, there were 300 conventional open appendectomies and 66 laparoscopic appendectomies performed. Data from both groups were compared with respect to anesthesia time, operative time, postoperative morbidity, postoperative pain, time to regular diet, hospitalization period, cost, and return to normal activities. RESULTS: There were no significant differences between the laparoscopic and open appendectomy groups with respect to operative complications, postoperative morbidity, pain medication requirements, and time to regular diet. There were significantly longer anesthesia times, operative times, and operating room costs in the laparoscopic group. For complicated appendicitis, the laparoscopic technique resulted in infectious complications that required readmission in 45.5 percent of the patients. CONCLUSIONS: Laparoscopic appendectomy is a safe alternative to conventional open appendectomy for simple acute appendicitis. However, laparoscopic appendectomy is not superior to the conventional method with regard to operative time, postoperative morbidity, pain medication requirements, time to regular diet, length of stay, cost, or return to normal activity. Laparoscopic appendectomy may be contraindicated in complicated appendicitis (gangrene, perforated with abscess, or peritonitis) due to an increased rate of infectious complications requiring readmission.

Subject(s)

Appendectomy/methods , Appendicitis/surgery , Laparoscopy , Acute Disease , Adolescent , Adult , Aged , Aged, 80 and over , Child , Child, Preschool , Female , Humans , Infant , Male , Middle Aged , Postoperative Complications , Retrospective Studies

DNA sequence confidence estimation.

Lipshutz, R J; Taverner, F; Hennessy, K; Hartzell, G; Davis, R.

Genomics ; 19(3): 417-24, 1994 Feb.

Article in English | MEDLINE | ID: mdl-8188283

ABSTRACT

A significant bottleneck in the current DNA sequencing process is the manual editing of trace data generated by automated DNA sequencers. This step is used to correct base calls and to associate to each base call a confidence level. The confidence levels are used in the assembly process to determine overlaps and to resolve discrepancies in determining the consensus sequence. This single step may cost as much as 4 to 8 cents per finished base. We report an approach to automated trace editing using classification trees to detect and exploit context-based patterns in trace peak heights. Local base composition and nearby peak heights account for 80% of the variations in peak heights. Classification algorithms were developed to identify 37% of automated base calls that differ from the consensus sequence. With these algorithms, 12% of the base calls had confidence levels less than 90%.

Subject(s)

Algorithms , Sequence Analysis, DNA , Analysis of Variance , Artifacts , Automation , Consensus Sequence , Cosmids/genetics , Decision Trees , Sequence Alignment , Sequence Analysis, DNA/economics , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/statistics & numerical data

Identification of consensus patterns in unaligned DNA sequences known to be functionally related.

Hertz, G Z; Hartzell, G W; Stormo, G D.

Comput Appl Biosci ; 6(2): 81-92, 1990 Apr.

Article in English | MEDLINE | ID: mdl-2193692

ABSTRACT

We have developed a method for identifying consensus patterns in a set of unaligned DNA sequences known to bind a common protein or to have some other common biochemical function. The method is based on a matrix representation of binding site patterns. Each row of the matrix represents one of the four possible bases, each column represents one of the positions of the binding site and each element is determined by the frequency the indicated base occurs at the indicated position. The goal of the method is to find the most significant matrix--i.e. the one with the lowest probability of occurring by chance--out of all the matrices that can be formed from the set of related sequences. The reliability of the method improves with the number of sequences, while the time required increases only linearly with the number of sequences. To test this method, we analysed 11 DNA sequences containing promoters regulated by the Escherichia coli LexA protein. The matrices we found were consistent with the known consensus sequence, and could distinguish the generally accepted LexA binding sites from other DNA sequences.

Subject(s)

Base Sequence , DNA , Pattern Recognition, Automated , Serine Endopeptidases , Software , Algorithms , Bacterial Proteins/genetics , Binding Sites , DNA, Bacterial/genetics , Escherichia coli/genetics , Genes, Bacterial , Molecular Sequence Data

10.

Identifying protein-binding sites from unaligned DNA fragments.

Stormo, G D; Hartzell, G W.

Proc Natl Acad Sci U S A ; 86(4): 1183-7, 1989 Feb.

Article in English | MEDLINE | ID: mdl-2919167

ABSTRACT

The ability to determine important features within DNA sequences from the sequences alone is becoming essential as large-scale sequencing projects are being undertaken. We present a method that can be applied to the problem of identifying the recognition pattern for a DNA-binding protein given only a collection of sequenced DNA fragments, each known to contain somewhere within it a binding site for that protein. Information about the position or orientation of the binding sites within those fragments is not needed. The method compares the "information content" of a large number of possible binding site alignments to arrive at a matrix representation of the binding site pattern. The specificity of the protein is represented as a matrix, rather than a consensus sequence, allowing patterns that are typical of regulatory protein-binding sites to be identified. The reliability of the method improves as the number of sequences increases, but the time required increases only linearly with the number of sequences. An example, using known cAMP receptor protein-binding sites, illustrates the method.

Subject(s)

Cyclic AMP Receptor Protein , DNA/metabolism , Models, Theoretical , Proteins/metabolism , Algorithms , Base Sequence , Binding Sites , Carrier Proteins/metabolism , Information Systems , Molecular Sequence Data , Neoplasm Proteins/metabolism

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL