ABSTRACT
We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture.
Subject(s)
Computational Biology/methods , Databases, Genetic , Genome , Software Design , Animals , Humans , InternetABSTRACT
The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.
Subject(s)
Drosophila melanogaster/genetics , Genome , Sequence Analysis, DNA , Animals , Biological Transport/genetics , Chromatin/genetics , Cloning, Molecular , Computational Biology , Contig Mapping , Cytochrome P-450 Enzyme System/genetics , DNA Repair/genetics , DNA Replication/genetics , Drosophila melanogaster/metabolism , Euchromatin , Gene Library , Genes, Insect , Heterochromatin/genetics , Insect Proteins/chemistry , Insect Proteins/genetics , Insect Proteins/physiology , Nuclear Proteins/genetics , Protein Biosynthesis , Transcription, GeneticABSTRACT
We constructed a bacterial artificial chromosome (BAC)-based physical map of chromosomes 2 and 3 of Drosophila melanogaster, which constitute 81% of the genome. Sequence tagged site (STS) content, restriction fingerprinting, and polytene chromosome in situ hybridization approaches were integrated to produce a map spanning the euchromatin. Three of five remaining gaps are in repeat-rich regions near the centromeres. A tiling path of clones spanning this map and STS maps of chromosomes X and 4 was sequenced to low coverage; the maps and tiling path sequence were used to support and verify the whole-genome sequence assembly, and tiling path BACs were used as templates in sequence finishing.