Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 25
Filter
Add more filters










Publication year range
1.
Bioinformatics ; 35(6): 1049-1050, 2019 03 15.
Article in English | MEDLINE | ID: mdl-30165579

ABSTRACT

SUMMARY: The JCVI pan-genome pipeline is a collection of programs to run PanOCT and tools that support and extend the capabilities of PanOCT. PanOCT (pan-genome ortholog clustering tool) is a tool for pan-genome analysis of closely related prokaryotic species or strains. The JCVI Pan-Genome Pipeline wrapper invokes command-line utilities that prepare input genomes, invoke third-party tools such as NCBI Blast+, run PanOCT, generate a consensus pan-genome, annotate features of the pan-genome, detect sets of genes of interest such as antimicrobial resistance (AMR) genes and generate figures, tables and html pages to visualize the results. The pipeline can run in a hierarchical mode, lowering the RAM and compute resources used. AVAILABILITY AND IMPLEMENTATION: Source code, demo data, and detailed documentation are freely available at https://github.com/JCVenterInstitute/PanGenomePipeline.


Subject(s)
Genome, Bacterial , Genome, Microbial , Cluster Analysis , Prokaryotic Cells , Software
2.
Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30398656

ABSTRACT

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.


Subject(s)
Databases, Protein , Molecular Sequence Annotation , Animals , Databases, Genetic , Gene Ontology , Humans , Internet , Multigene Family , Protein Domains/genetics , Sequence Homology, Amino Acid , Software , User-Computer Interface
3.
Nucleic Acids Res ; 47(D1): D564-D572, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30364992

ABSTRACT

Automatic annotation of protein function is routinely applied to newly sequenced genomes. While this provides a fine-grained view of an organism's functional protein repertoire, proteins, more commonly function in a coordinated manner, such as in pathways or multimeric complexes. Genome Properties (GPs) define such functional entities as a series of steps, originally described by either TIGRFAMs or Pfam entries. To increase the scope of coverage, we have migrated GPs to function as a companion resource utilizing InterPro entries. Having introduced GPs-specific versioned releases, we provide software and data via a GitHub repository, and have developed a new web interface to GPs (available at https://www.ebi.ac.uk/interpro/genomeproperties). In addition to exploring each of the 1286 GPs, the website contains GPs pre-calculated for a representative set of proteomes; these results can be used to profile GPs phylogenetically via an interactive viewer. Users can upload novel data to the viewer for comparison with the pre-calculated results. Over the last year, we have added ∼700 new GPs, increasing the coverage of eukaryotic systems, as well as increasing general coverage through automatic generation of GPs from related resources. All data are freely available via the website and the GitHub repository.


Subject(s)
Databases, Protein , Genome , Proteins/genetics , Genome, Microbial , Metabolic Networks and Pathways/genetics , Multiprotein Complexes/genetics , Proteins/metabolism , Proteome
4.
F1000Res ; 7: 521, 2018.
Article in English | MEDLINE | ID: mdl-30430006

ABSTRACT

Background: The predominant species in clinical Enterobacter isolates is E. hormaechei. Many articles, clinicians, and GenBank submissions misname these strains as E. cloacae. The lack of sequenced type strains or named species/subspecies for some clades in the E. cloacae complex complicate the issue. Methods: The genomes of the type strains for Enterobacter hormaechei subsp.  oharae, E.  hormaechei subsp.  steigerwaltii, and E. xiangfangensis, and two strains from Hoffmann clusters III and IV of the E. cloacae complex were sequenced. These genomes, the E.  hormaechei subsp.  hormaechei type strain, and other available Enterobacter type strains were analysed in conjunction with all extant Enterobacter genomes in NCBI's RefSeq using Average Nucleotide Identity (ANI). Results: There were five recognizable subspecies of E. hormaechei: E. hormaechei subsp. hoffmannii subsp. nov., E. hormaechei subsp. xiangfangensis comb. nov., and the three previously known subspecies. One of the strains sequenced from the E. cloacae complex was not a novel E. hormaechei subspecies but rather a member of a clade of a novel species: E. roggenkampii sp. nov.. E. muelleri was determined to be a later heterotypic synonym of E. asburiae which should take precedence. Conclusion: The phylogeny of the Enterobacter genus, particularly the cloacae complex, was re-evaluated based on the type strain genome sequences and all other available Enterobacter genomes in RefSeq.


Subject(s)
Bacterial Typing Techniques/methods , Computational Biology , Enterobacter/classification , Genome, Bacterial , Enterobacter/genetics , Phylogeny , RNA, Ribosomal, 16S/genetics , Species Specificity
5.
Article in English | MEDLINE | ID: mdl-30012762

ABSTRACT

Burkholderia multivorans is a member of the Burkholderia cepacia complex, a group of >20 related species of nosocomial pathogens that commonly infect individuals suffering from cystic fibrosis. ß-Lactam antibiotics are recommended as therapy for infections due to Bmultivorans, which possesses two ß-lactamase genes, blapenA and blaAmpC PenA is a carbapenemase with a substrate profile similar to that of the Klebsiella pneumoniae carbapenemase (KPC); in addition, expression of PenA is inducible by ß-lactams in Bmultivorans Here, we characterize AmpC from Bmultivorans ATCC 17616. AmpC possesses only 38 to 46% protein identity with non-Burkholderia AmpC proteins (e.g., PDC-1 and CMY-2). Among 49 clinical isolates of Bmultivorans, we identified 27 different AmpC variants. Some variants possessed single amino acid substitutions within critical active-site motifs (Ω loop and R2 loop). Purified AmpC1 demonstrated minimal measurable catalytic activity toward ß-lactams (i.e., nitrocefin and cephalothin). Moreover, avibactam was a poor inhibitor of AmpC1 (Kiapp > 600 µM), and acyl-enzyme complex formation with AmpC1 was slow, likely due to lack of productive interactions with active-site residues. Interestingly, immunoblotting using a polyclonal anti-AmpC antibody revealed that protein expression of AmpC1 was inducible in Bmultivorans ATCC 17616 after growth in subinhibitory concentrations of imipenem (1 µg/ml). AmpC is a unique inducible class C cephalosporinase that may play an ancillary role in Bmultivorans compared to PenA, which is the dominant ß-lactamase in Bmultivorans ATCC 17616.


Subject(s)
Anti-Bacterial Agents/pharmacology , Bacterial Proteins/chemistry , Bacterial Proteins/metabolism , Burkholderia/drug effects , Burkholderia/enzymology , beta-Lactamases/chemistry , beta-Lactamases/metabolism , beta-Lactams/pharmacology , Amino Acid Sequence , Azabicyclo Compounds/pharmacology , Cephalosporinase/chemistry , Cephalosporinase/metabolism , Cephalosporins/pharmacology , Cephalothin/pharmacology , Imipenem/pharmacology , Microbial Sensitivity Tests , Protein Structure, Secondary
6.
Diagn Microbiol Infect Dis ; 92(3): 253-258, 2018 Nov.
Article in English | MEDLINE | ID: mdl-29983287

ABSTRACT

Multidrug-resistant gram-negative pathogens are a significant health threat. Burkholderia spp. encompass a complex subset of gram-negative bacteria with a wide range of biological functions that include human, animal, and plant pathogens. The treatment of infections caused by Burkholderia spp. is problematic due to their inherent resistance to multiple antibiotics. The major ß-lactam resistance determinant expressed in Burkholderia spp. is a class A ß-lactamase of the PenA family. In this study, significant amino acid sequence heterogeneity was discovered in PenA (37 novel variants) within a panel of 48 different strains of Burkholderia multivorans isolated from individuals with cystic fibrosis. Phylogenetic analysis distributed the 37 variants into 5 groups based on their primary amino acid sequences. Amino acid substitutions were present throughout the entire ß-lactamase and did not congregate to specific regions of the protein. The PenA variants possessed 5 to 17 single amino acid changes. The N189S and S286I substitutions were most prevalent and found in all variants. Due to the sequence heterogeneity in PenA, a highly conserved peptide (18 amino acids) within PenA was chosen as the antigen for polyclonal antibody production in order to measure expression of PenA within the 48 clinical isolates of B. multivorans. Characterization of the anti-PenA peptide antibody, using immunoblotting approaches, exposed several unique features of this antibody (i.e., detected <500 pg of purified PenA, all 37 PenA variants in B. multivorans, and Pen-like ß-lactamases from other species within the Burkholderia cepacia complex). The significant sequence heterogeneity found in PenA may have occurred due to selective pressure (e.g., exposure to antimicrobial therapy) within the host. The contribution of these changes warrants further investigation.


Subject(s)
Bacterial Proteins/genetics , Burkholderia Infections/microbiology , Burkholderia/genetics , Genetic Variation , beta-Lactamases/genetics , Amino Acid Sequence , Amino Acid Substitution , Anti-Bacterial Agents/pharmacology , Bacterial Proteins/chemistry , Burkholderia/classification , Burkholderia/drug effects , Genome, Bacterial , Humans , Microbial Sensitivity Tests , Models, Molecular , Mutation , Protein Conformation , beta-Lactam Resistance , beta-Lactamases/chemistry
7.
F1000Res ; 7: 297, 2018.
Article in English | MEDLINE | ID: mdl-29707202

ABSTRACT

Background: The tick cell line ISE6, derived from Ixodes scapularis, is commonly used for amplification and detection of arboviruses in environmental or clinical samples. Methods: To assist with sequence-based assays, we sequenced the ISE6 genome with single-molecule, long-read technology. Results: The draft assembly appears near complete based on gene content analysis, though it appears to lack some instances of repeats in this highly repetitive genome. The assembly appears to have separated the haplotypes at many loci. DNA short read pairs, used for validation only, mapped to the cell line assembly at a higher rate than they mapped to the Ixodes scapularis reference genome sequence. Conclusions: The assembly could be useful for filtering host genome sequence from sequence data obtained from cells infected with pathogens.

8.
Gigascience ; 7(3): 1-13, 2018 03 01.
Article in English | MEDLINE | ID: mdl-29329394

ABSTRACT

Background: The 50-year-old Aedes albopictus C6/36 cell line is a resource for the detection, amplification, and analysis of mosquito-borne viruses including Zika, dengue, and chikungunya. The cell line is derived from an unknown number of larvae from an unspecified strain of Aedes albopictus mosquitoes. Toward improved utility of the cell line for research in virus transmission, we present an annotated assembly of the C6/36 genome. Results: The C6/36 genome assembly has the largest contig N50 (3.3 Mbp) of any mosquito assembly, presents the sequences of both haplotypes for most of the diploid genome, reveals independent null mutations in both alleles of the Dicer locus, and indicates a male-specific genome. Gene annotation was computed with publicly available mosquito transcript sequences. Gene expression data from cell line RNA sequence identified enrichment of growth-related pathways and conspicuous deficiency in aquaporins and inward rectifier K+ channels. As a test of utility, RNA sequence data from Zika-infected cells were mapped to the C6/36 genome and transcriptome assemblies. Host subtraction reduced the data set by 89%, enabling faster characterization of nonhost reads. Conclusions: The C6/36 genome sequence and annotation should enable additional uses of the cell line to study arbovirus vector interactions and interventions aimed at restricting the spread of human disease.


Subject(s)
Aedes/virology , Virus Replication/genetics , Zika Virus Infection/genetics , Zika Virus/genetics , Aedes/genetics , Animals , Base Sequence/genetics , Cell Line , Genome, Insect/genetics , Humans , Larva/genetics , Larva/virology , Mosquito Vectors/genetics , Mosquito Vectors/virology , Zika Virus/growth & development , Zika Virus Infection/virology
9.
F1000Res ; 7: 98, 2018.
Article in English | MEDLINE | ID: mdl-31231504

ABSTRACT

The human cell lines HepG2, HuH-7, and Jurkat are commonly used for amplification of the RNA viruses present in environmental samples. To assist with assays by RNAseq, we sequenced these cell lines and developed a subtraction database that contains sequences expected in sequence data from uninfected cells. RNAseq data from cell lines infected with Sendai virus were analyzed to test host subtraction. The process of mapping RNAseq reads to our subtraction database vastly reduced the number non-viral reads in the dataset to allow for efficient secondary analyses.


Subject(s)
Databases, Genetic , Cell Line , DNA Viruses , High-Throughput Nucleotide Sequencing , Humans , Viruses
10.
Front Microbiol ; 8: 1661, 2017.
Article in English | MEDLINE | ID: mdl-28932211

ABSTRACT

Pneumococcal pneumonia has decreased significantly since the implementation of the pneumococcal conjugate vaccine (PCV), nevertheless, in many developing countries pneumonia mortality in infants remains high. We have undertaken a study of the nasopharyngeal (NP) microbiome during the first year of life in infants from The Philippines and South Africa. The study entailed the determination of the Streptococcus sp. carriage using a lytA qPCR assay, whole metagenomic sequencing, and in silico serotyping of Streptococcus pneumoniae, as well as 16S rRNA amplicon based community profiling. The lytA carriage in both populations increased with infant age and lytA+ samples ranged from 24 to 85% of the samples at each sampling time point. We next developed informatic tools for determining Streptococcus community composition and pneumococcal serotype from metagenomic sequences derived from a subset of longitudinal lytA-positive Streptococcus enrichment cultures from The Philippines (n = 26 infants, 50% vaccinated) and South African (n = 7 infants, 100% vaccinated). NP samples from infants were passaged in enrichment media, and metagenomic DNA was purified and sequenced. In silico capsular serotyping of these 51 metagenomic assemblies assigned known serotypes in 28 samples, and the co-occurrence of serotypes in 5 samples. Eighteen samples were not typeable using known serotypes but did encode for capsule biosynthetic cluster genes similar to non-encapsulated reference sequences. In addition, we performed metagenomic assembly and 16S rRNA amplicon profiling to understand co-colonization dynamics of Streptococcus sp. and other NP genera, revealing the presence of multiple Streptococcus species as well as potential respiratory pathogens in healthy infants. A range of virulence and drug resistant elements were identified as circulating in the NP microbiomes of these infants. This study revealed the frequent co-occurrence of multiple S. pneumoniae strains along with Streptococcus sp. and other potential pathogens such as S. aureus in the NP microbiome of these infants. In addition, the in silico serotype analysis proved powerful in determining the serotypes in S. pneumoniae carriage, and may lead to developing better targeted vaccines to prevent invasive pneumococcal disease (IPD) in these countries. These findings suggest that NP colonization by S. pneumoniae during the first years of life is a dynamic process involving multiple serotypes and species.

11.
F1000Res ; 6: 688, 2017.
Article in English | MEDLINE | ID: mdl-28721204

ABSTRACT

The CP 96-1252 cultivar of sugarcane is a complex hybrid of commercial importance. DNA was extracted from lab-grown leaf tissue and sequenced. The raw Illumina DNA sequencing results provide 101 Gbp of genome sequence reads. The dataset is available from https://www.ncbi.nlm.nih.gov/bioproject/PRJNA345486/.

12.
Science ; 328(5981): 994-9, 2010 May 21.
Article in English | MEDLINE | ID: mdl-20489017

ABSTRACT

The human microbiome refers to the community of microorganisms, including prokaryotes, viruses, and microbial eukaryotes, that populate the human body. The National Institutes of Health launched an initiative that focuses on describing the diversity of microbial species that are associated with health and disease. The first phase of this initiative includes the sequencing of hundreds of microbial reference genomes, coupled to metagenomic sequencing from multiple body sites. Here we present results from an initial reference genome sequencing of 178 microbial genomes. From 547,968 predicted polypeptides that correspond to the gene complement of these strains, previously unidentified ("novel") polypeptides that had both unmasked sequence length greater than 100 amino acids and no BLASTP match to any nonreference entry in the nonredundant subset were defined. This analysis resulted in a set of 30,867 polypeptides, of which 29,987 (approximately 97%) were unique. In addition, this set of microbial genomes allows for approximately 40% of random sequences from the microbiome of the gastrointestinal tract to be associated with organisms based on the match criteria used. Insights into pan-genome analysis suggest that we are still far from saturating microbial species genetic data sets. In addition, the associated metrics and standards used by our group for quality assurance are presented.


Subject(s)
Genome, Bacterial , Metagenome/genetics , Sequence Analysis, DNA , Bacteria/classification , Bacteria/genetics , Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Biodiversity , Computational Biology , Databases, Genetic , Gastrointestinal Tract/microbiology , Genes, Bacterial , Genetic Variation , Genome, Archaeal , Humans , Metagenomics/methods , Metagenomics/standards , Mouth/microbiology , Peptides/chemistry , Peptides/genetics , Phylogeny , Respiratory System/microbiology , Sequence Analysis, DNA/standards , Skin/microbiology , Urogenital System/microbiology
13.
Nature ; 464(7288): 592-6, 2010 Mar 25.
Article in English | MEDLINE | ID: mdl-20228792

ABSTRACT

The freshwater cnidarian Hydra was first described in 1702 and has been the object of study for 300 years. Experimental studies of Hydra between 1736 and 1744 culminated in the discovery of asexual reproduction of an animal by budding, the first description of regeneration in an animal, and successful transplantation of tissue between animals. Today, Hydra is an important model for studies of axial patterning, stem cell biology and regeneration. Here we report the genome of Hydra magnipapillata and compare it to the genomes of the anthozoan Nematostella vectensis and other animals. The Hydra genome has been shaped by bursts of transposable element expansion, horizontal gene transfer, trans-splicing, and simplification of gene structure and gene content that parallel simplification of the Hydra life cycle. We also report the sequence of the genome of a novel bacterium stably associated with H. magnipapillata. Comparisons of the Hydra genome to the genomes of other animals shed light on the evolution of epithelia, contractile tissues, developmentally regulated transcription factors, the Spemann-Mangold organizer, pluripotency genes and the neuromuscular junction.


Subject(s)
Genome/genetics , Hydra/genetics , Animals , Anthozoa/genetics , Comamonadaceae/genetics , DNA Transposable Elements/genetics , Gene Transfer, Horizontal/genetics , Genome, Bacterial/genetics , Hydra/microbiology , Hydra/ultrastructure , Molecular Sequence Data , Neuromuscular Junction/ultrastructure
14.
Bioinformatics ; 23(4): 500-1, 2007 Feb 15.
Article in English | MEDLINE | ID: mdl-17158514

ABSTRACT

UNLABELLED: Novel DNA sequencing technologies with the potential for up to three orders magnitude more sequence throughput than conventional Sanger sequencing are emerging. The instrument now available from Solexa Ltd, produces millions of short DNA sequences of 25 nt each. Due to ubiquitous repeats in large genomes and the inability of short sequences to uniquely and unambiguously characterize them, the short read length limits applicability for de novo sequencing. However, given the sequencing depth and the throughput of this instrument, stringent assembly of highly identical sequences can be achieved. We describe SSAKE, a tool for aggressively assembling millions of short nucleotide sequences by progressively searching through a prefix tree for the longest possible overlap between any two sequences. SSAKE is designed to help leverage the information from short sequence reads by stringently assembling them into contiguous sequences that can be used to characterize novel sequencing targets. AVAILABILITY: http://www.bcgsc.ca/bioinfo/software/ssake.


Subject(s)
Algorithms , Chromosome Mapping/methods , Contig Mapping/methods , Sequence Analysis, DNA/methods , Software , Base Sequence , Molecular Sequence Data
15.
Nature ; 428(6982): 493-521, 2004 Apr 01.
Article in English | MEDLINE | ID: mdl-15057822

ABSTRACT

The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.


Subject(s)
Evolution, Molecular , Genome , Genomics , Rats, Inbred BN/genetics , Animals , Base Composition , Centromere/genetics , Chromosomes, Mammalian/genetics , CpG Islands/genetics , DNA Transposable Elements/genetics , DNA, Mitochondrial/genetics , Gene Duplication , Humans , Introns/genetics , Male , Mice , Models, Molecular , Mutagenesis , Polymorphism, Single Nucleotide/genetics , RNA Splice Sites/genetics , RNA, Untranslated/genetics , Rats , Regulatory Sequences, Nucleic Acid/genetics , Retroelements/genetics , Sequence Analysis, DNA , Telomere/genetics
16.
Proc Natl Acad Sci U S A ; 101(7): 1916-21, 2004 Feb 17.
Article in English | MEDLINE | ID: mdl-14769938

ABSTRACT

We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304-1351; International Human Genome Sequencing Consortium (2001) Nature 409, 860-921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats.


Subject(s)
Computational Biology , Genome, Human , Human Genome Project , Computational Biology/standards , Contig Mapping/standards , Humans , RNA, Messenger/analysis , Software
17.
Proc Natl Acad Sci U S A ; 100(22): 12984-8, 2003 Oct 28.
Article in English | MEDLINE | ID: mdl-14566062

ABSTRACT

The hyperthermophile Nanoarchaeum equitans is an obligate symbiont growing in coculture with the crenarchaeon Ignicoccus. Ribosomal protein and rRNA-based phylogenies place its branching point early in the archaeal lineage, representing the new archaeal kingdom Nanoarchaeota. The N. equitans genome (490,885 base pairs) encodes the machinery for information processing and repair, but lacks genes for lipid, cofactor, amino acid, or nucleotide biosyntheses. It is the smallest microbial genome sequenced to date, and also one of the most compact, with 95% of the DNA predicted to encode proteins or stable RNAs. Its limited biosynthetic and catabolic capacity indicates that N. equitans' symbiotic relationship to Ignicoccus is parasitic, making it the only known archaeal parasite. Unlike the small genomes of bacterial parasites that are undergoing reductive evolution, N. equitans has few pseudogenes or extensive regions of noncoding DNA. This organism represents a basal archaeal lineage and has a highly reduced genome.


Subject(s)
Archaea/genetics , Biological Evolution , Genome, Archaeal , Arabidopsis/microbiology , Archaea/classification , Archaea/pathogenicity , DNA, Archaeal/genetics , Gene Library , Phylogeny
18.
Genetica ; 117(2-3): 227-37, 2003 Mar.
Article in English | MEDLINE | ID: mdl-12723702

ABSTRACT

Whole genome shotgun assemblies have proven remarkably successful in reconstructing the bulk of euchromatic genes, with the only limit appearing to be determined by the sequencing depth. For genes imbedded in heterochromatin, however, the low cloning efficiency of repetitive sequences, combined with the computational challenges, demand that additional clues be used to annotate the sequences. One approach that has proven very successful in identifying protein coding genes in Y-linked heterochromatin of Drosophila melanogaster has been to make a BLASTable database of the small, unmapped contigs and fragments leftover at the end of a shotgun assembly, and to attempt to capture these by blasting with an appropriate query sequence. This approach often yields a staggered alignment of contigs from the unmapped set to the query sequence, as though the disjoint contigs represent small portions of the gene. Further inspection frequently shows that the contigs are broken by very large, heterochromatic introns. Methods of this sort are being expanded to make best use of all available clues to determine which unmapped contigs are associated with genes. These include use of EST libraries, and, in the case of the Y chromosome, testing of male specific genes and reduced shotgun depth of relevant contigs. It appears much more hopeful than anyone would have imagined that whole genome shotgun assemblies can recover the great bulk of even heterochromatic genes.


Subject(s)
Drosophila melanogaster/genetics , Heterochromatin/genetics , Y Chromosome/genetics , Animals , Databases, Nucleic Acid , Expressed Sequence Tags , Sequence Analysis, DNA/methods
20.
Science ; 298(5591): 149-59, 2002 Oct 04.
Article in English | MEDLINE | ID: mdl-12364792

ABSTRACT

Comparison of the genomes and proteomes of the two diptera Anopheles gambiae and Drosophila melanogaster, which diverged about 250 million years ago, reveals considerable similarities. However, numerous differences are also observed; some of these must reflect the selection and subsequent adaptation associated with different ecologies and life strategies. Almost half of the genes in both genomes are interpreted as orthologs and show an average sequence identity of about 56%, which is slightly lower than that observed between the orthologs of the pufferfish and human (diverged about 450 million years ago). This indicates that these two insects diverged considerably faster than vertebrates. Aligned sequences reveal that orthologous genes have retained only half of their intron/exon structure, indicating that intron gains or losses have occurred at a rate of about one per gene per 125 million years. Chromosomal arms exhibit significant remnants of homology between the two species, although only 34% of the genes colocalize in small "microsyntenic" clusters, and major interarm transfers as well as intra-arm shuffling of gene order are detected.


Subject(s)
Anopheles/genetics , Drosophila melanogaster/genetics , Genome , Proteome , Animals , Anopheles/chemistry , Anopheles/physiology , Biological Evolution , Chromosome Inversion , Chromosomes/genetics , Cluster Analysis , Dosage Compensation, Genetic , Drosophila Proteins/chemistry , Drosophila Proteins/genetics , Drosophila Proteins/physiology , Drosophila melanogaster/chemistry , Drosophila melanogaster/physiology , Exons , Gene Order , Genes, Insect , Insect Proteins/chemistry , Insect Proteins/genetics , Insect Proteins/physiology , Introns , Physical Chromosome Mapping , Protein Structure, Tertiary , Pseudogenes , Sequence Homology, Nucleic Acid , Species Specificity , Synteny
SELECTION OF CITATIONS
SEARCH DETAIL
...