Search | VHL Regional Portal

De novo likelihood-based measures for comparing genome assemblies.

Ghodsi, Mohammadreza; Hill, Christopher M; Astrovskaya, Irina; Lin, Henry; Sommer, Dan D; Koren, Sergey; Pop, Mihai.

BMC Res Notes ; 6: 334, 2013 Aug 22.

Article in English | MEDLINE | ID: mdl-23965294

ABSTRACT

BACKGROUND: The current revolution in genomics has been made possible by software tools called genome assemblers, which stitch together DNA fragments "read" by sequencing machines into complete or nearly complete genome sequences. Despite decades of research in this field and the development of dozens of genome assemblers, assessing and comparing the quality of assembled genome sequences still relies on the availability of independently determined standards, such as manually curated genome sequences, or independently produced mapping data. These "gold standards" can be expensive to produce and may only cover a small fraction of the genome, which limits their applicability to newly generated genome sequences. Here we introduce a de novo probabilistic measure of assembly quality which allows for an objective comparison of multiple assemblies generated from the same set of reads. We define the quality of a sequence produced by an assembler as the conditional probability of observing the sequenced reads from the assembled sequence. A key property of our metric is that the true genome sequence maximizes the score, unlike other commonly used metrics. RESULTS: We demonstrate that our de novo score can be computed quickly and accurately in a practical setting even for large datasets, by estimating the score from a relatively small sample of the reads. To demonstrate the benefits of our score, we measure the quality of the assemblies generated in the GAGE and Assemblathon 1 assembly "bake-offs" with our metric. Even without knowledge of the true reference sequence, our de novo metric closely matches the reference-based evaluation metrics used in the studies and outperforms other de novo metrics traditionally used to measure assembly quality (such as N50). Finally, we highlight the application of our score to optimize assembly parameters used in genome assemblers, which enables better assemblies to be produced, even without prior knowledge of the genome being assembled. CONCLUSION: Likelihood-based measures, such as ours proposed here, will become the new standard for de novo assembly evaluation.

Subject(s)

Contig Mapping/statistics & numerical data , Genome, Bacterial , Rhodobacter sphaeroides/genetics , Software , Staphylococcus aureus/genetics , Staphylococcus epidermidis/genetics , Algorithms , Genomics/methods , Likelihood Functions , Sequence Analysis, DNA

Next generation sequence assembly with AMOS.

Treangen, Todd J; Sommer, Dan D; Angly, Florent E; Koren, Sergey; Pop, Mihai.

Curr Protoc Bioinformatics ; Chapter 11: Unit 11.8, 2011 Mar.

Article in English | MEDLINE | ID: mdl-21400694

ABSTRACT

A Modular Open-Source Assembler (AMOS) was designed to offer a modular approach to genome assembly. AMOS includes a wide range of tools for assembly, including the lightweight de novo assemblers Minimus and Minimo, and Bambus 2, a robust scaffolder able to handle metagenomic and polymorphic data. This protocol describes how to configure and use AMOS for the assembly of Next Generation sequence data. Additionally, we provide three tutorial examples that include bacterial, viral, and metagenomic datasets with specific tips for improving assembly quality.

Subject(s)

Genomics/methods , Sequence Analysis, DNA/methods , Software , Databases, Genetic

Suppression subtractive hybridization PCR isolation of cDNAs from a Caribbean soft coral

Lopez, Jose V; Ledger, Angela; Santiago-Vázquez, Lory Z; Pop, Mihai; Sommer, Dan D; Ranzer, Llanie K; Feldman, Robert A; Russell, G. Kerr.

Electron. j. biotechnol ; 14(1): 8-9, Jan. 2011. ilus, tab

Article in English | LILACS | ID: lil-591926

ABSTRACT

Transcriptomic studies of marine organisms are still in their infancy. A partial, subtracted expressed sequence tag (EST) library of the Caribbean octocoral Erythropodium caribaeorum and the sea fan Gorgonia ventalina has been analyzed in order to find novel genes or differences in gene expression related to potential secondary metabolite production or symbioses. This approach entails enrichment for potential non-housekeeping genes using the suppression subtractive hybridization (SSH) polymerase chain reaction (PCR) method. More than 500 expressed sequence tags (ESTs) were generated after cloning SSH products, which yielded at least 53 orthologous groups of proteins (COGs) and Pfam clusters, including transcription factors (Drosophila Big Brother), catalases, reverse transcriptases, ferritins and various hypothetical protein sequences. A total of 591 EST sequences were deposited into GenBank [dbEST: FL512138 - FL512331, GH611838, and HO061755-HO062154]. The results represent proof of concept for enrichment of unique transcripts over housekeeping genes, such as actin or ribosomal genes, which comprised approximately 17 percent of the total dataset. Due to the gene and sequence diversity of some ESTs, such sequences can find utility as molecular markers in current and future studies of this species and other soft coral biogeography, chemical ecology, phylogenetics, and evolution.

Subject(s)

Animals , DNA, Complementary/analysis , DNA, Complementary/physiology , Anthozoa/genetics , Anthozoa/chemistry , /analysis , Polymerase Chain Reaction/methods

Genomic characterization of the Yersinia genus.

Chen, Peter E; Cook, Christopher; Stewart, Andrew C; Nagarajan, Niranjan; Sommer, Dan D; Pop, Mihai; Thomason, Brendan; Thomason, Maureen P Kiley; Lentz, Shannon; Nolan, Nichole; Sozhamannan, Shanmuga; Sulakvelidze, Alexander; Mateczun, Alfred; Du, Lei; Zwick, Michael E; Read, Timothy D.

Genome Biol ; 11(1): R1, 2010 Jan 04.

Article in English | MEDLINE | ID: mdl-20047673

ABSTRACT

BACKGROUND: New DNA sequencing technologies have enabled detailed comparative genomic analyses of entire genera of bacterial pathogens. Prior to this study, three species of the enterobacterial genus Yersinia that cause invasive human diseases (Yersinia pestis, Yersinia pseudotuberculosis, and Yersinia enterocolitica) had been sequenced. However, there were no genomic data on the Yersinia species with more limited virulence potential, frequently found in soil and water environments. RESULTS: We used high-throughput sequencing-by-synthesis instruments to obtain 25- to 42-fold average redundancy, whole-genome shotgun data from the type strains of eight species: Y. aldovae, Y. bercovieri, Y. frederiksenii, Y. kristensenii, Y. intermedia, Y. mollaretii, Y. rohdei, and Y. ruckeri. The deepest branching species in the genus, Y. ruckeri, causative agent of red mouth disease in fish, has the smallest genome (3.7 Mb), although it shares the same core set of approximately 2,500 genes as the other members of the species, whose genomes range in size from 4.3 to 4.8 Mb. Yersinia genomes had a similar global partition of protein functions, as measured by the distribution of Cluster of Orthologous Groups families. Genome to genome variation in islands with genes encoding functions such as ureases, hydrogenases and B-12 cofactor metabolite reactions may reflect adaptations to colonizing specific host habitats. CONCLUSIONS: Rapid high-quality draft sequencing was used successfully to compare pathogenic and non-pathogenic members of the Yersinia genus. This work underscores the importance of the acquisition of horizontally transferred genes in the evolution of Y. pestis and points to virulence determinants that have been gained and lost on multiple occasions in the history of the genus.

Subject(s)

Genome, Bacterial , Yersinia/genetics , Chromosome Mapping/methods , Cluster Analysis , Genetic Techniques , Genetic Variation , Multigene Family , Phylogeny , Sequence Analysis, DNA , Species Specificity , Virulence , Yersinia enterocolitica/genetics , Yersinia pestis/genetics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL