Search | VHL Regional Portal

Using populations of human and microbial genomes for organism detection in metagenomes.

Ames, Sasha K; Gardner, Shea N; Marti, Jose Manuel; Slezak, Tom R; Gokhale, Maya B; Allen, Jonathan E.

Genome Res ; 25(7): 1056-67, 2015 Jul.

Article in English | MEDLINE | ID: mdl-25926546

ABSTRACT

Identifying causative disease agents in human patients from shotgun metagenomic sequencing (SMS) presents a powerful tool to apply when other targeted diagnostics fail. Numerous technical challenges remain, however, before SMS can move beyond the role of research tool. Accurately separating the known and unknown organism content remains difficult, particularly when SMS is applied as a last resort. The true amount of human DNA that remains in a sample after screening against the human reference genome and filtering nonbiological components left from library preparation has previously been underreported. In this study, we create the most comprehensive collection of microbial and reference-free human genetic variation available in a database optimized for efficient metagenomic search by extracting sequences from GenBank and the 1000 Genomes Project. The results reveal new human sequences found in individual Human Microbiome Project (HMP) samples. Individual samples contain up to 95% human sequence, and 4% of the individual HMP samples contain 10% or more human reads. Left unidentified, human reads can complicate and slow down further analysis and lead to inaccurately labeled microbial taxa and ultimately lead to privacy concerns as more human genome data is collected.

Subject(s)

Genome, Microbial , Metagenome , Metagenomics/methods , Microbiota , Computational Biology/methods , Databases, Nucleic Acid , Humans , ROC Curve

Scalable metagenomic taxonomy classification using a reference genome database.

Ames, Sasha K; Hysom, David A; Gardner, Shea N; Lloyd, G Scott; Gokhale, Maya B; Allen, Jonathan E.

Bioinformatics ; 29(18): 2253-60, 2013 Sep 15.

Article in English | MEDLINE | ID: mdl-23828782

ABSTRACT

MOTIVATION: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge. RESULTS: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample. AVAILABILITY: Software was implemented in C++ and is freely available at http://sourceforge.net/projects/lmat CONTACT: allen99@llnl.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Metagenomics/methods , Phylogeny , Algorithms , Classification/methods , Databases, Nucleic Acid , Genome , High-Throughput Nucleotide Sequencing , Software

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL