Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
Add more filters










Publication year range
1.
Elife ; 132024 May 02.
Article in English | MEDLINE | ID: mdl-38696239

ABSTRACT

The reconstruction of complete microbial metabolic pathways using 'omics data from environmental samples remains challenging. Computational pipelines for pathway reconstruction that utilize machine learning methods to predict the presence or absence of KEGG modules in incomplete genomes are lacking. Here, we present MetaPathPredict, a software tool that incorporates machine learning models to predict the presence of complete KEGG modules within bacterial genomic datasets. Using gene annotation data and information from the KEGG module database, MetaPathPredict employs deep learning models to predict the presence of KEGG modules in a genome. MetaPathPredict can be used as a command line tool or as a Python module, and both options are designed to be run locally or on a compute cluster. Benchmarks show that MetaPathPredict makes robust predictions of KEGG module presence within highly incomplete genomes.


Subject(s)
Genome, Bacterial , Metabolic Networks and Pathways , Software , Metabolic Networks and Pathways/genetics , Computational Biology/methods , Machine Learning , Bacteria/genetics , Bacteria/metabolism , Bacteria/classification
2.
mBio ; 12(3): e0339620, 2021 06 29.
Article in English | MEDLINE | ID: mdl-34060330

ABSTRACT

Although often neglected in gut microbiota studies, recent evidence suggests that imbalanced, or dysbiotic, gut mycobiota (fungal microbiota) communities in infancy coassociate with states of bacterial dysbiosis linked to inflammatory diseases such as asthma. In the present study, we (i) characterized the infant gut mycobiota at 3 months and 1 year of age in 343 infants from the CHILD Cohort Study, (ii) defined associations among gut mycobiota community composition and environmental factors for the development of inhalant allergic sensitization (atopy) at age 5 years, and (iii) built a predictive model for inhalant atopy status at age 5 years using these data. We show that in Canadian infants, fungal communities shift dramatically in composition over the first year of life. Early-life environmental factors known to affect gut bacterial communities were also associated with differences in gut fungal community alpha diversity, beta diversity, and/or the relative abundance of specific fungal taxa. Moreover, these metrics differed among healthy infants and those who developed inhalant allergic sensitization (atopy) by age 5 years. Using a rationally selected set of early-life environmental factors in combination with fungal community composition at 1 year of age, we developed a machine learning logistic regression model that predicted inhalant atopy status at 5 years of age with 81% accuracy. Together, these data suggest an important role for the infant gut mycobiota in early-life immune development and indicate that early-life behavioral or therapeutic interventions have the potential to modify infant gut fungal communities, with implications for an infant's long-term health. IMPORTANCE Recent evidence suggests an immunomodulatory role for commensal fungi (mycobiota) in the gut, yet little is known about the composition and dynamics of early-life gut fungal communities. In this work, we show for the first time that the composition of the gut mycobiota of Canadian infants changes dramatically over the course of the first year of life, is associated with environmental factors such as geographical location, diet, and season of birth, and can be used in conjunction with knowledge of a small number of key early-life factors to predict inhalant atopy status at age 5 years. Our study highlights the importance of considering fungal communities as indicators or inciters of immune dysfunction preceding the onset of allergic disease and can serve as a benchmark for future studies aiming to examine infant gut fungal communities across birth cohorts.


Subject(s)
Environment , Fungi/genetics , Gastrointestinal Microbiome/genetics , Hypersensitivity/etiology , Hypersensitivity/microbiology , Mycobiome/genetics , Asthma/etiology , Asthma/microbiology , Child, Preschool , Cohort Studies , Dysbiosis , Feces/microbiology , Female , Fungi/classification , Gastrointestinal Microbiome/physiology , Humans , Hypersensitivity/complications , Infant , Male , Mycobiome/physiology
3.
Sci Data ; 4: 170035, 2017 04 11.
Article in English | MEDLINE | ID: mdl-28398290

ABSTRACT

Advances in high-throughput sequencing are reshaping how we perceive microbial communities inhabiting the human body, with implications for therapeutic interventions. Several large-scale datasets derived from hundreds of human microbiome samples sourced from multiple studies are now publicly available. However, idiosyncratic data processing methods between studies introduce systematic differences that confound comparative analyses. To overcome these challenges, we developed GutCyc, a compendium of environmental pathway genome databases (ePGDBs) constructed from 418 assembled human microbiome datasets using MetaPathways, enabling reproducible functional metagenomic annotation. We also generated metabolic network reconstructions for each metagenome using the Pathway Tools software, empowering researchers and clinicians interested in visualizing and interpreting metabolic pathways encoded by the human gut microbiome. For the first time, GutCyc provides consistent annotations and metabolic pathway predictions, making possible comparative community analyses between health and disease states in inflammatory bowel disease, Crohn's disease, and type 2 diabetes. GutCyc data products are searchable online, or may be downloaded and explored locally using MetaPathways and Pathway Tools.


Subject(s)
Databases, Genetic , Gastrointestinal Microbiome , Metabolic Networks and Pathways , Crohn Disease/microbiology , Diabetes Mellitus, Type 2/microbiology , Geography, Medical , Humans , Inflammatory Bowel Diseases/microbiology , Metagenome , Metagenomics
4.
Bioinformatics ; 32(23): 3535-3542, 2016 12 01.
Article in English | MEDLINE | ID: mdl-27515739

ABSTRACT

MOTIVATION: A perennial problem in the analysis of environmental sequence information is the assignment of reads or assembled sequences, e.g. contigs or scaffolds, to discrete taxonomic bins. In the absence of reference genomes for most environmental microorganisms, the use of intrinsic nucleotide patterns and phylogenetic anchors can improve assembly-dependent binning needed for more accurate taxonomic and functional annotation in communities of microorganisms, and assist in identifying mobile genetic elements or lateral gene transfer events. RESULTS: Here, we present a statistic called LCA* inspired by Information and Voting theories that uses the NCBI Taxonomic Database hierarchy to assign taxonomy to contigs assembled from environmental sequence information. The LCA* algorithm identifies a sufficiently strong majority on the hierarchy while minimizing entropy changes to the observed taxonomic distribution resulting in improved statistical properties. Moreover, we apply results from the order-statistic literature to formulate a likelihood-ratio hypothesis test and P-value for testing the supremacy of the assigned LCA* taxonomy. Using simulated and real-world datasets, we empirically demonstrate that voting-based methods, majority vote and LCA*, in the presence of known reference annotations, are consistently more accurate in identifying contig taxonomy than the lowest common ancestor algorithm popularized by MEGAN, and that LCA* taxonomy strikes a balance between specificity and confidence to provide an estimate appropriate to the available information in the data. AVAILABILITY AND IMPLEMENTATION: The LCA* has been implemented as a stand-alone Python library compatible with the MetaPathways pipeline; both of which are available on GitHub with installation instructions and use-cases (http://www.github.com/hallamlab/LCAStar/). CONTACT: shallam@mail.ubc.caSupplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Metagenome , Phylogeny , Entropy , Models, Statistical
5.
Curr Opin Microbiol ; 31: 209-216, 2016 06.
Article in English | MEDLINE | ID: mdl-27183115

ABSTRACT

A revolution is unfolding in microbial ecology where petabytes of 'multi-omics' data are produced using next generation sequencing and mass spectrometry platforms. This cornucopia of biological information has enormous potential to reveal the hidden metabolic powers of microbial communities in natural and engineered ecosystems. However, to realize this potential, the development of new technologies and interpretative frameworks grounded in ecological design principles are needed to overcome computational and analytical bottlenecks. Here we explore the relationship between microbial ecology and information science in the era of cloud-based computation. We consider microorganisms as individual information processing units implementing a distributed metabolic algorithm and describe developments in ecoinformatics and ubiquitous computing with the potential to eliminate bottlenecks and empower knowledge creation and translation.


Subject(s)
Ecological and Environmental Phenomena , Electronic Data Processing/methods , Information Science/methods , Information Services , Microbial Consortia/genetics , Ecosystem , High-Throughput Nucleotide Sequencing , Internet
6.
Bioinformatics ; 31(20): 3345-7, 2015 Oct 15.
Article in English | MEDLINE | ID: mdl-26076725

ABSTRACT

UNLABELLED: Next-generation sequencing is producing vast amounts of sequence information from natural and engineered ecosystems. Although this data deluge has an enormous potential to transform our lives, knowledge creation and translation need software applications that scale with increasing data processing and analysis requirements. Here, we present improvements to MetaPathways, an annotation and analysis pipeline for environmental sequence information that expedites this transformation. We specifically address pathway prediction hazards through integration of a weighted taxonomic distance and enable quantitative comparison of assembled annotations through a normalized read-mapping measure. Additionally, we improve LAST homology searches through BLAST-equivalent E-values and output formats that are natively compatible with prevailing software applications. Finally, an updated graphical user interface allows for keyword annotation query and projection onto user-defined functional gene hierarchies, including the Carbohydrate-Active Enzyme database. AVAILABILITY AND IMPLEMENTATION: MetaPathways v2.5 is available on GitHub: http://github.com/hallamlab/metapathways2. CONTACT: shallam@mail.ubc.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Information Storage and Retrieval , Molecular Sequence Annotation , Phylogeny , Software , Algorithms , Databases, Genetic , Humans , Sequence Analysis, DNA/methods
7.
Appl Environ Microbiol ; 80(21): 6807-18, 2014 Nov.
Article in English | MEDLINE | ID: mdl-25172853

ABSTRACT

Despite recent advances in metagenomic and single-cell genomic sequencing to investigate uncultivated microbial diversity and metabolic potential, fundamental questions related to population structure, interactions, and biogeochemical roles of candidate divisions remain. Numerous molecular surveys suggest that stratified ecosystems manifesting anoxic, sulfidic, and/or methane-rich conditions are enriched in these enigmatic microbes. Here we describe diversity, abundance, and cooccurrence patterns of uncultivated microbial communities inhabiting the permanently stratified waters of meromictic Sakinaw Lake, British Columbia, Canada, using 454 sequencing of the small-subunit rRNA gene with three-domain resolution. Operational taxonomic units (OTUs) were affiliated with 64 phyla, including more than 25 candidate divisions. Pronounced trends in community structure were observed for all three domains with eukaryotic sequences vanishing almost completely below the mixolimnion, followed by a rapid and sustained increase in methanogen-affiliated (∼10%) and unassigned (∼60%) archaeal sequences as well as bacterial OTUs affiliated with Chloroflexi (∼22%) and candidate divisions (∼28%). Network analysis revealed highly correlated, depth-dependent cooccurrence patterns between Chloroflexi, candidate divisions WWE1, OP9/JS1, OP8, and OD1, methanogens, and unassigned archaeal OTUs indicating niche partitioning and putative syntrophic growth modes. Indeed, pathway reconstruction using recently published Sakinaw Lake single-cell genomes affiliated with OP9/JS1 and OP8 revealed complete coverage of the Wood-Ljungdahl pathway with potential to drive syntrophic acetate oxidation to hydrogen and carbon dioxide under methanogenic conditions. Taken together, these observations point to previously unrecognized syntrophic networks in meromictic lake ecosystems with the potential to inform design and operation of anaerobic methanogenic bioreactors.


Subject(s)
Archaea/classification , Bacteria/classification , Biota , Eukaryota/classification , Lakes/microbiology , Archaea/genetics , Bacteria/genetics , British Columbia , Cluster Analysis , DNA, Ribosomal/chemistry , DNA, Ribosomal/genetics , Eukaryota/genetics , Molecular Sequence Data , Phylogeny , Sequence Analysis, DNA
8.
BMC Genomics ; 15: 619, 2014 Jul 22.
Article in English | MEDLINE | ID: mdl-25048541

ABSTRACT

BACKGROUND: A convergence of high-throughput sequencing and computational power is transforming biology into information science. Despite these technological advances, converting bits and bytes of sequence information into meaningful insights remains a challenging enterprise. Biological systems operate on multiple hierarchical levels from genomes to biomes. Holistic understanding of biological systems requires agile software tools that permit comparative analyses across multiple information levels (DNA, RNA, protein, and metabolites) to identify emergent properties, diagnose system states, or predict responses to environmental change. RESULTS: Here we adopt the MetaPathways annotation and analysis pipeline and Pathway Tools to construct environmental pathway/genome databases (ePGDBs) that describe microbial community metabolism using MetaCyc, a highly curated database of metabolic pathways and components covering all domains of life. We evaluate Pathway Tools' performance on three datasets with different complexity and coding potential, including simulated metagenomes, a symbiotic system, and the Hawaii Ocean Time-series. We define accuracy and sensitivity relationships between read length, coverage and pathway recovery and evaluate the impact of taxonomic pruning on ePGDB construction and interpretation. Resulting ePGDBs provide interactive metabolic maps, predict emergent metabolic pathways associated with biosynthesis and energy production and differentiate between genomic potential and phenotypic expression across defined environmental gradients. CONCLUSIONS: This multi-tiered analysis provides the user community with specific operating guidelines, performance metrics and prediction hazards for more reliable ePGDB construction and interpretation. Moreover, it demonstrates the power of Pathway Tools in predicting metabolic interactions in natural and engineered ecosystems.


Subject(s)
Genomics/methods , Metabolic Networks and Pathways , Metabolic Networks and Pathways/genetics , Molecular Sequence Annotation
9.
ISME J ; 8(2): 455-68, 2014 Feb.
Article in English | MEDLINE | ID: mdl-24030600

ABSTRACT

Marine Group A (MGA) is a deeply branching and uncultivated phylum of bacteria. Although their functional roles remain elusive, MGA subgroups are particularly abundant and diverse in oxygen minimum zones and permanent or seasonally stratified anoxic basins, suggesting metabolic adaptation to oxygen-deficiency. Here, we expand a previous survey of MGA diversity in O2-deficient waters of the Northeast subarctic Pacific Ocean (NESAP) to include Saanich Inlet (SI), an anoxic fjord with seasonal O2 gradients and periodic sulfide accumulation. Phylogenetic analysis of small subunit ribosomal RNA (16S rRNA) gene clone libraries recovered five previously described MGA subgroups and defined three novel subgroups (SHBH1141, SHBH391, and SHAN400) in SI. To discern the functional properties of MGA residing along gradients of O2 in the NESAP and SI, we identified and sequenced to completion 14 fosmids harboring MGA-associated 16S RNA genes from a collection of 46 fosmid libraries sourced from NESAP and SI waters. Comparative analysis of these fosmids, in addition to four publicly available MGA-associated large-insert DNA fragments from Hawaii Ocean Time-series and Monterey Bay, revealed widespread genomic differentiation proximal to the ribosomal RNA operon that did not consistently reflect subgroup partitioning patterns observed in 16S rRNA gene clone libraries. Predicted protein-coding genes associated with adaptation to O2-deficiency and sulfur-based energy metabolism were detected on multiple fosmids, including polysulfide reductase (psrABC), implicated in dissimilatory polysulfide reduction to hydrogen sulfide and dissimilatory sulfur oxidation. These results posit a potential role for specific MGA subgroups in the marine sulfur cycle.


Subject(s)
Bacteria/classification , Bacteria/genetics , Biodiversity , Phylogeny , Aquatic Organisms/classification , Aquatic Organisms/genetics , Aquatic Organisms/metabolism , Bacteria/metabolism , Genome, Bacterial/genetics , Genomics , Molecular Sequence Data , Oxygen/analysis , Pacific Ocean , RNA, Ribosomal, 16S/genetics , Seawater/chemistry
10.
Environ Sci Technol ; 47(18): 10708-17, 2013 Sep 17.
Article in English | MEDLINE | ID: mdl-23889694

ABSTRACT

Oil in subsurface reservoirs is biodegraded by resident microbial communities. Water-mediated, anaerobic conversion of hydrocarbons to methane and CO2, catalyzed by syntrophic bacteria and methanogenic archaea, is thought to be one of the dominant processes. We compared 160 microbial community compositions in ten hydrocarbon resource environments (HREs) and sequenced twelve metagenomes to characterize their metabolic potential. Although anaerobic communities were common, cores from oil sands and coal beds had unexpectedly high proportions of aerobic hydrocarbon-degrading bacteria. Likewise, most metagenomes had high proportions of genes for enzymes involved in aerobic hydrocarbon metabolism. Hence, although HREs may have been strictly anaerobic and typically methanogenic for much of their history, this may not hold today for coal beds and for the Alberta oil sands, one of the largest remaining oil reservoirs in the world. This finding may influence strategies to recover energy or chemicals from these HREs by in situ microbial processes.


Subject(s)
Archaea/genetics , Bacteria/genetics , Oil and Gas Fields/microbiology , RNA, Archaeal/genetics , Aerobiosis , Alberta , Archaea/classification , Archaea/metabolism , Bacteria/classification , Bacteria/metabolism , Genes, Archaeal , Genes, Bacterial , Hydrocarbons/metabolism , Metagenomics , RNA, Archaeal/metabolism , RNA, Bacterial/genetics , RNA, Ribosomal, 16S/genetics
11.
BMC Bioinformatics ; 14: 202, 2013 Jun 21.
Article in English | MEDLINE | ID: mdl-23800136

ABSTRACT

BACKGROUND: A central challenge to understanding the ecological and biogeochemical roles of microorganisms in natural and human engineered ecosystems is the reconstruction of metabolic interaction networks from environmental sequence information. The dominant paradigm in metabolic reconstruction is to assign functional annotations using BLAST. Functional annotations are then projected onto symbolic representations of metabolism in the form of KEGG pathways or SEED subsystems. RESULTS: Here we present MetaPathways, an open source pipeline for pathway inference that uses the PathoLogic algorithm to map functional annotations onto the MetaCyc collection of reactions and pathways, and construct environmental Pathway/Genome Databases (ePGDBs) compatible with the editing and navigation features of Pathway Tools. The pipeline accepts assembled or unassembled nucleotide sequences, performs quality assessment and control, predicts and annotates noncoding genes and open reading frames, and produces inputs to PathoLogic. In addition to constructing ePGDBs, MetaPathways uses MLTreeMap to build phylogenetic trees for selected taxonomic anchor and functional gene markers, converts General Feature Format (GFF) files into concatenated GenBank files for ePGDB construction based on third-party annotations, and generates useful file formats including Sequin files for direct GenBank submission and gene feature tables summarizing annotations, MLTreeMap trees, and ePGDB pathway coverage summaries for statistical comparisons. CONCLUSIONS: MetaPathways provides users with a modular annotation and analysis pipeline for predicting metabolic interaction networks from environmental sequence information using an alternative to KEGG pathways and SEED subsystems mapping. It is extensible to genomic and transcriptomic datasets from a wide range of sequencing platforms, and generates useful data products for microbial community structure and function analysis. The MetaPathways software package, installation instructions, and example data can be obtained from http://hallam.microbiology.ubc.ca/MetaPathways.


Subject(s)
Databases, Genetic , Environment , Software , Algorithms , Animals , Databases, Nucleic Acid , Ecosystem , Forecasting , Genomics , Humans , Phylogeny
12.
BMC Genomics ; 14 Suppl 1: S3, 2013.
Article in English | MEDLINE | ID: mdl-23368516

ABSTRACT

BACKGROUND: Pairwise comparison of time series data for both local and time-lagged relationships is a computationally challenging problem relevant to many fields of inquiry. The Local Similarity Analysis (LSA) statistic identifies the existence of local and lagged relationships, but determining significance through a p-value has been algorithmically cumbersome due to an intensive permutation test, shuffling rows and columns and repeatedly calculating the statistic. Furthermore, this p-value is calculated with the assumption of normality -- a statistical luxury dissociated from most real world datasets. RESULTS: To improve the performance of LSA on big datasets, an asymptotic upper bound on the p-value calculation was derived without the assumption of normality. This change in the bound calculation markedly improved computational speed from O(pm²n) to O(m²n), where p is the number of permutations in a permutation test, m is the number of time series, and n is the length of each time series. The bounding process is implemented as a computationally efficient software package, FASTLSA, written in C and optimized for threading on multi-core computers, improving its practical computation time. We computationally compare our approach to previous implementations of LSA, demonstrate broad applicability by analyzing time series data from public health, microbial ecology, and social media, and visualize resulting networks using the Cytoscape software. CONCLUSIONS: The FASTLSA software package expands the boundaries of LSA allowing analysis on datasets with millions of co-varying time series. Mapping metadata onto force-directed graphs derived from FASTLSA allows investigators to view correlated cliques and explore previously unrecognized network relationships. The software is freely available for download at: http://www.cmde.science.ubc.ca/hallam/fastLSA/.


Subject(s)
Software , Algorithms , Computational Biology , Female , Humans , Internet , Intestines/microbiology , Male , Metagenome , Mouth/microbiology , Saccharomyces cerevisiae/genetics , Skin/microbiology , User-Computer Interface
13.
ISME J ; 7(2): 256-68, 2013 Feb.
Article in English | MEDLINE | ID: mdl-23151638

ABSTRACT

Marine Group A (MGA) is a candidate phylum of Bacteria that is ubiquitous and abundant in the ocean. Despite being prevalent, the structural and functional properties of MGA populations remain poorly constrained. Here, we quantified MGA diversity and population structure in relation to nutrients and O(2) concentrations in the oxygen minimum zone (OMZ) of the Northeast subarctic Pacific Ocean using a combination of catalyzed reporter deposition fluorescence in situ hybridization (CARD-FISH) and 16S small subunit ribosomal RNA (16S rRNA) gene sequencing (clone libraries and 454-pyrotags). Estimates of MGA abundance as a proportion of total bacteria were similar across all three methods although estimates based on CARD-FISH were consistently lower in the OMZ (5.6%±1.9%) than estimates based on 16S rRNA gene clone libraries (11.0%±3.9%) or pyrotags (9.9%±1.8%). Five previously defined MGA subgroups were recovered in 16S rRNA gene clone libraries and five novel subgroups were defined (HF770D10, P262000D03, P41300E03, P262000N21 and A714018). Rarefaction analysis of pyrotag data indicated that the ultimate richness of MGA was very nearly sampled. Spearman's rank analysis of MGA abundances by CARD-FISH and O(2) concentrations resulted in significant correlation. Analyzed in more detail by 16S rRNA pyrotag sequencing, MGA operational taxonomic units affiliated with subgroups Arctic95A-2 and A714018 comprised 0.3-2.4% of total bacterial sequences and displayed strong correlations with decreasing O(2) concentration. This study is the first comprehensive description of MGA diversity using complementary techniques. These results provide a phylogenetic framework for interpreting future studies on ecotype selection among MGA subgroups, and suggest a potentially important role for MGA in the ecology and biogeochemistry of OMZs.


Subject(s)
Bacteria/classification , Biodiversity , Phylogeny , Seawater/microbiology , Bacteria/genetics , Base Sequence , DNA, Bacterial/genetics , Gene Library , Molecular Sequence Data , Pacific Ocean , RNA, Ribosomal, 16S/genetics , Sequence Analysis, DNA , Water Microbiology
14.
Nat Rev Microbiol ; 10(6): 381-94, 2012 May 14.
Article in English | MEDLINE | ID: mdl-22580367

ABSTRACT

Dissolved oxygen concentration is a crucial organizing principle in marine ecosystems. As oxygen levels decline, energy is increasingly diverted away from higher trophic levels into microbial metabolism, leading to loss of fixed nitrogen and to production of greenhouse gases, including nitrous oxide and methane. In this Review, we describe current efforts to explore the fundamental factors that control the ecological and microbial biodiversity in oxygen-starved regions of the ocean, termed oxygen minimum zones. We also discuss how recent advances in microbial ecology have provided information about the potential interactions in distributed co-occurrence and metabolic networks in oxygen minimum zones, and we provide new insights into coupled biogeochemical processes in the ocean.


Subject(s)
Biota , Energy Metabolism , Oxygen/metabolism , Seawater/microbiology , Greenhouse Effect , Methane/metabolism , Nitrous Oxide/metabolism
15.
Proc Natl Acad Sci U S A ; 109(20): 7665-70, 2012 May 15.
Article in English | MEDLINE | ID: mdl-22547789

ABSTRACT

We present a programmable droplet-based microfluidic device that combines the reconfigurable flow-routing capabilities of integrated microvalve technology with the sample compartmentalization and dispersion-free transport that is inherent to droplets. The device allows for the execution of user-defined multistep reaction protocols in 95 individually addressable nanoliter-volume storage chambers by consecutively merging programmable sequences of picoliter-volume droplets containing reagents or cells. This functionality is enabled by "flow-controlled wetting," a droplet docking and merging mechanism that exploits the physics of droplet flow through a channel to control the precise location of droplet wetting. The device also allows for automated cross-contamination-free recovery of reaction products from individual chambers into standard microfuge tubes for downstream analysis. The combined features of programmability, addressability, and selective recovery provide a general hardware platform that can be reprogrammed for multiple applications. We demonstrate this versatility by implementing multiple single-cell experiment types with this device: bacterial cell sorting and cultivation, taxonomic gene identification, and high-throughput single-cell whole genome amplification and sequencing using common laboratory strains. Finally, we apply the device to genome analysis of single cells and microbial consortia from diverse environmental samples including a marine enrichment culture, deep-sea sediments, and the human oral cavity. The resulting datasets capture genotypic properties of individual cells and illuminate known and potentially unique partnerships between microbial community members.


Subject(s)
Hydrodynamics , Metagenome/genetics , Microfluidic Analytical Techniques/instrumentation , Microfluidic Analytical Techniques/methods , Base Sequence , DNA Primers/genetics , Genotype , Geologic Sediments/microbiology , Humans , Image Processing, Computer-Assisted , Metagenomics/methods , Microscopy, Fluorescence , Molecular Sequence Data , Mouth/microbiology , Polymerase Chain Reaction , RNA, Ribosomal, 16S/genetics , Sequence Analysis, DNA , Surface-Active Agents , Wettability
16.
Int J Bioinform Res Appl ; 1(2): 145-61, 2005.
Article in English | MEDLINE | ID: mdl-18048126

ABSTRACT

String barcoding is a recently introduced technique for genomic based identification of microorganisms. In this paper, we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size, on a well equipped workstation. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem.


Subject(s)
Algorithms , Genomics
SELECTION OF CITATIONS
SEARCH DETAIL
...