Search | VHL Regional Portal

Advances in biodiversity: metagenomics and the unveiling of biological dark matter.

Robbins, Robert J; Krishtalka, Leonard; Wooley, John C.

Stand Genomic Sci ; 11(1): 69, 2016.

Article in English | MEDLINE | ID: mdl-27617059

ABSTRACT

BACKGROUND: Efforts to harmonize genomic data standards used by the biodiversity and metagenomic research communities have shown that prokaryotic data cannot be understood or represented in a traditional, classical biological context for conceptual reasons, not technical ones. RESULTS: Biology, like physics, has a fundamental duality-the classical macroscale eukaryotic realm vs. the quantum microscale microbial realm-with the two realms differing profoundly, and counter-intuitively, from one another. Just as classical physics is emergent from and cannot explain the microscale realm of quantum physics, so classical biology is emergent from and cannot explain the microscale realm of prokaryotic life. Classical biology describes the familiar, macroscale realm of multi-cellular eukaryotic organisms, which constitute a highly derived and constrained evolutionary subset of the biosphere, unrepresentative of the vast, mostly unseen, microbial world of prokaryotic life that comprises at least half of the planet's biomass and most of its genetic diversity. The two realms occupy fundamentally different mega-niches: eukaryotes interact primarily mechanically with the environment, prokaryotes primarily physiologically. Further, many foundational tenets of classical biology simply do not apply to prokaryotic biology. CONCLUSIONS: Classical genetics one held that genes, arranged on chromosomes like beads on a string, were the fundamental units of mutation, recombination, and heredity. Then, molecular analysis showed that there were no fundamental units, no beads, no string. Similarly, classical biology asserts that individual organisms and species are fundamental units of ecology, evolution, and biodiversity, composing an evolutionary history of objectively real, lineage-defined groups in a single-rooted tree of life. Now, metagenomic tools are forcing a recognition that there are no completely objective individuals, no unique lineages, and no one true tree. The newly revealed biosphere of microbial dark matter cannot be understood merely by extending the concepts and methods of eukaryotic macrobiology. The unveiling of biological dark matter is allowing us to see, for the first time, the diversity of the entire biosphere and, to paraphrase Darwin, is providing a new view of life. Advancing and understanding that view will require major revisions to some of the most fundamental concepts and theories in biology.

Expansion of the protein repertoire in newly explored environments: human gut microbiome specific protein families.

Ellrott, Kyle; Jaroszewski, Lukasz; Li, Weizhong; Wooley, John C; Godzik, Adam.

PLoS Comput Biol ; 6(6): e1000798, 2010 Jun 03.

Article in English | MEDLINE | ID: mdl-20532204

ABSTRACT

The microbes that inhabit particular environments must be able to perform molecular functions that provide them with a competitive advantage to thrive in those environments. As most molecular functions are performed by proteins and are conserved between related proteins, we can expect that organisms successful in a given environmental niche would contain protein families that are specific for functions that are important in that environment. For instance, the human gut is rich in polysaccharides from the diet or secreted by the host, and is dominated by Bacteroides, whose genomes contain highly expanded repertoire of protein families involved in carbohydrate metabolism. To identify other protein families that are specific to this environment, we investigated the distribution of protein families in the currently available human gut genomic and metagenomic data. Using an automated procedure, we identified a group of protein families strongly overrepresented in the human gut. These not only include many families described previously but also, interestingly, a large group of previously unrecognized protein families, which suggests that we still have much to discover about this environment. The identification and analysis of these families could provide us with new information about an environment critical to our health and well being.

Subject(s)

Bacterial Proteins/genetics , Computational Biology/methods , Gastrointestinal Tract/microbiology , Genome, Bacterial , Metagenome , Cluster Analysis , Databases, Protein , Humans

A primer on metagenomics.

Wooley, John C; Godzik, Adam; Friedberg, Iddo.

PLoS Comput Biol ; 6(2): e1000667, 2010 Feb 26.

Article in English | MEDLINE | ID: mdl-20195499

ABSTRACT

Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics.

Subject(s)

Metagenomics , Computational Biology , Sequence Analysis, DNA

Extending Standards for Genomics and Metagenomics Data: A Research Coordination Network for the Genomic Standards Consortium (RCN4GSC).

Wooley, John C; Field, Dawn; Glöckner, Frank-Oliver.

Stand Genomic Sci ; 1(1): 87-90, 2009 Jul 20.

Article in English | MEDLINE | ID: mdl-21304642

ABSTRACT

Through a newly established Research Coordination Network for the Genomic Standards Consortium (RCN4GSC), the GSC will continue its leadership in establishing and integrating genomic standards through community-based efforts. These efforts, undertaken in the context of genomic and metagenomic research aim to ensure the electronic capture of all genomic data and to facilitate the achievement of a community consensus around collecting and managing relevant contextual information connected to the sequence data. The GSC operates as an open, inclusive organization, welcoming inspired biologists with a commitment to community service. Within the collaborative framework of the ongoing, international activities of the GSC, the RCN will expand the range of research domains engaged in these standardization efforts and sustain scientific networking to encourage active participation by the broader community. The RCN4GSC, funded for five years by the US National Science Foundation, will primarily support outcome-focused working meetings and the exchange of early-career scientists between GSC research groups in order to advance key standards contributions such as GCDML. Focusing on the timely delivery of the extant GSC core projects, the RCN will also extend the pioneering efforts of the GSC to engage researchers active in developing ecological, environmental and biodiversity data standards. As the initial goals of the GSC are increasingly achieved, promoting the comprehensive use of effective standards will be essential to ensure the effective use of sequence and associated data, to provide access for all biologists to all of the information, and to create interdisciplinary opportunities for discovery. The RCN will facilitate these implementation activities through participation in major scientific conferences and presentations on scientific advances enabled by community usage of genomic standards.

Metagenomics: Facts and Artifacts, and Computational Challenges*

Wooley, John C; Ye, Yuzhen.

J Comput Sci Technol ; 25(1): 71-81, 2009 Jan.

Article in English | MEDLINE | ID: mdl-20648230

ABSTRACT

Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. By enabling an analysis of populations including many (so-far) unculturable and often unknown microbes, metagenomics is revolutionizing the field of microbiology, and has excited researchers in many disciplines that could benefit from the study of environmental microbes, including those in ecology, environmental sciences, and biomedicine. Specific computational and statistical tools have been developed for metagenomic data analysis and comparison. New studies, however, have revealed various kinds of artifacts present in metagenomics data caused by limitations in the experimental protocols and/or inadequate data analysis procedures, which often lead to incorrect conclusions about a microbial community. Here, we review some of the artifacts, such as overestimation of species diversity and incorrect estimation of gene family frequencies, and discuss emerging computational approaches to address them. We also review potential challenges that metagenomics may encounter with the extensive application of next-generation sequencing (NGS) techniques.

Probing metagenomics by rapid cluster analysis of very large datasets.

Li, Weizhong; Wooley, John C; Godzik, Adam.

PLoS One ; 3(10): e3375, 2008.

Article in English | MEDLINE | ID: mdl-18846219

ABSTRACT

BACKGROUND: The scale and diversity of metagenomic sequencing projects challenge both our technical and conceptual approaches in gene and genome annotations. The recent Sorcerer II Global Ocean Sampling (GOS) expedition yielded millions of predicted protein sequences, which significantly altered the landscape of known protein space by more than doubling its size and adding thousands of new families (Yooseph et al., 2007 PLoS Biol 5, e16). Such datasets, not only by their sheer size, but also by many other features, defy conventional analysis and annotation methods. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we describe an approach for rapid analysis of the sequence diversity and the internal structure of such very large datasets by advanced clustering strategies using the newly modified CD-HIT algorithm. We performed a hierarchical clustering analysis on the 17.4 million Open Reading Frames (ORFs) identified from the GOS study and found over 33 thousand large predicted protein clusters comprising nearly 6 million sequences. Twenty percent of these clusters did not match known protein families by sequence similarity search and might represent novel protein families. Distributions of the large clusters were illustrated on organism composition, functional class, and sample locations. CONCLUSION/SIGNIFICANCE: Our clustering took about two orders of magnitude less computational effort than the similar protein family analysis of original GOS study. This approach will help to analyze other large metagenomic datasets in the future. A Web server with our clustering results and annotations of predicted protein clusters is available online at http://tools.camera.calit2.net/gos under the CAMERA project.

Subject(s)

Cluster Analysis , Databases, Protein , Proteins/analysis , Proteins/classification , Sequence Analysis, Protein/methods , Algorithms , Genome , Oceans and Seas , Open Reading Frames , Proteins/genetics , Seawater/chemistry , Seawater/microbiology , Software

The grand challenge: facing up to proteomics.

Wooley, John C.

Trends Biotechnol ; 20(8): 316-7, 2002 Aug.

Article in English | MEDLINE | ID: mdl-12127269

Subject(s)

Databases, Protein , Gene Expression Regulation , Genome, Human , Proteomics/trends , Sequence Analysis, Protein , Humans , Structure-Activity Relationship

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL