Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 56
Filter
1.
Microbiol Spectr ; : e0069524, 2024 Jun 24.
Article in English | MEDLINE | ID: mdl-38912828

ABSTRACT

Amplicon sequencing stands as a cornerstone in microbiome profiling, yet concerns persist regarding its resolution and accuracy. The enhancement of reference databases and annotations marks a new era for 16S rRNA-based profiling. Capitalizing on this potential, we introduce PM-profiler, a novel tool for profiling amplicon short reads. PM-profiler is implemented by C++-based advanced algorithms, such as pre-allocated hash for reference construction, hybrid and dynamic short-read matching, big-data-guided dual-mode hierarchical taxonomy annotation strategy, and full-procedure parallel computing. This tool delivers species-level resolution and ultrafast speed for large-scale microbiomes, surpassing alignment-based approaches and the Naïve-Bayesian model. Furthermore, recognizing the global uneven distribution of microbes, we delineate optimal annotation strategies for each sampling habitat based on microbial patterns over 270,000 microbiomes. Integrated with the established workflow of Parallel-Meta Suite and the latest curated reference databases, this endeavor offers a swift and dependable solution for high-precision microbiome surveys.IMPORTANCEOur study introduces PM-profiler, a new tool that deciphers the complexity of microbial communities. With advanced algorithms, flexible annotation strategies, and well-organized big-data, PM-profiler provides a faster and more accurate way to study on microbiomes, paving the way for discoveries that could improve our understanding of microbiomes and their impact on the world.

2.
BMC Bioinformatics ; 25(1): 214, 2024 Jun 14.
Article in English | MEDLINE | ID: mdl-38877401

ABSTRACT

BACKGROUND: The exploration of gene-disease associations is crucial for understanding the mechanisms underlying disease onset and progression, with significant implications for prevention and treatment strategies. Advances in high-throughput biotechnology have generated a wealth of data linking diseases to specific genes. While graph representation learning has recently introduced groundbreaking approaches for predicting novel associations, existing studies always overlooked the cumulative impact of functional modules such as protein complexes and the incompletion of some important data such as protein interactions, which limits the detection performance. RESULTS: Addressing these limitations, here we introduce a deep learning framework called ModulePred for predicting disease-gene associations. ModulePred performs graph augmentation on the protein interaction network using L3 link prediction algorithms. It builds a heterogeneous module network by integrating disease-gene associations, protein complexes and augmented protein interactions, and develops a novel graph embedding for the heterogeneous module network. Subsequently, a graph neural network is constructed to learn node representations by collectively aggregating information from topological structure, and gene prioritization is carried out by the disease and gene embeddings obtained from the graph neural network. Experimental results underscore the superiority of ModulePred, showcasing the effectiveness of incorporating functional modules and graph augmentation in predicting disease-gene associations. This research introduces innovative ideas and directions, enhancing the understanding and prediction of gene-disease relationships.


Subject(s)
Algorithms , Deep Learning , Humans , Computational Biology/methods , Protein Interaction Maps/genetics , Genetic Predisposition to Disease/genetics , Neural Networks, Computer , Genetic Association Studies/methods
3.
Spectrochim Acta A Mol Biomol Spectrosc ; 318: 124454, 2024 Oct 05.
Article in English | MEDLINE | ID: mdl-38788500

ABSTRACT

For species identification analysis, methods based on deep learning are becoming prevalent due to their data-driven and task-oriented nature. The most commonly used convolutional neural network (CNN) model has been well applied in Raman spectra recognition. However, when faced with similar molecules or functional groups, the features of overlapping peaks and weak peaks may not be fully extracted using the CNN model, which can potentially hinder accurate species identification. Based on these practical challenges, the fusion of multi-modal data can effectively meet the comprehensive and accurate analysis of actual samples when compared with single-modal data. In this study, we propose a double-branch CNN model by integrating Raman and image multi-modal data, named SI-DBNet. In addition, we have developed a one-dimensional convolutional neural network combining dilated convolutions and efficient channel attention mechanisms for spectral branching. The effectiveness of the model has been demonstrated using the Grad-CAM method to visualize the key regions concerned by the model. When compared to single-modal and multi-modal classification methods, our SI-DBNet model achieved superior performance with a classification accuracy of 98.8%. The proposed method provided a new reference for species identification based on multi-modal data fusion.

4.
BMC Genomics ; 25(1): 515, 2024 May 25.
Article in English | MEDLINE | ID: mdl-38796435

ABSTRACT

BACKGROUND: The short-read whole-genome sequencing (WGS) approach has been widely applied to investigate the genomic variation in the natural populations of many plant species. With the rapid advancements in long-read sequencing and genome assembly technologies, high-quality genome sequences are available for a group of varieties for many plant species. These genome sequences are expected to help researchers comprehensively investigate any type of genomic variants that are missed by the WGS technology. However, multiple genome alignment (MGA) tools designed by the human genome research community might be unsuitable for plant genomes. RESULTS: To fill this gap, we developed the AnchorWave-Cactus Multiple Genome Alignment (ACMGA) pipeline, which improved the alignment of repeat elements and could identify long (> 50 bp) deletions or insertions (INDELs). We conducted MGA using ACMGA and Cactus for 8 Arabidopsis (Arabidopsis thaliana) and 26 Maize (Zea mays) de novo assembled genome sequences and compared them with the previously published short-read variant calling results. MGA identified more single nucleotide variants (SNVs) and long INDELs than did previously published WGS variant callings. Additionally, ACMGA detected significantly more SNVs and long INDELs in repetitive regions and the whole genome than did Cactus. Compared with the results of Cactus, the results of ACMGA were more similar to the previously published variants called using short-read. These two MGA pipelines identified numerous multi-allelic variants that were missed by the WGS variant calling pipeline. CONCLUSIONS: Aligning de novo assembled genome sequences could identify more SNVs and INDELs than mapping short-read. ACMGA combines the advantages of AnchorWave and Cactus and offers a practical solution for plant MGA by integrating global alignment, a 2-piece-affine-gap cost strategy, and the progressive MGA algorithm.


Subject(s)
Arabidopsis , Genome, Plant , Zea mays , Arabidopsis/genetics , Zea mays/genetics , Sequence Alignment , INDEL Mutation , Genomics/methods , Polymorphism, Single Nucleotide , Whole Genome Sequencing/methods , Software
5.
Bioinform Adv ; 4(1): vbae013, 2024.
Article in English | MEDLINE | ID: mdl-38371919

ABSTRACT

Motivation: The human microbiome, found throughout various body parts, plays a crucial role in health dynamics and disease development. Recent research has highlighted microbiome disparities between patients with different diseases and healthy individuals, suggesting the microbiome's potential in recognizing health states. Traditionally, microbiome-based status classification relies on pre-trained machine learning (ML) models. However, most ML methods overlook microbial relationships, limiting model performance. Results: To address this gap, we propose PM-CNN (Phylogenetic Multi-path Convolutional Neural Network), a novel phylogeny-based neural network model for multi-status classification and disease detection using microbiome data. PM-CNN organizes microbes based on their phylogenetic relationships and extracts features using a multi-path convolutional neural network. An ensemble learning method then fuses these features to make accurate classification decisions. We applied PM-CNN to human microbiome data for status and disease detection, demonstrating its significant superiority over existing ML models. These results provide a robust foundation for microbiome-based state recognition and disease prediction in future research and applications. Availability and implementation: PM-CNN software is available at https://github.com/qdu-bioinfo/PM_CNN.

6.
ISME J ; 18(1)2024 Jan 08.
Article in English | MEDLINE | ID: mdl-38365232

ABSTRACT

Ammonia-oxidizing archaea (AOA) are among the most ubiquitous and abundant archaea on Earth, widely distributed in marine, terrestrial, and geothermal ecosystems. However, the genomic diversity, biogeography, and evolutionary process of AOA populations in subsurface environments are vastly understudied compared to those in marine and soil systems. Here, we report a novel AOA order Candidatus (Ca.) Nitrosomirales which forms a sister lineage to the thermophilic Ca. Nitrosocaldales. Metagenomic and 16S rRNA gene-read mapping demonstrates the abundant presence of Nitrosomirales AOA in various groundwater environments and their widespread distribution across a range of geothermal, terrestrial, and marine habitats. Terrestrial Nitrosomirales AOA show the genetic capacity of using formate as a source of reductant and using nitrate as an alternative electron acceptor. Nitrosomirales AOA appear to have acquired key metabolic genes and operons from other mesophilic populations via horizontal gene transfer, including genes encoding urease, nitrite reductase, and V-type ATPase. The additional metabolic versatility conferred by acquired functions may have facilitated their radiation into a variety of subsurface, marine, and soil environments. We also provide evidence that each of the four AOA orders spans both marine and terrestrial habitats, which suggests a more complex evolutionary history for major AOA lineages than previously proposed. Together, these findings establish a robust phylogenomic framework of AOA and provide new insights into the ecology and adaptation of this globally abundant functional guild.


Subject(s)
Ammonia , Archaea , Ammonia/metabolism , Ecosystem , RNA, Ribosomal, 16S/genetics , RNA, Ribosomal, 16S/metabolism , Oxidation-Reduction , Phylogeny , Soil , Soil Microbiology
7.
Front Microbiol ; 14: 1291010, 2023.
Article in English | MEDLINE | ID: mdl-37915854

ABSTRACT

Selenium (Se) is an essential trace element that plays a vital role in various physiological functions of the human body, despite its small proportion. Due to the inability of the human body to synthesize selenium, there has been increasing concern regarding its nutritional value and adequate intake as a micronutrient. The efficiency of selenium absorption varies depending on individual biochemical characteristics and living environments, underscoring the importance of accurately estimating absorption efficiency to prevent excessive or inadequate intake. As a crucial digestive organ in the human body, gut harbors a complex and diverse microbiome, which has been found to have a significant correlation with the host's overall health status. To investigate the relationship between the gut microbiome and selenium absorption, a two-month intervention experiment was conducted among Chinese adult cohorts. Results indicated that selenium supplementation had minimal impact on the overall diversity of the gut microbiome but was associated with specific subsets of microorganisms. More importantly, these dynamics exhibited variations across regions and sequencing batches, which complicated the interpretation and utilization of gut microbiome data. To address these challenges, we proposed a hybrid predictive modeling method, utilizing refined gut microbiome features and host variable encoding. This approach accurately predicts individual selenium absorption efficiency by revealing hidden microbial patterns while minimizing differences in sequencing data across batches and regions. These efforts provide new insights into the interaction between micronutrients and the gut microbiome, as well as a promising direction for precise nutrition in the future.

8.
Microbiol Spectr ; 11(3): e0056323, 2023 06 15.
Article in English | MEDLINE | ID: mdl-37102867

ABSTRACT

The 16S rRNA gene works as a rapid and effective marker for the identification of microorganisms in complex communities; hence, a huge number of microbiomes have been surveyed by 16S amplicon-based sequencing. The resolution of the 16S rRNA gene is always considered only at the genus level; however, it has not been verified on a wide range of microbes yet. To fully explore the ability and potential of the 16S rRNA gene in microbial profiling, here, we propose Qscore, a comprehensive method to evaluate the performance of amplicons by integrating the amplification rate, multitier taxonomic annotation, sequence type, and length. Our in silico assessment by a "global view" of 35,889 microbe species across multiple reference databases summarizes the optimal sequencing strategy for 16S short reads. On the other hand, since microbes are unevenly distributed according to their habitats, we also provide the recommended configuration for 16 typical ecosystems based on the Qscores of 157,390 microbiomes in the Microbiome Search Engine (MSE). Detailed data simulation further proves that the 16S amplicons produced with Qscore-suggested parameters exhibit high precision in microbiome profiling, which is close to that of shotgun metagenomes under CAMI metrics. Therefore, by reconsidering the precision of 16S-based microbiome profiling, our work not only enables the high-quality reusability of massive sequence legacy that has already been produced but is also significant for guiding microbiome studies in the future. We have implemented the Qscore as an online service at http://qscore.single-cell.cn to parse the recommended sequencing strategy for specific habitats or expected microbial structures. IMPORTANCE 16S rRNA has long been used as a biomarker to identify distinct microbes from complex communities. However, due to the influence of the amplification region, sequencing type, sequence processing, and reference database, the accuracy of 16S rRNA has not been fully verified on a global range. More importantly, the microbial composition of different habitats varies greatly, and it is necessary to adopt different strategies according to the corresponding target microbes to achieve optimal analytical performance. Here, we developed Qscore, which evaluates the comprehensive performance of 16S amplicons from multiple perspectives, thus providing the best sequencing strategies for common ecological environments by using big data.


Subject(s)
Microbiota , RNA, Ribosomal, 16S/genetics , Genes, rRNA , Phylogeny , Microbiota/genetics , Metagenome , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods
9.
Bioinformatics ; 39(4)2023 04 03.
Article in English | MEDLINE | ID: mdl-36946295

ABSTRACT

MOTIVATION: Beta-diversity quantitatively measures the difference among microbial communities thus enlightening the association between microbiome composition and environment properties or host phenotypes. The beta-diversity analysis mainly relies on distances among microbiomes that are calculated by all microbial features. However, in some cases, only a small fraction of members in a community plays crucial roles. Such a tiny proportion is insufficient to alter the overall distance, which is always missed by end-to-end comparison. On the other hand, beta-diversity pattern can also be interfered due to the data sparsity when only focusing on nonabundant microbes. RESULTS: Here, we develop Flex Meta-Storms (FMS) distance algorithm that implements the "local alignment" of microbiomes for the first time. Using a flexible extraction that considers the weighted phylogenetic and functional relations of microbes, FMS produces a normalized phylogenetic distance among members of interest for microbiome pairs. We demonstrated the advantage of FMS in detecting the subtle variations of microbiomes among different states using artificial and real datasets, which were neglected by regular distance metrics. Therefore, FMS effectively discriminates microbiomes with higher sensitivity and flexibility, thus contributing to in-depth comprehension of microbe-host interactions, as well as promoting the utilization of microbiome data such as disease screening and prediction. AVAILABILITY AND IMPLEMENTATION: FMS is implemented in C++, and the source code is released at https://github.com/qdu-bioinfo/flex-meta-storms.


Subject(s)
Microbiota , Phylogeny , Software , Algorithms
11.
Imeta ; 1(1): e1, 2022 Mar.
Article in English | MEDLINE | ID: mdl-38867729

ABSTRACT

Massive microbiome sequencing data has been generated, which elucidates associations between microbes and their environmental phenotypes such as host health or ecosystem status. Outstanding bioinformatic tools are the basis to decipher the biological information hidden under microbiome data. However, most approaches placed difficulties on the accessibility to nonprofessional users. On the other side, the computing throughput has become a significant bottleneck of many analytical pipelines in processing large-scale datasets. In this study, we introduce Parallel-Meta Suite (PMS), an interactive software package for fast and comprehensive microbiome data analysis, visualization, and interpretation. It covers a wide array of functions for data preprocessing, statistics, visualization by state-of-the-art algorithms in a user-friendly graphical interface, which is accessible to diverse users. To meet the rapidly increasing computational demands, the entire procedure of PMS has been optimized by a parallel computing scheme, enabling the rapid processing of thousands of samples. PMS is compatible with multiple platforms, and an installer has been integrated for full-automatic installation.

12.
mSystems ; 6(4): e0036321, 2021 Aug 31.
Article in English | MEDLINE | ID: mdl-34402645

ABSTRACT

Quantitative comparison among microbiomes can link microbial beta-diversity to environmental features, thus enabling prediction of ecosystem properties or dissection of host-microbiome interaction. However, to compute beta-diversity, current methods mainly employ the entire community profiles of taxa or functions, which can miss the subtle differences caused by low-abundance community members that may play crucial roles in the properties of interest. In this work, I review the distance metrics and search engines that we developed to match microbiomes at a large scale based on whole-community-level similarities, as well as their limitations in tackling the microbiome changes caused by less abundant community features. Then I propose the concept of microbiome "local alignment," including an algorithm to measure microbiome similarity on specific fractions of biodiversity and an indexing strategy for rapidly fetching microbiome local-alignment matches from the data repository.

13.
mSystems ; 6(4): e0039421, 2021 Aug 31.
Article in English | MEDLINE | ID: mdl-34254819

ABSTRACT

Microbiomes are inherently linked by their structural similarity, yet the global features of such similarity are not clear. Here, we propose as a solution a search-based microbiome transition network. By traversing a composition-similarity-based network of 177,022 microbiomes, we show that although the compositions are distinct by habitat, each microbiome is on-average only seven neighbors from any other microbiome on Earth, indicating the inherent homology of microbiomes at the global scale. This network is scale-free, suggesting a high degree of stability and robustness in microbiome transition. By tracking the minimum spanning tree in this network, a global roadmap of microbiome dispersal was derived that tracks the potential paths of formulating and propagating microbiome diversity. Such search-based global microbiome networks, reconstructed within hours on just one computing node, provide a readily expanded reference for tracing the origin and evolution of existing or new microbiomes. IMPORTANCE It remains unclear whether and how compositional changes at the "community to community" level among microbiomes are linked to the origin and evolution of global microbiome diversity. Here we propose a microbiome transition model and a network-based analysis framework to describe and simulate the variation and dispersal of the global microbial beta-diversity across multiple habitats. The traversal of a transition network with 177,022 samples shows the inherent homology of microbiome at the global scale. Then a global roadmap of microbiome dispersal derived from the network tracks the potential paths of formulating and propagating microbiome diversity. Such search-based microbiome network provides a readily expanded reference for tracing the origin and evolution of existing or new microbiomes at the global scale.

14.
Front Microbiol ; 12: 673349, 2021.
Article in English | MEDLINE | ID: mdl-34177856

ABSTRACT

In selective RNA processing and stabilization (SRPS) operons, stem-loops (SLs) located at the 3'-UTR region of selected genes can control the stability of the corresponding transcripts and determine the stoichiometry of the operon. Here, for such operons, we developed a computational approach named SLOFE (stem-loop free energy) that identifies the SRPS operons and predicts their transcript- and protein-level stoichiometry at the whole-genome scale using only the genome sequence via the minimum free energy (ΔG) of specific SLs in the intergenic regions within operons. As validated by the experimental approach of differential RNA-Seq, SLOFE identifies genome-wide SRPS operons in Clostridium cellulolyticum with 80% accuracy and reveals that the SRPS mechanism contributes to diverse cellular activities. Moreover, in the identified SRPS operons, SLOFE predicts the transcript- and protein-level stoichiometry, including those encoding cellulosome complexes, ATP synthases, ABC transporter family proteins, and ribosomal proteins. Its accuracy exceeds those of existing in silico approaches in C. cellulolyticum, Clostridium acetobutylicum, Clostridium thermocellum, and Bacillus subtilis. The ability to identify genome-wide SRPS operons and predict their stoichiometry via DNA sequence in silico should facilitate studying the function and evolution of SRPS operons in bacteria.

15.
Comput Struct Biotechnol J ; 19: 2742-2749, 2021.
Article in English | MEDLINE | ID: mdl-34093989

ABSTRACT

Machine learning (ML) has been widely used in microbiome research for biomarker selection and disease prediction. By training microbial profiles of samples from patients and healthy controls, ML classifiers constructs data models by community features that highly correlated with the target diseases, so as to determine the status of new samples. To clearly understand the host-microbe interaction of specific diseases, previous studies always focused on well-designed cohorts, in which each sample was exactly labeled by a single status type. However, in fact an individual may be associated with multiple diseases simultaneously, which introduce additional variations on microbial patterns that interferes the status detection. More importantly, comorbidities or complications can be missed by regular ML models, limiting the practical application of microbiome techniques. In this review, we summarize the typical ML approaches of single-label classification for microbiome research, and demonstrate their limitations in multi-label disease detection using a real dataset. Then we prospect a further step of ML towards multi-label classification that potentially solves the aforementioned problem, including a series of promising strategies and key technical issues for applying multi-label classification in microbiome-based studies.

16.
mBio ; 12(2)2021 03 09.
Article in English | MEDLINE | ID: mdl-33688007

ABSTRACT

Most adults experience episodes of gingivitis, which can progress to the irreversible, chronic state of periodontitis, yet roles of plaque in gingivitis onset and progression to periodontitis remain elusive. Here, we longitudinally profiled the plaque metagenome, the plaque metabolome, and salivary cytokines in 40 adults who transited from naturally occurring gingivitis (NG) to healthy gingivae (baseline) and then to experimental gingivitis (EG). During EG, rapid and consistent alterations in plaque microbiota, metabolites, and salivary cytokines emerged as early as 24 to 72 h after oral-hygiene pause, defining an asymptomatic suboptimal health (SoH) stage of the gingivae. SoH features a swift, full activation of 11 salivary cytokines but a steep synergetic decrease of plaque-derived betaine and Rothia spp., suggesting an anti-gum inflammation mechanism by health-promoting symbionts. Global, cross-cohort meta-analysis revealed, at SoH, a greatly elevated microbiome-based periodontitis index driven by its convergence of both taxonomical and functional profiles toward the periodontitis microbiome. Finally, post-SoH gingivitis development accelerates oral microbiota aging by over 1 year within 28 days, with Rothia spp. depletion and Porphyromonas gingivalis elevation as hallmarks. Thus, the microbiome-defined, transient gum SoH stage is a crucial link among gingivitis, periodontitis, and aging.IMPORTANCE A significant portion of world population still fails to brush teeth daily. As a result, the majority of the global adult population is afflicted with chronic gingivitis, and if it is left untreated, some of them will eventually suffer from periodontitis. Here, we identified periodontitis-like microbiome dysbiosis in an asymptomatic SoH stage as early as 24 to 72 h after oral-hygiene pause. SoH features a swift, full activation of multiple salivary cytokines but a steep synergetic decrease of plaque-derived betaine and Rothia spp. The microbial ecology during early gingivitis is highly similar to that in periodontitis under both taxonomical and functional contexts. Unexpectedly, exposures to gingivitis can accelerate over 10-fold the normal rate of oral microbiota aging. Our findings underscore the importance of intervening at the SoH stage of gingivitis via proper oral-hygiene practices on a daily basis, so as to maintain a periodontitis-preventive plaque and ensure the healthy aging of the oral ecosystem.


Subject(s)
Aging , Cytokines/analysis , Gingiva/microbiology , Gingivitis/microbiology , Metagenome , Microbiota , Periodontitis/microbiology , Cohort Studies , Cytokines/immunology , Dysbiosis , Genomics , Gingiva/pathology , Humans , Longitudinal Studies , Metabolomics , Proteomics , Saliva/immunology
17.
mSystems ; 6(1)2021 Jan 19.
Article in English | MEDLINE | ID: mdl-33468706

ABSTRACT

Metagenomic data sets from diverse environments have been growing rapidly. To ensure accessibility and reusability, tools that quickly and informatively correlate new microbiomes with existing ones are in demand. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes in the global metagenome data space based on the taxonomic or functional similarity of a whole microbiome to those in the database. MSE 2 consists of (i) a well-organized and regularly updated microbiome database that currently contains over 250,000 metagenomic shotgun and 16S rRNA gene amplicon samples associated with unified metadata collected from 798 studies, (ii) an enhanced search engine that enables real-time and fast (<0.5 s per query) searches against the entire database for best-matched microbiomes using overall taxonomic or functional profiles, and (iii) a Web-based graphical user interface for user-friendly searching, data browsing, and tutoring. MSE 2 is freely accessible via http://mse.ac.cn For standalone searches of customized microbiome databases, the kernel of the MSE 2 search engine is provided at GitHub (https://github.com/qibebt-bioinfo/meta-storms).IMPORTANCE A search-based strategy is useful for large-scale mining of microbiome data sets, such as a bird's-eye view of the microbiome data space and disease diagnosis via microbiome big data. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes against the existing microbiome data sets on the basis of their similarity in taxonomic structure or functional profile. Key improvements include database extension, data compatibility, a search engine kernel, and a user interface. The new ability to search the microbiome space via functional similarity greatly expands the scope of search-based mining of the microbiome big data.

18.
BMC Genomics ; 22(1): 9, 2021 Jan 06.
Article in English | MEDLINE | ID: mdl-33407112

ABSTRACT

BACKGROUND: Due to their much lower costs in experiment and computation than metagenomic whole-genome sequencing (WGS), 16S rRNA gene amplicons have been widely used for predicting the functional profiles of microbiome, via software tools such as PICRUSt 2. However, due to the potential PCR bias and gene profile variation among phylogenetically related genomes, functional profiles predicted from 16S amplicons may deviate from WGS-derived ones, resulting in misleading results. RESULTS: Here we present Meta-Apo, which greatly reduces or even eliminates such deviation, thus deduces much more consistent diversity patterns between the two approaches. Tests of Meta-Apo on > 5000 16S-rRNA amplicon human microbiome samples from 4 body sites showed the deviation between the two strategies is significantly reduced by using only 15 WGS-amplicon training sample pairs. Moreover, Meta-Apo enables cross-platform functional comparison between WGS and amplicon samples, thus greatly improve 16S-based microbiome diagnosis, e.g. accuracy of gingivitis diagnosis via 16S-derived functional profiles was elevated from 65 to 95% by WGS-based classification. Therefore, with the low cost of 16S-amplicon sequencing, Meta-Apo can produce a reliable, high-resolution view of microbiome function equivalent to that offered by shotgun WGS. CONCLUSIONS: This suggests that large-scale, function-oriented microbiome sequencing projects can probably benefit from the lower cost of 16S-amplicon strategy, without sacrificing the precision in functional reconstruction that otherwise requires WGS. An optimized C++ implementation of Meta-Apo is available on GitHub ( https://github.com/qibebt-bioinfo/meta-apo ) under a GNU GPL license. It takes the functional profiles of a few paired WGS:16S-amplicon samples as training, and outputs the calibrated functional profiles for the much larger number of 16S-amplicon samples.


Subject(s)
Bacteria , Microbiota , Bacteria/genetics , Metagenome , Metagenomics , Microbiota/genetics , RNA, Ribosomal, 16S/genetics
19.
Bioinform Adv ; 1(1): vbab003, 2021.
Article in English | MEDLINE | ID: mdl-36700101

ABSTRACT

Functional beta-diversity analysis on numerous microbiomes interprets the linkages between metabolic functions and their meta-data. To evaluate the microbiome beta-diversity, widely used distance metrices only count overlapped gene families but omit their inherent relationships, resulting in erroneous distances due to the sparsity of high-dimensional function profiles. Here we propose Hierarchical Meta-Storms (HMS) to tackle such problem. HMS contains two core components: (i) a dissimilarity algorithm that comprehensively measures functional distances among microbiomes using multi-level metabolic hierarchy and (ii) a fast Principal Co-ordinates Analysis (PCoA) implementation that deduces the beta-diversity pattern optimized by parallel computing. Results showed HMS can detect the variations of microbial functions in upper-level metabolic pathways, however, always missed by other methods. In addition, HMS accomplished the pairwise distance matrix and PCoA for 20 000 microbiomes in 3.9 h on a single computing node, which was 23 times faster and 80% less RAM consumption compared to existing methods, enabling the in-depth data mining among microbiomes on a high resolution. HMS takes microbiome functional profiles as input, produces their pairwise distance matrix and PCoA coordinates. Availability and implementation: It is coded in C/C++ with parallel computing and released in two alternative forms: a standalone software (https://github.com/qdu-bioinfo/hierarchical-meta-storms) and an equivalent R package (https://github.com/qdu-bioinfo/hrms). Supplementary information: Supplementary data are available at Bioinformatics Advances online.

20.
Comput Struct Biotechnol J ; 18: 2075-2080, 2020.
Article in English | MEDLINE | ID: mdl-32802279

ABSTRACT

During the past decade, tremendous amount of microbiome sequencing data has been generated to study on the dynamic associations between microbial profiles and environments. How to precisely and efficiently decipher large-scale of microbiome data and furtherly take advantages from it has become one of the most essential bottlenecks for microbiome research at present. In this mini-review, we focus on the three key steps of analyzing cross-study microbiome datasets, including microbiome profiling, data integrating and data mining. By introducing the current bioinformatics approaches and discussing their limitations, we prospect the opportunities in development of computational methods for the three steps, and propose the promising solutions to multi-omics data analysis for comprehensive understanding and rapid investigation of microbiome from different angles, which could potentially promote the data-driven research by providing a broader view of the "microbiome data space".

SELECTION OF CITATIONS
SEARCH DETAIL
...