Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
1.
J Genet Genomics ; 2024 Feb 26.
Article in English | MEDLINE | ID: mdl-38417547

ABSTRACT

The molecular clock model is fundamental for inferring species divergence times from molecular sequences. However, its direct application may introduce significant biases due to sequencing errors, recombination events, and inaccurately labeled sampling times. Improving accuracy necessitates rigorous quality control measures to identify and remove potentially erroneous sequences. Furthermore, while not all branches of a phylogenetic tree may exhibit a clear temporal signal, specific branches may still adhere to the assumptions, with varying evolutionary rates. Supporting a relaxed molecular clock model better aligns with the complexities of evolution. The root-to-tip regression method has been widely used to analyze the temporal signal in phylogenetic studies and can be generalized for detecting other phylogenetic signals. Despite its utility, there remains a lack of corresponding software implementations for broader applications. To address this gap, we present shinyTempSignal, an interactive web application implemented with the shiny framework, available as an R package and publicly accessible at https://github.com/YuLab-SMU/shinyTempSignal. This tool facilitates the analysis of temporal and other phylogenetic signals under both strict and relaxed models. By extending the root-to-tip regression method to diverse signals, shinyTempSignal helps in the detection of evolving features or traits, thereby laying the foundation for deeper insights and subsequent analyses.

2.
Gut Microbes ; 15(1): 2223349, 2023.
Article in English | MEDLINE | ID: mdl-37306408

ABSTRACT

The gut metabolome acts as an intermediary between the gut microbiota and host, and has tremendous diagnostic and therapeutic potential. Several studies have utilized bioinformatic tools to predict metabolites based on the different aspects of the gut microbiome. Although these tools have contributed to a better understanding of the relationship between the gut microbiota and various diseases, most of them have focused on the impact of microbial genes on the metabolites and the relationship between microbial genes. In contrast, relatively little is known regarding the effect of metabolites on the microbial genes or the relationship between these metabolites. In this study, we constructed a computational framework of Microbe-Metabolite INteractions-based metabolic profiles Predictor (MMINP), based on the Two-Way Orthogonal Partial Least Squares (O2-PLS) algorithm to predict the metabolic profiles associated with gut microbiota. We demonstrated the predictive value of MMINP relative to that of similar methods. Additionally, we identified the features that would profoundly impact the prediction performance of data-driven methods (O2-PLS, MMINP, MelonnPan, and ENVIM), including the training sample size, host disease state, and the upstream data processing methods of the different technical platforms. We suggest that when using data-driven methods, similar host disease states and preprocessing methods, and a sufficient number of training samples are necessary to achieve accurate prediction.


MMINP fully considers internal and mutual correlations in metabolites and microbial genes and infers metabolite information through their real joint parts.The feasibility of predicting metabolic profiles using gut microbiome data should be based on the premise of similar host disease states, similar preprocessing methods, and a sufficient number of training samples.Although the accuracy of predicted specific metabolites is affected by multiple factors, the systematic conclusions presented for predicted metabolites at higher levels (e.g., class level) are accurate, allowing metabolite prediction to be applied to the discovery of potential metabolite markers.


Subject(s)
Gastrointestinal Microbiome , Least-Squares Analysis , Algorithms , Computational Biology , Metabolome
3.
Innovation (Camb) ; 4(2): 100388, 2023 Mar 13.
Article in English | MEDLINE | ID: mdl-36895758

ABSTRACT

The data output from microbiome research is growing at an accelerating rate, yet mining the data quickly and efficiently remains difficult. There is still a lack of an effective data structure to represent and manage data, as well as flexible and composable analysis methods. In response to these two issues, we designed and developed the MicrobiotaProcess package. It provides a comprehensive data structure, MPSE, to better integrate the primary and intermediate data, which improves the integration and exploration of the downstream data. Around this data structure, the downstream analysis tasks are decomposed and a set of functions are designed under a tidy framework. These functions independently perform simple tasks and can be combined to perform complex tasks. This gives users the ability to explore data, conduct personalized analyses, and develop analysis workflows. Moreover, MicrobiotaProcess can interoperate with other packages in the R community, which further expands its analytical capabilities. This article demonstrates the MicrobiotaProcess for analyzing microbiome data as well as other ecological data through several examples. It connects upstream data, provides flexible downstream analysis components, and provides visualization methods to assist in presenting and interpreting results.

4.
Curr Protoc ; 2(10): e585, 2022 Oct.
Article in English | MEDLINE | ID: mdl-36286622

ABSTRACT

In many aspects of life, epigenetics, or the altering of phenotype without changes in sequences, play an essential role in biological function. A vast number of epigenomic datasets are emerging as a result of the advent of next-generation sequencing. Annotation, comparison, visualization, and interpretation of epigenomic datasets remain key aspects of computational biology. ChIPseeker is a Bioconductor package for performing these analyses among variable epigenomic datasets. The fundamental functions of ChIPseeker, including data preparation, annotation, comparison, and visualization, are explained in this article. ChIPseeker is a freely available open-source package that may be found at https://www.bioconductor.org/packages/ChIPseeker. © 2022 Wiley Periodicals LLC. Basic Protocol 1: ChIPseeker and epigenomic dataset preparation Basic Protocol 2: Annotation of epigenomic datasets Basic Protocol 3: Comparison of epigenomic datasets Basic Protocol 4: Visualization of annotated results Basic Protocol 5: Functional analysis of epigenomic datasets Basic Protocol 6: Genome-wide and locus-specific distribution of epigenomic datasets Basic Protocol 7: Heatmaps and metaplots of epigenomic datasets.


Subject(s)
Epigenomics , Software , Epigenomics/methods , Computational Biology/methods , High-Throughput Nucleotide Sequencing , Genome
5.
Front Microbiol ; 13: 951774, 2022.
Article in English | MEDLINE | ID: mdl-36051757

ABSTRACT

The toxin-antitoxin (TA) system is a widely distributed group of genetic modules that play important roles in the life of prokaryotes, with mobile genetic elements (MGEs) contributing to the dissemination of antibiotic resistance gene (ARG). The diversity and richness of TA systems in Pseudomonas aeruginosa, as one of the bacterial species with ARGs, have not yet been completely demonstrated. In this study, we explored the TA systems from the public genomic sequencing data and genome sequences. A small scale of genomic sequencing data in 281 isolates was selected from the NCBI SRA database, reassembling the genomes of these isolates led to the findings of abundant TA homologs. Furthermore, remapping these identified TA modules on 5,437 genome/draft genomes uncovers a great diversity of TA modules in P. aeruginosa. Moreover, manual inspection revealed several TA systems that were not yet reported in P. aeruginosa including the hok-sok, cptA-cptB, cbeA-cbtA, tomB-hha, and ryeA-sdsR. Additional annotation revealed that a large number of MGEs were closely distributed with TA. Also, 16% of ARGs are located relatively close to TA. Our work confirmed a wealth of TA genes in the unexplored P. aeruginosa pan-genomes, expanded the knowledge on P. aeruginosa, and provided methodological tips on large-scale data mining for future studies. The co-occurrence of MGE, ARG, and TA may indicate a potential interaction in their dissemination.

6.
Brief Bioinform ; 23(4)2022 07 18.
Article in English | MEDLINE | ID: mdl-35671504

ABSTRACT

The identification of the conserved and variable regions in the multiple sequence alignment (MSA) is critical to accelerating the process of understanding the function of genes. MSA visualizations allow us to transform sequence features into understandable visual representations. As the sequence-structure-function relationship gains increasing attention in molecular biology studies, the simple display of nucleotide or protein sequence alignment is not satisfied. A more scalable visualization is required to broaden the scope of sequence investigation. Here we present ggmsa, an R package for mining comprehensive sequence features and integrating the associated data of MSA by a variety of display methods. To uncover sequence conservation patterns, variations and recombination at the site level, sequence bundles, sequence logos, stacked sequence alignment and comparative plots are implemented. ggmsa supports integrating the correlation of MSA sequences and their phenotypes, as well as other traits such as ancestral sequences, molecular structures, molecular functions and expression levels. We also design a new visualization method for genome alignments in multiple alignment format to explore the pattern of within and between species variation. Combining these visual representations with prime knowledge, ggmsa assists researchers in discovering MSA and making decisions. The ggmsa package is open-source software released under the Artistic-2.0 license, and it is freely available on Bioconductor (https://bioconductor.org/packages/ggmsa) and Github (https://github.com/YuLab-SMU/ggmsa).


Subject(s)
Genome , Software , Amino Acid Sequence , Position-Specific Scoring Matrices , Sequence Alignment
7.
Front Genet ; 12: 774846, 2021.
Article in English | MEDLINE | ID: mdl-34795698

ABSTRACT

With the rapid increase of large-scale datasets, biomedical data visualization is facing challenges. The data may be large, have different orders of magnitude, contain extreme values, and the data distribution is not clear. Here we present an R package ggbreak that allows users to create broken axes using ggplot2 syntax. It can effectively use the plotting area to deal with large datasets (especially for long sequential data), data with different magnitudes, and contain outliers. The ggbreak package increases the available visual space for a better presentation of the data and detailed annotation, thus improves our ability to interpret the data. The ggbreak package is fully compatible with ggplot2 and it is easy to superpose additional layers and applies scale and theme to adjust the plot using the ggplot2 syntax. The ggbreak package is open-source software released under the Artistic-2.0 license, and it is freely available on CRAN (https://CRAN.R-project.org/package=ggbreak) and Github (https://github.com/YuLab-SMU/ggbreak).

8.
Innovation (Camb) ; 2(3): 100141, 2021 Aug 28.
Article in English | MEDLINE | ID: mdl-34557778

ABSTRACT

Functional enrichment analysis is pivotal for interpreting high-throughput omics data in life science. It is crucial for this type of tool to use the latest annotation databases for as many organisms as possible. To meet these requirements, we present here an updated version of our popular Bioconductor package, clusterProfiler 4.0. This package has been enhanced considerably compared with its original version published 9 years ago. The new version provides a universal interface for functional enrichment analysis in thousands of organisms based on internally supported ontologies and pathways as well as annotation data provided by users or derived from online databases. It also extends the dplyr and ggplot2 packages to offer tidy interfaces for data operation and visualization. Other new features include gene set enrichment analysis and comparison of enrichment results from multiple gene lists. We anticipate that clusterProfiler 4.0 will be applied to a wide range of scenarios across diverse organisms.

9.
Mol Biol Evol ; 38(9): 4039-4042, 2021 08 23.
Article in English | MEDLINE | ID: mdl-34097064

ABSTRACT

We present the ggtreeExtra package for visualizing heterogeneous data with a phylogenetic tree in a circular or rectangular layout (https://www.bioconductor.org/packages/ggtreeExtra). The package supports more data types and visualization methods than other tools. It supports using the grammar of graphics syntax to present data on a tree with richly annotated layers and allows evolutionary statistics inferred by commonly used software to be integrated and visualized with external data. GgtreeExtra is a universal tool for tree data visualization. It extends the applications of the phylogenetic tree in different disciplines by making more domain-specific data to be available to visualize and interpret in the evolutionary context.


Subject(s)
Phylogeny , Software
10.
J Cardiovasc Transl Res ; 14(5): 912-920, 2021 10.
Article in English | MEDLINE | ID: mdl-33409962

ABSTRACT

Left atrial sphericity index (LASI) is one significant geometric remodeling parameter to evaluate the prognosis of atrial fibrillation (AF). We aimed to determine whether transthoracic echocardiography (TTE)-derived LASI may help predict the outcomes following AF radiofrequency catheter ablation (RFCA). This prospective study enrolled 190 consecutive AF patients who underwent TTE 24 h before RFCA. LASI was calculated as the ratio of left atrial maximum volume to spherical volume. After 1-year follow-up, 56 patients (29.5%) relapsed. Multivariate Cox regression showed that LASI (hazard ratio = 1.48, 95% Cl 1.15-1.92, P = 0.003) was an independent predictor of AF recurrence. Stratifying patients into four subgroups with different LAVI showed that high LASI value indicated a high risk of recurrence, especially in patients with mildly and moderately enlarged atria (the recurrence rate was 0% vs. 26.3%, P = 0.049; 9.5% vs. 40.9%, P = 0.018, respectively). In conclusion, TTE-derived LASI may be useful to predict AF recurrence after RFCA.


Subject(s)
Atrial Fibrillation , Catheter Ablation , Atrial Fibrillation/diagnostic imaging , Atrial Fibrillation/surgery , Catheter Ablation/adverse effects , Heart Atria/diagnostic imaging , Heart Atria/surgery , Humans , Prospective Studies , Recurrence , Treatment Outcome
11.
Sci Total Environ ; 741: 140423, 2020 Nov 01.
Article in English | MEDLINE | ID: mdl-32615432

ABSTRACT

With the increasing researches on the role of gut microbiota in human health and disease, appropriate storage method of fecal samples at ambient temperature would conveniently guarantee the precise and reliable microbiota results. Nevertheless, less choice of stabilizer that is cost-efficient and feasible to be used in longer preservation period obstructed the large-scale metagenomics studies. Here, we evaluated the efficacy of a guanidine isothiocyanate-based reagent method EffcGut and compared it with the other already used storage method by means of 16S rRNA gene sequencing technology. We found that guanidine isothiocyanate-based reagent method at ambient temperature was not inferior to OMNIgene·GUT OM-200 and it could retain the similar bacterial community as that of -80 °C within 24 weeks. Furthermore, bacterial diversity and community structure difference were compared among different sample fraction (supernatant, suspension and precipitate) preserved in EffcGut and -80 °C. We found that supernatant under the preservation of EffcGut retained the similar community structure and composition as that of the low temperature preservation method.


Subject(s)
Microbiota , Specimen Handling , Cost-Benefit Analysis , Feces , Humans , RNA, Ribosomal, 16S , Temperature
12.
Aging (Albany NY) ; 12(9): 8583-8604, 2020 05 11.
Article in English | MEDLINE | ID: mdl-32392181

ABSTRACT

Reduced bone mineral density (BMD) is associated with an altered microbiota in senile osteoporosis. However, the relationship among gut microbiota, BMD and bone metabolic indexes remains unknown in postmenopausal osteoporosis. In this study, fecal microbiota profiles for 106 postmenopausal individuals with osteopenia (n=33) or osteoporosis (n=42) or with normal BMD (n=31) were determined. An integrated 16S rRNA gene sequencing and LC-MS-based metabolomics approach was applied to explore the association of estrogen-reduced osteoporosis with the gut microbiota and fecal metabolic phenotype. Adjustments were made using several statistical models for potential confounding variables identified from the literature. The results demonstrated decreased bacterial richness and diversity in postmenopausal osteoporosis. Additionally, showed significant differences in abundance levels among phyla and genera in the gut microbial community were found. Moreover, postmenopausal osteopenia-enriched N-acetylmannosamine correlated negatively with BMD, and distinguishing metabolites were closely associated with gut bacterial variation. Both serum procollagen type I N propeptide (P1NP) and C-terminal telopeptide of type I collagen (CTX-1) correlated positively with osteopenia-enriched Allisonella, Klebsiella and Megasphaera. However, we did not find a significant correlation between bacterial diversity and estrogen. These observations will lead to a better understanding of the relationship between bone homeostasis and the microbiota in postmenopausal osteoporosis.


Subject(s)
Bone Density , Bone and Bones/physiology , Gastrointestinal Microbiome , Osteoporosis, Postmenopausal/metabolism , Absorptiometry, Photon , Bone Remodeling , Collagen Type I/metabolism , Feces/microbiology , Female , Humans , Mass Spectrometry , Metabolomics , Middle Aged , Osteoporosis, Postmenopausal/diagnosis , Osteoporosis, Postmenopausal/microbiology , RNA, Ribosomal, 16S/genetics
13.
Front Microbiol ; 11: 383, 2020.
Article in English | MEDLINE | ID: mdl-32265857

ABSTRACT

Dysbiosis of gut microbiota during the progression of HBV-related liver disease is not well understood, as there are very few reports that discuss the featured bacterial taxa in different stages. The aim of this study was to reveal the featured bacterial species whose abundances are directly associated with HBV disease progression, that is, progression from healthy subjects to, chronic HBV infection, chronic hepatitis B to liver cirrhosis. Approximately 400 fecal samples were collected, and 97 samples were subjected to 16S rRNA gene sequencing after age and BMI matching. Compared with the healthy individuals, significant gut microbiota alterations were associated with the progression of liver disease. LEfSe results showed that the HBV infected patients had higher Fusobacteria, Veillonella, and Haemophilus abundance while the healthy individuals had higher levels of Prevotella and Phascolarctobacterium. Indicator analysis revealed that 57 OTUs changed as the disease progressed, and their combination produced an AUC value of 90% (95% CI: 86-94%) between the LC and non-LC groups. In addition, the abundances of OTU51 (Dialister succinatiphilus) and OTU50 (Alistipes onderdonkii) decreased as the disease progressed, and these results were further verified by qPCR. The LC patients had the higher bacterial network complexity, which was accompanied with a lower abundance of potential beneficial bacterial taxa, such as Dialister and Alistipes, while they had a higher abundance of pathogenic species within Actinobacteria. The compositional and network changes in the gut microbiota in varied CHB stages, suggest the potential contributions of gut microbiota in CHB disease progression.

14.
Mol Biol Evol ; 37(2): 599-603, 2020 02 01.
Article in English | MEDLINE | ID: mdl-31633786

ABSTRACT

Phylogenetic trees and data are often stored in incompatible and inconsistent formats. The outputs of software tools that contain trees with analysis findings are often not compatible with each other, making it hard to integrate the results of different analyses in a comparative study. The treeio package is designed to connect phylogenetic tree input and output. It supports extracting phylogenetic trees as well as the outputs of commonly used analytical software. It can link external data to phylogenies and merge tree data obtained from different sources, enabling analyses of phylogeny-associated data from different disciplines in an evolutionary context. Treeio also supports export of a phylogenetic tree with heterogeneous-associated data to a single tree file, including BEAST compatible NEXUS and jtree formats; these facilitate data sharing as well as file format conversion for downstream analysis. The treeio package is designed to work with the tidytree and ggtree packages. Tree data can be processed using the tidy interface with tidytree and visualized by ggtree. The treeio package is released within the Bioconductor and rOpenSci projects. It is available at https://www.bioconductor.org/packages/treeio/.


Subject(s)
Computational Biology/methods , Data Mining/methods , Internet , Phylogeny , Software
15.
Front Genet ; 10: 447, 2019.
Article in English | MEDLINE | ID: mdl-31191599

ABSTRACT

Colorectal cancer (CRC) ranks second in cancer-associated mortality and third in the incidence worldwide. Most of CRC follow adenoma-carcinoma sequence, and have more than 90% chance of survival if diagnosed at early stage. But the recommended screening by colonoscopy is invasive, expensive, and poorly adhered to. Recently, several studies reported that the fecal bacteria might provide non-invasive biomarkers for CRC and precancerous tumors. Therefore, we collected and uniformly re-analyzed these published fecal 16S rDNA sequencing datasets to verify the association and identify biomarkers to classify and predict colorectal tumors by random forest method. A total of 1674 samples (330 CRC, 357 advanced adenoma, 141 adenoma, and 846 control) from 7 studies were analyzed in this study. By random effects model and fixed effects model, we observed significant differences in alpha-diversity and beta-diversity between individuals with CRC and the normal colon, but not between adenoma and the normal. We identified various bacterial genera with significant odds ratios for colorectal tumors at different stages. Through building random forest model with 10-fold cross-validation as well as new test datasets, we classified individuals with CRC, advanced adenoma, adenoma and normal colon. All approaches obtained comparable performance at entire OTU level, entire genus level, and the common genus level as measured using AUC. When combined all samples, the AUC of random forest model based on 12 common genera reached 0.846 for CRC, although the predication performed poorly for advance adenoma and adenoma.

16.
Mar Genomics ; 32: 71-78, 2017 Apr.
Article in English | MEDLINE | ID: mdl-28089131

ABSTRACT

Circular RNA (circRNA) was first reported over thirty years ago. With the development of high-throughput sequencing technologies, circRNA has been identified in an increasing number of species. However, few studies on circRNA have been reported in teleost fish. Accumulating transcriptome and phenotype data enable us to probe the biological functions of circRNA in fish species. Here, we report the identification of circRNAs from RNA sequencing (RNA-seq) data in large yellow croaker (Larimichthys crocea), a commercially important marine fish in China and East Asia. Using the computational identification, 975 circular RNAs were detected, of which three were validated by experiments. GO and KEGG analyses revealed the biological functions of genes hosting the circRNAs were enriched in the progression of translation initiation, macromolecule metabolism and binding. Notably, we found that many circRNAs in large yellow croaker had abundant microRNA-binding sites. A total of 363 the identified circRNAs had more than five miRNA-binding sites, among which twenty-two had more than ten binding sites for the miRNA-430 and the let-7 family. Our study confirmed the presence of circRNAs in large yellow croaker for the first time, providing a valuable reference for circRNA identification in fish species. Meanwhile, this work confirmed that the RNA-seq data from the traditional linear transcriptome library could be used for preliminary circRNA identification, which may offer an important reference for preliminary circRNA investigations in other species.


Subject(s)
DNA, Circular/genetics , Perciformes/genetics , Transcriptome , Animals , DNA, Circular/metabolism , Perciformes/metabolism , Polymerase Chain Reaction/veterinary , Sequence Analysis, DNA
17.
BMC Genomics ; 16: 670, 2015 Sep 03.
Article in English | MEDLINE | ID: mdl-26336087

ABSTRACT

BACKGROUND: Large yellow croaker (Larimichthys crocea) is an important commercial fish in China and East-Asia. The annual product of the species from the aqua-farming industry is about 90 thousand tons. In spite of its economic importance, genetic studies of economic traits and genomic selections of the species are hindered by the lack of genomic resources. Specifically, a whole-genome physical map of large yellow croaker is still missing. The traditional BAC-based fingerprint method is extremely time- and labour-consuming. Here we report the first genome map construction using the high-throughput whole-genome mapping technique by nanochannel arrays in BioNano Genomics Irys system. RESULTS: For an optimal marker density of ~10 per 100 kb, the nicking endonuclease Nt.BspQ1 was chosen for the genome map generation. 645,305 DNA molecules with a total length of ~112 Gb were labelled and detected, covering more than 160X of the large yellow croaker genome. Employing IrysView package and signature patterns in raw DNA molecules, a whole-genome map of large yellow croaker was assembled into 686 maps with a total length of 727 Mb, which was consistent with the estimated genome size. The N50 length of the whole-genome map, including 126 maps, was up to 1.7 Mb. The excellent hybrid alignment with large yellow croaker draft genome validated the consensus genome map assembly and highlighted a promising application of whole-genome mapping on draft genome sequence super-scaffolding. The genome map data of large yellow croaker are accessible on lycgenomics.jmu.edu.cn/pm. CONCLUSION: Using the state-of-the-art whole-genome mapping technique in Irys system, the first whole-genome map for large yellow croaker has been constructed and thus highly facilitates the ongoing genomic and evolutionary studies for the species. To our knowledge, this is the first public report on genome map construction by the whole-genome mapping for aquatic-organisms. Our study demonstrates a promising application of the whole-genome mapping on genome maps construction for other non-model organisms in a fast and reliable manner.


Subject(s)
Chromosome Mapping/methods , Genomics/methods , Nanotechnology/methods , Perciformes/genetics , Animals , Base Sequence , DNA/metabolism , Deoxyribonuclease I/metabolism , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...