Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
NAR Genom Bioinform ; 6(1): lqad110, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38187087

ABSTRACT

Sparse feature tables, in which many features are present in very few samples, are common in big biological data (e.g. metagenomics). Ignoring issues of zero-laden datasets can result in biased statistical estimates and decreased power in downstream analyses. Zeros are also a particular issue for compositional data analysis using log-ratios since the log of zero is undefined. Researchers typically deal with this issue by removing low frequency features, but the thresholds for removal differ markedly between studies with little or no justification. Here, we present CurvCut, an unsupervised data-driven approach with human confirmation for rare-feature removal. CurvCut implements two distinct approaches for determining natural breaks in the feature distributions: a method based on curvature analysis borrowed from thermodynamics and the Fisher-Jenks statistical method. Our results show that CurvCut rapidly identifies data-specific breaks in these distributions that can be used as cutoff points for low-frequency feature removal that maximizes feature retention. We show that CurvCut works across different biological data types and rapidly generates clear visual results that allow researchers to confirm and apply feature removal cutoffs to individual datasets.

2.
PLoS One ; 19(1): e0291801, 2024.
Article in English | MEDLINE | ID: mdl-38206953

ABSTRACT

Phylogenetic analysis of protein sequences provides a powerful means of identifying novel protein functions and subfamilies, and for identifying and resolving annotation errors. However, automation of functional clustering based on phylogenetic trees has been challenging and most of it is done manually. Clustering phylogenetic trees usually requires the delineation of tree-based thresholds (e.g., distances), leading to an ad hoc problem. We propose a new phylogenetic clustering approach that identifies clusters without using ad hoc distances or other pre-defined values. Our workflow combines uniform manifold approximation and projection (UMAP) with Gaussian mixture models as a k-means like procedure to automatically group sequences into clusters. We then apply a "second pass" clade identification algorithm to resolve non-monophyletic groups. We tested our approach with several well-curated protein families (outer membrane porins, acyltransferase, and nuclear receptors) and showed our automated methods recapitulated known subfamilies. We also applied our methods to a broad range of different protein families from multiple databases, including Pfam, PANTHER, and UniProt, and to alignments of RNA viral genomes. Our results showed that AutoPhy rapidly generated monophyletic clusters (subfamilies) within phylogenetic trees evolving at very different rates both within and among phylogenies. The phylogenetic clusters generated by AutoPhy resolved misannotations and identified new protein functional groups and novel viral strains.


Subject(s)
Algorithms , Proteins , Phylogeny , Proteins/genetics , Porins/genetics , Amino Acid Sequence
3.
Microorganisms ; 9(9)2021 Sep 18.
Article in English | MEDLINE | ID: mdl-34576881

ABSTRACT

Anaerobic fungi are emerging biotechnology platforms with genomes rich in biosynthetic potential. Yet, the heterologous expression of their biosynthetic pathways has had limited success in model hosts like E. coli. We find one reason for this is that the genome composition of anaerobic fungi like P. indianae are extremely AT-biased with a particular preference for rare and semi-rare AT-rich tRNAs in E coli, which are not explicitly predicted by standard codon adaptation indices (CAI). Native P. indianae genes with these extreme biases create drastic growth defects in E. coli (up to 69% reduction in growth), which is not seen in genes from other organisms with similar CAIs. However, codon optimization rescues growth, allowing for gene evaluation. In this manner, we demonstrate that anaerobic fungal homologs such as PI.atoB are more active than S. cerevisiae homologs in a hybrid pathway, increasing the production of mevalonate up to 2.5 g/L (more than two-fold) and reducing waste carbon to acetate by ~90% under the conditions tested. This work demonstrates the bioproduction potential of anaerobic fungal enzyme homologs and how the analysis of codon utilization enables the study of otherwise difficult to express genes that have applications in biocatalysis and natural product discovery.

4.
Front Microbiol ; 12: 617949, 2021.
Article in English | MEDLINE | ID: mdl-34079525

ABSTRACT

Periodontal disease (PD) is a chronic, progressive polymicrobial disease that induces a strong host immune response. Culture-independent methods, such as next-generation sequencing (NGS) of bacteria 16S amplicon and shotgun metagenomic libraries, have greatly expanded our understanding of PD biodiversity, identified novel PD microbial associations, and shown that PD biodiversity increases with pocket depth. NGS studies have also found PD communities to be highly host-specific in terms of both biodiversity and the response of microbial communities to periodontal treatment. As with most microbiome work, the majority of PD microbiome studies use standard data normalization procedures that do not account for the compositional nature of NGS microbiome data. Here, we apply recently developed compositional data analysis (CoDA) approaches and software tools to reanalyze multiomics (16S, metagenomics, and metabolomics) data generated from previously published periodontal disease studies. CoDA methods, such as centered log-ratio (clr) transformation, compensate for the compositional nature of these data, which can not only remove spurious correlations but also allows for the identification of novel associations between microbial features and disease conditions. We validated many of the studies' original findings, but also identified new features associated with periodontal disease, including the genera Schwartzia and Aerococcus and the cytokine C-reactive protein (CRP). Furthermore, our network analysis revealed a lower connectivity among taxa in deeper periodontal pockets, potentially indicative of a more "random" microbiome. Our findings illustrate the utility of CoDA techniques in multiomics compositional data analysis of the oral microbiome.

5.
Biotechnol Biofuels ; 11: 293, 2018.
Article in English | MEDLINE | ID: mdl-30386430

ABSTRACT

BACKGROUND: Plant biomass is an abundant but underused feedstock for bioenergy production due to its complex and variable composition, which resists breakdown into fermentable sugars. These feedstocks, however, are routinely degraded by many uncommercialized microbes such as anaerobic gut fungi. These gut fungi express a broad range of carbohydrate active enzymes and are native to the digestive tracts of ruminants and hindgut fermenters. In this study, we examine gut fungal performance on these substrates as a function of composition, and the ability of this isolate to degrade inhibitory high syringyl lignin-containing forestry residues. RESULTS: We isolated a novel fungal specimen from a donkey in Independence, Indiana, United States. Phylogenetic analysis of the Internal Transcribed Spacer 1 sequence classified the isolate as a member of the genus Piromyces within the phylum Neocallimastigomycota (Piromyces sp. UH3-1, strain UH3-1). The isolate penetrates the substrate with an extensive rhizomycelial network and secretes many cellulose-binding enzymes, which are active on various components of lignocellulose. These activities enable the fungus to hydrolyze at least 58% of the glucan and 28% of the available xylan in untreated corn stover within 168 h and support growth on crude agricultural residues, food waste, and energy crops. Importantly, UH3-1 hydrolyzes high syringyl lignin-containing poplar that is inhibitory to many fungi with efficiencies equal to that of low syringyl lignin-containing poplar with no reduction in fungal growth. This behavior is correlated with slight remodeling of the fungal secretome whose composition adapts with substrate to express an enzyme cocktail optimized to degrade the available biomass. CONCLUSIONS: Piromyces sp. UH3-1, a newly isolated anaerobic gut fungus, grows on diverse untreated substrates through production of a broad range of carbohydrate active enzymes that are robust to variations in substrate composition. Additionally, UH3-1 and potentially other anaerobic fungi are resistant to inhibitory lignin composition possibly due to changes in enzyme secretion with substrate. Thus, anaerobic fungi are an attractive platform for the production of enzymes that efficiently use mixed feedstocks of variable composition for second generation biofuels. More importantly, our work suggests that the study of anaerobic fungi may reveal naturally evolved strategies to circumvent common hydrolytic inhibitors that hinder biomass usage.

SELECTION OF CITATIONS
SEARCH DETAIL
...