Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
mSystems ; : e0049724, 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38940518

ABSTRACT

Relationships between bacterial taxa are traditionally defined using 16S rRNA nucleotide similarity or average nucleotide identity. Improvements in sequencing technology provide additional pairwise information on genome sequences, which may provide valuable information on genomic relationships. Mapping orthologous gene locations between genome pairs, known as synteny, is typically implemented in the discovery of new species and has not been systematically applied to bacterial genomes. Using a data set of 378 bacterial genomes, we developed and tested a new measure of synteny similarity between a pair of genomes, which was scaled onto 16S rRNA distance using covariance matrices. Based on the input gene functions used (i.e., core, antibiotic resistance, and virulence), we observed varying topological arrangements of bacterial relationship networks by applying (i) complete linkage hierarchical clustering and (ii) K-nearest neighbor graph structures to synteny-scaled 16S data. Our metric improved clustering quality comparatively to state-of-the-art average nucleotide identity metrics while preserving clustering assignments for the highest similarity relationships. Our findings indicate that syntenic relationships provide more granular and interpretable relationships for within-genera taxa compared to pairwise similarity measures, particularly in functional contexts. IMPORTANCE: Given the prevalence and necessity of the 16S rRNA measure in bacterial identification and analysis, this additional analysis adds a functional and synteny-based layer to the identification of relatives and clustering of bacteria genomes. It is also of computational interest to model the bacterial genome as a graph structure, which presents new avenues of genomic analysis for bacteria and their closely related strains and species.

2.
bioRxiv ; 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-38645008

ABSTRACT

Relationships between bacterial taxa are traditionally defined using 16S rRNA nucleotide similarity or average nucleotide identity. Improvements in sequencing technology provides additional pairwise information on genome sequences, which may provide valuable information on genomic relationships. Mapping orthologous gene locations between genome pairs, known as synteny, is typically implemented in the discovery of new species and has not been systematically applied to bacterial genomes. Using a dataset of 378 bacterial genomes, we developed and tested a new measure of synteny similarity between a pair of genomes, which was scaled onto 16S rRNA distance using covariance matrices. Based on the input gene functions used (i.e., core, antibiotic resistance, and virulence), we observed varying topological arrangements of bacterial relationship networks by applying (1) complete linkage hierarchical clustering and (2) KNN graph structures to syntenic-scaled 16S data. Our metric improved clustering quality comparatively to state-of-the-art ANI metrics while preserving clustering assignments for the highest similarity relationships. Our findings indicate that syntenic relationships provide more granular and interpretable relationships for within-genera taxa compared to pairwise similarity measures, particularly in functional contexts.

3.
Methods Mol Biol ; 2744: 335-345, 2024.
Article in English | MEDLINE | ID: mdl-38683329

ABSTRACT

Classification is a technique that labels subjects based on the characteristics of the data. It often includes using prior learned information from preexisting data drawn from the same distribution or data type to make informed decisions per each given subject. The method presented here, the Characteristic Attribute Organization System (CAOS), uses a character-based approach to molecular sequence classification. Using a set of aligned sequences (either nucleotide or amino acid) and a maximum parsimony tree, CAOS will generate classification rules for the sequences based on tree structure and provide more interpretable results than other classification or sequence analysis protocols. The code is accessible at https://github.com/JuliaHealth/CAOS.jl/ .


Subject(s)
Phylogeny , Software , Computational Biology/methods , Algorithms , Sequence Alignment/methods
4.
Bioinformatics ; 38(17): 4172-4177, 2022 09 02.
Article in English | MEDLINE | ID: mdl-35801940

ABSTRACT

MOTIVATION: Microbiome datasets are often constrained by sequencing limitations. GenBank is the largest collection of publicly available DNA sequences, which is maintained by the National Center of Biotechnology Information (NCBI). The metadata of GenBank records are a largely understudied resource and may be uniquely leveraged to access the sum of prior studies focused on microbiome composition. Here, we developed a computational pipeline to analyze GenBank metadata, containing data on hosts, microorganisms and their place of origin. This work provides the first opportunity to leverage the totality of GenBank to shed light on compositional data practices that shape how microbiome datasets are formed as well as examine host-microbiome relationships. RESULTS: The collected dataset contains multiple kingdoms of microorganisms, consisting of bacteria, viruses, archaea, protozoa, fungi, and invertebrate parasites, and hosts of multiple taxonomical classes, including mammals, birds and fish. A human data subset of this dataset provides insights to gaps in current microbiome data collection, which is biased towards clinically relevant pathogens. Clustering and phylogenic analysis reveals the potential to use these data to model host taxonomy and evolution, revealing groupings formed by host diet, environment and coevolution. AVAILABILITY AND IMPLEMENTATION: GenBank Host-Microbiome Pipeline is available at https://github.com/bcbi/genbank_holobiome. The GenBank loader is available at https://github.com/bcbi/genbank_loader. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Microbiota , Viruses , Animals , Humans , Databases, Nucleic Acid , Software , Microbiota/genetics , Metadata , Mammals
5.
JCO Precis Oncol ; 6: e2100477, 2022 05.
Article in English | MEDLINE | ID: mdl-35584350

ABSTRACT

PURPOSE: Colorectal carcinomas (CRCs) with microsatellite-instability (MSI) are enriched for oncogenic kinase fusions (KFs), including NTRK1, RET, and BRAF, but the mechanism underlying this finding is unclear. METHODS: The genomic profiles of 32,218 advanced CRC tumor specimens were analyzed to assess the fusion breakpoints of oncogenic alterations including KFs in microsatellite-stable and microsatellite-unstable CRC. Genomic contexts of such alterations were analyzed to obtain mechanistic insights. RESULTS: Genomic analysis demonstrated that oncogenic fusion breakpoints in MSI tumors do not preferentially involve repetitive or low-complexity sequences. Instead, their junction regions showed pronounced guanine and cytosine bias and elevated mutation frequency at G:C contexts. Elevated mutation frequency at G:C bases in relevant introns predicted prevalence of associated oncogenic fusions in MSI CRCs. CRCs harboring mismatch repair signatures had enrichment of butyrate-producing microbial species, reported to be associated with induction of 8-oxoguanine lesions in the intestine. CONCLUSION: Detailed analysis of breakpoints in MSI-associated KFs support a model in which inefficient repair and/or processing of microbiome-induced clustered 8-oxoguanine damage in MSI CRC contributes to the increased incidence of specific oncogenic fusions.


Subject(s)
Colorectal Neoplasms , Carcinogenesis/genetics , Colorectal Neoplasms/genetics , Gene Fusion , Guanine , Humans , Microsatellite Instability , Microsatellite Repeats , Mutation
6.
Int J Cancer ; 148(7): 1778-1788, 2021 04 01.
Article in English | MEDLINE | ID: mdl-33336398

ABSTRACT

Based on the approvals of crizotinib and entrectinib by the Food and Drug Administration for the treatment of ROS1 positive nonsmall cell lung cancer (NSCLC), we sought to examine the mutational profile of a variety of solid tumors (excluding sarcomas) with ROS1 fusions that underwent comprehensive genomic profiling. A review of our database was performed to extract all nonsarcoma patients with ROS1 fusions that were discovered by the hybrid capture-based DNA only sequencing assays. We examined the coalterations representing potentially targetable biomarkers, resistance alterations and other alterations in these cases. In addition, we examined the histologic characteristics and protein expression with immunohistochemistry (IHC). From a series of clinically advanced nonsarcoma solid tumors, 356 unique cases with ROS1 fusions included 275 (77.2%) NSCLC and 81 (22.8%) non-NSCLC. Ten novel ROS1 fusions were discovered. Importantly, the NSCLC ROS1 fusionpos tumors had a higher PD-L1 IHC expression positivity when compared to the NSCLC ROS1 fusionneg population (P = .012, Chi-squared). The frequency of known and likely anti-ROS1 targeted therapy resistance genomic alterations in NSCLC was 7.3% (20/275) and in non-NSCLC was 4.9% (4/81). Overall, the coalteration profile of ROS1 fusionpos NSCLC and non-NSCLC was similar with only three genes altered significantly more frequently in non-NSCLC vs NSCLC: TERT, PTEN, APC. In our study, we characterized a large cohort of ROS1 fusionpos NSCLC and non-NSCLC solid tumors and discovered 10 novel ROS1 fusions.


Subject(s)
Biomarkers, Tumor/genetics , Lung Neoplasms/genetics , Oncogene Fusion/genetics , Protein-Tyrosine Kinases/genetics , Proto-Oncogene Proteins/genetics , Aged , B7-H1 Antigen/metabolism , Carcinoma, Non-Small-Cell Lung/genetics , Carcinoma, Non-Small-Cell Lung/pathology , Cohort Studies , Databases, Genetic , Female , Genomics , Humans , Immunohistochemistry , Lung Neoplasms/pathology , Male , Middle Aged , Mutation , Protein-Tyrosine Kinases/antagonists & inhibitors , Protein-Tyrosine Kinases/metabolism , Proto-Oncogene Proteins/antagonists & inhibitors , Proto-Oncogene Proteins/metabolism , Retrospective Studies
7.
PLoS Comput Biol ; 14(1): e1005939, 2018 01.
Article in English | MEDLINE | ID: mdl-29338008

ABSTRACT

Microbiota contribute to many dimensions of host phenotype, including disease. To link specific microbes to specific phenotypes, microbiome-wide association studies compare microbial abundances between two groups of samples. Abundance differences, however, reflect not only direct associations with the phenotype, but also indirect effects due to microbial interactions. We found that microbial interactions could easily generate a large number of spurious associations that provide no mechanistic insight. Using techniques from statistical physics, we developed a method to remove indirect associations and applied it to the largest dataset on pediatric inflammatory bowel disease. Our method corrected the inflation of p-values in standard association tests and showed that only a small subset of associations is directly linked to the disease. Direct associations had a much higher accuracy in separating cases from controls and pointed to immunomodulation, butyrate production, and the brain-gut axis as important factors in the inflammatory bowel disease.


Subject(s)
Crohn Disease/microbiology , Gastrointestinal Microbiome , Inflammatory Bowel Diseases/microbiology , Algorithms , Child , Computational Biology , Genome-Wide Association Study , Humans , Microbial Interactions , Models, Statistical , Phenotype , Regression Analysis , Reproducibility of Results , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...