Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Front Microbiol ; 15: 1328083, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38440141

RESUMO

Cyanobacteria form diverse communities and are important primary producers in Antarctic freshwater environments, but their geographic distribution patterns in Antarctica and globally are still unresolved. There are however few genomes of cultured cyanobacteria from Antarctica available and therefore metagenome-assembled genomes (MAGs) from Antarctic cyanobacteria microbial mats provide an opportunity to explore distribution of uncultured taxa. These MAGs also allow comparison with metagenomes of cyanobacteria enriched communities from a range of habitats, geographic locations, and climates. However, most MAGs do not contain 16S rRNA gene sequences, making a 16S rRNA gene-based biogeography comparison difficult. An alternative technique is to use large-scale k-mer searching to find genomes of interest in public metagenomes. This paper presents the results of k-mer based searches for 5 Antarctic cyanobacteria MAGs from Lake Fryxell and Lake Vanda, assigned the names Phormidium pseudopriestleyi FRX01, Microcoleus sp. MP8IB2.171, Leptolyngbya sp. BulkMat.35, Pseudanabaenaceae cyanobacterium MP8IB2.15, and Leptolyngbyaceae cyanobacterium MP9P1.79 in 498,942 unassembled metagenomes from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). The Microcoleus sp. MP8IB2.171 MAG was found in a wide variety of environments, the P. pseudopriestleyi MAG was found in environments with challenging conditions, the Leptolyngbyaceae cyanobacterium MP9P1.79 MAG was only found in Antarctica, and the Leptolyngbya sp. BulkMat.35 and Pseudanabaenaceae cyanobacterium MP8IB2.15 MAGs were found in Antarctic and other cold environments. The findings based on metagenome matches and global comparisons suggest that these Antarctic cyanobacteria have distinct distribution patterns ranging from locally restricted to global distribution across the cold biosphere and other climatic zones.

2.
Genome Res ; 33(7): 1061-1068, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37344105

RESUMO

Sketching methods offer computational biologists scalable techniques to analyze data sets that continue to grow in size. MinHash is one such technique to estimate set similarity that has enjoyed recent broad application. However, traditional MinHash has previously been shown to perform poorly when applied to sets of very dissimilar sizes. FracMinHash was recently introduced as a modification of MinHash to compensate for this lack of performance when set sizes differ. This approach has been successfully applied to metagenomic taxonomic profiling in the widely used tool sourmash gather. Although experimental evidence has been encouraging, FracMinHash has not yet been analyzed from a theoretical perspective. In this paper, we perform such an analysis to derive various statistics of FracMinHash, and prove that although FracMinHash is not unbiased (in the sense that its expected value is not equal to the quantity it attempts to estimate), this bias is easily corrected for both the containment and Jaccard index versions. Next, we show how FracMinHash can be used to compute point estimates as well as confidence intervals for evolutionary mutation distance between a pair of sequences by assuming a simple mutation model. We also investigate edge cases in which these analyses may fail to effectively warn the users of FracMinHash indicating the likelihood of such cases. Our analyses show that FracMinHash estimates the containment of a genome in a large metagenome more accurately and more precisely compared with traditional MinHash, and the point estimates and confidence intervals perform significantly better in estimating mutation distances.


Assuntos
Evolução Biológica , Taxa de Mutação , Intervalos de Confiança , Metagenoma , Metagenômica/métodos
4.
BMC Bioinformatics ; 23(1): 541, 2022 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-36513983

RESUMO

BACKGROUND: Long-read shotgun metagenomic sequencing is gaining in popularity and offers many advantages over short-read sequencing. The higher information content in long reads is useful for a variety of metagenomics analyses, including taxonomic classification and profiling. The development of long-read specific tools for taxonomic classification is accelerating, yet there is a lack of information regarding their relative performance. Here, we perform a critical benchmarking study using 11 methods, including five methods designed specifically for long reads. We applied these tools to several mock community datasets generated using Pacific Biosciences (PacBio) HiFi or Oxford Nanopore Technology sequencing, and evaluated their performance based on read utilization, detection metrics, and relative abundance estimates. RESULTS: Our results show that long-read classifiers generally performed best. Several short-read classification and profiling methods produced many false positives (particularly at lower abundances), required heavy filtering to achieve acceptable precision (at the cost of reduced recall), and produced inaccurate abundance estimates. By contrast, two long-read methods (BugSeq, MEGAN-LR & DIAMOND) and one generalized method (sourmash) displayed high precision and recall without any filtering required. Furthermore, in the PacBio HiFi datasets these methods detected all species down to the 0.1% abundance level with high precision. Some long-read methods, such as MetaMaps and MMseqs2, required moderate filtering to reduce false positives to resemble the precision and recall of the top-performing methods. We found read quality affected performance for methods relying on protein prediction or exact k-mer matching, and these methods performed better with PacBio HiFi datasets. We also found that long-read datasets with a large proportion of shorter reads (< 2 kb length) resulted in lower precision and worse abundance estimates, relative to length-filtered datasets. Finally, for classification methods, we found that the long-read datasets produced significantly better results than short-read datasets, demonstrating clear advantages for long-read metagenomic sequencing. CONCLUSIONS: Our critical assessment of available methods provides best-practice recommendations for current research using long reads and establishes a baseline for future benchmarking studies.


Assuntos
Metagenoma , Metagenômica , Metagenômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Benchmarking , Análise de Sequência de DNA/métodos
5.
Front Microbiol ; 13: 887310, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35663905

RESUMO

Genomics has put prokaryotic rank-based taxonomy on a solid phylogenetic foundation. However, most taxonomic ranks were set long before the advent of DNA sequencing and genomics. In this concept paper, we thus ask the following question: should prokaryotic classification schemes besides the current phylum-to-species ranks be explored, developed, and incorporated into scientific discourse? Could such alternative schemes provide better solutions to the basic need of science and society for which taxonomy was developed, namely, precise and meaningful identification? A neutral genome-similarity based framework is then described that could allow alternative classification schemes to be explored, compared, and translated into each other without having to choose only one as the gold standard. Classification schemes could thus continue to evolve and be selected according to their benefits and based on how well they fulfill the need for prokaryotic identification.

6.
Mol Ecol ; 30(23): 6403-6416, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34003535

RESUMO

Reproductive isolation is often achieved when genes that are neutral or beneficial in their genomic background become functionally incompatible in a foreign genomic background, causing inviability, sterility or other forms of low fitness in hybrids. Recent studies suggest that mitonuclear interactions are among the initial incompatibilities to evolve at early stages of population divergence across taxa. Yet, the genomic architecture of mitonuclear incompatibilities has rarely been elucidated. We employ an experimental evolution approach starting with low-fitness F2 interpopulation hybrids of the copepod Tigriopus californicus, in which frequencies of compatible and incompatible nuclear alleles change in response to an alternative mitochondrial background. After about nine generations, we observe a generalized increase in population size and in survivorship, suggesting efficiency of selection against maladaptive phenotypes. Whole genome sequencing of evolved populations showed some consistent allele frequency changes across three replicates of each reciprocal cross, but markedly different patterns between mitochondrial backgrounds. In only a few regions (~6.5% of the genome), the same parental allele was overrepresented irrespective of the mitochondrial background. About 33% of the genome showed allele frequency changes consistent with divergent selection, with the location of these genomic regions strongly differing between mitochondrial backgrounds. In 87% and 89% of these genomic regions, the dominant nuclear allele matched the associated mitochondrial background, consistent with mitonuclear co-adaptation. These results suggest that mitonuclear incompatibilities have a complex polygenic architecture that differs between populations, potentially generating genome-wide barriers to gene flow between closely related taxa.


Assuntos
Copépodes , Isolamento Reprodutivo , Alelos , Animais , Núcleo Celular/genética , Copépodes/genética , Hibridização Genética , Mitocôndrias/genética
7.
Gigascience ; 10(1)2021 01 13.
Artigo em Inglês | MEDLINE | ID: mdl-33438730

RESUMO

As the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. These workflows can produce hundreds to thousands of intermediate files and results that must be integrated for biological insight. Data-centric workflow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis and empowering researchers to conduct reproducible analyses at scale. Adoption of these tools can facilitate and expedite robust data analysis, but knowledge of these techniques is still lacking. Here, we provide a series of strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis. We present these practices in the context of high-throughput sequencing data analysis, but the principles are broadly applicable to biologists working beyond this field.


Assuntos
Biologia Computacional , Software , Análise de Dados , Sequenciamento de Nucleotídeos em Larga Escala , Fluxo de Trabalho
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...