Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Tipo de estudo
Intervalo de ano de publicação
1.
Microb Genom ; 10(5)2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38785221

RESUMO

Wastewater-based surveillance (WBS) is an important epidemiological and public health tool for tracking pathogens across the scale of a building, neighbourhood, city, or region. WBS gained widespread adoption globally during the SARS-CoV-2 pandemic for estimating community infection levels by qPCR. Sequencing pathogen genes or genomes from wastewater adds information about pathogen genetic diversity, which can be used to identify viral lineages (including variants of concern) that are circulating in a local population. Capturing the genetic diversity by WBS sequencing is not trivial, as wastewater samples often contain a diverse mixture of viral lineages with real mutations and sequencing errors, which must be deconvoluted computationally from short sequencing reads. In this study we assess nine different computational tools that have recently been developed to address this challenge. We simulated 100 wastewater sequence samples consisting of SARS-CoV-2 BA.1, BA.2, and Delta lineages, in various mixtures, as well as a Delta-Omicron recombinant and a synthetic 'novel' lineage. Most tools performed well in identifying the true lineages present and estimating their relative abundances and were generally robust to variation in sequencing depth and read length. While many tools identified lineages present down to 1 % frequency, results were more reliable above a 5 % threshold. The presence of an unknown synthetic lineage, which represents an unclassified SARS-CoV-2 lineage, increases the error in relative abundance estimates of other lineages, but the magnitude of this effect was small for most tools. The tools also varied in how they labelled novel synthetic lineages and recombinants. While our simulated dataset represents just one of many possible use cases for these methods, we hope it helps users understand potential sources of error or bias in wastewater sequencing analysis and to appreciate the commonalities and differences across methods.


Assuntos
COVID-19 , Genoma Viral , SARS-CoV-2 , Águas Residuárias , Águas Residuárias/virologia , SARS-CoV-2/genética , SARS-CoV-2/classificação , COVID-19/virologia , COVID-19/epidemiologia , Humanos , Biologia Computacional/métodos , Genômica/métodos , Vigilância Epidemiológica Baseada em Águas Residuárias , Filogenia
2.
BMC Bioinformatics ; 25(1): 126, 2024 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-38521945

RESUMO

BACKGROUND: Metagenomic profiling algorithms commonly rely on genomic differences between lineages, strains, or species to infer the relative abundances of sequences present in a sample. This observation plays an important role in the analysis of diverse microbial communities, where targeted sequencing of 16S and 18S rRNA, both well-known hypervariable genomic regions, have led to insights into microbial diversity and the discovery of novel organisms. However, the variable nature of discriminatory regions can also act as a double-edged sword, as the sought-after variability can make it difficult to design primers for their amplification through PCR. Moreover, the most variable regions are not necessarily the most informative regions for the purpose of differentiation; one should focus on regions that maximize the number of lineages that can be distinguished. RESULTS: Here we present AmpliDiff, a computational tool that simultaneously finds highly discriminatory genomic regions in viral genomes of a single species, as well as primers allowing for the amplification of these regions. We show that regions and primers found by AmpliDiff can be used to accurately estimate relative abundances of SARS-CoV-2 lineages, for example in wastewater sequencing data. We obtain errors that are comparable with using whole genome information to estimate relative abundances. Furthermore, our results show that AmpliDiff is robust against incomplete input data and that primers designed by AmpliDiff also bind to genomes sampled months after the primers were selected. CONCLUSIONS: With AmpliDiff we provide an effective, cost-efficient alternative to whole genome sequencing for estimating lineage abundances in viral metagenomes.


Assuntos
Metagenoma , Microbiota , Primers do DNA/genética , Algoritmos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , RNA Ribossômico 16S/genética
3.
Genome Biol ; 23(1): 236, 2022 11 08.
Artigo em Inglês | MEDLINE | ID: mdl-36348471

RESUMO

Effectively monitoring the spread of SARS-CoV-2 mutants is essential to efforts to counter the ongoing pandemic. Predicting lineage abundance from wastewater, however, is technically challenging. We show that by sequencing SARS-CoV-2 RNA in wastewater and applying algorithms initially used for transcriptome quantification, we can estimate lineage abundance in wastewater samples. We find high variability in signal among individual samples, but the overall trends match those observed from sequencing clinical samples. Thus, while clinical sequencing remains a more sensitive technique for population surveillance, wastewater sequencing can be used to monitor trends in mutant prevalence in situations where clinical sequencing is unavailable.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Águas Residuárias , RNA Viral/genética , Transcriptoma
4.
Nat Comput ; 21(1): 81-108, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36969737

RESUMO

Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora of software tools for the analysis of the human genome. These tools allowed computational biologists to approach ambitious projects at population scale, such as the 1000 Genomes Project. A major contribution of the 1000 Genomes Project is the characterization of a broad spectrum of genetic variations in the human genome, including the discovery of novel variations in the South Asian, African and European populations-thus enhancing the catalogue of variability within the reference genome. Currently, the need to take into account the high variability in population genomes as well as the specificity of an individual genome in a personalized approach to medicine is rapidly pushing the abandonment of the traditional paradigm of using a single reference genome. A graph-based representation of multiple genomes, or a graph pangenome, is replacing the linear reference genome. This means completely rethinking well-established procedures to analyze, store, and access information from genome representations. Properly addressing these challenges is crucial to face the computational tasks of ambitious healthcare projects aiming to characterize human diversity by sequencing 1M individuals (Stark et al. 2019). This tutorial aims to introduce readers to the most recent advances in the theory of data structures for the representation of graph pangenomes. We discuss efficient representations of haplotypes and the variability of genotypes in graph pangenomes, and highlight applications in solving computational problems in human and microbial (viral) pangenomes.

5.
medRxiv ; 2021 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-34494031

RESUMO

Effectively monitoring the spread of SARS-CoV-2 variants is essential to efforts to counter the ongoing pandemic. Wastewater monitoring of SARS-CoV-2 RNA has proven an effective and efficient technique to approximate COVID-19 case rates in the population. Predicting variant abundances from wastewater, however, is technically challenging. Here we show that by sequencing SARS-CoV-2 RNA in wastewater and applying computational techniques initially used for RNA-Seq quantification, we can estimate the abundance of variants in wastewater samples. We show by sequencing samples from wastewater and clinical isolates in Connecticut U.S.A. between January and April 2021 that the temporal dynamics of variant strains broadly correspond. We further show that this technique can be used with other wastewater sequencing techniques by expanding to samples taken across the United States in a similar timeframe. We find high variability in signal among individual samples, and limited ability to detect the presence of variants with clinical frequencies <10%; nevertheless, the overall trends match what we observed from sequencing clinical samples. Thus, while clinical sequencing remains a more sensitive technique for population surveillance, wastewater sequencing can be used to monitor trends in variant prevalence in situations where clinical sequencing is unavailable or impractical.

6.
Genome Biol ; 21(1): 31, 2020 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-32033589

RESUMO

The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.


Assuntos
Ciência de Dados/métodos , Genômica/métodos , RNA-Seq/métodos , Análise de Célula Única/métodos , Animais , Humanos
7.
Bioinformatics ; 35(24): 5086-5094, 2019 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-31147688

RESUMO

MOTIVATION: Viruses populate their hosts as a viral quasispecies: a collection of genetically related mutant strains. Viral quasispecies assembly is the reconstruction of strain-specific haplotypes from read data, and predicting their relative abundances within the mix of strains is an important step for various treatment-related reasons. Reference genome independent ('de novo') approaches have yielded benefits over reference-guided approaches, because reference-induced biases can become overwhelming when dealing with divergent strains. While being very accurate, extant de novo methods only yield rather short contigs. The remaining challenge is to reconstruct full-length haplotypes together with their abundances from such contigs. RESULTS: We present Virus-VG as a de novo approach to viral haplotype reconstruction from preassembled contigs. Our method constructs a variation graph from the short input contigs without making use of a reference genome. Then, to obtain paths through the variation graph that reflect the original haplotypes, we solve a minimization problem that yields a selection of maximal-length paths that is, optimal in terms of being compatible with the read coverages computed for the nodes of the variation graph. We output the resulting selection of maximal length paths as the haplotypes, together with their abundances. Benchmarking experiments on challenging simulated and real datasets show significant improvements in assembly contiguity compared to the input contigs, while preserving low error rates compared to the state-of-the-art viral quasispecies assemblers. AVAILABILITY AND IMPLEMENTATION: Virus-VG is freely available at https://bitbucket.org/jbaaijens/virus-vg. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Quase-Espécies , Algoritmos , Genoma , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software
8.
Bioinformatics ; 35(21): 4281-4289, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-30994902

RESUMO

MOTIVATION: Haplotype-aware genome assembly plays an important role in genetics, medicine and various other disciplines, yet generation of haplotype-resolved de novo assemblies remains a major challenge. Beyond distinguishing between errors and true sequential variants, one needs to assign the true variants to the different genome copies. Recent work has pointed out that the enormous quantities of traditional NGS read data have been greatly underexploited in terms of haplotig computation so far, which reflects that methodology for reference independent haplotig computation has not yet reached maturity. RESULTS: We present POLYploid genome fitTEr (POLYTE) as a new approach to de novo generation of haplotigs for diploid and polyploid genomes of known ploidy. Our method follows an iterative scheme where in each iteration reads or contigs are joined, based on their interplay in terms of an underlying haplotype-aware overlap graph. Along the iterations, contigs grow while preserving their haplotype identity. Benchmarking experiments on both real and simulated data demonstrate that POLYTE establishes new standards in terms of error-free reconstruction of haplotype-specific sequence. As a consequence, POLYTE outperforms state-of-the-art approaches in various relevant aspects, where advantages become particularly distinct in polyploid settings. AVAILABILITY AND IMPLEMENTATION: POLYTE is freely available as part of the HaploConduct package at https://github.com/HaploConduct/HaploConduct, implemented in Python and C++. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Diploide , Poliploidia , Algoritmos , Genoma , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA
9.
Genome Res ; 27(5): 835-848, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28396522

RESUMO

A viral quasispecies, the ensemble of viral strains populating an infected person, can be highly diverse. For optimal assessment of virulence, pathogenesis, and therapy selection, determining the haplotypes of the individual strains can play a key role. As many viruses are subject to high mutation and recombination rates, high-quality reference genomes are often not available at the time of a new disease outbreak. We present SAVAGE, a computational tool for reconstructing individual haplotypes of intra-host virus strains without the need for a high-quality reference genome. SAVAGE makes use of either FM-index-based data structures or ad hoc consensus reference sequence for constructing overlap graphs from patient sample data. In this overlap graph, nodes represent reads and/or contigs, while edges reflect that two reads/contigs, based on sound statistical considerations, represent identical haplotypic sequence. Following an iterative scheme, a new overlap assembly algorithm that is based on the enumeration of statistically well-calibrated groups of reads/contigs then efficiently reconstructs the individual haplotypes from this overlap graph. In benchmark experiments on simulated and on real deep-coverage data, SAVAGE drastically outperforms generic de novo assemblers as well as the only specialized de novo viral quasispecies assembler available so far. When run on ad hoc consensus reference sequence, SAVAGE performs very favorably in comparison with state-of-the-art reference genome-guided tools. We also apply SAVAGE on two deep-coverage samples of patients infected by the Zika and the hepatitis C virus, respectively, which sheds light on the genetic structures of the respective viral quasispecies.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma Viral , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Mapeamento de Sequências Contíguas/normas , Genômica/normas , Haplótipos , Hepacivirus/genética , Polimorfismo Genético , Padrões de Referência , Análise de Sequência de DNA/normas , Zika virus/genética
10.
Nat Commun ; 7: 12989, 2016 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-27708267

RESUMO

Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes. However, the majority of studies provide neither a global view of the full spectrum of these variants nor integrate them into reference panels of genetic variation. Here, we analyse whole genome sequencing data of 769 individuals from 250 Dutch families, and provide a haplotype-resolved map of 1.9 million genome variants across 9 different variant classes, including novel forms of complex indels, and retrotransposition-mediated insertions of mobile elements and processed RNAs. A large proportion are previously under reported variants sized between 21 and 100 bp. We detect 4 megabases of novel sequence, encoding 11 new transcripts. Finally, we show 191 known, trait-associated SNPs to be in strong linkage disequilibrium with SVs and demonstrate that our panel facilitates accurate imputation of SVs in unrelated individuals.


Assuntos
Genoma Humano , Variação Estrutural do Genoma , Genômica , Algoritmos , Cromossomos/ultraestrutura , Biologia Computacional , Deleção de Genes , Genótipo , Haplótipos , Humanos , Mutação INDEL , Desequilíbrio de Ligação , Países Baixos , Reação em Cadeia da Polimerase , Polimorfismo de Nucleotídeo Único , RNA/metabolismo , Análise de Sequência de DNA , Análise de Sequência de RNA , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...