Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nat Biotechnol ; 40(7): 1075-1081, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35228706

RESUMO

Although most existing genome assemblers are based on de Bruijn graphs, the construction of these graphs for large genomes and large k-mer sizes has remained elusive. This algorithmic challenge has become particularly pressing with the emergence of long, high-fidelity (HiFi) reads that have been recently used to generate a semi-manual telomere-to-telomere assembly of the human genome. To enable automated assemblies of long, HiFi reads, we present the La Jolla Assembler (LJA), a fast algorithm using the Bloom filter, sparse de Bruijn graphs and disjointig generation. LJA reduces the error rate in HiFi reads by three orders of magnitude, constructs the de Bruijn graph for large genomes and large k-mer sizes and transforms it into a multiplex de Bruijn graph with varying k-mer sizes. Compared to state-of-the-art assemblers, our algorithm not only achieves five-fold fewer misassemblies but also generates more contiguous assemblies. We demonstrate the utility of LJA via the automated assembly of a human genome that completely assembled six chromosomes.


Assuntos
Algoritmos , Genoma Humano , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA , Software
2.
Microbiome ; 9(1): 149, 2021 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-34183047

RESUMO

BACKGROUND: Since the prolonged use of insecticidal proteins has led to toxin resistance, it is important to search for novel insecticidal protein genes (IPGs) that are effective in controlling resistant insect populations. IPGs are usually encoded in the genomes of entomopathogenic bacteria, especially in large plasmids in strains of the ubiquitous soil bacteria, Bacillus thuringiensis (Bt). Since there are often multiple similar IPGs encoded by such plasmids, their assemblies are typically fragmented and many IPGs are scattered through multiple contigs. As a result, existing gene prediction tools (that analyze individual contigs) typically predict partial rather than complete IPGs, making it difficult to conduct downstream IPG engineering efforts in agricultural genomics. METHODS: Although it is difficult to assemble IPGs in a single contig, the structure of the genome assembly graph often provides clues on how to combine multiple contigs into segments encoding a single IPG. RESULTS: We describe ORFograph, a pipeline for predicting IPGs in assembly graphs, benchmark it on (meta)genomic datasets, and discover nearly a hundred novel IPGs. This work shows that graph-aware gene prediction tools enable the discovery of greater diversity of IPGs from (meta)genomes. CONCLUSIONS: We demonstrated that analysis of the assembly graphs reveals novel candidate IPGs. ORFograph identified both already known genes "hidden" in assembly graphs and potential novel IPGs that evaded existing tools for IPG identification. As ORFograph is fast, one could imagine a pipeline that processes many (meta)genomic assembly graphs to identify even more novel IPGs for phenotypic testing than would previously be inaccessible by traditional gene-finding methods. While here we demonstrated the results of ORFograph only for IPGs, the proposed approach can be generalized to any class of genes. Video abstract.


Assuntos
Inseticidas , Algoritmos , Genômica , Metagenoma , Metagenômica
3.
Genome Biol ; 20(1): 226, 2019 10 31.
Artigo em Inglês | MEDLINE | ID: mdl-31672156

RESUMO

As metagenomic studies move to increasing numbers of samples, communities like the human gut may benefit more from the assembly of abundant microbes in many samples, rather than the exhaustive assembly of fewer samples. We term this approach leaderboard metagenome sequencing. To explore protocol optimization for leaderboard metagenomics in real samples, we introduce a benchmark of library prep and sequencing using internal references generated by synthetic long-read technology, allowing us to evaluate high-throughput library preparation methods against gold-standard reference genomes derived from the samples themselves. We introduce a low-cost protocol for high-throughput library preparation and sequencing.


Assuntos
Biblioteca Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Metagenômica/métodos , Animais , Benchmarking , Microbioma Gastrointestinal , Humanos , Camundongos
4.
Bioinformatics ; 35(14): i61-i70, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510642

RESUMO

MOTIVATION: The recently developed barcoding-based synthetic long read (SLR) technologies have already found many applications in genome assembly and analysis. However, although some new barcoding protocols are emerging and the range of SLR applications is being expanded, the existing SLR assemblers are optimized for a narrow range of parameters and are not easily extendable to new barcoding technologies and new applications such as metagenomics or hybrid assembly. RESULTS: We describe the algorithmic challenge of the SLR assembly and present a cloudSPAdes algorithm for SLR assembly that is based on analyzing the de Bruijn graph of SLRs. We benchmarked cloudSPAdes across various barcoding technologies/applications and demonstrated that it improves on the state-of-the-art SLR assemblers in accuracy and speed. AVAILABILITY AND IMPLEMENTATION: Source code and installation manual for cloudSPAdes are available at https://github.com/ablab/spades/releases/tag/cloudspades-paper. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Computação em Nuvem , Análise de Sequência de DNA , Software , Sequenciamento de Nucleotídeos em Larga Escala , Metagenômica
5.
Cell Syst ; 7(2): 192-200.e3, 2018 08 22.
Artigo em Inglês | MEDLINE | ID: mdl-30056005

RESUMO

Reduced microbiome diversity has been linked to several diseases. However, estimating the diversity of bacterial communities-the number and the total length of distinct genomes within a metagenome-remains an open problem in microbial ecology. Here, we describe an algorithm for estimating the microbial diversity in a metagenomic sample based on a joint analysis of short and long reads. Unlike previous approaches, the algorithm does not make any assumptions on the distribution of the frequencies of genomes within a metagenome (as in parametric methods) and does not require a large database that covers the total diversity (as in non-parametric methods). We estimate that genomes comprising a human gut metagenome have total length varying from 1.3 to 3.5 billion nucleotides, with genomes responsible for 50% of total abundance having total length varying from only 25 to 61 million nucleotides. In contrast, genomes comprising an aquifer sediment metagenome have more than two orders of magnitude larger total length (≈840 billion nucleotides).


Assuntos
Microbioma Gastrointestinal , Genoma Bacteriano , Metagenômica/métodos , Algoritmos , Bactérias/genética , Variação Genética , Humanos , Metagenoma , Análise de Sequência de DNA
6.
Genome Biol ; 17(1): 211, 2016 10 11.
Artigo em Inglês | MEDLINE | ID: mdl-27802837

RESUMO

BACKGROUND: There are three main dietary groups in mammals: carnivores, omnivores, and herbivores. Currently, there is limited comparative genomics insight into the evolution of dietary specializations in mammals. Due to recent advances in sequencing technologies, we were able to perform in-depth whole genome analyses of representatives of these three dietary groups. RESULTS: We investigated the evolution of carnivory by comparing 18 representative genomes from across Mammalia with carnivorous, omnivorous, and herbivorous dietary specializations, focusing on Felidae (domestic cat, tiger, lion, cheetah, and leopard), Hominidae, and Bovidae genomes. We generated a new high-quality leopard genome assembly, as well as two wild Amur leopard whole genomes. In addition to a clear contraction in gene families for starch and sucrose metabolism, the carnivore genomes showed evidence of shared evolutionary adaptations in genes associated with diet, muscle strength, agility, and other traits responsible for successful hunting and meat consumption. Additionally, an analysis of highly conserved regions at the family level revealed molecular signatures of dietary adaptation in each of Felidae, Hominidae, and Bovidae. However, unlike carnivores, omnivores and herbivores showed fewer shared adaptive signatures, indicating that carnivores are under strong selective pressure related to diet. Finally, felids showed recent reductions in genetic diversity associated with decreased population sizes, which may be due to the inflexible nature of their strict diet, highlighting their vulnerability and critical conservation status. CONCLUSIONS: Our study provides a large-scale family level comparative genomic analysis to address genomic changes associated with dietary specialization. Our genomic analyses also provide useful resources for diet-related genetic and health research.


Assuntos
Variação Genética , Genoma , Panthera/genética , Análise de Sequência de DNA , Adaptação Fisiológica/genética , Animais , Evolução Biológica , Gatos , Herbivoria/genética , Mamíferos/genética , Anotação de Sequência Molecular , Filogenia
7.
Nat Methods ; 13(3): 248-50, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26828418

RESUMO

The recently introduced TruSeq synthetic long read (TSLR) technology generates long and accurate virtual reads from an assembly of barcoded pools of short reads. The TSLR method provides an attractive alternative to existing sequencing platforms that generate long but inaccurate reads. We describe the truSPAdes algorithm (http://bioinf.spbau.ru/spades) for TSLR assembly and show that it results in a dramatic improvement in the quality of metagenomics assemblies.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Código de Barras de DNA Taxonômico/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Sequência de Bases , Biblioteca Gênica , Dados de Sequência Molecular , Alinhamento de Sequência/métodos
8.
J Comput Biol ; 22(6): 528-45, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25734602

RESUMO

While the number of sequenced diploid genomes have been steadily increasing in the last few years, assembly of highly polymorphic (HP) diploid genomes remains challenging. As a result, there is a shortage of tools for assembling HP genomes from the next generation sequencing (NGS) data. The initial approaches to assembling HP genomes were proposed in the pre-NGS era and are not well suited for NGS projects. To address this limitation, we developed the first de Bruijn graph assembler, dipSPAdes, for HP genomes that significantly improves on the state-of-the-art assemblers for HP diploid genomes.


Assuntos
Genoma/genética , Análise de Sequência de DNA/métodos , Algoritmos , Biologia Computacional/métodos , Diploide , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software
9.
Bioinformatics ; 30(12): i293-301, 2014 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-24931996

RESUMO

UNLABELLED: Next-generation sequencing (NGS) technologies have raised a challenging de novo genome assembly problem that is further amplified in recently emerged single-cell sequencing projects. While various NGS assemblers can use information from several libraries of read-pairs, most of them were originally developed for a single library and do not fully benefit from multiple libraries. Moreover, most assemblers assume uniform read coverage, condition that does not hold for single-cell projects where utilization of read-pairs is even more challenging. We have developed an exSPAnder algorithm that accurately resolves repeats in the case of both single and multiple libraries of read-pairs in both standard and single-cell assembly projects. AVAILABILITY AND IMPLEMENTATION: http://bioinf.spbau.ru/en/spades


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Actinomycetales/genética , DNA/química , Biblioteca Gênica , Genoma Bacteriano , Humanos , Sequências Repetitivas de Ácido Nucleico , Staphylococcus aureus/genética
10.
J Comput Biol ; 20(10): 714-37, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24093227

RESUMO

Recent advances in single-cell genomics provide an alternative to largely gene-centric metagenomics studies, enabling whole-genome sequencing of uncultivated bacteria. However, single-cell assembly projects are challenging due to (i) the highly nonuniform read coverage and (ii) a greatly elevated number of chimeric reads and read pairs. While recently developed single-cell assemblers have addressed the former challenge, methods for assembling highly chimeric reads remain poorly explored. We present algorithms for identifying chimeric edges and resolving complex bulges in de Bruijn graphs, which significantly improve single-cell assemblies. We further describe applications of the single-cell assembler SPAdes to a new approach for capturing and sequencing "microbial dark matter" that forms small pools of randomly selected single cells (called a mini-metagenome) and further sequences all genomes from the mini-metagenome at once. On single-cell bacterial datasets, SPAdes improves on the recently developed E+V-SC and IDBA-UD assemblers specifically designed for single-cell sequencing. For standard (cultivated monostrain) datasets, SPAdes also improves on A5, ABySS, CLC, EULER-SR, Ray, SOAPdenovo, and Velvet. Thus, recently developed single-cell assemblers not only enable single-cell sequencing, but also improve on conventional assemblers on their own turf. SPAdes is available for free online download under a GPLv2 license.


Assuntos
Mapeamento de Sequências Contíguas/métodos , DNA Bacteriano/genética , DNA Concatenado/genética , Algoritmos , Composição de Bases , Biologia Computacional , Escherichia coli/genética , Biblioteca Gênica , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala , Técnicas de Amplificação de Ácido Nucleico , Pedobacter/genética , Prochlorococcus/genética , Análise de Sequência de DNA , Análise de Célula Única
11.
J Comput Biol ; 19(5): 455-77, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22506599

RESUMO

The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.


Assuntos
Algoritmos , Bactérias/genética , Genoma Bacteriano , Metagenômica/métodos , Análise de Célula Única/métodos , Análise de Sequência de DNA/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...