Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Viruses ; 14(9)2022 08 23.
Artigo em Inglês | MEDLINE | ID: mdl-36146653

RESUMO

Bacteriophages play key roles in the dynamics of the human microbiome. By far the most abundant components of the human gut virome are tailed bacteriophages of the realm Duplodnaviria, in particular, crAss-like phages. However, apart from duplodnaviruses, the gut virome has not been dissected in detail. Here we report a comprehensive census of a minor component of the gut virome, the tailless bacteriophages of the realm Varidnaviria. Tailless phages are primarily represented in the gut by prophages, that are mostly integrated in genomes of Alphaproteobacteria and Verrucomicrobia and belong to the order Vinavirales, which currently consists of the families Corticoviridae and Autolykiviridae. Phylogenetic analysis of the major capsid proteins (MCP) suggests that at least three new families should be established within Vinavirales to accommodate the diversity of prophages from the human gut virome. Previously, only the MCP and packaging ATPase genes were reported as conserved core genes of Vinavirales. Here we report an extended core set of 12 proteins, including MCP, packaging ATPase, and previously undetected lysis enzymes, that are shared by most of these viruses. We further demonstrate that replication system components are frequently replaced in the genomes of Vinavirales, suggestive of selective pressure for escape from yet unknown host defenses or avoidance of incompatibility with coinfecting related viruses. The results of this analysis show that, in a sharp contrast to marine viromes, varidnaviruses are a minor component of the human gut virome. Moreover, they are primarily represented by prophages, as indicated by the analysis of the flanking genes, suggesting that there are few, if any, lytic varidnavirus infections in the gut at any given time. These findings complement the existing knowledge of the human gut virome by exploring a group of viruses that has been virtually overlooked in previous work.


Assuntos
Bacteriófagos , Vírus , Adenosina Trifosfatases/genética , Bacteriófagos/genética , Proteínas do Capsídeo/genética , Humanos , Intestinos , Filogenia , Prófagos/genética
2.
Nat Methods ; 19(4): 429-440, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35396482

RESUMO

Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses.


Assuntos
Metagenoma , Metagenômica , Archaea/genética , Metagenômica/métodos , Reprodutibilidade dos Testes , Análise de Sequência de DNA , Software
3.
Nat Biotechnol ; 40(7): 1075-1081, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35228706

RESUMO

Although most existing genome assemblers are based on de Bruijn graphs, the construction of these graphs for large genomes and large k-mer sizes has remained elusive. This algorithmic challenge has become particularly pressing with the emergence of long, high-fidelity (HiFi) reads that have been recently used to generate a semi-manual telomere-to-telomere assembly of the human genome. To enable automated assemblies of long, HiFi reads, we present the La Jolla Assembler (LJA), a fast algorithm using the Bloom filter, sparse de Bruijn graphs and disjointig generation. LJA reduces the error rate in HiFi reads by three orders of magnitude, constructs the de Bruijn graph for large genomes and large k-mer sizes and transforms it into a multiplex de Bruijn graph with varying k-mer sizes. Compared to state-of-the-art assemblers, our algorithm not only achieves five-fold fewer misassemblies but also generates more contiguous assemblies. We demonstrate the utility of LJA via the automated assembly of a human genome that completely assembled six chromosomes.


Assuntos
Algoritmos , Genoma Humano , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA , Software
4.
Genome Biol ; 23(1): 57, 2022 02 21.
Artigo em Inglês | MEDLINE | ID: mdl-35189932

RESUMO

Although the use of long-read sequencing improves the contiguity of assembled viral genomes compared to short-read methods, assembling complex viral communities remains an open problem. We describe the viralFlye tool for identification and analysis of metagenome-assembled viruses in long-read assemblies. We show it significantly improves viral assemblies and demonstrate that long-reads result in a much larger array of predicted virus-host associations as compared to short-read assemblies. We demonstrate that the identification of novel CRISPR arrays in bacterial genomes from a newly assembled metagenomic sample provides information for predicting novel hosts for novel viruses.


Assuntos
Metagenômica , Vírus , Genoma Bacteriano , Metagenoma , Metagenômica/métodos , Análise de Sequência de DNA/métodos , Vírus/genética
5.
Microbiome ; 9(1): 78, 2021 03 29.
Artigo em Inglês | MEDLINE | ID: mdl-33781338

RESUMO

BACKGROUND: Double-stranded DNA bacteriophages (dsDNA phages) play pivotal roles in structuring human gut microbiomes; yet, the gut virome is far from being fully characterized, and additional groups of phages, including highly abundant ones, continue to be discovered by metagenome mining. A multilevel framework for taxonomic classification of viruses was recently adopted, facilitating the classification of phages into evolutionary informative taxonomic units based on hallmark genes. Together with advanced approaches for sequence assembly and powerful methods of sequence analysis, this revised framework offers the opportunity to discover and classify unknown phage taxa in the human gut. RESULTS: A search of human gut metagenomes for circular contigs encoding phage hallmark genes resulted in the identification of 3738 apparently complete phage genomes that represent 451 putative genera. Several of these phage genera are only distantly related to previously identified phages and are likely to found new families. Two of the candidate families, "Flandersviridae" and "Quimbyviridae", include some of the most common and abundant members of the human gut virome that infect Bacteroides, Parabacteroides, and Prevotella. The third proposed family, "Gratiaviridae," consists of less abundant phages that are distantly related to the families Autographiviridae, Drexlerviridae, and Chaseviridae. Analysis of CRISPR spacers indicates that phages of all three putative families infect bacteria of the phylum Bacteroidetes. Comparative genomic analysis of the three candidate phage families revealed features without precedent in phage genomes. Some "Quimbyviridae" phages possess Diversity-Generating Retroelements (DGRs) that generate hypervariable target genes nested within defense-related genes, whereas the previously known targets of phage-encoded DGRs are structural genes. Several "Flandersviridae" phages encode enzymes of the isoprenoid pathway, a lipid biosynthesis pathway that so far has not been known to be manipulated by phages. The "Gratiaviridae" phages encode a HipA-family protein kinase and glycosyltransferase, suggesting these phages modify the host cell wall, preventing superinfection by other phages. Hundreds of phages in these three and other families are shown to encode catalases and iron-sequestering enzymes that can be predicted to enhance cellular tolerance to reactive oxygen species. CONCLUSIONS: Analysis of phage genomes identified in whole-community human gut metagenomes resulted in the delineation of at least three new candidate families of Caudovirales and revealed diverse putative mechanisms underlying phage-host interactions in the human gut. Addition of these phylogenetically classified, diverse, and distinct phages to public databases will facilitate taxonomic decomposition and functional characterization of human gut viromes. Video abstract.


Assuntos
Bacteriófagos , Microbioma Gastrointestinal , Microbiota , Bactérias/genética , Bacteriófagos/genética , Microbioma Gastrointestinal/genética , Genoma Viral/genética , Humanos , Metagenoma , Filogenia
6.
Nat Commun ; 12(1): 1044, 2021 02 16.
Artigo em Inglês | MEDLINE | ID: mdl-33594055

RESUMO

CrAssphage is the most abundant human-associated virus and the founding member of a large group of bacteriophages, discovered in animal-associated and environmental metagenomes, that infect bacteria of the phylum Bacteroidetes. We analyze 4907 Circular Metagenome Assembled Genomes (cMAGs) of putative viruses from human gut microbiomes and identify nearly 600 genomes of crAss-like phages that account for nearly 87% of the DNA reads mapped to these cMAGs. Phylogenetic analysis of conserved genes demonstrates the monophyly of crAss-like phages, a putative virus order, and of 5 branches, potential families within that order, two of which have not been identified previously. The phage genomes in one of these families are almost twofold larger than the crAssphage genome (145-192 kilobases), with high density of self-splicing introns and inteins. Many crAss-like phages encode suppressor tRNAs that enable read-through of UGA or UAG stop-codons, mostly, in late phage genes. A distinct feature of the crAss-like phages is the recurrent switch of the phage DNA polymerase type between A and B families. Thus, comparative genomic analysis of the expanded assemblage of crAss-like phages reveals aspects of genome architecture and expression as well as phage biology that were not apparent from the previous work on phage genomics.


Assuntos
Bacteriófagos/genética , Microbioma Gastrointestinal/genética , Genoma Viral , Metagenoma , Códon/genética , Sequência Conservada , DNA Polimerase Dirigida por DNA/metabolismo , Humanos , Inteínas , Íntrons/genética , Fases de Leitura Aberta/genética , Filogenia , Splicing de RNA/genética , Transcrição Gênica , Viroma/genética
7.
BMC Bioinformatics ; 21(Suppl 12): 302, 2020 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-32703149

RESUMO

BACKGROUND: De novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data. RESULTS: In this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data. CONCLUSION: To evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used.


Assuntos
Algoritmos , Transcriptoma/genética , Bases de Dados Genéticas , Humanos , Células MCF-7 , Nanoporos , RNA-Seq , Reprodutibilidade dos Testes
8.
BMC Bioinformatics ; 21(Suppl 12): 306, 2020 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-32703258

RESUMO

BACKGROUND: Graph-based representation of genome assemblies has been recently used in different contexts - from improved reconstruction of plasmid sequences and refined analysis of metagenomic data to read error correction and reference-free haplotype reconstruction. While many of these applications heavily utilize the alignment of long nucleotide sequences to assembly graphs, first general-purpose software tools for finding such alignments have been released only recently and their deficiencies and limitations are yet to be discovered. Moreover, existing tools can not perform alignment of amino acid sequences, which could prove useful in various contexts - in particular the analysis of metagenomic sequencing data. RESULTS: In this work we present a novel SPAligner (Saint-Petersburg Aligner) tool for aligning long diverged nucleotide and amino acid sequences to assembly graphs. We demonstrate that SPAligner is an efficient solution for mapping third generation sequencing reads onto assembly graphs of various complexity and also show how it can facilitate the identification of known genes in complex metagenomic datasets. CONCLUSIONS: Our work will facilitate accelerating the development of graph-based approaches in solving sequence to genome assembly alignment problem. SPAligner is implemented as a part of SPAdes tools library and is available on Github.


Assuntos
Algoritmos , Variação Genética , Alinhamento de Sequência , Sequência de Bases , Haplótipos/genética , Humanos , Software , Estatística como Assunto , beta-Lactamases/química
9.
Curr Protoc Bioinformatics ; 70(1): e102, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32559359

RESUMO

SPAdes-St. Petersburg genome Assembler-was originally developed for de novo assembly of genome sequencing data produced for cultivated microbial isolates and for single-cell genomic DNA sequencing. With time, the functionality of SPAdes was extended to enable assembly of IonTorrent data, as well as hybrid assembly from short and long reads (PacBio and Oxford Nanopore). In this article we present protocols for five different assembly pipelines that comprise the SPAdes package and that are used for assembly of metagenomes and transcriptomes as well as assembly of putative plasmids and biosynthetic gene clusters from whole-genome sequencing and metagenomic datasets. In addition, we present guidelines for understanding results with use cases for each pipeline, and several additional support protocols that help in using SPAdes properly. © 2020 Wiley Periodicals LLC. Basic Protocol 1: Assembling isolate bacterial datasets Basic Protocol 2: Assembling metagenomic datasets Basic Protocol 3: Assembling sets of putative plasmids Basic Protocol 4: Assembling transcriptomes Basic Protocol 5: Assembling putative biosynthetic gene clusters Support Protocol 1: Installing SPAdes Support Protocol 2: Providing input via command line Support Protocol 3: Providing input data via YAML format Support Protocol 4: Restarting previous run Support Protocol 5: Determining strand-specificity of RNA-seq data.


Assuntos
Algoritmos , Análise de Sequência de DNA/métodos , Bactérias/genética , Vias Biossintéticas/genética , Bases de Dados Genéticas , Metagenoma , Família Multigênica , Plasmídeos/genética , RNA-Seq , Transcriptoma/genética
10.
Bioinformatics ; 36(14): 4126-4129, 2020 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-32413137

RESUMO

MOTIVATION: Although the set of currently known viruses has been steadily expanding, only a tiny fraction of the Earth's virome has been sequenced so far. Shotgun metagenomic sequencing provides an opportunity to reveal novel viruses but faces the computational challenge of identifying viral genomes that are often difficult to detect in metagenomic assemblies. RESULTS: We describe a MetaviralSPAdes tool for identifying viral genomes in metagenomic assembly graphs that is based on analyzing variations in the coverage depth between viruses and bacterial chromosomes. We benchmarked MetaviralSPAdes on diverse metagenomic datasets, verified our predictions using a set of virus-specific Hidden Markov Models and demonstrated that it improves on the state-of-the-art viral identification pipelines. AVAILABILITY AND IMPLEMENTATION: Metaviral SPAdes includes ViralAssembly, ViralVerify and ViralComplete modules that are available as standalone packages: https://github.com/ablab/spades/tree/metaviral_publication, https://github.com/ablab/viralVerify/ and https://github.com/ablab/viralComplete/. CONTACT: d.antipov@spbu.ru. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Vírus , Algoritmos , Metagenoma , Metagenômica , Análise de Sequência de DNA , Vírus/genética
11.
Gigascience ; 8(9)2019 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-31494669

RESUMO

BACKGROUND: The possibility of generating large RNA-sequencing datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the organisms with finished and well-annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing, and paralogous genes. RESULTS: Herein we describe the novel transcriptome assembler rnaSPAdes, which has been developed on top of the SPAdes genome assembler and explores computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-sequencing datasets, and briefly highlight strong and weak points of different assemblers. CONCLUSIONS: Based on the performed comparison between different assembly methods, we infer that it is not possible to detect the absolute leader according to all quality metrics and all used datasets. However, rnaSPAdes typically outperforms other assemblers by such important property as the number of assembled genes and isoforms, and at the same time has higher accuracy statistics on average comparing to the closest competitors.


Assuntos
Algoritmos , RNA-Seq , Transcriptoma , Animais , Arabidopsis/genética , Caenorhabditis elegans/genética , Humanos , Camundongos , Zea mays/genética
12.
Genome Res ; 29(6): 961-968, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31048319

RESUMO

Although plasmids are important for bacterial survival and adaptation, plasmid detection and assembly from genomic, let alone metagenomic, samples remain challenging. The recently developed plasmidSPAdes assembler addressed some of these challenges in the case of isolate genomes but stopped short of detecting plasmids in metagenomic assemblies, an untapped source of yet to be discovered plasmids. We present the metaplasmidSPAdes tool for plasmid assembly in metagenomic data sets that reduced the false positive rate of plasmid detection compared with the state-of-the-art approaches. We assembled plasmids in diverse data sets and have shown that thousands of plasmids remained below the radar in already completed genomic and metagenomic studies. Our analysis revealed the extreme variability of plasmids and has led to the discovery of many novel plasmids (including many plasmids carrying antibiotic-resistance genes) without significant similarities to currently known ones.


Assuntos
Biologia Computacional , Genômica , Metagenoma , Metagenômica , Plasmídeos/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Genômica/métodos , Humanos , Metagenômica/métodos , Anotação de Sequência Molecular
13.
Bioinformatics ; 34(13): i142-i150, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29949969

RESUMO

Motivation: The emergence of high-throughput sequencing technologies revolutionized genomics in early 2000s. The next revolution came with the era of long-read sequencing. These technological advances along with novel computational approaches became the next step towards the automatic pipelines capable to assemble nearly complete mammalian-size genomes. Results: In this manuscript, we demonstrate performance of the state-of-the-art genome assembly software on six eukaryotic datasets sequenced using different technologies. To evaluate the results, we developed QUAST-LG-a tool that compares large genomic de novo assemblies against reference sequences and computes relevant quality metrics. Since genomes generally cannot be reconstructed completely due to complex repeat patterns and low coverage regions, we introduce a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness. Using QUAST-LG, we show how close the assemblies are to the theoretical optimum, and how far this optimum is from the finished reference. Availability and implementation: http://cab.spbu.ru/software/quast-lg. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Animais , Genômica/métodos , Humanos , Saccharomyces cerevisiae/genética
14.
Bioinformatics ; 32(22): 3380-3387, 2016 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-27466620

RESUMO

MOTIVATION: Plasmids are stably maintained extra-chromosomal genetic elements that replicate independently from the host cell's chromosomes. Although plasmids harbor biomedically important genes, (such as genes involved in virulence and antibiotics resistance), there is a shortage of specialized software tools for extracting and assembling plasmid data from whole genome sequencing projects. RESULTS: We present the plasmidSPAdes algorithm and software tool for assembling plasmids from whole genome sequencing data and benchmark its performance on a diverse set of bacterial genomes. AVAILABILITY AND IMPLEMENTATION: plasmidSPAdes is publicly available at http://spades.bioinf.spbau.ru/plasmidSPAdes/ CONTACT: d.antipov@spbu.ruSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma Bacteriano , Plasmídeos/genética , Algoritmos , Análise de Sequência de DNA , Software
15.
Bioinformatics ; 32(14): 2210-2, 2016 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-27153654

RESUMO

UNLABELLED: Ability to generate large RNA-Seq datasets created a demand for both de novo and reference-based transcriptome assemblers. However, while many transcriptome assemblers are now available, there is still no unified quality assessment tool for RNA-Seq assemblies. We present rnaQUAST-a tool for evaluating RNA-Seq assembly quality and benchmarking transcriptome assemblers using reference genome and gene database. rnaQUAST calculates various metrics that demonstrate completeness and correctness levels of the assembled transcripts, and outputs them in a user-friendly report. AVAILABILITY AND IMPLEMENTATION: rnaQUAST is implemented in Python and is freely available at http://bioinf.spbau.ru/en/rnaquast CONTACT: ap@bioinf.spbau.ru SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Análise de Sequência de RNA , Software , Transcriptoma
16.
Bioinformatics ; 32(7): 1009-15, 2016 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-26589280

RESUMO

MOTIVATION: Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, a hybrid approach that assembles long reads (with low coverage) and short reads has a potential to generate high-quality assemblies at reduced cost. RESULTS: We describe hybridSPAdes algorithm for assembling short and long reads and benchmark it on a variety of bacterial assembly projects. Our results demonstrate that hybridSPAdes generates accurate assemblies (even in projects with relatively low coverage by long reads) thus reducing the overall cost of genome sequencing. We further present the first complete assembly of a genome from single cells using SMRT reads. AVAILABILITY AND IMPLEMENTATION: hybridSPAdes is implemented in C++ as a part of SPAdes genome assembler and is publicly available at http://bioinf.spbau.ru/en/spades CONTACT: d.antipov@spbu.ru SUPPLEMENTARY INFORMATION: supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Análise de Sequência de DNA , Sequência de Bases , Mapeamento Cromossômico , Genoma
17.
J Comput Biol ; 20(10): 714-37, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24093227

RESUMO

Recent advances in single-cell genomics provide an alternative to largely gene-centric metagenomics studies, enabling whole-genome sequencing of uncultivated bacteria. However, single-cell assembly projects are challenging due to (i) the highly nonuniform read coverage and (ii) a greatly elevated number of chimeric reads and read pairs. While recently developed single-cell assemblers have addressed the former challenge, methods for assembling highly chimeric reads remain poorly explored. We present algorithms for identifying chimeric edges and resolving complex bulges in de Bruijn graphs, which significantly improve single-cell assemblies. We further describe applications of the single-cell assembler SPAdes to a new approach for capturing and sequencing "microbial dark matter" that forms small pools of randomly selected single cells (called a mini-metagenome) and further sequences all genomes from the mini-metagenome at once. On single-cell bacterial datasets, SPAdes improves on the recently developed E+V-SC and IDBA-UD assemblers specifically designed for single-cell sequencing. For standard (cultivated monostrain) datasets, SPAdes also improves on A5, ABySS, CLC, EULER-SR, Ray, SOAPdenovo, and Velvet. Thus, recently developed single-cell assemblers not only enable single-cell sequencing, but also improve on conventional assemblers on their own turf. SPAdes is available for free online download under a GPLv2 license.


Assuntos
Mapeamento de Sequências Contíguas/métodos , DNA Bacteriano/genética , DNA Concatenado/genética , Algoritmos , Composição de Bases , Biologia Computacional , Escherichia coli/genética , Biblioteca Gênica , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala , Técnicas de Amplificação de Ácido Nucleico , Pedobacter/genética , Prochlorococcus/genética , Análise de Sequência de DNA , Análise de Célula Única
18.
J Comput Biol ; 20(4): 359-71, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22803627

RESUMO

One of the key advances in genome assembly that has led to a significant improvement in contig lengths has been improved algorithms for utilization of paired reads (mate-pairs). While in most assemblers, mate-pair information is used in a post-processing step, the recently proposed Paired de Bruijn Graph (PDBG) approach incorporates the mate-pair information directly in the assembly graph structure. However, the PDBG approach faces difficulties when the variation in the insert sizes is high. To address this problem, we first transform mate-pairs into edge-pair histograms that allow one to better estimate the distance between edges in the assembly graph that represent regions linked by multiple mate-pairs. Further, we combine the ideas of mate-pair transformation and PDBGs to construct new data structures for genome assembly: pathsets and pathset graphs.


Assuntos
Algoritmos , Mapeamento de Sequências Contíguas/métodos , Genoma/genética , Análise de Sequência de DNA/métodos , Bases de Dados Genéticas , Escherichia coli/genética
19.
J Comput Biol ; 19(5): 455-77, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22506599

RESUMO

The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.


Assuntos
Algoritmos , Bactérias/genética , Genoma Bacteriano , Metagenômica/métodos , Análise de Célula Única/métodos , Análise de Sequência de DNA/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...