Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
PLoS One ; 6(8): e23455, 2011.
Article in English | MEDLINE | ID: mdl-21858125

ABSTRACT

MOTIVATION: Next Generation Sequencing (NGS) is a frequently applied approach to detect sequence variations between highly related genomes. Recent large-scale re-sequencing studies as the Human 1000 Genomes Project utilize NGS data of low coverage to afford sequencing of hundreds of individuals. Here, SNPs and micro-indels can be detected by applying an alignment-consensus approach. However, computational methods capable of discovering other variations such as novel insertions or highly diverged sequence from low coverage NGS data are still lacking. RESULTS: We present LOCAS, a new NGS assembler particularly designed for low coverage assembly of eukaryotic genomes using a mismatch sensitive overlap-layout-consensus approach. LOCAS assembles homologous regions in a homology-guided manner while it performs de novo assemblies of insertions and highly polymorphic target regions subsequently to an alignment-consensus approach. LOCAS has been evaluated in homology-guided assembly scenarios with low sequence coverage of Arabidopsis thaliana strains sequenced as part of the Arabidopsis 1001 Genomes Project. While assembling the same amount of long insertions as state-of-the-art NGS assemblers, LOCAS showed best results regarding contig size, error rate and runtime. CONCLUSION: LOCAS produces excellent results for homology-guided assembly of eukaryotic genomes with short reads and low sequencing depth, and therefore appears to be the assembly tool of choice for the detection of novel sequence variations in this scenario.


Subject(s)
Computational Biology/methods , Genome/genetics , Models, Genetic , Sequence Analysis, DNA/methods , Arabidopsis/genetics , Base Sequence , DNA, Plant/analysis , DNA, Plant/genetics , Genetic Variation , Genome, Plant/genetics , INDEL Mutation/genetics , Polymorphism, Single Nucleotide/genetics , Reproducibility of Results
2.
Bioinformatics ; 27(16): 2187-93, 2011 Aug 15.
Article in English | MEDLINE | ID: mdl-21712251

ABSTRACT

MOTIVATION: Next-generation sequencing technologies have facilitated the study of organisms on a genome-wide scale. A recent method called restriction site associated DNA sequencing (RAD-seq) allows to sample sequence information at reduced complexity across a target genome using the Illumina platform. Single-end RAD-seq has proven to provide a large number of informative genetic markers in reference as well as non-reference organisms. RESULTS: Here, we present a method for de novo assembly of paired-end RAD-seq data in order to produce extended contigs flanking a restriction site. We were able to reconstruct one-tenth of the guppy genome represented by 200-500 bp contigs associated to EcoRI recognition sites. In addition, these contigs were used as reference allowing the detection of thousands of new polymorphic markers that are informative for mapping and population genetic studies in the guppy. AVAILABILITY: A perl and C++ implementation of the method demonstrated in this article is available under http://guppy.weigelworld.org/weigeldatabases/radMarkers/ as package RApiD. CONTACT: christine.dreyer@tuebingen.mpg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genetic Markers , Sequence Analysis, DNA/methods , Animals , Chromosome Mapping , Female , Genome , Male , Poecilia/genetics , Polymorphism, Single Nucleotide , Sequence Analysis, DNA/standards
3.
Proc Natl Acad Sci U S A ; 108(25): 10249-54, 2011 Jun 21.
Article in English | MEDLINE | ID: mdl-21646520

ABSTRACT

We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the complexity of the de novo assembly and later integrated reads without similarity to the reference sequence. As an example, half of the noncentromeric C24 genome was covered by scaffolds that are longer than 260 kb, with a maximum of 2.2 Mb. Moreover, over 96% of the reference genome was covered by the reference-guided assembly, compared with only 87% with a complete de novo assembly. Comparisons with 2 Mb of dideoxy sequence reveal that the per-base error rate of the reference-guided assemblies was below 1 in 10,000. Our assemblies provide a detailed, genomewide picture of large-scale differences between A. thaliana individuals, most of which are difficult to access with alignment-consensus methods only. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone. Genome assemblies, raw reads, and further information are accessible through http://1001genomes.org/projects/assemblies.html.


Subject(s)
Arabidopsis/genetics , Genome, Plant , Algorithms , Base Sequence , Polymorphism, Genetic , Sequence Alignment , Sequence Analysis, DNA
SELECTION OF CITATIONS
SEARCH DETAIL
...