Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 8(3): e59484, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23544073

RESUMO

BACKGROUND: Recent studies on genome assembly from short-read sequencing data reported the limitation of this technology to reconstruct the entire genome even at very high depth coverage. We investigated the limitation from the perspective of information theory to evaluate the effect of repeats on short-read genome assembly using idealized (error-free) reads at different lengths. METHODOLOGY/PRINCIPAL FINDINGS: We define a metric H(k) to be the entropy of sequencing reads at a read length k and use the relative loss of entropy ΔH(k) to measure the impact of repeats for the reconstruction of whole-genome from sequences of length k. In our experiments, we found that entropy loss correlates well with de-novo assembly coverage of a genome, and a score of ΔH(k)>1% indicates a severe loss in genome reconstruction fidelity. The minimal read lengths to achieve ΔH(k)<1% are different for various organisms and are independent of the genome size. For example, in order to meet the threshold of ΔH(k)<1%, a read length of 60 bp is needed for the sequencing of human genome (3.2 10(9) bp) and 320 bp for the sequencing of fruit fly (1.8×10(8) bp). We also calculated the ΔH(k) scores for 2725 prokaryotic chromosomes and plasmids at several read lengths. Our results indicate that the levels of repeats in different genomes are diverse and the entropy of sequencing reads provides a measurement for the repeat structures. CONCLUSIONS/SIGNIFICANCE: The proposed entropy-based measurement, which can be calculated in seconds to minutes in most cases, provides a rapid quantitative evaluation on the limitation of idealized short-read genome sequencing. Moreover, the calculation can be parallelized to scale up to large euakryotic genomes. This approach may be useful to tune the sequencing parameters to achieve better genome assemblies when a closely related genome is already available.


Assuntos
Entropia , Genoma/genética , Sequências Repetitivas de Ácido Nucleico/genética , Análise de Sequência de DNA/métodos , Animais , Bactérias/genética , Pareamento de Bases/genética , Sequência de Bases , Cromossomos/genética , Cromossomos Artificiais Bacterianos/genética , Humanos , Células Procarióticas/metabolismo
2.
Bioinformatics ; 29(8): 1004-10, 2013 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-23457040

RESUMO

MOTIVATION: High-accuracy de novo assembly of the short sequencing reads from RNA-Seq technology is very challenging. We introduce a de novo assembly algorithm, EBARDenovo, which stands for Extension, Bridging And Repeat-sensing Denovo. This algorithm uses an efficient chimera-detection function to abrogate the effect of aberrant chimeric reads in RNA-Seq data. RESULTS: EBARDenovo resolves the complications of RNA-Seq assembly arising from sequencing errors, repetitive sequences and aberrant chimeric amplicons. In a series of assembly experiments, our algorithm is the most accurate among the examined programs, including de Bruijn graph assemblers, Trinity and Oases. AVAILABILITY AND IMPLEMENTATION: EBARDenovo is available at http://ebardenovo.sourceforge.net/. This software package (with patent pending) is free of charge for academic use only. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , RNA/química , Sequências Repetitivas de Ácido Nucleico , Software
3.
BMC Genomics ; 13 Suppl 7: S5, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23282223

RESUMO

BACKGROUND: Mitochondrial dysfunction is associated with various aging diseases. The copy number of mtDNA in human cells may therefore be a potential biomarker for diagnostics of aging. Here we propose a new computational method for the accurate assessment of mtDNA copies from whole genome sequencing data. RESULTS: Two families of the human whole genome sequencing datasets from the HapMap and the 1000 Genomes projects were used for the accurate counting of mitochondrial DNA copy numbers. The results revealed the parental mitochondrial DNA copy numbers are significantly lower than that of their children in these samples. There are 8%~21% more copies of mtDNA in samples from the children than from their parents. The experiment demonstrated the possible correlations between the quantity of mitochondrial DNA and aging-related diseases. CONCLUSIONS: Since the next-generation sequencing technology strives to deliver affordable and non-biased sequencing results, accurate assessment of mtDNA copy numbers can be achieved effectively from the output of whole genome sequencing. We implemented the method as a software package MitoCounter with the source code and user's guide available to the public at http://sourceforge.net/projects/mitocounter/.


Assuntos
DNA Mitocondrial/metabolismo , Genoma Humano , Mitocôndrias/genética , Adulto , Criança , Bases de Dados Genéticas , Feminino , Humanos , Masculino , Análise de Sequência de DNA , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...