Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Sci Rep ; 13(1): 15978, 2023 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-37749195

RESUMO

DNA is a promising candidate for long-term data storage due to its high density and endurance. The key challenge in DNA storage today is the cost of synthesis. In this work, we propose composite motifs, a framework that uses a mixture of prefabricated motifs as building blocks to reduce synthesis cost by scaling logical density. To write data, we introduce Bridge Oligonucleotide Assembly, an enzymatic ligation technique for synthesizing oligos based on composite motifs. To sequence data, we introduce Direct Oligonucleotide Sequencing, a nanopore-based technique to sequence short oligos, eliminating common preparatory steps like DNA assembly, amplification and end-prep. To decode data, we introduce Motif-Search, a novel consensus caller that provides accurate reconstruction despite synthesis and sequencing errors. Using the proposed methods, we present an end-to-end experiment where we store the text "HelloWorld" at a logical density of 84 bits/cycle (14-42× improvement over state-of-the-art).


Assuntos
DNA , Nanoporos , Oligonucleotídeos , Consenso , Estado Nutricional
2.
BMC Bioinformatics ; 22(1): 257, 2021 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-34016035

RESUMO

BACKGROUND: Improvements in sequencing technology continue to drive sequencing cost towards $100 per genome. However, mapping sequenced data to a reference genome remains a computationally-intensive task due to the dependence on edit distance for dealing with INDELs and mismatches introduced by sequencing. All modern aligners use seed-filter-extend methodology and rely on filtration heuristics to reduce the overhead of edit distance computation. However, filtering has inherent performance-accuracy trade-offs that limits its effectiveness. RESULTS: Motivated by algorithmic advances in randomized low-distortion embedding, we introduce SEE, a new methodology for developing sequence mappers and aligners. While SFE focuses on eliminating sub-optimal candidates, SEE focuses instead on identifying optimal candidates. To do so, SEE transforms the read and reference strings from edit distance regime to the Hamming regime by embedding them using a randomized algorithm, and uses Hamming distance over the embedded set to identify optimal candidates. To show that SEE performs well in practice, we present Accel-Align an SEE-based short-read sequence mapper and aligner that is 3-12[Formula: see text] faster than state-of-the-art aligners on commodity CPUs, without any special-purpose hardware, while providing comparable accuracy. CONCLUSIONS: As sequencing technologies continue to increase read length while improving throughput and accuracy, we believe that randomized embeddings open up new avenues for optimization that cannot be achieved by using edit distance. Thus, the techniques presented in this paper have a much broader scope as they can be used for other applications like graph alignment, multiple sequence alignment, and sequence assembly.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Algoritmos , Alinhamento de Sequência , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...