Pesquisa | Portal Regional da BVS (teste)

Evaluating the effectiveness of ensemble voting in improving the accuracy of consensus signals produced by various DTWA algorithms from step-current signals generated during nanopore sequencing.

Smith, Michael; Chan, Rachel; Khurram, Maaz; Gordon, Paul M K.

PLoS Comput Biol ; 17(9): e1009350, 2021 09.

Artigo em Inglês | MEDLINE | ID: mdl-34506479

RESUMO

Nanopore sequencing device analysis systems simultaneously generate multiple picoamperage current signals representing the passage of DNA or RNA nucleotides ratcheted through a biomolecule nanopore array by motor proteins. Squiggles are a noisy and time-distorted representation of an underlying nucleotide sequence, "gold standard model", due to experimental and algorithmic artefacts. Other research fields use dynamic time warped-space averaging (DTWA) algorithms to produce a consensus signal from multiple time-warped sources while preserving key features distorted by standard, linear-averaging approaches. We compared the ability of DTW Barycentre averaging (DBA), minimize mean (MM) and stochastic sub-gradient descent (SSG) DTWA algorithms to generate a consensus signal from squiggle-space ensembles of RNA molecules Enolase, Sequin R1-71-1 and Sequin R2-55-3 without knowledge of their associated gold standard model. We propose techniques to identify the leader and distorted squiggle features prior to DTWA consensus generation. New visualization and warping-path metrics are introduced to compare consensus signals and the best estimate of the "true" consensus, the study's gold standard model. The DBA consensus was the best match to the gold standard for both Sequin studies but was outperformed in the Enolase study. Given an underlying common characteristic across a squiggle ensemble, we objectively evaluate a novel "voting scheme" that improves the local similarity between the consensus signal and a given fraction of the squiggle ensemble. While the gold standard is not used during voting, the increase in the match of the final voted-on consensus to the underlying Enolase and Sequin gold standard sequences provides an indirect success measure for the proposed voting procedure in two ways: First is the decreased least squares warped distance between the final consensus and the gold model, and second, the voting generates a final consensus length closer to the known underlying RNA biomolecule length. The results suggest considerable potential in marrying squiggle analysis and voted-on DTWA consensus signals to provide low-noise, low-distortion signals. This will lead to improved accuracy in detecting nucleotides and their deviation model due to chemical modifications (a.k.a. epigenetic information). The proposed combination of ensemble voting and DTWA has application in other research fields involving time-distorted, high entropy signals.

Assuntos

Algoritmos , Sequenciamento por Nanoporos/estatística & dados numéricos , Biologia Computacional , Simulação por Computador , Sequência Consenso , Fosfopiruvato Hidratase/genética , RNA/genética , Razão Sinal-Ruído , Processos Estocásticos

Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.

Senol Cali, Damla; Kim, Jeremie S; Ghose, Saugata; Alkan, Can; Mutlu, Onur.

Brief Bioinform ; 20(4): 1542-1559, 2019 07 19.

Artigo em Inglês | MEDLINE | ID: mdl-29617724

RESUMO

Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology.

Assuntos

Genômica/métodos , Sequenciamento por Nanoporos/métodos , Animais , Mapeamento Cromossômico , Biologia Computacional , Escherichia coli/genética , Genoma Bacteriano , Genômica/estatística & dados numéricos , Genômica/tendências , Humanos , Sequenciamento por Nanoporos/estatística & dados numéricos , Sequenciamento por Nanoporos/tendências , Análise de Sequência de DNA , Software

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA