Search | VHL Regional Portal

Cache-Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER.

Ferreira, Miguel; Roma, Nuno; Russo, Luis M S.

BMC Bioinformatics ; 15: 165, 2014 May 30.

Article in English | MEDLINE | ID: mdl-24884826

ABSTRACT

BACKGROUND: HMMER is a commonly used bioinformatics tool based on Hidden Markov Models (HMMs) to analyze and process biological sequences. One of its main homology engines is based on the Viterbi decoding algorithm, which was already highly parallelized and optimized using Farrar's striped processing pattern with Intel SSE2 instruction set extension. RESULTS: A new SIMD vectorization of the Viterbi decoding algorithm is proposed, based on an SSE2 inter-task parallelization approach similar to the DNA alignment algorithm proposed by Rognes. Besides this alternative vectorization scheme, the proposed implementation also introduces a new partitioning of the Markov model that allows a significantly more efficient exploitation of the cache locality. Such optimization, together with an improved loading of the emission scores, allows the achievement of a constant processing throughput, regardless of the innermost-cache size and of the dimension of the considered model. CONCLUSIONS: The proposed optimized vectorization of the Viterbi decoding algorithm was extensively evaluated and compared with the HMMER3 decoder to process DNA and protein datasets, proving to be a rather competitive alternative implementation. Being always faster than the already highly optimized ViterbiFilter implementation of HMMER3, the proposed Cache-Oblivious Parallel SIMD Viterbi (COPS) implementation provides a constant throughput and offers a processing speedup as high as two times faster, depending on the model's size.

Subject(s)

Computational Biology/methods , Markov Chains , Sequence Analysis/methods , Algorithms , Databases, Genetic , Databases, Protein , Humans , Sequence Alignment

Efficient alignment of pyrosequencing reads for re-sequencing applications.

Fernandes, Francisco; da Fonseca, Paulo G S; Russo, Luis M S; Oliveira, Arlindo L; Freitas, Ana T.

BMC Bioinformatics ; 12: 163, 2011 May 16.

Article in English | MEDLINE | ID: mdl-21672185

ABSTRACT

BACKGROUND: Over the past few years, new massively parallel DNA sequencing technologies have emerged. These platforms generate massive amounts of data per run, greatly reducing the cost of DNA sequencing. However, these techniques also raise important computational difficulties mostly due to the huge volume of data produced, but also because of some of their specific characteristics such as read length and sequencing errors. Among the most critical problems is that of efficiently and accurately mapping reads to a reference genome in the context of re-sequencing projects. RESULTS: We present an efficient method for the local alignment of pyrosequencing reads produced by the GS FLX (454) system against a reference sequence. Our approach explores the characteristics of the data in these re-sequencing applications and uses state of the art indexing techniques combined with a flexible seed-based approach, leading to a fast and accurate algorithm which needs very little user parameterization. An evaluation performed using real and simulated data shows that our proposed method outperforms a number of mainstream tools on the quantity and quality of successful alignments, as well as on the execution time. CONCLUSIONS: The proposed methodology was implemented in a software tool called TAPyR--Tool for the Alignment of Pyrosequencing Reads--which is publicly available from http://www.tapyr.net.

Subject(s)

Sequence Analysis, DNA/methods , Algorithms , Animals , Base Sequence , High-Throughput Nucleotide Sequencing , Humans , Sequence Alignment , Software

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL