Search | VHL Regional Portal

A hybrid short read mapping accelerator.

Chen, Yupeng; Schmidt, Bertil; Maskell, Douglas L.

BMC Bioinformatics ; 14: 67, 2013 Feb 26.

Article in English | MEDLINE | ID: mdl-23441908

ABSTRACT

BACKGROUND: The rapid growth of short read datasets poses a new challenge to the short read mapping problem in terms of sensitivity and execution speed. Existing methods often use a restrictive error model for computing the alignments to improve speed, whereas more flexible error models are generally too slow for large-scale applications. A number of short read mapping software tools have been proposed. However, designs based on hardware are relatively rare. Field programmable gate arrays (FPGAs) have been successfully used in a number of specific application areas, such as the DSP and communications domains due to their outstanding parallel data processing capabilities, making them a competitive platform to solve problems that are "inherently parallel". RESULTS: We present a hybrid system for short read mapping utilizing both FPGA-based hardware and CPU-based software. The computation intensive alignment and the seed generation operations are mapped onto an FPGA. We present a computationally efficient, parallel block-wise alignment structure (Align Core) to approximate the conventional dynamic programming algorithm. The performance is compared to the multi-threaded CPU-based GASSST and BWA software implementations. For single-end alignment, our hybrid system achieves faster processing speed than GASSST (with a similar sensitivity) and BWA (with a higher sensitivity); for pair-end alignment, our design achieves a slightly worse sensitivity than that of BWA but has a higher processing speed. CONCLUSIONS: This paper shows that our hybrid system can effectively accelerate the mapping of short reads to a reference genome based on the seed-and-extend approach. The performance comparison to the GASSST and BWA software implementations under different conditions shows that our hybrid design achieves a high degree of sensitivity and requires less overall execution time with only modest FPGA resource utilization. Our hybrid system design also shows that the performance bottleneck for the short read mapping problem can be changed from the alignment stage to the seed generation stage, which provides an additional requirement for the future development of short read aligners.

Subject(s)

Sequence Alignment/methods , Sequence Analysis, DNA/methods , Algorithms , Chromosome Mapping , Genome , Software

CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform.

Liu, Yongchao; Schmidt, Bertil; Maskell, Douglas L.

Bioinformatics ; 28(14): 1830-7, 2012 Jul 15.

Article in English | MEDLINE | ID: mdl-22576173

ABSTRACT

MOTIVATION: New high-throughput sequencing technologies have promoted the production of short reads with dramatically low unit cost. The explosive growth of short read datasets poses a challenge to the mapping of short reads to reference genomes, such as the human genome, in terms of alignment quality and execution speed. RESULTS: We present CUSHAW, a parallelized short read aligner based on the compute unified device architecture (CUDA) parallel programming model. We exploit CUDA-compatible graphics hardware as accelerators to achieve fast speed. Our algorithm uses a quality-aware bounded search approach based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini index to reduce the search space and achieve high alignment quality. Performance evaluation, using simulated as well as real short read datasets, reveals that our algorithm running on one or two graphics processing units achieves significant speedups in terms of execution time, while yielding comparable or even better alignment quality for paired-end alignments compared with three popular BWT-based aligners: Bowtie, BWA and SOAP2. CUSHAW also delivers competitive performance in terms of single-nucleotide polymorphism calling for an Escherichia coli test dataset. AVAILABILITY: http://cushaw.sourceforge.net

Subject(s)

Genomics/methods , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Software , Algorithms , Computational Biology/methods , Genome, Human , Humans , Polymorphism, Single Nucleotide

Parallelized short read assembly of large genomes using de Bruijn graphs.

Liu, Yongchao; Schmidt, Bertil; Maskell, Douglas L.

BMC Bioinformatics ; 12: 354, 2011 Aug 25.

Article in English | MEDLINE | ID: mdl-21867511

ABSTRACT

BACKGROUND: Next-generation sequencing technologies have given rise to the explosive increase in DNA sequencing throughput, and have promoted the recent development of de novo short read assemblers. However, existing assemblers require high execution times and a large amount of compute resources to assemble large genomes from quantities of short reads. RESULTS: We present PASHA, a parallelized short read assembler using de Bruijn graphs, which takes advantage of hybrid computing architectures consisting of both shared-memory multi-core CPUs and distributed-memory compute clusters to gain efficiency and scalability. Evaluation using three small-scale real paired-end datasets shows that PASHA is able to produce more contiguous high-quality assemblies in shorter time compared to three leading assemblers: Velvet, ABySS and SOAPdenovo. PASHA's scalability for large genome datasets is demonstrated with human genome assembly. Compared to ABySS, PASHA achieves competitive assembly quality with faster execution speed on the same compute resources, yielding an NG50 contig size of 503 with the longest correct contig size of 18,252, and an NG50 scaffold size of 2,294. Moreover, the human assembly is completed in about 21 hours with only modest compute resources. CONCLUSIONS: Developing parallel assemblers for large genomes has been garnering significant research efforts due to the explosive size growth of high-throughput short read datasets. By employing hybrid parallelism consisting of multi-threading on multi-core CPUs and message passing on compute clusters, PASHA is able to assemble the human genome with high quality and in reasonable time using modest compute resources.

Subject(s)

Computational Biology/methods , High-Throughput Nucleotide Sequencing , Bacteria/genetics , Genome , Genome, Human , Humans , Software

DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI.

Liu, Yongchao; Schmidt, Bertil; Maskell, Douglas L.

BMC Bioinformatics ; 12: 85, 2011 Mar 29.

Article in English | MEDLINE | ID: mdl-21447171

ABSTRACT

BACKGROUND: Next-generation sequencing technologies have led to the high-throughput production of sequence data (reads) at low cost. However, these reads are significantly shorter and more error-prone than conventional Sanger shotgun reads. This poses a challenge for the de novo assembly in terms of assembly quality and scalability for large-scale short read datasets. RESULTS: We present DecGPU, the first parallel and distributed error correction algorithm for high-throughput short reads (HTSRs) using a hybrid combination of CUDA and MPI parallel programming models. DecGPU provides CPU-based and GPU-based versions, where the CPU-based version employs coarse-grained and fine-grained parallelism using the MPI and OpenMP parallel programming models, and the GPU-based version takes advantage of the CUDA and MPI parallel programming models and employs a hybrid CPU+GPU computing model to maximize the performance by overlapping the CPU and GPU computation. The distributed feature of our algorithm makes it feasible and flexible for the error correction of large-scale HTSR datasets. Using simulated and real datasets, our algorithm demonstrates superior performance, in terms of error correction quality and execution speed, to the existing error correction algorithms. Furthermore, when combined with Velvet and ABySS, the resulting DecGPU-Velvet and DecGPU-ABySS assemblers demonstrate the potential of our algorithm to improve de novo assembly quality for de-Bruijn-graph-based assemblers. CONCLUSIONS: DecGPU is publicly available open-source software, written in CUDA C++ and MPI. The experimental results suggest that DecGPU is an effective and feasible error correction algorithm to tackle the flood of short reads produced by next-generation sequencing technologies.

Subject(s)

Algorithms , Sequence Analysis, DNA/methods , Software , Computer Simulation , Reproducibility of Results

CompleteMOTIFs: DNA motif discovery platform for transcription factor binding experiments.

Kuttippurathu, Lakshmi; Hsing, Michael; Liu, Yongchao; Schmidt, Bertil; Maskell, Douglas L; Lee, Kyungjoon; He, Aibin; Pu, William T; Kong, Sek Won.

Bioinformatics ; 27(5): 715-7, 2011 Mar 01.

Article in English | MEDLINE | ID: mdl-21183585

ABSTRACT

UNLABELLED: CompleteMOTIFs (cMOTIFs) is an integrated web tool developed to facilitate systematic discovery of overrepresented transcription factor binding motifs from high-throughput chromatin immunoprecipitation experiments. Comprehensive annotations and Boolean logic operations on multiple peak locations enable users to focus on genomic regions of interest for de novo motif discovery using tools such as MEME, Weeder and ChIPMunk. The pipeline incorporates a scanning tool for known motifs from TRANSFAC and JASPAR databases, and performs an enrichment test using local or precalculated background models that significantly improve the motif scanning result. Furthermore, using the cMOTIFs pipeline, we demonstrated that multiple transcription factors could cooperatively bind to the upstream of important stem cell differentiation regulators. AVAILABILITY: http://cmotifs.tchlab.org.

Subject(s)

DNA/metabolism , Transcription Factors/metabolism , Algorithms , Binding Sites , DNA/genetics , Databases, Nucleic Acid , Internet , Models, Statistical , Molecular Sequence Annotation , Protein Binding , Protein Interaction Domains and Motifs , Sequence Analysis, DNA , Transcription Factors/genetics

MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities.

Liu, Yongchao; Schmidt, Bertil; Maskell, Douglas L.

Bioinformatics ; 26(16): 1958-64, 2010 Aug 15.

Article in English | MEDLINE | ID: mdl-20576627

ABSTRACT

MOTIVATION: Multiple sequence alignment is of central importance to bioinformatics and computational biology. Although a large number of algorithms for computing a multiple sequence alignment have been designed, the efficient computation of highly accurate multiple alignments is still a challenge. RESULTS: We present MSAProbs, a new and practical multiple alignment algorithm for protein sequences. The design of MSAProbs is based on a combination of pair hidden Markov models and partition functions to calculate posterior probabilities. Furthermore, two critical bioinformatics techniques, namely weighted probabilistic consistency transformation and weighted profile-profile alignment, are incorporated to improve alignment accuracy. Assessed using the popular benchmarks: BAliBASE, PREFAB, SABmark and OXBENCH, MSAProbs achieves statistically significant accuracy improvements over the existing top performing aligners, including ClustalW, MAFFT, MUSCLE, ProbCons and Probalign. Furthermore, MSAProbs is optimized for multi-core CPUs by employing a multi-threaded design, leading to a competitive execution time compared to other aligners. AVAILABILITY: The source code of MSAProbs, written in C++, is freely and publicly available from http://msaprobs.sourceforge.net.

Subject(s)

Algorithms , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Amino Acid Sequence , Computational Biology/methods , Markov Chains , Probability , Software

CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions.

Liu, Yongchao; Schmidt, Bertil; Maskell, Douglas L.

BMC Res Notes ; 3: 93, 2010 Apr 06.

Article in English | MEDLINE | ID: mdl-20370891

ABSTRACT

BACKGROUND: Due to its high sensitivity, the Smith-Waterman algorithm is widely used for biological database searches. Unfortunately, the quadratic time complexity of this algorithm makes it highly time-consuming. The exponential growth of biological databases further deteriorates the situation. To accelerate this algorithm, many efforts have been made to develop techniques in high performance architectures, especially the recently emerging many-core architectures and their associated programming models. FINDINGS: This paper describes the latest release of the CUDASW++ software, CUDASW++ 2.0, which makes new contributions to Smith-Waterman protein database searches using compute unified device architecture (CUDA). A parallel Smith-Waterman algorithm is proposed to further optimize the performance of CUDASW++ 1.0 based on the single instruction, multiple thread (SIMT) abstraction. For the first time, we have investigated a partitioned vectorized Smith-Waterman algorithm using CUDA based on the virtualized single instruction, multiple data (SIMD) abstraction. The optimized SIMT and the partitioned vectorized algorithms were benchmarked, and remarkably, have similar performance characteristics. CUDASW++ 2.0 achieves performance improvement over CUDASW++ 1.0 as much as 1.74 (1.72) times using the optimized SIMT algorithm and up to 1.77 (1.66) times using the partitioned vectorized algorithm, with a performance of up to 17 (30) billion cells update per second (GCUPS) on a single-GPU GeForce GTX 280 (dual-GPU GeForce GTX 295) graphics card. CONCLUSIONS: CUDASW++ 2.0 is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant performance improvement over CUDASW++ 1.0 using either the optimized SIMT algorithm or the partitioned vectorized algorithm for Smith-Waterman protein database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.

CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units.

Liu, Yongchao; Maskell, Douglas L; Schmidt, Bertil.

BMC Res Notes ; 2: 73, 2009 May 06.

Article in English | MEDLINE | ID: mdl-19416548

ABSTRACT

BACKGROUND: The Smith-Waterman algorithm is one of the most widely used tools for searching biological sequence databases due to its high sensitivity. Unfortunately, the Smith-Waterman algorithm is computationally demanding, which is further compounded by the exponential growth of sequence databases. The recent emergence of many-core architectures, and their associated programming interfaces, provides an opportunity to accelerate sequence database searches using commonly available and inexpensive hardware. FINDINGS: Our CUDASW++ implementation (benchmarked on a single-GPU NVIDIA GeForce GTX 280 graphics card and a dual-GPU GeForce GTX 295 graphics card) provides a significant performance improvement compared to other publicly available implementations, such as SWPS3, CBESW, SW-CUDA, and NCBI-BLAST. CUDASW++ supports query sequences of length up to 59K and for query sequences ranging in length from 144 to 5,478 in Swiss-Prot release 56.6, the single-GPU version achieves an average performance of 9.509 GCUPS with a lowest performance of 9.039 GCUPS and a highest performance of 9.660 GCUPS, and the dual-GPU version achieves an average performance of 14.484 GCUPS with a lowest performance of 10.660 GCUPS and a highest performance of 16.087 GCUPS. CONCLUSION: CUDASW++ is publicly available open-source software. It provides a significant performance improvement for Smith-Waterman-based protein sequence database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.

High speed biological sequence analysis with hidden Markov models on reconfigurable platforms.

Oliver, Timothy F; Schmidt, Bertil; Jakop, Yanto; Maskell, Douglas L.

IEEE Trans Inf Technol Biomed ; 13(5): 740-6, 2009 Sep.

Article in English | MEDLINE | ID: mdl-19273034

ABSTRACT

Molecular biologists use hidden Markov models (HMMs) as a popular tool to statistically describe biological sequence families. This statistical description can then be used for sensitive and selective database scanning, e.g., new protein sequences are compared with a set of HMMs to detect functional similarities. Efficient dynamic-programming algorithms exist for solving this problem; however, current solutions still require significant scan times. These scan time requirements are likely to become even more severe due to the rapid growth in the size of these databases. This paper shows how reconfigurable architectures can be used to derive an efficient fine-grained parallelization of the dynamic programming calculation. We describe how this technique leads to significant runtime savings for HMM database scanning on a standard off-the-shelf field-programmable gate array (FPGA).

Subject(s)

Computational Biology/methods , Markov Chains , Pattern Recognition, Automated/methods , Sequence Analysis, Protein/methods , Algorithms , Sequence Alignment/methods

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL