Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Genes (Basel) ; 11(1)2020 01 03.
Article in English | MEDLINE | ID: mdl-31947774

ABSTRACT

The rapid proliferation of low-cost RNA-seq data has resulted in a growing interest in RNA analysis techniques for various applications, ranging from identifying genotype-phenotype relationships to validating discoveries of other analysis results. However, many practical applications in this field are limited by the available computational resources and associated long computing time needed to perform the analysis. GATK has a popular best practices pipeline specifically designed for variant calling RNA-seq analysis. Some tools in this pipeline are not optimized to scale the analysis to multiple processors or compute nodes efficiently, thereby limiting their ability to process large datasets. In this paper, we present SparkRA, an Apache Spark based pipeline to efficiently scale up the GATK RNA-seq variant calling pipeline on multiple cores in one node or in a large cluster. On a single node with 20 hyper-threaded cores, the original pipeline runs for more than 5 h to process a dataset of 32 GB. In contrast, SparkRA is able to reduce the overall computation time of the pipeline on the same single node by about 4×, reducing the computation time down to 1.3 h. On a cluster with 16 nodes (each with eight single-threaded cores), SparkRA is able to further reduce this computation time by 7.7× compared to a single node. Compared to other scalable state-of-the-art solutions, SparkRA is 1.2× faster while achieving the same accuracy of the results.


Subject(s)
Databases, Nucleic Acid , RNA-Seq , Sequence Analysis, RNA , Software
2.
PLoS One ; 14(12): e0224784, 2019.
Article in English | MEDLINE | ID: mdl-31805063

ABSTRACT

Due to the rapid decrease in the cost of NGS (Next Generation Sequencing), interest has increased in using data generated from NGS to diagnose genetic diseases. However, the data generated by NGS technology is usually in the order of hundreds of gigabytes per experiment, thus requiring efficient and scalable programs to perform data analysis quickly. This paper presents SparkGA2, a memory efficient, production quality framework for high performance DNA analysis in the cloud, which can scale according to the available computational resources by increasing the number of nodes. Our framework uses Apache Spark's ability to cache data in the memory to speed up processing, while also allowing the user to run the framework on systems with lower amounts of memory at the cost of slightly less performance. To manage the memory footprint, we implement an on-the-fly compression method of intermediate data and reduce memory requirements by up to 3x. Our framework also uses a streaming approach to gradually stream input data as processing is taking place. This makes our framework faster than other state of the art approaches while at the same time allowing users to adapt it to run on clusters with lower memory. As compared to the state of the art, SparkGA2 is up to 22% faster on a large big data cluster of 67 nodes and up to 9% faster on a smaller cluster of 6 nodes. Including the streaming solution, where data pre-processing is considered, SparkGA2 is 51% faster on a 6 node cluster. The source code of SparkGA2 is publicly available at https://github.com/HamidMushtaq/SparkGA2.


Subject(s)
Cloud Computing , Genomics/methods , Software , Genome, Human , High-Throughput Nucleotide Sequencing , Humans , Sequence Analysis, DNA
3.
BMC Bioinformatics ; 20(1): 597, 2019 Nov 19.
Article in English | MEDLINE | ID: mdl-31744474

ABSTRACT

Following publication of the original article [1], the author requested changes to the figures 4, 7, 8, 9, 12 and 14 to align these with the text. The corrected figures are supplied below.

4.
BMC Bioinformatics ; 20(1): 520, 2019 Oct 25.
Article in English | MEDLINE | ID: mdl-31653208

ABSTRACT

BACKGROUND: Due the computational complexity of sequence alignment algorithms, various accelerated solutions have been proposed to speedup this analysis. NVBIO is the only available GPU library that accelerates sequence alignment of high-throughput NGS data, but has limited performance. In this article we present GASAL2, a GPU library for aligning DNA and RNA sequences that outperforms existing CPU and GPU libraries. RESULTS: The GASAL2 library provides specialized, accelerated kernels for local, global and all types of semi-global alignment. Pairwise sequence alignment can be performed with and without traceback. GASAL2 outperforms the fastest CPU-optimized SIMD implementations such as SeqAn and Parasail, as well as NVIDIA's own GPU-based library known as NVBIO. GASAL2 is unique in performing sequence packing on GPU, which is up to 750x faster than NVBIO. Overall on Geforce GTX 1080 Ti GPU, GASAL2 is up to 21x faster than Parasail on a dual socket hyper-threaded Intel Xeon system with 28 cores and up to 13x faster than NVBIO with a query length of up to 300 bases and 100 bases, respectively. GASAL2 alignment functions are asynchronous/non-blocking and allow full overlap of CPU and GPU execution. The paper shows how to use GASAL2 to accelerate BWA-MEM, speeding up the local alignment by 20x, which gives an overall application speedup of 1.3x vs. CPU with up to 12 threads. CONCLUSIONS: The library provides high performance APIs for local, global and semi-global alignment that can be easily integrated into various bioinformatics tools.


Subject(s)
Gene Library , High-Throughput Nucleotide Sequencing , Sequence Alignment , Software , Algorithms , Computational Biology , DNA/genetics , RNA/genetics , Sequence Analysis, DNA , Sequence Analysis, RNA
5.
Eur J Gastroenterol Hepatol ; 25(7): 850-7, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23411866

ABSTRACT

BACKGROUND AND AIMS: Performing endoscopic ultrasound (EUS) before endoscopic retrograde cholangiopancreatography (ERCP) has been described to be useful in cases of suspected biliary obstruction where EUS can triage patients for ERCP. We aimed to determine the diagnostic accuracy of EUS and its impact on ERCP burden in real clinical practice. We also evaluated the safety and efficacy of EUS+ERCP in a single endoscopic session. PATIENTS AND METHODS: Four hundred and eighteen consecutive patients with suspected but unexplained biliary obstruction referred for EUS before possible ERCP were evaluated. The diagnostic accuracy of EUS and its value in predicting the need for ERCP were determined. EUS established whether pancreaticobiliary disorder (PBD) was present and whether therapeutic ERCP was required. These decisions were matched with ERCP findings, histology, clinical course, and follow-up. Where ERCP was indicated, it was performed in the same endoscopic session. RESULTS: EUS was performed in 412/418 patients (feasibility 98.5%), and ERCP was considered necessary in 64% (ERCP avoided in 36%). The single-session EUS and ERCP was safe and effective (264 patients). The diagnostic accuracy of EUS was as follows: choledocholithiasis 99%, malignant strictures 90%, and benign strictures 92%. EUS showed pathology in 42% of patients who had a nondilated biliary system at initial investigations. When EUS indicated a normal common bile duct (n=119), this had a 100% positive predictive value for non-necessity for ERCP. The median overall follow-up period was 12 months (range 6-34 months). CONCLUSION: EUS demonstrated high diagnostic accuracy in this mixed group of PBD. This accurately guided ERCP need and avoided unnecessary ERCP in 36%. EUS and ERCP in the same endoscopic session for the evaluation and management of PBD is technically feasible, with safety and efficacy profiles equivalent to that of each procedure performed independently in different sessions.


Subject(s)
Bile Ducts/diagnostic imaging , Bile Ducts/surgery , Cholangiopancreatography, Endoscopic Retrograde , Cholestasis/diagnosis , Cholestasis/surgery , Endosonography , Patient Selection , Adolescent , Adult , Aged , Aged, 80 and over , Chi-Square Distribution , Cholangiopancreatography, Endoscopic Retrograde/adverse effects , Cholestasis/diagnostic imaging , Cholestasis/etiology , Constriction, Pathologic , Endosonography/adverse effects , Feasibility Studies , Female , Humans , Male , Middle Aged , Predictive Value of Tests , Unnecessary Procedures , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...