Búsqueda | Portal Regional de la BVS

Benchmarking distributed data warehouse solutions for storing genomic variant information.

Wiewiórka, Marek S; Wysakowicz, Dawid P; Okoniewski, Michal J; Gambin, Tomasz.

Database (Oxford) ; 20172017 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-29220442

RESUMEN

Database URL: https://github.com/ZSI-Bio/variantsdwh.

Asunto(s)

Data Warehousing , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Genómica/métodos , Benchmarking , Humanos , Medicina de Precisión

SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision.

Wiewiórka, Marek S; Messina, Antonio; Pacholewska, Alicja; Maffioletti, Sergio; Gawrysiak, Piotr; Okoniewski, Michal J.

Bioinformatics ; 30(18): 2652-3, 2014 Sep 15.

Artículo en Inglés | MEDLINE | ID: mdl-24845651

RESUMEN

UNLABELLED: Many time-consuming analyses of next -: generation sequencing data can be addressed with modern cloud computing. The Apache Hadoop-based solutions have become popular in genomics BECAUSE OF: their scalability in a cloud infrastructure. So far, most of these tools have been used for batch data processing rather than interactive data querying. The SparkSeq software has been created to take advantage of a new MapReduce framework, Apache Spark, for next-generation sequencing data. SparkSeq is a general-purpose, flexible and easily extendable library for genomic cloud computing. It can be used to build genomic analysis pipelines in Scala and run them in an interactive way. SparkSeq opens up the possibility of customized ad hoc secondary analyses and iterative machine learning algorithms. This article demonstrates its scalability and overall fast performance by running the analyses of sequencing datasets. Tests of SparkSeq also prove that the use of cache and HDFS block size can be tuned for the optimal performance on multiple worker nodes. AVAILABILITY AND IMPLEMENTATION: Available under open source Apache 2.0 license: https://bitbucket.org/mwiewiorka/sparkseq/.

Asunto(s)

Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Internet , Nucleótidos/genética , Programas Informáticos , Estadística como Asunto/métodos , Algoritmos , Factores de Tiempo

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA