Pesquisa | Portal Regional da BVS (teste)

NeuroPigPen: A Scalable Toolkit for Processing Electrophysiological Signal Data in Neuroscience Applications Using Apache Pig.

Sahoo, Satya S; Wei, Annan; Valdez, Joshua; Wang, Li; Zonjy, Bilal; Tatsuoka, Curtis; Loparo, Kenneth A; Lhatoo, Samden D.

Front Neuroinform ; 10: 18, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27375472

RESUMO

The recent advances in neurological imaging and sensing technologies have led to rapid increase in the volume, rate of data generation, and variety of neuroscience data. This "neuroscience Big data" represents a significant opportunity for the biomedical research community to design experiments using data with greater timescale, large number of attributes, and statistically significant data size. The results from these new data-driven research techniques can advance our understanding of complex neurological disorders, help model long-term effects of brain injuries, and provide new insights into dynamics of brain networks. However, many existing neuroinformatics data processing and analysis tools were not built to manage large volume of data, which makes it difficult for researchers to effectively leverage this available data to advance their research. We introduce a new toolkit called NeuroPigPen that was developed using Apache Hadoop and Pig data flow language to address the challenges posed by large-scale electrophysiological signal data. NeuroPigPen is a modular toolkit that can process large volumes of electrophysiological signal data, such as Electroencephalogram (EEG), Electrocardiogram (ECG), and blood oxygen levels (SpO2), using a new distributed storage model called Cloudwave Signal Format (CSF) that supports easy partitioning and storage of signal data on commodity hardware. NeuroPigPen was developed with three design principles: (a) Scalability-the ability to efficiently process increasing volumes of data; (b) Adaptability-the toolkit can be deployed across different computing configurations; and (c) Ease of programming-the toolkit can be easily used to compose multi-step data processing pipelines using high-level programming constructs. The NeuroPigPen toolkit was evaluated using 750 GB of electrophysiological signal data over a variety of Hadoop cluster configurations ranging from 3 to 30 Data nodes. The evaluation results demonstrate that the toolkit is highly scalable and adaptable, which makes it suitable for use in neuroscience applications as a scalable data processing toolkit. As part of the ongoing extension of NeuroPigPen, we are developing new modules to support statistical functions to analyze signal data for brain connectivity research. In addition, the toolkit is being extended to allow integration with scientific workflow systems. NeuroPigPen is released under BSD license at: https://sites.google.com/a/case.edu/neuropigpen/.

A scalable neuroinformatics data flow for electrophysiological signals using MapReduce.

Jayapandian, Catherine; Wei, Annan; Ramesh, Priya; Zonjy, Bilal; Lhatoo, Samden D; Loparo, Kenneth; Zhang, Guo-Qiang; Sahoo, Satya S.

Front Neuroinform ; 9: 4, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25852536

RESUMO

Data-driven neuroscience research is providing new insights in progression of neurological disorders and supporting the development of improved treatment approaches. However, the volume, velocity, and variety of neuroscience data generated from sophisticated recording instruments and acquisition methods have exacerbated the limited scalability of existing neuroinformatics tools. This makes it difficult for neuroscience researchers to effectively leverage the growing multi-modal neuroscience data to advance research in serious neurological disorders, such as epilepsy. We describe the development of the Cloudwave data flow that uses new data partitioning techniques to store and analyze electrophysiological signal in distributed computing infrastructure. The Cloudwave data flow uses MapReduce parallel programming algorithm to implement an integrated signal data processing pipeline that scales with large volume of data generated at high velocity. Using an epilepsy domain ontology together with an epilepsy focused extensible data representation format called Cloudwave Signal Format (CSF), the data flow addresses the challenge of data heterogeneity and is interoperable with existing neuroinformatics data representation formats, such as HDF5. The scalability of the Cloudwave data flow is evaluated using a 30-node cluster installed with the open source Hadoop software stack. The results demonstrate that the Cloudwave data flow can process increasing volume of signal data by leveraging Hadoop Data Nodes to reduce the total data processing time. The Cloudwave data flow is a template for developing highly scalable neuroscience data processing pipelines using MapReduce algorithms to support a variety of user applications.

Insight: Semantic Provenance and Analysis Platform for Multi-center Neurology Healthcare Research.

Ramesh, Priya; Wei, Annan; Welter, Elisabeth; Bamps, Yvan; Stoll, Shelley; Bukach, Ashley; Sajatovic, Martha; Sahoo, Satya S.

Proceedings (IEEE Int Conf Bioinformatics Biomed) ; 2015: 731-736, 2015 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-27069752

RESUMO

Insight is a Semantic Web technology-based platform to support large-scale secondary analysis of healthcare data for neurology clinical research. Insight features the novel use of: (1) provenance metadata, which describes the history or origin of patient data, in clinical research analysis, and (2) support for patient cohort queries across multiple institutions conducting research in epilepsy, which is the one of the most common neurological disorders affecting 50 million persons worldwide. Insight is being developed as a healthcare informatics infrastructure to support a national network of eight epilepsy research centers across the U.S. funded by the U.S. Centers for Disease Control and Prevention (CDC). This paper describes the use of the World Wide Web Consortium (W3C) PROV recommendation for provenance metadata that allows researchers to create patient cohorts based on the provenance of the research studies. In addition, the paper describes the use of descriptive logic-based OWL2 epilepsy ontology for cohort queries with "expansion of query expression" using ontology reasoning. Finally, the evaluation results for the data integration and query performance are described using data from three research studies with 180 epilepsy patients. The experiment results demonstrate that Insight is a scalable approach to use Semantic provenance metadata for context-based data analysis in healthcare informatics.

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA