ABSTRACT
Multiclass classification is an important technique to many complex bioinformatics problems. However, their performance is limited by the computation power. Based on the Apache Hadoop design framework, this study proposes a two layer architecture that exploits the inherent parallelism of GA-SVM classification to speed up the work. The performance evaluations on an mRNA benchmark cancer dataset have reduced 86.55% features and raised accuracy from 97.53% to 98.03%. With a user-friendly web interface, the system provides researchers an easy way to investigate the unrevealed secrets in the fast-growing repository of bioinformatics data.
Subject(s)
Computational Biology/methods , RNA, Messenger/analysis , Algorithms , Humans , Models, Theoretical , RNA, Messenger/genetics , Time FactorsABSTRACT
Biomedical data analytic system has played an important role in doing the clinical diagnosis for several decades. Today, it is an emerging research area of analyzing these big data to make decision support for physicians. This paper presents a parallelized web-based tool with cloud computing service architecture to analyze the epilepsy. There are many modern analytic functions which are wavelet transform, genetic algorithm (GA), and support vector machine (SVM) cascaded in the system. To demonstrate the effectiveness of the system, it has been verified by two kinds of electroencephalography (EEG) data, which are short term EEG and long term EEG. The results reveal that our approach achieves the total classification accuracy higher than 90%. In addition, the entire training time accelerate about 4.66 times and prediction time is also meet requirements in real time.