SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data / 基因组蛋白质组与生物信息学报·英文版
Genomics, Proteomics & Bioinformatics
;
(4): 201-210, 2019.
Article
in English
| WPRIM
| ID: wpr-772939
ABSTRACT
Clustering is a prevalent analytical means to analyze single cell RNA sequencing (scRNA-seq) data but the rapidly expanding data volume can make this process computationally challenging. New methods for both accurate and efficient clustering are of pressing need. Here we proposed Spearman subsampling-clustering-classification (SSCC), a new clustering framework based on random projection and feature construction, for large-scale scRNA-seq data. SSCC greatly improves clustering accuracy, robustness, and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, SSCC achieved 20% improvement for clustering accuracy and 50-fold acceleration, but only consumed 66% memory usage, compared to the widelyused software package SC3. Compared to k-means, the accuracy improvement of SSCC can reach 3-fold. An R implementation of SSCC is available at https//github.com/Japrin/sscClust.
Full text:
Available
Index:
WPRIM (Western Pacific)
Main subject:
Algorithms
/
Software
/
Cluster Analysis
/
Sequence Analysis, RNA
/
Statistics, Nonparametric
/
Computational Biology
/
Databases as Topic
/
Gene Expression Profiling
/
Single-Cell Analysis
/
Methods
Limits:
Animals
/
Humans
Language:
English
Journal:
Genomics, Proteomics & Bioinformatics
Year:
2019
Type:
Article
Similar
MEDLINE
...
LILACS
LIS