jSRC: a flexible and accurate joint learning algorithm for clustering of single-cell RNA-sequencing data.
Brief Bioinform
; 22(5)2021 09 02.
Article
in English
| MEDLINE | ID: covidwho-1061210
ABSTRACT
Single-cell RNA-sequencing (scRNA-seq) explores the transcriptome of genes at cell level, which sheds light on revealing the heterogeneity and dynamics of cell populations. Advances in biotechnologies make it possible to generate scRNA-seq profiles for large-scale cells, requiring effective and efficient clustering algorithms to identify cell types and informative genes. Although great efforts have been devoted to clustering of scRNA-seq, the accuracy, scalability and interpretability of available algorithms are not desirable. In this study, we solve these problems by developing a joint learning algorithm [a.k.a. joints sparse representation and clustering (jSRC)], where the dimension reduction (DR) and clustering are integrated. Specifically, DR is employed for the scalability and joint learning improves accuracy. To increase the interpretability of patterns, we assume that cells within the same type have similar expression patterns, where the sparse representation is imposed on features. We transform clustering of scRNA-seq into an optimization problem and then derive the update rules to optimize the objective of jSRC. Fifteen scRNA-seq datasets from various tissues and organisms are adopted to validate the performance of jSRC, where the number of single cells varies from 49 to 110 824. The experimental results demonstrate that jSRC significantly outperforms 12 state-of-the-art methods in terms of various measurements (on average 20.29% by improvement) with fewer running time. Furthermore, jSRC is efficient and robust across different scRNA-seq datasets from various tissues. Finally, jSRC also accurately identifies dynamic cell types associated with progression of COVID-19. The proposed model and methods provide an effective strategy to analyze scRNA-seq data (the software is coded using MATLAB and is free for academic purposes; https//github.com/xkmaxidian/jSRC).
Keywords
Full text:
Available
Collection:
International databases
Database:
MEDLINE
Main subject:
Algorithms
/
Sequence Analysis, RNA
/
Single-Cell Analysis
/
Machine Learning
Type of study:
Prognostic study
Language:
English
Journal subject:
Biology
/
Medical Informatics
Year:
2021
Document Type:
Article
Affiliation country:
Bib
Similar
MEDLINE
...
LILACS
LIS