Search | VHL Regional Portal

scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering.

Wang, Yunhe; Yu, Zhuohan; Li, Shaochuan; Bian, Chuang; Liang, Yanchun; Wong, Ka-Chun; Li, Xiangtao.

Bioinformatics ; 39(2)2023 02 14.

Article in English | MEDLINE | ID: mdl-36734596

ABSTRACT

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) is an increasingly popular technique for transcriptomic analysis of gene expression at the single-cell level. Cell-type clustering is the first crucial task in the analysis of scRNA-seq data that facilitates accurate identification of cell types and the study of the characteristics of their transcripts. Recently, several computational models based on a deep autoencoder and the ensemble clustering have been developed to analyze scRNA-seq data. However, current deep autoencoders are not sufficient to learn the latent representations of scRNA-seq data, and obtaining consensus partitions from these feature representations remains under-explored. RESULTS: To address this challenge, we propose a single-cell deep clustering model via a dual denoising autoencoder with bipartite graph ensemble clustering called scBGEDA, to identify specific cell populations in single-cell transcriptome profiles. First, a single-cell dual denoising autoencoder network is proposed to project the data into a compressed low-dimensional space and that can learn feature representation via explicit modeling of synergistic optimization of the zero-inflated negative binomial reconstruction loss and denoising reconstruction loss. Then, a bipartite graph ensemble clustering algorithm is designed to exploit the relationships between cells and the learned latent embedded space by means of a graph-based consensus function. Multiple comparison experiments were conducted on 20 scRNA-seq datasets from different sequencing platforms using a variety of clustering metrics. The experimental results indicated that scBGEDA outperforms other state-of-the-art methods on these datasets, and also demonstrated its scalability to large-scale scRNA-seq datasets. Moreover, scBGEDA was able to identify cell-type specific marker genes and provide functional genomic analysis by quantifying the influence of genes on cell clusters, bringing new insights into identifying cell types and characterizing the scRNA-seq data from different perspectives. AVAILABILITY AND IMPLEMENTATION: The source code of scBGEDA is available at https://github.com/wangyh082/scBGEDA. The software and the supporting data can be downloaded from https://figshare.com/articles/software/scBGEDA/19657911. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Gene Expression Profiling , Sequence Analysis, RNA/methods , Gene Expression Profiling/methods , Software , Single-Cell Analysis/methods , Cluster Analysis

scEFSC: Accurate single-cell RNA-seq data analysis via ensemble consensus clustering based on multiple feature selections.

Bian, Chuang; Wang, Xubin; Su, Yanchi; Wang, Yunhe; Wong, Ka-Chun; Li, Xiangtao.

Comput Struct Biotechnol J ; 20: 2181-2197, 2022.

Article in English | MEDLINE | ID: mdl-35615016

ABSTRACT

With the development of next-generation sequencing technologies, single-cell RNA sequencing (scRNA-seq) has become one indispensable tool to reveal the wide heterogeneity between cells. Clustering is a fundamental task in this analysis to disclose the transcriptomic profiles of single cells and is one of the key computational problems that has received widespread attention. Recently, many clustering algorithms have been developed for the scRNA-seq data. Nevertheless, the computational models often suffer from realistic restrictions such as numerical instability, high dimensionality and computational scalability. Moreover, the accumulating cell numbers and high dropout rates bring a huge computational challenge to the analysis. To address these limitations, we first provide a systematic and extensive performance evaluation of four feature selection methods and nine scRNA-seq clustering algorithms on fourteen real single-cell RNA-seq datasets. Based on this, we then propose an accurate single-cell data analysis via Ensemble Feature Selection based Clustering, called scEFSC. Indeed, the algorithm employs several unsupervised feature selections to remove genes that do not contribute significantly to the scRNA-seq data. After that, different single-cell RNA-seq clustering algorithms are proposed to cluster the data filtered by multiple unsupervised feature selections, and then the clustering results are combined using weighted-based meta-clustering. We applied scEFSC to the fourteen real single-cell RNA-seq datasets and the experimental results demonstrated that our proposed scEFSC outperformed the other scRNA-seq clustering algorithms with several evaluation metrics. In addition, we established the biological interpretability of scEFSC by carrying out differential gene expression analysis, gene ontology enrichment and KEGG analysis. scEFSC is available at https://github.com/Conan-Bian/scEFSC.

Elucidating transcriptomic profiles from single-cell RNA sequencing data using nature-inspired compressed sensing.

Yu, Zhuohan; Bian, Chuang; Liu, Genggeng; Zhang, Shixiong; Wong, Ka-Chun; Li, Xiangtao.

Brief Bioinform ; 22(5)2021 09 02.

Article in English | MEDLINE | ID: mdl-33855366

ABSTRACT

Gene-expression profiling can define the cell state and gene-expression pattern of cells at the genetic level in a high-throughput manner. With the development of transcriptome techniques, processing high-dimensional genetic data has become a major challenge in expression profiling. Thanks to the recent widespread use of matrix decomposition methods in bioinformatics, a computational framework based on compressed sensing was adopted to reduce dimensionality. However, compressed sensing requires an optimization strategy to learn the modular dictionaries and activity levels from the low-dimensional random composite measurements to reconstruct the high-dimensional gene-expression data. Considering this, here we introduce and compare four compressed sensing frameworks coming from nature-inspired optimization algorithms (CSCS, ABCCS, BACS and FACS) to improve the quality of the decompression process. Several experiments establish that the three proposed methods outperform benchmark methods on nine different datasets, especially the FACS method. We illustrate therefore, the robustness and convergence of FACS in various aspects; notably, time complexity and parameter analyses highlight properties of our proposed FACS. Furthermore, differential gene-expression analysis, cell-type clustering, gene ontology enrichment and pathology analysis are conducted, which bring novel insights into cell-type identification and characterization mechanisms from different perspectives. All algorithms are written in Python and available at https://github.com/Philyzh8/Nature-inspired-CS.

Subject(s)

Algorithms , Computational Biology/methods , Gene Expression Profiling/methods , RNA-Seq/methods , Single-Cell Analysis/methods , Transcriptome , Animals , Cluster Analysis , Gene Regulatory Networks/genetics , Humans , Molecular Sequence Annotation/methods , Reproducibility of Results , Signal Transduction/genetics , Time Factors

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL