Pesquisa | Portal Regional da BVS

A multi-use deep learning method for CITE-seq and single-cell RNA-seq data integration with cell surface protein prediction and imputation.

Lakkis, Justin; Schroeder, Amelia; Su, Kenong; Lee, Michelle Y Y; Bashore, Alexander C; Reilly, Muredach P; Li, Mingyao.

Nat Mach Intell ; 4(11): 940-952, 2022 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-36873621

RESUMO

CITE-seq, a single-cell multi-omics technology that measures RNA and protein expression simultaneously in single cells, has been widely applied in biomedical research, especially in immune related disorders and other diseases such as influenza and COVID-19. Despite the proliferation of CITE-seq, it is still costly to generate such data. Although data integration can increase information content, this raises computational challenges. First, combining multiple datasets is prone to batch effects that need to be addressed. Secondly, it is difficult to combine multiple CITE-seq datasets because the protein panels in different datasets may only partially overlap. Integrating multiple CITE-seq and single-cell RNA-seq (scRNA-seq) datasets is important because this allows the utilization of as many data as possible to uncover cell population heterogeneity. To overcome these challenges, we present sciPENN, a multi-use deep learning approach that supports CITE-seq and scRNA-seq data integration, protein expression prediction for scRNA-seq, protein expression imputation for CITE-seq, quantification of prediction and imputation uncertainty, and cell type label transfer from CITE-seq to scRNA-seq. Comprehensive evaluations spanning multiple datasets demonstrate that sciPENN outperforms other current state-of-the-art methods.

A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics.

Lakkis, Justin; Wang, David; Zhang, Yuanchao; Hu, Gang; Wang, Kui; Pan, Huize; Ungar, Lyle; Reilly, Muredach P; Li, Xiangjie; Li, Mingyao.

Genome Res ; 31(10): 1753-1766, 2021 10.

Artigo em Inglês | MEDLINE | ID: mdl-34035047

RESUMO

Recent developments of single-cell RNA-seq (scRNA-seq) technologies have led to enormous biological discoveries. As the scale of scRNA-seq studies increases, a major challenge in analysis is batch effects, which are inevitable in studies involving human tissues. Most existing methods remove batch effects in a low-dimensional embedding space. Although useful for clustering, batch effects are still present in the gene expression space, leaving downstream gene-level analysis susceptible to batch effects. Recent studies have shown that batch effect correction in the gene expression space is much harder than in the embedding space. Methods such as Seurat 3.0 rely on the mutual nearest neighbor (MNN) approach to remove batch effects in gene expression, but MNN can only analyze two batches at a time, and it becomes computationally infeasible when the number of batches is large. Here, we present CarDEC, a joint deep learning model that simultaneously clusters and denoises scRNA-seq data while correcting batch effects both in the embedding and the gene expression space. Comprehensive evaluations spanning different species and tissues showed that CarDEC outperforms Scanorama, DCA + Combat, scVI, and MNN. With CarDEC denoising, non-highly variable genes offer as much signal for clustering as the highly variable genes (HVGs), suggesting that CarDEC substantially boosted information content in scRNA-seq. We also showed that trajectory analysis using CarDEC's denoised and batch-corrected expression as input revealed marker genes and transcription factors that are otherwise obscured in the presence of batch effects. CarDEC is computationally fast, making it a desirable tool for large-scale scRNA-seq studies.

Assuntos

Aprendizado Profundo , Transcriptoma , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA