Search | VHL Regional Portal

Characterizing the impacts of dataset imbalance on single-cell data integration.

Maan, Hassaan; Zhang, Lin; Yu, Chengxin; Geuenich, Michael J; Campbell, Kieran R; Wang, Bo.

Nat Biotechnol ; 2024 Mar 01.

Article in English | MEDLINE | ID: mdl-38429430

ABSTRACT

Computational methods for integrating single-cell transcriptomic data from multiple samples and conditions do not generally account for imbalances in the cell types measured in different datasets. In this study, we examined how differences in the cell types present, the number of cells per cell type and the cell type proportions across samples affect downstream analyses after integration. The Iniquitate pipeline assesses the robustness of integration results after perturbing the degree of imbalance between datasets. Benchmarking of five state-of-the-art single-cell RNA sequencing integration techniques in 2,600 integration experiments indicates that sample imbalance has substantial impacts on downstream analyses and the biological interpretation of integration results. Imbalance perturbation led to statistically significant variation in unsupervised clustering, cell type classification, differential expression and marker gene annotation, query-to-reference mapping and trajectory inference. We quantified the impacts of imbalance through newly introduced properties-aggregate cell type support and minimum cell type center distance. To better characterize and mitigate impacts of imbalance, we introduce balanced clustering metrics and imbalanced integration guidelines for integration method users.

The impacts of active and self-supervised learning on efficient annotation of single-cell expression data.

Geuenich, Michael J; Gong, Dae-Won; Campbell, Kieran R.

Nat Commun ; 15(1): 1014, 2024 Feb 03.

Article in English | MEDLINE | ID: mdl-38307875

ABSTRACT

A crucial step in the analysis of single-cell data is annotating cells to cell types and states. While a myriad of approaches has been proposed, manual labeling of cells to create training datasets remains tedious and time-consuming. In the field of machine learning, active and self-supervised learning methods have been proposed to improve the performance of a classifier while reducing both annotation time and label budget. However, the benefits of such strategies for single-cell annotation have yet to be evaluated in realistic settings. Here, we perform a comprehensive benchmarking of active and self-supervised labeling strategies across a range of single-cell technologies and cell type annotation algorithms. We quantify the benefits of active learning and self-supervised strategies in the presence of cell type imbalance and variable similarity. We introduce adaptive reweighting, a heuristic procedure tailored to single-cell data-including a marker-aware version-that shows competitive performance with existing approaches. In addition, we demonstrate that having prior knowledge of cell type markers improves annotation accuracy. Finally, we summarize our findings into a set of recommendations for those implementing cell type annotation procedures or platforms. An R package implementing the heuristic approaches introduced in this work may be found at https://github.com/camlab-bioml/leader .

Subject(s)

Algorithms , Machine Learning , Technology , Awareness , Supervised Machine Learning , Single-Cell Analysis

Automated assignment of cell identity from single-cell multiplexed imaging and proteomic data.

Geuenich, Michael J; Hou, Jinyu; Lee, Sunyun; Ayub, Shanza; Jackson, Hartland W; Campbell, Kieran R.

Cell Syst ; 12(12): 1173-1186.e5, 2021 12 15.

Article in English | MEDLINE | ID: mdl-34536381

ABSTRACT

A major challenge in the analysis of highly multiplexed imaging data is the assignment of cells to a priori known cell types. Existing approaches typically solve this by clustering cells followed by manual annotation. However, these often require several subjective choices and cannot explicitly assign cells to an uncharacterized type. To help address these issues we present Astir, a probabilistic model to assign cells to cell types by integrating prior knowledge of marker proteins. Astir uses deep recognition neural networks for fast inference, allowing for annotations at the million-cell scale in the absence of a previously annotated reference. We apply Astir to over 2.4 million cells from suspension and imaging datasets and demonstrate its scalability, robustness to sample composition, and interpretable uncertainty estimates. We envision deployment of Astir either for a first broad cell type assignment or to accurately annotate cells that may serve as biomarkers in multiple disease contexts. A record of this paper's transparent peer review process is included in the supplemental information.

Subject(s)

Neural Networks, Computer , Proteomics , Cluster Analysis

MAX Functions as a Tumor Suppressor and Rewires Metabolism in Small Cell Lung Cancer.

Augert, Arnaud; Mathsyaraja, Haritha; Ibrahim, Ali H; Freie, Brian; Geuenich, Michael J; Cheng, Pei-Feng; Alibeckoff, Sydney P; Wu, Nan; Hiatt, Joseph B; Basom, Ryan; Gazdar, Adi; Sullivan, Lucas B; Eisenman, Robert N; MacPherson, David.

Cancer Cell ; 38(1): 97-114.e7, 2020 07 13.

Article in English | MEDLINE | ID: mdl-32470392

ABSTRACT

Small cell lung cancer (SCLC) is a highly aggressive and lethal neoplasm. To identify candidate tumor suppressors we applied CRISPR/Cas9 gene inactivation screens to a cellular model of early-stage SCLC. Among the top hits was MAX, the obligate heterodimerization partner for MYC family proteins that is mutated in human SCLC. Max deletion increases growth and transformation in cells and dramatically accelerates SCLC progression in an Rb1/Trp53-deleted mouse model. In contrast, deletion of Max abrogates tumorigenesis in MYCL-overexpressing SCLC. Max deletion in SCLC resulted in derepression of metabolic genes involved in serine and one-carbon metabolism. By increasing serine biosynthesis, Max-deleted cells exhibit resistance to serine depletion. Thus, Max loss results in metabolic rewiring and context-specific tumor suppression.

Subject(s)

Basic Helix-Loop-Helix Leucine Zipper Transcription Factors/genetics , Disease Models, Animal , Lung Neoplasms/genetics , Small Cell Lung Carcinoma/genetics , Tumor Suppressor Proteins/genetics , Animals , Basic Helix-Loop-Helix Leucine Zipper Transcription Factors/metabolism , Cells, Cultured , Gene Expression Profiling/methods , Gene Expression Regulation, Neoplastic , HEK293 Cells , Hep G2 Cells , Humans , K562 Cells , Kaplan-Meier Estimate , Lung Neoplasms/metabolism , Mice, Knockout , Mice, Transgenic , Small Cell Lung Carcinoma/metabolism , Tumor Suppressor Proteins/metabolism

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL