Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
Cell ; 187(10): 2343-2358, 2024 May 09.
Article in English | MEDLINE | ID: mdl-38729109

ABSTRACT

As the number of single-cell datasets continues to grow rapidly, workflows that map new data to well-curated reference atlases offer enormous promise for the biological community. In this perspective, we discuss key computational challenges and opportunities for single-cell reference-mapping algorithms. We discuss how mapping algorithms will enable the integration of diverse datasets across disease states, molecular modalities, genetic perturbations, and diverse species and will eventually replace manual and laborious unsupervised clustering pipelines.


Subject(s)
Algorithms , Single-Cell Analysis , Single-Cell Analysis/methods , Humans , Computational Biology/methods , Data Analysis , Animals , Cluster Analysis
2.
Brief Bioinform ; 25(2)2024 01 22.
Article in English | MEDLINE | ID: mdl-38483255

ABSTRACT

Spatially resolved transcriptomics (SRT) is a pioneering method for simultaneously studying morphological contexts and gene expression at single-cell precision. Data emerging from SRT are multifaceted, presenting researchers with intricate gene expression matrices, precise spatial details and comprehensive histology visuals. Such rich and intricate datasets, unfortunately, render many conventional methods like traditional machine learning and statistical models ineffective. The unique challenges posed by the specialized nature of SRT data have led the scientific community to explore more sophisticated analytical avenues. Recent trends indicate an increasing reliance on deep learning algorithms, especially in areas such as spatial clustering, identification of spatially variable genes and data alignment tasks. In this manuscript, we provide a rigorous critique of these advanced deep learning methodologies, probing into their merits, limitations and avenues for further refinement. Our in-depth analysis underscores that while the recent innovations in deep learning tailored for SRT have been promising, there remains a substantial potential for enhancement. A crucial area that demands attention is the development of models that can incorporate intricate biological nuances, such as phylogeny-aware processing or in-depth analysis of minuscule histology image segments. Furthermore, addressing challenges like the elimination of batch effects, perfecting data normalization techniques and countering the overdispersion and zero inflation patterns seen in gene expression is pivotal. To support the broader scientific community in their SRT endeavors, we have meticulously assembled a comprehensive directory of readily accessible SRT databases, hoping to serve as a foundation for future research initiatives.


Subject(s)
Deep Learning , Algorithms , Databases, Factual , Gene Expression Profiling , Machine Learning
3.
Nat Methods ; 21(1): 50-59, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37735568

ABSTRACT

RNA velocity has been rapidly adopted to guide interpretation of transcriptional dynamics in snapshot single-cell data; however, current approaches for estimating RNA velocity lack effective strategies for quantifying uncertainty and determining the overall applicability to the system of interest. Here, we present veloVI (velocity variational inference), a deep generative modeling framework for estimating RNA velocity. veloVI learns a gene-specific dynamical model of RNA metabolism and provides a transcriptome-wide quantification of velocity uncertainty. We show that veloVI compares favorably to previous approaches with respect to goodness of fit, consistency across transcriptionally similar cells and stability across preprocessing pipelines for quantifying RNA abundance. Further, we demonstrate that veloVI's posterior velocity uncertainty can be used to assess whether velocity analysis is appropriate for a given dataset. Finally, we highlight veloVI as a flexible framework for modeling transcriptional dynamics by adapting the underlying dynamical model to use time-dependent transcription rates.


Subject(s)
RNA , Transcriptome , RNA/genetics , Learning
4.
Nat Methods ; 20(11): 1683-1692, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37813989

ABSTRACT

The increasing generation of population-level single-cell atlases has the potential to link sample metadata with cellular data. Constructing such references requires integration of heterogeneous cohorts with varying metadata. Here we present single-cell population level integration (scPoli), an open-world learner that incorporates generative models to learn sample and cell representations for data integration, label transfer and reference mapping. We applied scPoli on population-level atlases of lung and peripheral blood mononuclear cells, the latter consisting of 7.8 million cells across 2,375 samples. We demonstrate that scPoli can explain sample-level biological and technical variations using sample embeddings revealing genes associated with batch effects and biological effects. scPoli is further applicable to single-cell sequencing assay for transposase-accessible chromatin and cross-species datasets, offering insights into chromatin accessibility and comparative genomics. We envision scPoli becoming an important tool for population-level single-cell data integration facilitating atlas use but also interpretation by means of multi-scale analyses.


Subject(s)
Genomics , Leukocytes, Mononuclear , Humans , Chromatin/genetics , Single-Cell Analysis
5.
NAR Genom Bioinform ; 5(3): lqad070, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37502708

ABSTRACT

Single-cell genomics is now producing an ever-increasing amount of datasets that, when integrated, could provide large-scale reference atlases of tissue in health and disease. Such large-scale atlases increase the scale and generalizability of analyses and enable combining knowledge generated by individual studies. Specifically, individual studies often differ regarding cell annotation terminology and depth, with different groups specializing in different cell type compartments, often using distinct terminology. Understanding how these distinct sets of annotations are related and complement each other would mark a major step towards a consensus-based cell-type annotation reflecting the latest knowledge in the field. Whereas recent computational techniques, referred to as 'reference mapping' methods, facilitate the usage and expansion of existing reference atlases by mapping new datasets (i.e. queries) onto an atlas; a systematic approach towards harmonizing dataset-specific cell-type terminology and annotation depth is still lacking. Here, we present 'treeArches', a framework to automatically build and extend reference atlases while enriching them with an updatable hierarchy of cell-type annotations across different datasets. We demonstrate various use cases for treeArches, from automatically resolving relations between reference and query cell types to identifying unseen cell types absent in the reference, such as disease-associated cell states. We envision treeArches enabling data-driven construction of consensus atlas-level cell-type hierarchies and facilitating efficient usage of reference atlases.

6.
Nat Med ; 29(6): 1563-1577, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37291214

ABSTRACT

Single-cell technologies have transformed our understanding of human tissues. Yet, studies typically capture only a limited number of donors and disagree on cell type definitions. Integrating many single-cell datasets can address these limitations of individual studies and capture the variability present in the population. Here we present the integrated Human Lung Cell Atlas (HLCA), combining 49 datasets of the human respiratory system into a single atlas spanning over 2.4 million cells from 486 individuals. The HLCA presents a consensus cell type re-annotation with matching marker genes, including annotations of rare and previously undescribed cell types. Leveraging the number and diversity of individuals in the HLCA, we identify gene modules that are associated with demographic covariates such as age, sex and body mass index, as well as gene modules changing expression along the proximal-to-distal axis of the bronchial tree. Mapping new data to the HLCA enables rapid data annotation and interpretation. Using the HLCA as a reference for the study of disease, we identify shared cell states across multiple lung diseases, including SPP1+ profibrotic monocyte-derived macrophages in COVID-19, pulmonary fibrosis and lung carcinoma. Overall, the HLCA serves as an example for the development and use of large-scale, cross-dataset organ atlases within the Human Cell Atlas.


Subject(s)
COVID-19 , Lung Neoplasms , Pulmonary Fibrosis , Humans , Lung , Lung Neoplasms/genetics , Macrophages
7.
Mol Syst Biol ; 19(6): e11517, 2023 06 12.
Article in English | MEDLINE | ID: mdl-37154091

ABSTRACT

Recent advances in multiplexed single-cell transcriptomics experiments facilitate the high-throughput study of drug and genetic perturbations. However, an exhaustive exploration of the combinatorial perturbation space is experimentally unfeasible. Therefore, computational methods are needed to predict, interpret, and prioritize perturbations. Here, we present the compositional perturbation autoencoder (CPA), which combines the interpretability of linear models with the flexibility of deep-learning approaches for single-cell response modeling. CPA learns to in silico predict transcriptional perturbation response at the single-cell level for unseen dosages, cell types, time points, and species. Using newly generated single-cell drug combination data, we validate that CPA can predict unseen drug combinations while outperforming baseline models. Additionally, the architecture's modularity enables incorporating the chemical representation of the drugs, allowing the prediction of cellular response to completely unseen drugs. Furthermore, CPA is also applicable to genetic combinatorial screens. We demonstrate this by imputing in silico 5,329 missing combinations (97.6% of all possibilities) in a single-cell Perturb-seq experiment with diverse genetic interactions. We envision CPA will facilitate efficient experimental design and hypothesis generation by enabling in silico response prediction at the single-cell level and thus accelerate therapeutic applications using single-cell technologies.


Subject(s)
Computational Biology , Gene Expression Profiling , High-Throughput Screening Assays , Single-Cell Gene Expression Analysis
8.
Heliyon ; 9(5): e15694, 2023 May.
Article in English | MEDLINE | ID: mdl-37144199

ABSTRACT

Prostate cancer (PCa) is one of the two solid malignancies in which a higher T cell infiltration in the tumor microenvironment (TME) corresponds with a worse prognosis for the tumor. The inability of T cells to eliminate tumor cells despite an increase in their number reinforces the possibility of impaired antigen presentation. In this study, we investigated the TME at single-cell resolution to understand the molecular function and communication of dendritic cells (DCs) (as professional antigen-presenting cells). According to our data, tumor cells stimulate the migration of immature DCs to the tumor site by inducing inflammatory chemokines. Many signaling pathways such as TNF-α/NF-κB, IL2/STAT5, and E2F up-regulated after DCs enter the tumor location. In addition, some molecules such as GPR34 and SLCO2B1 decreased on the surface of DCs. The analysis of molecular and signaling alterations in DCs revealed some suppression mechanisms of tumors, such as removing mature DCs, reducing the DC's survival, inducing anergy or exhaustion in the effector T cells, and enhancing the differentiation of T cells to Th2 and Tregs. In addition, we investigated the cellular and molecular communication between DCs and macrophages in the tumor site and found three molecular pairs including CCR5/CCL5, CD52/SIGLEC10, and HLA-DPB1/TNFSF13B. These molecular pairs are involved in the migration of immature DCs to the TME and disrupt the antigen-presenting function of DCs. Furthermore, we presented new therapeutic targets by the construction of a gene co-expression network. These data increase our knowledge of the heterogeneity and the role of DCs in PCa TME.

9.
Nat Cell Biol ; 25(2): 337-350, 2023 02.
Article in English | MEDLINE | ID: mdl-36732632

ABSTRACT

The increasing availability of large-scale single-cell atlases has enabled the detailed description of cell states. In parallel, advances in deep learning allow rapid analysis of newly generated query datasets by mapping them into reference atlases. However, existing data transformations learned to map query data are not easily explainable using biologically known concepts such as genes or pathways. Here we propose expiMap, a biologically informed deep-learning architecture that enables single-cell reference mapping. ExpiMap learns to map cells into biologically understandable components representing known 'gene programs'. The activity of each cell for a gene program is learned while simultaneously refining them and learning de novo programs. We show that expiMap compares favourably to existing methods while bringing an additional layer of interpretability to integrative single-cell analysis. Furthermore, we demonstrate its applicability to analyse single-cell perturbation responses in different tissues and species and resolve responses of patients who have coronavirus disease 2019 to different treatments across cell types.


Subject(s)
COVID-19 , Deep Learning , Humans , COVID-19/genetics , Single-Cell Analysis
10.
Nat Methods ; 19(2): 171-178, 2022 02.
Article in English | MEDLINE | ID: mdl-35102346

ABSTRACT

Spatial omics data are advancing the study of tissue organization and cellular communication at an unprecedented scale. Flexible tools are required to store, integrate and visualize the large diversity of spatial omics data. Here, we present Squidpy, a Python framework that brings together tools from omics and image analysis to enable scalable description of spatial molecular data, such as transcriptome or multivariate proteins. Squidpy provides efficient infrastructure and numerous analysis methods that allow to efficiently store, manipulate and interactively visualize spatial omics data. Squidpy is extensible and can be interfaced with a variety of already existing libraries for the scalable analysis of spatial omics data.


Subject(s)
Computational Biology/methods , Gene Expression Profiling/methods , Proteomics/methods , Software , Animals , Data Visualization , Databases, Factual , Humans , Image Processing, Computer-Assisted , Mice , Programming Languages , Workflow
12.
Nat Biotechnol ; 40(1): 121-130, 2022 01.
Article in English | MEDLINE | ID: mdl-34462589

ABSTRACT

Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.


Subject(s)
Datasets as Topic/standards , Deep Learning , Organ Specificity , Single-Cell Analysis/standards , Animals , COVID-19/pathology , Humans , Mice , Reference Standards , SARS-CoV-2/pathogenicity
13.
Cell Syst ; 12(6): 522-537, 2021 06 16.
Article in English | MEDLINE | ID: mdl-34139164

ABSTRACT

Cell biology is fundamentally limited in its ability to collect complete data on cellular phenotypes and the wide range of responses to perturbation. Areas such as computer vision and speech recognition have addressed this problem of characterizing unseen or unlabeled conditions with the combined advances of big data, deep learning, and computing resources in the past 5 years. Similarly, recent advances in machine learning approaches enabled by single-cell data start to address prediction tasks in perturbation response modeling. We first define objectives in learning perturbation response in single-cell omics; survey existing approaches, resources, and datasets (https://github.com/theislab/sc-pert); and discuss how a perturbation atlas can enable deep learning models to construct an informative perturbation latent space. We then examine future avenues toward more powerful and explainable modeling using deep neural networks, which enable the integration of disparate information sources and an understanding of heterogeneous, complex, and unseen systems.


Subject(s)
Machine Learning , Neural Networks, Computer
14.
Bioinformatics ; 36(Suppl_2): i610-i617, 2020 12 30.
Article in English | MEDLINE | ID: mdl-33381839

ABSTRACT

MOTIVATION: While generative models have shown great success in sampling high-dimensional samples conditional on low-dimensional descriptors (stroke thickness in MNIST, hair color in CelebA, speaker identity in WaveNet), their generation out-of-distribution poses fundamental problems due to the difficulty of learning compact joint distribution across conditions. The canonical example of the conditional variational autoencoder (CVAE), for instance, does not explicitly relate conditions during training and, hence, has no explicit incentive of learning such a compact representation. RESULTS: We overcome the limitation of the CVAE by matching distributions across conditions using maximum mean discrepancy in the decoder layer that follows the bottleneck. This introduces a strong regularization both for reconstructing samples within the same condition and for transforming samples across conditions, resulting in much improved generalization. As this amount to solving a style-transfer problem, we refer to the model as transfer VAE (trVAE). Benchmarking trVAE on high-dimensional image and single-cell RNA-seq, we demonstrate higher robustness and higher accuracy than existing approaches. We also show qualitatively improved predictions by tackling previously problematic minority classes and multiple conditions in the context of cellular perturbation response to treatment and disease based on high-dimensional single-cell gene expression data. For generic tasks, we improve Pearson correlations of high-dimensional estimated means and variances with their ground truths from 0.89 to 0.97 and 0.75 to 0.87, respectively. We further demonstrate that trVAE learns cell-type-specific responses after perturbation and improves the prediction of most cell-type-specific genes by 65%. AVAILABILITY AND IMPLEMENTATION: The trVAE implementation is available via github.com/theislab/trvae. The results of this article can be reproduced via github.com/theislab/trvae_reproducibility.


Subject(s)
Reproducibility of Results , RNA-Seq , Exome Sequencing
15.
Nat Methods ; 16(8): 715-721, 2019 08.
Article in English | MEDLINE | ID: mdl-31363220

ABSTRACT

Accurately modeling cellular response to perturbations is a central goal of computational biology. While such modeling has been based on statistical, mechanistic and machine learning models in specific settings, no generalization of predictions to phenomena absent from training data (out-of-sample) has yet been demonstrated. Here, we present scGen (https://github.com/theislab/scgen), a model combining variational autoencoders and latent space vector arithmetics for high-dimensional single-cell gene expression data. We show that scGen accurately models perturbation and infection response of cells across cell types, studies and species. In particular, we demonstrate that scGen learns cell-type and species-specific responses implying that it captures features that distinguish responding from non-responding genes and cells. With the upcoming availability of large-scale atlases of organs in a healthy state, we envision scGen to become a tool for experimental design through in silico screening of perturbation response in the context of disease and drug treatment.


Subject(s)
Algorithms , Computational Biology/methods , Leukocytes, Mononuclear/metabolism , Machine Learning , Phagocytes/metabolism , Single-Cell Analysis/methods , Transcriptome , Animals , Computer Simulation , Gene Expression Profiling , Humans , Leukocytes, Mononuclear/cytology , Mice , Phagocytes/cytology , Species Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...