Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
ArXiv ; 2024 Jun 04.
Article in English | MEDLINE | ID: mdl-38827453

ABSTRACT

Optimal transport (OT) and the related Wasserstein metric ( W ) are powerful and ubiquitous tools for comparing distributions. However, computing pairwise Wasserstein distances rapidly becomes intractable as cohort size grows. An attractive alternative would be to find an embedding space in which pairwise Euclidean distances map to OT distances, akin to standard multidimensional scaling (MDS). We present Wasserstein Wormhole, a transformer-based autoencoder that embeds empirical distributions into a latent space wherein Euclidean distances approximate OT distances. Extending MDS theory, we show that our objective function implies a bound on the error incurred when embedding non-Euclidean distances. Empirically, distances between Wormhole embeddings closely match Wasserstein distances, enabling linear time computation of OT distances. Along with an encoder that maps distributions to embeddings, Wasserstein Wormhole includes a decoder that maps embeddings back to distributions, allowing for operations in the embedding space to generalize to OT spaces, such as Wasserstein barycenter estimation and OT interpolation. By lending scalability and interpretability to OT approaches, Wasserstein Wormhole unlocks new avenues for data analysis in the fields of computational geometry and single-cell biology. Software is available at http://wassersteinwormhole.readthedocs.io/en/latest/.

2.
Nat Biotechnol ; 2024 Apr 02.
Article in English | MEDLINE | ID: mdl-38565973

ABSTRACT

A key challenge of analyzing data from high-resolution spatial profiling technologies is to suitably represent the features of cellular neighborhoods or niches. Here we introduce the covariance environment (COVET), a representation that leverages the gene-gene covariate structure across cells in the niche to capture the multivariate nature of cellular interactions within it. We define a principled optimal transport-based distance metric between COVET niches that scales to millions of cells. Using COVET to encode spatial context, we developed environmental variational inference (ENVI), a conditional variational autoencoder that jointly embeds spatial and single-cell RNA sequencing data into a latent space. ENVI includes two decoders: one to impute gene expression across the spatial modality and a second to project spatial information onto single-cell data. ENVI can confer spatial context to genomics data from single dissociated cells and outperforms alternatives for imputing gene expression on diverse spatial datasets.

3.
Bioinformatics ; 39(39 Suppl 1): i394-i403, 2023 06 30.
Article in English | MEDLINE | ID: mdl-37387147

ABSTRACT

MOTIVATION: Transcriptional dynamics are governed by the action of regulatory proteins and are fundamental to systems ranging from normal development to disease. RNA velocity methods for tracking phenotypic dynamics ignore information on the regulatory drivers of gene expression variability through time. RESULTS: We introduce scKINETICS (Key regulatory Interaction NETwork for Inferring Cell Speed), a dynamical model of gene expression change which is fit with the simultaneous learning of per-cell transcriptional velocities and a governing gene regulatory network. Fitting is accomplished through an expectation-maximization approach designed to learn the impact of each regulator on its target genes, leveraging biologically motivated priors from epigenetic data, gene-gene coexpression, and constraints on cells' future states imposed by the phenotypic manifold. Applying this approach to an acute pancreatitis dataset recapitulates a well-studied axis of acinar-to-ductal transdifferentiation whilst proposing novel regulators of this process, including factors with previously appreciated roles in driving pancreatic tumorigenesis. In benchmarking experiments, we show that scKINETICS successfully extends and improves existing velocity approaches to generate interpretable, mechanistic models of gene regulatory dynamics. AVAILABILITY AND IMPLEMENTATION: All python code and an accompanying Jupyter notebook with demonstrations are available at http://github.com/dpeerlab/scKINETICS.


Subject(s)
Pancreatitis , Humans , Acute Disease , Transcriptome , Gene Expression Profiling , Benchmarking
4.
Science ; 380(6645): eadd5327, 2023 05 12.
Article in English | MEDLINE | ID: mdl-37167403

ABSTRACT

The response to tumor-initiating inflammatory and genetic insults can vary among morphologically indistinguishable cells, suggesting as yet uncharacterized roles for epigenetic plasticity during early neoplasia. To investigate the origins and impact of such plasticity, we performed single-cell analyses on normal, inflamed, premalignant, and malignant tissues in autochthonous models of pancreatic cancer. We reproducibly identified heterogeneous cell states that are primed for diverse, late-emerging neoplastic fates and linked these to chromatin remodeling at cell-cell communication loci. Using an inference approach, we revealed signaling gene modules and tissue-level cross-talk, including a neoplasia-driving feedback loop between discrete epithelial and immune cell populations that was functionally validated in mice. Our results uncover a neoplasia-specific tissue-remodeling program that may be exploited for pancreatic cancer interception.


Subject(s)
Carcinogenesis , Epigenesis, Genetic , Pancreas , Pancreatic Neoplasms , Animals , Mice , Carcinogenesis/genetics , Carcinogenesis/pathology , Cell Communication , Cell Transformation, Neoplastic/genetics , Cell Transformation, Neoplastic/pathology , Pancreas/pathology , Pancreatic Neoplasms/genetics , Pancreatic Neoplasms/pathology
5.
bioRxiv ; 2023 Apr 20.
Article in English | MEDLINE | ID: mdl-37131616

ABSTRACT

The tsunami of new multiplexed spatial profiling technologies has opened a range of computational challenges focused on leveraging these powerful data for biological discovery. A key challenge underlying computation is a suitable representation for features of cellular niches. Here, we develop the covariance environment (COVET), a representation that can capture the rich, continuous multivariate nature of cellular niches by capturing the gene-gene covariate structure across cells in the niche, which can reflect the cell-cell communication between them. We define a principled optimal transport-based distance metric between COVET niches and develop a computationally efficient approximation to this metric that can scale to millions of cells. Using COVET to encode spatial context, we develop environmental variational inference (ENVI), a conditional variational autoencoder that jointly embeds spatial and single-cell RNA-seq data into a latent space. Two distinct decoders either impute gene expression across spatial modality, or project spatial information onto dissociated single-cell data. We show that ENVI is not only superior in the imputation of gene expression but is also able to infer spatial context to disassociated single-cell genomics data.

6.
Nature ; 590(7844): 67-73, 2021 02.
Article in English | MEDLINE | ID: mdl-33536657

ABSTRACT

Fundamental mathematical constants such as e and π are ubiquitous in diverse fields of science, from abstract mathematics and geometry to physics, biology and chemistry1,2. Nevertheless, for centuries new mathematical formulas relating fundamental constants have been scarce and usually discovered sporadically3-6. Such discoveries are often considered an act of mathematical ingenuity or profound intuition by great mathematicians such as Gauss and Ramanujan7. Here we propose a systematic approach that leverages algorithms to discover mathematical formulas for fundamental constants and helps to reveal the underlying structure of the constants. We call this approach 'the Ramanujan Machine'. Our algorithms find dozens of well known formulas as well as previously unknown ones, such as continued fraction representations of π, e, Catalan's constant, and values of the Riemann zeta function. Several conjectures found by our algorithms were (in retrospect) simple to prove, whereas others remain as yet unproved. We present two algorithms that proved useful in finding conjectures: a variant of the meet-in-the-middle algorithm and a gradient descent optimization algorithm tailored to the recurrent structure of continued fractions. Both algorithms are based on matching numerical values; consequently, they conjecture formulas without providing proofs or requiring prior knowledge of the underlying mathematical structure, making this methodology complementary to automated theorem proving8-13. Our approach is especially attractive when applied to discover formulas for fundamental constants for which no mathematical structure is known, because it reverses the conventional usage of sequential logic in formal proofs. Instead, our work supports a different conceptual framework for research: computer algorithms use numerical data to unveil mathematical structures, thus trying to replace the mathematical intuition of great mathematicians and providing leads to further mathematical research.

SELECTION OF CITATIONS
SEARCH DETAIL
...