Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 133
Filtrar
1.
Bioinformatics ; 40(Supplement_1): i218-i227, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940122

RESUMO

MOTIVATION: Eukaryotic cells contain organelles called mitochondria that have their own genome. Most cells contain thousands of mitochondria which replicate, even in nondividing cells, by means of a relatively error-prone process resulting in somatic mutations in their genome. Because of the higher mutation rate compared to the nuclear genome, mitochondrial mutations have been used to track cellular lineage, particularly using single-cell sequencing that measures mitochondrial mutations in individual cells. However, existing methods to infer the cell lineage tree from mitochondrial mutations do not model "heteroplasmy," which is the presence of multiple mitochondrial clones with distinct sets of mutations in an individual cell. Single-cell sequencing data thus provide a mixture of the mitochondrial clones in individual cells, with the ancestral relationships between these clones described by a mitochondrial clone tree. While deconvolution of somatic mutations from a mixture of evolutionarily related genomes has been extensively studied in the context of bulk sequencing of cancer tumor samples, the problem of mitochondrial deconvolution has the additional constraint that the mitochondrial clone tree must be concordant with the cell lineage tree. RESULTS: We formalize the problem of inferring a concordant pair of a mitochondrial clone tree and a cell lineage tree from single-cell sequencing data as the Nested Perfect Phylogeny Mixture (NPPM) problem. We derive a combinatorial characterization of the solutions to the NPPM problem, and formulate an algorithm, MERLIN, to solve this problem exactly using a mixed integer linear program. We show on simulated data that MERLIN outperforms existing methods that do not model mitochondrial heteroplasmy nor the concordance between the mitochondrial clone tree and the cell lineage tree. We use MERLIN to analyze single-cell whole-genome sequencing data of 5220 cells of a gastric cancer cell line and show that MERLIN infers a more biologically plausible cell lineage tree and mitochondrial clone tree compared to existing methods. AVAILABILITY AND IMPLEMENTATION: https://github.com/raphael-group/MERLIN.


Assuntos
Linhagem da Célula , Mitocôndrias , Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Linhagem da Célula/genética , Mitocôndrias/genética , Mutação , Genoma Mitocondrial , Algoritmos , Evolução Molecular
2.
Bioinformatics ; 40(Supplement_1): i481-i489, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940134

RESUMO

MOTIVATION: Cell-cell interactions (CCIs) consist of cells exchanging signals with themselves and neighboring cells by expressing ligand and receptor molecules and play a key role in cellular development, tissue homeostasis, and other critical biological functions. Since direct measurement of CCIs is challenging, multiple methods have been developed to infer CCIs by quantifying correlations between the gene expression of the ligands and receptors that mediate CCIs, originally from bulk RNA-sequencing data and more recently from single-cell or spatially resolved transcriptomics (SRT) data. SRT has a particular advantage over single-cell approaches, since ligand-receptor correlations can be computed between cells or spots that are physically close in the tissue. However, the transcript counts of individual ligands and receptors in SRT data are generally low, complicating the inference of CCIs from expression correlations. RESULTS: We introduce Copulacci, a count-based model for inferring CCIs from SRT data. Copulacci uses a Gaussian copula to model dependencies between the expression of ligands and receptors from nearby spatial locations even when the transcript counts are low. On simulated data, Copulacci outperforms existing CCI inference methods based on the standard Spearman and Pearson correlation coefficients. Using several real SRT datasets, we show that Copulacci discovers biologically meaningful ligand-receptor interactions that are lowly expressed and undiscoverable by existing CCI inference methods. AVAILABILITY AND IMPLEMENTATION: Copulacci is implemented in Python and available at https://github.com/raphael-group/copulacci.


Assuntos
Comunicação Celular , Transcriptoma , Transcriptoma/genética , Humanos , Perfilação da Expressão Gênica/métodos , Análise de Célula Única/métodos , Algoritmos , Biologia Computacional/métodos , Ligantes
3.
Bioinformatics ; 40(Supplement_1): i228-i236, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940146

RESUMO

MOTIVATION: Recently developed spatial lineage tracing technologies induce somatic mutations at specific genomic loci in a population of growing cells and then measure these mutations in the sampled cells along with the physical locations of the cells. These technologies enable high-throughput studies of developmental processes over space and time. However, these applications rely on accurate reconstruction of a spatial cell lineage tree describing both past cell divisions and cell locations. Spatial lineage trees are related to phylogeographic models that have been well-studied in the phylogenetics literature. We demonstrate that standard phylogeographic models based on Brownian motion are inadequate to describe the spatial symmetric displacement (SD) of cells during cell division. RESULTS: We introduce a new model-the SD model for cell motility that includes symmetric displacements of daughter cells from the parental cell followed by independent diffusion of daughter cells. We show that this model more accurately describes the locations of cells in a real spatial lineage tracing of mouse embryonic stem cells. Combining the spatial SD model with an evolutionary model of DNA mutations, we obtain a phylogeographic model for spatial lineage tracing. Using this model, we devise a maximum likelihood framework-MOLLUSC (Maximum Likelihood Estimation Of Lineage and Location Using Single-Cell Spatial Lineage tracing Data)-to co-estimate time-resolved branch lengths, spatial diffusion rate, and mutation rate. On both simulated and real data, we show that MOLLUSC accurately estimates all parameters. In contrast, the Brownian motion model overestimates spatial diffusion rate in all test cases. In addition, the inclusion of spatial information improves accuracy of branch length estimation compared to sequence data alone. On real data, we show that spatial information has more signal than sequence data for branch length estimation, suggesting augmenting lineage tracing technologies with spatial information is useful to overcome the limitations of genome-editing in developmental systems. AVAILABILITY AND IMPLEMENTATION: The python implementation of MOLLUSC is available at https://github.com/raphael-group/MOLLUSC.


Assuntos
Divisão Celular , Linhagem da Célula , Movimento Celular , Animais , Camundongos , Funções Verossimilhança , Filogeografia , Mutação , Filogenia
4.
Genome Biol ; 25(1): 130, 2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38773520

RESUMO

Bulk DNA sequencing of multiple samples from the same tumor is becoming common, yet most methods to infer copy-number aberrations (CNAs) from this data analyze individual samples independently. We introduce HATCHet2, an algorithm to identify haplotype- and clone-specific CNAs simultaneously from multiple bulk samples. HATCHet2 extends the earlier HATCHet method by improving identification of focal CNAs and introducing a novel statistic, the minor haplotype B-allele frequency (mhBAF), that enables identification of mirrored-subclonal CNAs. We demonstrate HATCHet2's improved accuracy using simulations and a single-cell sequencing dataset. HATCHet2 analysis of 10 prostate cancer patients reveals previously unreported mirrored-subclonal CNAs affecting cancer genes.


Assuntos
Algoritmos , Variações do Número de Cópias de DNA , Haplótipos , Neoplasias da Próstata , Humanos , Neoplasias da Próstata/genética , Masculino , Análise de Sequência de DNA/métodos , Neoplasias/genética , Frequência do Gene , Análise de Célula Única
5.
bioRxiv ; 2024 Apr 27.
Artigo em Inglês | MEDLINE | ID: mdl-38712136

RESUMO

A key challenge in cancer genomics is understanding the functional relationships and dependencies between combinations of somatic mutations that drive cancer development. Such driver mutations frequently exhibit patterns of mutual exclusivity or co-occurrence across tumors, and many methods have been developed to identify such dependency patterns from bulk DNA sequencing data of a cohort of patients. However, while mutual exclusivity and co-occurrence are described as properties of driver mutations, existing methods do not explicitly disentangle functional, driver mutations from neutral, passenger mutations. In particular, nearly all existing methods evaluate mutual exclusivity or co-occurrence at the gene level, marking a gene as mutated if any mutation - driver or passenger - is present. Since some genes have a large number of passenger mutations, existing methods either restrict their analyses to a small subset of suspected driver genes - limiting their ability to identify novel dependencies - or make spurious inferences of mutual exclusivity and co-occurrence involving genes with many passenger mutations. We introduce DIALECT, an algorithm to identify dependencies between pairs of driver mutations from somatic mutation counts. We derive a latent variable mixture model for drivers and passengers that combines existing probabilistic models of passenger mutation rates with a latent variable describing the unknown status of a mutation as a driver or passenger. We use an expectation maximization (EM) algorithm to estimate the parameters of our model, including the rates of mutually exclusivity and co-occurrence between drivers. We demonstrate that DIALECT more accurately infers mutual exclusivity and co-occurrence between driver mutations compared to existing methods on both simulated mutation data and somatic mutation data from 5 cancer types in The Cancer Genome Atlas (TCGA).

6.
bioRxiv ; 2024 Mar 10.
Artigo em Inglês | MEDLINE | ID: mdl-38496660

RESUMO

Spatially resolved transcriptomics (SRT) measures mRNA transcripts at thousands of locations within a tissue slice, revealing spatial variations in gene expression and distribution of cell types. In recent studies, SRT has been applied to tissue slices from multiple timepoints during the development of an organism. Alignment of this spatiotemporal transcriptomics data can provide insights into the gene expression programs governing the growth and differentiation of cells over space and time. We introduce DeST-OT (Developmental SpatioTemporal Optimal Transport), a method to align SRT slices from pairs of developmental timepoints using the framework of optimal transport (OT). DeST-OT uses semi-relaxed optimal transport to precisely model cellular growth, death, and differentiation processes that are not well-modeled by existing alignment methods. We demonstrate the advantage of DeST-OT on simulated slices. We further introduce two metrics to quantify the plausibility of a spatiotemporal alignment: a growth distortion metric which quantifies the discrepancy between the inferred and the true cell type growth rates, and a migration metric which quantifies the distance traveled between ancestor and descendant cells. DeST-OT outperforms existing methods on these metrics in the alignment of spatiotemporal transcriptomics data from the development of axolotl brain.

7.
bioRxiv ; 2024 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-38496496

RESUMO

Recent dynamic lineage tracing technologies combine CRISPR-based genome editing with single-cell sequencing to track cell divisions during development. A key computational problem in dynamic lineage tracing is to infer a cell lineage tree from the measured CRISPR-induced mutations. Three features of dynamic lineage tracing data distinguish this problem from standard phylogenetic tree inference. First, the CRISPR-editing process modifies a genomic location exactly once. This non-modifiable property is not well described by the time-reversible models commonly used in phylogenetics. Second, as a consequence of non-modifiability, the number of mutations per time unit decreases over time. Third, CRISPR-based genome-editing and single-cell sequencing results in high rates of both heritable and non-heritable (dropout) missing data. To model these features, we introduce the Probabilistic Mixed-type Missing (PMM) model. We describe an algorithm, LAML (Lineage Analysis via Maximum Likelihood), to search for the maximum likelihood (ML) tree under the PMM model. LAML combines an Expectation Maximization (EM) algorithm with a heuristic tree search to jointly estimate tree topology, branch lengths and missing data parameters. We derive a closed-form solution for the M-step in the case of no heritable missing data, and a block coordinate ascent approach in the general case which is more efficient than the standard General Time Reversible (GTR) phylogenetic model. On simulated data, LAML infers more accurate tree topologies and branch lengths than existing methods, with greater advantages on datasets with higher ratios of heritable to non-heritable missing data. We show that LAML provides unbiased time-scaled estimates of branch lengths. In contrast, we demonstrate that maximum parsimony methods for lineage tracing data not only underestimate branch lengths, but also yield branch lengths which are not proportional to time, due to the nonlinear decay in the number of mutations on branches further from the root. On lineage tracing data from a mouse model of lung adenocarcinoma, we show that LAML infers phylogenetic distances that are more concordant with gene expression data compared to distances derived from maximum parsimony. The LAML tree topology is more plausible than existing published trees, with fewer total cell migrations between distant metastases and fewer reseeding events where cells migrate back to the primary tumor. Crucially, we identify three distinct time epochs of metastasis progression, which includes a burst of metastasis events to various anatomical sites during a single month.

8.
Genome Biol ; 24(1): 272, 2023 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-38037115

RESUMO

A tumor contains a diverse collection of somatic mutations that reflect its past evolutionary history and that range in scale from single nucleotide variants (SNVs) to large-scale copy-number aberrations (CNAs). However, no current single-cell DNA sequencing (scDNA-seq) technology produces accurate measurements of both SNVs and CNAs, complicating the inference of tumor phylogenies. We introduce a new evolutionary model, the constrained k-Dollo model, that uses SNVs as phylogenetic markers but constrains losses of SNVs according to clusters of cells. We derive an algorithm, ConDoR, that infers phylogenies from targeted scDNA-seq data using this model. We demonstrate the advantages of ConDoR on simulated and real scDNA-seq data.


Assuntos
Neoplasias , Humanos , Animais , Filogenia , Neoplasias/genética , Mutação , Algoritmos , Análise de Sequência de DNA , Aves/genética , Variações do Número de Cópias de DNA
9.
Cell Syst ; 14(12): 1113-1121.e9, 2023 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-38128483

RESUMO

CRISPR-Cas9-based genome editing combined with single-cell sequencing enables the tracing of the history of cell divisions, or cellular lineage, in tissues and whole organisms. Although standard phylogenetic approaches may be applied to reconstruct cellular lineage trees from this data, the unique features of the CRISPR-Cas9 editing process motivate the development of specialized models that describe the evolution of CRISPR-Cas9-induced mutations. Here, we introduce the "star homoplasy" evolutionary model that constrains a phylogenetic character to mutate at most once along a lineage, capturing the "non-modifiability" property of CRISPR-Cas9 mutations. We derive a combinatorial characterization of star homoplasy phylogenies and use this characterization to develop an algorithm, "Startle", that computes a maximum parsimony star homoplasy phylogeny. We demonstrate that Startle infers more accurate phylogenies on simulated lineage tracing data compared with existing methods and finds parsimonious phylogenies with fewer metastatic migrations on lineage tracing data from mouse metastatic lung adenocarcinoma.


Assuntos
Sistemas CRISPR-Cas , Edição de Genes , Animais , Camundongos , Sistemas CRISPR-Cas/genética , Filogenia , Edição de Genes/métodos , Linhagem da Célula/genética , Mutação
10.
Nature ; 623(7986): 432-441, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37914932

RESUMO

Chromatin accessibility is essential in regulating gene expression and cellular identity, and alterations in accessibility have been implicated in driving cancer initiation, progression and metastasis1-4. Although the genetic contributions to oncogenic transitions have been investigated, epigenetic drivers remain less understood. Here we constructed a pan-cancer epigenetic and transcriptomic atlas using single-nucleus chromatin accessibility data (using single-nucleus assay for transposase-accessible chromatin) from 225 samples and matched single-cell or single-nucleus RNA-sequencing expression data from 206 samples. With over 1 million cells from each platform analysed through the enrichment of accessible chromatin regions, transcription factor motifs and regulons, we identified epigenetic drivers associated with cancer transitions. Some epigenetic drivers appeared in multiple cancers (for example, regulatory regions of ABCC1 and VEGFA; GATA6 and FOX-family motifs), whereas others were cancer specific (for example, regulatory regions of FGF19, ASAP2 and EN1, and the PBX3 motif). Among epigenetically altered pathways, TP53, hypoxia and TNF signalling were linked to cancer initiation, whereas oestrogen response, epithelial-mesenchymal transition and apical junction were tied to metastatic transition. Furthermore, we revealed a marked correlation between enhancer accessibility and gene expression and uncovered cooperation between epigenetic and genetic drivers. This atlas provides a foundation for further investigation of epigenetic dynamics in cancer transitions.


Assuntos
Epigênese Genética , Regulação Neoplásica da Expressão Gênica , Neoplasias , Humanos , Hipóxia Celular , Núcleo Celular , Cromatina/genética , Cromatina/metabolismo , Elementos Facilitadores Genéticos/genética , Epigênese Genética/genética , Transição Epitelial-Mesenquimal , Estrogênios/metabolismo , Perfilação da Expressão Gênica , Proteínas Ativadoras de GTPase/metabolismo , Metástase Neoplásica , Neoplasias/classificação , Neoplasias/genética , Neoplasias/patologia , Sequências Reguladoras de Ácido Nucleico/genética , Análise de Célula Única , Fatores de Transcrição/metabolismo
11.
PLoS Comput Biol ; 19(11): e1011590, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37943952

RESUMO

MOTIVATION: New low-coverage single-cell DNA sequencing technologies enable the measurement of copy number profiles from thousands of individual cells within tumors. From this data, one can infer the evolutionary history of the tumor by modeling transformations of the genome via copy number aberrations. Copy number aberrations alter multiple adjacent genomic loci, violating the standard phylogenetic assumption that loci evolve independently. Thus, specialized models to infer copy number phylogenies have been introduced. A widely used model is the copy number transformation (CNT) model in which a genome is represented by an integer vector and a copy number aberration is an event that either increases or decreases the number of copies of a contiguous segment of the genome. The CNT distance between a pair of copy number profiles is the minimum number of events required to transform one profile to another. While this distance can be computed efficiently, no efficient algorithm has been developed to find the most parsimonious phylogeny under the CNT model. RESULTS: We introduce the zero-agnostic copy number transformation (ZCNT) model, a simplification of the CNT model that allows the amplification or deletion of regions with zero copies. We derive a closed form expression for the ZCNT distance between two copy number profiles and show that, unlike the CNT distance, the ZCNT distance forms a metric. We leverage the closed-form expression for the ZCNT distance and an alternative characterization of copy number profiles to derive polynomial time algorithms for two natural relaxations of the small parsimony problem on copy number profiles. While the alteration of zero copy number regions allowed under the ZCNT model is not biologically realistic, we show on both simulated and real datasets that the ZCNT distance is a close approximation to the CNT distance. Extending our polynomial time algorithm for the ZCNT small parsimony problem, we develop an algorithm, Lazac, for solving the large parsimony problem on copy number profiles. We demonstrate that Lazac outperforms existing methods for inferring copy number phylogenies on both simulated and real data.


Assuntos
Variações do Número de Cópias de DNA , Neoplasias , Humanos , Filogenia , Variações do Número de Cópias de DNA/genética , Neoplasias/genética , Genômica/métodos , Genoma , Algoritmos
12.
bioRxiv ; 2023 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-37873258

RESUMO

Spatially resolved transcriptomics technologies provide high-throughput measurements of gene expression in a tissue slice, but the sparsity of this data complicates the analysis of spatial gene expression patterns such as gene expression gradients. We address these issues by deriving a topographic map of a tissue slice-analogous to a map of elevation in a landscape-using a novel quantity called the isodepth. Contours of constant isodepth enclose spatial domains with distinct cell type composition, while gradients of the isodepth indicate spatial directions of maximum change in gene expression. We develop GASTON, an unsupervised and interpretable deep learning algorithm that simultaneously learns the isodepth, spatial gene expression gradients, and piecewise linear functions of the isodepth that model both continuous gradients and discontinuous spatial variation in the expression of individual genes. We validate GASTON by showing that it accurately identifies spatial domains and marker genes across several biological systems. In SRT data from the brain, GASTON reveals gradients of neuronal differentiation and firing, and in SRT data from a tumor sample, GASTON infers gradients of metabolic activity and epithelial-mesenchymal transition (EMT)-related gene expression in the tumor microenvironment.

13.
NAR Cancer ; 5(3): zcad045, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37636316

RESUMO

Androgen receptor (AR) inhibition is standard of care for advanced prostate cancer (PC). However, efficacy is limited by progression to castration-resistant PC (CRPC), usually due to AR re-activation via mechanisms that include AR amplification and structural rearrangement. These two classes of AR alterations often co-occur in CRPC tumors, but it is unclear whether this reflects intercellular or intracellular heterogeneity of AR. Resolving this is important for developing new therapies and predictive biomarkers. Here, we analyzed 41 CRPC tumors and 6 patient-derived xenografts (PDXs) using linked-read DNA-sequencing, and identified 7 tumors that developed complex, multiply-rearranged AR gene structures in conjunction with very high AR copy number. Analysis of PDX models by optical genome mapping and fluorescence in situ hybridization showed that AR residing on extrachromosomal DNA (ecDNA) was an underlying mechanism, and was associated with elevated levels and diversity of AR expression. This study identifies co-evolution of AR gene copy number and structural complexity via ecDNA as a mechanism associated with endocrine therapy resistance.

14.
Genome Res ; 33(7): 1124-1132, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37553263

RESUMO

Spatially resolved transcriptomics (SRT) technologies measure messenger RNA (mRNA) expression at thousands of locations in a tissue slice. However, nearly all SRT technologies measure expression in two-dimensional (2D) slices extracted from a 3D tissue, thus losing information that is shared across multiple slices from the same tissue. Integrating SRT data across multiple slices can help recover this information and improve downstream expression analyses, but multislice alignment and integration remains a challenging task. Existing methods for integrating SRT data either do not use spatial information or assume that the morphology of the tissue is largely preserved across slices, an assumption that is often violated because of biological or technical reasons. We introduce PASTE2, a method for partial alignment and 3D reconstruction of multislice SRT data sets, allowing only partial overlap between aligned slices and/or slice-specific cell types. PASTE2 formulates a novel partial fused Gromov-Wasserstein optimal transport problem, which we solve using a conditional gradient algorithm. PASTE2 includes a model selection procedure to estimate the fraction of overlap between slices, and optionally uses information from histological images that accompany some SRT experiments. We show on both simulated and real data that PASTE2 obtains more accurate alignments than existing methods. We further use PASTE2 to reconstruct a 3D map of gene expression in a Drosophila embryo from a 16 slice Stereo-seq data set. PASTE2 produces accurate alignments of multislice data sets from multiple SRT technologies, enabling detailed studies of spatial gene expression across a wide range of biological applications.


Assuntos
Algoritmos , Transcriptoma
15.
bioRxiv ; 2023 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-37502835

RESUMO

Multi-region DNA sequencing of primary tumors and metastases from individual patients helps identify somatic aberrations driving cancer development. However, most methods to infer copy-number aberrations (CNAs) analyze individual samples. We introduce HATCHet2 to identify haplotype- and clone-specific CNAs simultaneously from multiple bulk samples. HATCHet2 introduces a novel statistic, the mirrored haplotype B-allele frequency (mhBAF), to identify mirrored-subclonal CNAs having different numbers of copies of parental haplotypes in different tumor clones. HATCHet2 also has high accuracy in identifying focal CNAs and extends the earlier HATCHet method in several directions. We demonstrate HATCHet2's improved accuracy using simulations and a single-cell sequencing dataset. HATCHet2 analysis of 50 prostate cancer samples from 10 patients reveals previously-unreported mirrored-subclonal CNAs affecting cancer genes.

16.
Cancer Res Commun ; 3(4): 564-575, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-37066022

RESUMO

Osteosarcoma is an aggressive malignancy characterized by high genomic complexity. Identification of few recurrent mutations in protein coding genes suggests that somatic copy-number aberrations (SCNA) are the genetic drivers of disease. Models around genomic instability conflict-it is unclear whether osteosarcomas result from pervasive ongoing clonal evolution with continuous optimization of the fitness landscape or an early catastrophic event followed by stable maintenance of an abnormal genome. We address this question by investigating SCNAs in >12,000 tumor cells obtained from human osteosarcomas using single-cell DNA sequencing, with a degree of precision and accuracy not possible when inferring single-cell states using bulk sequencing. Using the CHISEL algorithm, we inferred allele- and haplotype-specific SCNAs from this whole-genome single-cell DNA sequencing data. Surprisingly, despite extensive structural complexity, these tumors exhibit a high degree of cell-cell homogeneity with little subclonal diversification. Longitudinal analysis of patient samples obtained at distant therapeutic timepoints (diagnosis, relapse) demonstrated remarkable conservation of SCNA profiles over tumor evolution. Phylogenetic analysis suggests that the majority of SCNAs were acquired early in the oncogenic process, with relatively few structure-altering events arising in response to therapy or during adaptation to growth in metastatic tissues. These data further support the emerging hypothesis that early catastrophic events, rather than sustained genomic instability, give rise to structural complexity, which is then preserved over long periods of tumor developmental time. Significance: Chromosomally complex tumors are often described as genomically unstable. However, determining whether complexity arises from remote time-limited events that give rise to structural alterations or a progressive accumulation of structural events in persistently unstable tumors has implications for diagnosis, biomarker assessment, mechanisms of treatment resistance, and represents a conceptual advance in our understanding of intratumoral heterogeneity and tumor evolution.


Assuntos
Neoplasias Ósseas , Osteossarcoma , Humanos , Filogenia , Variações do Número de Cópias de DNA/genética , Recidiva Local de Neoplasia , Osteossarcoma/genética , Instabilidade Genômica/genética , Neoplasias Ósseas/genética
17.
bioRxiv ; 2023 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-37090633

RESUMO

Motivation: New low-coverage single-cell DNA sequencing technologies enable the measurement of copy number profiles from thousands of individual cells within tumors. From this data, one can infer the evolutionary history of the tumor by modeling transformations of the genome via copy number aberrations. A widely used model to infer such copy number phylogenies is the copy number transformation (CNT) model in which a genome is represented by an integer vector and a copy number aberration is an event that either increases or decreases the number of copies of a contiguous segment of the genome. The CNT distance between a pair of copy number profiles is the minimum number of events required to transform one profile to another. While this distance can be computed efficiently, no efficient algorithm has been developed to find the most parsimonious phylogeny under the CNT model. Results: We introduce the zero-agnostic copy number transformation (ZCNT) model, a simplification of the CNT model that allows the amplification or deletion of regions with zero copies. We derive a closed form expression for the ZCNT distance between two copy number profiles and show that, unlike the CNT distance, the ZCNT distance forms a metric. We leverage the closed-form expression for the ZCNT distance and an alternative characterization of copy number profiles to derive polynomial time algorithms for two natural relaxations of the small parsimony problem on copy number profiles. While the alteration of zero copy number regions allowed under the ZCNT model is not biologically realistic, we show on both simulated and real datasets that the ZCNT distance is a close approximation to the CNT distance. Extending our polynomial time algorithm for the ZCNT small parsimony problem, we develop an algorithm, Lazac, for solving the large parsimony problem on copy number profiles. We demonstrate that Lazac outperforms existing methods for inferring copy number phylogenies on both simulated and real data.

18.
Nature ; 616(7955): 113-122, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36922587

RESUMO

Emerging spatial technologies, including spatial transcriptomics and spatial epigenomics, are becoming powerful tools for profiling of cellular states in the tissue context1-5. However, current methods capture only one layer of omics information at a time, precluding the possibility of examining the mechanistic relationship across the central dogma of molecular biology. Here, we present two technologies for spatially resolved, genome-wide, joint profiling of the epigenome and transcriptome by cosequencing chromatin accessibility and gene expression, or histone modifications (H3K27me3, H3K27ac or H3K4me3) and gene expression on the same tissue section at near-single-cell resolution. These were applied to embryonic and juvenile mouse brain, as well as adult human brain, to map how epigenetic mechanisms control transcriptional phenotype and cell dynamics in tissue. Although highly concordant tissue features were identified by either spatial epigenome or spatial transcriptome we also observed distinct patterns, suggesting their differential roles in defining cell states. Linking epigenome to transcriptome pixel by pixel allows the uncovering of new insights in spatial epigenetic priming, differentiation and gene regulation within the tissue architecture. These technologies are of great interest in life science and biomedical research.


Assuntos
Cromatina , Epigenoma , Mamíferos , Transcriptoma , Animais , Humanos , Camundongos , Cromatina/genética , Cromatina/metabolismo , Epigênese Genética , Epigenômica , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Mamíferos/genética , Histonas/química , Histonas/metabolismo , Análise de Célula Única , Especificidade de Órgãos , Encéfalo/embriologia , Encéfalo/metabolismo , Envelhecimento/genética
19.
bioRxiv ; 2023 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-36711528

RESUMO

Tumors consist of subpopulations of cells that harbor distinct collections of somatic mutations. These mutations range in scale from single nucleotide variants (SNVs) to large-scale copy-number aberrations (CNAs). While many approaches infer tumor phylogenies using SNVs as phylogenetic markers, CNAs that overlap SNVs may lead to erroneous phylogenetic inference. Specifically, an SNV may be lost in a cell due to a deletion of the genomic segment containing the SNV. Unfortunately, no current single-cell DNA sequencing (scDNA-seq) technology produces accurate measurements of both SNVs and CNAs. For instance, recent targeted scDNA-seq technologies, such as Mission Bio Tapestri, measure SNVs with high fidelity in individual cells, but yield much less reliable measurements of CNAs. We introduce a new evolutionary model, the constrained k-Dollo model, that uses SNVs as phylogenetic markers and partial information about CNAs in the form of clustering of cells with similar copy-number profiles. This copy-number clustering constrains where loss of SNVs can occur in the phylogeny. We develop ConDoR (Constrained Dollo Reconstruction), an algorithm to infer tumor phylogenies from targeted scDNA-seq data using the constrained k-Dollo model. We show that ConDoR outperforms existing methods on simulated data. We use ConDoR to analyze a new multi-region targeted scDNA-seq dataset of 2153 cells from a pancreatic ductal adenocarcinoma (PDAC) tumor and produce a more plausible phylogeny compared to existing methods that conforms to histological results for the tumor from a previous study. We also analyze a metastatic colorectal cancer dataset, deriving a more parsimonious phylogeny than previously published analyses and with a simpler monoclonal origin of metastasis compared to the original study. Code availability: Software is available at https://github.com/raphael-group/constrained-Dollo.

20.
bioRxiv ; 2023 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-36711750

RESUMO

Spatially resolved transcriptomics (SRT) technologies measure mRNA expression at thousands of locations in a tissue slice. However, nearly all SRT technologies measure expression in two dimensional slices extracted from a three-dimensional tissue, thus losing information that is shared across multiple slices from the same tissue. Integrating SRT data across multiple slices can help recover this information and improve downstream expression analyses, but multi-slice alignment and integration remains a challenging task. Existing methods for integrating SRT data either do not use spatial information or assume that the morphology of the tissue is largely preserved across slices, an assumption that is often violated due to biological or technical reasons. We introduce PASTE2, a method for partial alignment and 3D reconstruction of multi-slice SRT datasets, allowing only partial overlap between aligned slices and/or slice-specific cell types. PASTE2 formulates a novel partial Fused Gromov-Wasserstein Optimal Transport problem, which we solve using a conditional gradient algorithm. PASTE2 includes a model selection procedure to estimate the fraction of overlap between slices, and optionally uses information from histological images that accompany some SRT experiments. We show on both simulated and real data that PASTE2 obtains more accurate alignments than existing methods. We further use PASTE2 to reconstruct a 3D map of gene expression in a Drosophila embryo from a 16 slice Stereo-seq dataset. PASTE2 produces accurate alignments of multi-slice datasets from multiple SRT technologies, enabling detailed studies of spatial gene expression across a wide range of biological applications. Code availability: Software is available at https://github.com/raphael-group/paste2.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...