Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
Add more filters










Publication year range
1.
Bioinformatics ; 39(39 Suppl 1): i204-i212, 2023 06 30.
Article in English | MEDLINE | ID: mdl-37387177

ABSTRACT

MOTIVATION: The acquisition of somatic mutations by a tumor can be modeled by a type of evolutionary tree. However, it is impossible to observe this tree directly. Instead, numerous algorithms have been developed to infer such a tree from different types of sequencing data. But such methods can produce conflicting trees for the same patient, making it desirable to have approaches that can combine several such tumor trees into a consensus or summary tree. We introduce The Weighted m-Tumor Tree Consensus Problem (W-m-TTCP) to find a consensus tree among multiple plausible tumor evolutionary histories, each assigned a confidence weight, given a specific distance measure between tumor trees. We present an algorithm called TuELiP that is based on integer linear programming which solves the W-m-TTCP, and unlike other existing consensus methods, allows the input trees to be weighted differently. RESULTS: On simulated data we show that TuELiP outperforms two existing methods at correctly identifying the true underlying tree used to create the simulations. We also show that the incorporation of weights can lead to more accurate tree inference. On a Triple-Negative Breast Cancer dataset, we show that including confidence weights can have important impacts on the consensus tree identified. AVAILABILITY: An implementation of TuELiP and simulated datasets are available at https://bitbucket.org/oesperlab/consensus-ilp/src/main/.


Subject(s)
Algorithms , Triple Negative Breast Neoplasms , Humans , Consensus , Biological Evolution , Programming, Linear
2.
Article in English | MEDLINE | ID: mdl-33031032

ABSTRACT

We consider the problem of finding a consensus tumor evolution tree from a set of conflicting input trees. In contrast to traditional phylogenetic trees, the tumor trees we consider do not have the same set of labels applied to the leaves of each tree. We describe several distance measures between these tumor trees. Our GraPhyC algorithm solves the consensus problem using a weighted directed graph where vertices are sets of mutations and edges are weighted based on the number of times a parental relationship is observed between their constituent mutations in the input trees. We find a minimum weight spanning arborescence in this graph and prove that it minimizes the total distance to all input trees for one of our distance measures. We also describe several extensions of our GraPhyC approach. On simulated data we show that GraPhyC outperforms a baseline method and demonstrate that GraPhyC can be an effective means of computing centroids in k-medians clustering. We analyze two real sequencing datasets and find that GraPhyC is able to identify a tree not included in the set of input trees, but that contains characteristics supported by other reported evolutionary reconstructions of this tumor.


Subject(s)
Algorithms , Neoplasms , Cluster Analysis , Consensus , Humans , Neoplasms/genetics , Phylogeny
3.
Pac Symp Biocomput ; 27: 397-401, 2022.
Article in English | MEDLINE | ID: mdl-34890166

ABSTRACT

Cancer results from an evolutionary process that yields a heterogeneous tumor with distinct subpopulations and varying sets of somatic mutations. This perspective discusses computational methods to infer models of evolutionary processes in cancer that aim to improve our understanding of tumorigenesis and ultimately enhance current clinical practice.


Subject(s)
Computational Biology , Neoplasms , Humans , Mutation , Neoplasms/genetics
4.
Bioinformatics ; 36(7): 2090-2097, 2020 04 01.
Article in English | MEDLINE | ID: mdl-31750900

ABSTRACT

MOTIVATION: There has been recent increased interest in using algorithmic methods to infer the evolutionary tree underlying the developmental history of a tumor. Quantitative measures that compare such trees are vital to a number of different applications including benchmarking tree inference methods and evaluating common inheritance patterns across patients. However, few appropriate distance measures exist, and those that do have low resolution for differentiating trees or do not fully account for the complex relationship between tree topology and the inheritance of the mutations labeling that topology. RESULTS: Here, we present two novel distance measures, Common Ancestor Set distance (CASet) and Distinctly Inherited Set Comparison distance (DISC), that are specifically designed to account for the subclonal mutation inheritance patterns characteristic of tumor evolutionary trees. We apply CASet and DISC to multiple simulated datasets and two breast cancer datasets and show that our distance measures allow for more nuanced and accurate delineation between tumor evolutionary trees than existing distance measures. AVAILABILITY AND IMPLEMENTATION: Implementations of CASet and DISC are freely available at: https://bitbucket.org/oesperlab/stereodist. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Neoplasms , Biological Evolution , Humans , Phylogeny
5.
BMC Med Genomics ; 12(Suppl 10): 184, 2019 12 23.
Article in English | MEDLINE | ID: mdl-31865909

ABSTRACT

BACKGROUND: Accurate inference of the evolutionary history of a tumor has important implications for understanding and potentially treating the disease. While a number of methods have been proposed to reconstruct the evolutionary history of a tumor from DNA sequencing data, it is not clear how aspects of the sequencing data and tumor itself affect these reconstructions. METHODS: We investigate when and how well these histories can be reconstructed from multi-sample bulk sequencing data when considering only single nucleotide variants (SNVs). Specifically, we examine the space of all possible tumor phylogenies under the infinite sites assumption (ISA) using several approaches for enumerating phylogenies consistent with the sequencing data. RESULTS: On noisy simulated data, we find that the ISA is often violated and that low coverage and high noise make it more difficult to identify phylogenies. Additionally, we find that evolutionary trees with branching topologies are easier to reconstruct accurately. We also apply our reconstruction methods to both chronic lymphocytic leukemia and clear cell renal cell carcinoma datasets and confirm that ISA violations are common in practice, especially in lower-coverage sequencing data. Nonetheless, we show that an ISA-based approach can be relaxed to produce high-quality phylogenies. CONCLUSIONS: Consideration of practical aspects of sequencing data such as coverage or the model of tumor evolution (branching, linear, etc.) is essential to effectively using the output of tumor phylogeny inference methods. Additionally, these factors should be considered in the development of new inference methods.


Subject(s)
Computational Biology/methods , Evolution, Molecular , Neoplasms/genetics , Phylogeny , Gene Frequency , Polymorphism, Single Nucleotide , Sequence Analysis, DNA
6.
Bioinformatics ; 34(2): 346-352, 2018 Jan 15.
Article in English | MEDLINE | ID: mdl-29186385

ABSTRACT

MOTIVATION: The traditional view of cancer evolution states that a cancer genome accumulates a sequential ordering of mutations over a long period of time. However, in recent years it has been suggested that a cancer genome may instead undergo a one-time catastrophic event, such as chromothripsis, where a large number of mutations instead occur simultaneously. A number of potential signatures of chromothripsis have been proposed. In this work, we provide a rigorous formulation and analysis of the 'ability to walk the derivative chromosome' signature originally proposed by Korbel and Campbell. In particular, we show that this signature, as originally envisioned, may not always be present in a chromothripsis genome and we provide a precise quantification of under what circumstances it would be present. We also propose a variation on this signature, the H/T alternating fraction, which allows us to overcome some of the limitations of the original signature. RESULTS: We apply our measure to both simulated data and a previously analyzed real cancer dataset and find that the H/T alternating fraction may provide useful signal for distinguishing genomes having acquired mutations simultaneously from those acquired in a sequential fashion. AVAILABILITY AND IMPLEMENTATION: An implementation of the H/T alternating fraction is available at https://bitbucket.org/oesperlab/ht-altfrac. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

7.
Cell Syst ; 3(1): 43-53, 2016 07.
Article in English | MEDLINE | ID: mdl-27467246

ABSTRACT

Phylogenetic techniques are increasingly applied to infer the somatic mutational history of a tumor from DNA sequencing data. However, standard phylogenetic tree reconstruction techniques do not account for the fact that bulk sequencing data measures mutations in a population of cells. We formulate and solve the multi-state perfect phylogeny mixture deconvolution problem of reconstructing a phylogenetic tree given mixtures of its leaves, under the multi-state perfect phylogeny, or infinite alleles model. Our somatic phylogeny reconstruction using combinatorial enumeration (SPRUCE) algorithm uses this model to construct phylogenetic trees jointly from single-nucleotide variants (SNVs) and copy-number aberrations (CNAs). We show that SPRUCE addresses complexities in simultaneous analysis of SNVs and CNAs. In particular, there are often many possible phylogenetic trees consistent with the data, but the ambiguity decreases considerably with an increasing number of samples. These findings have implications for tumor sequencing strategies, suggest caution in drawing strong conclusions based on a single tree reconstruction, and explain difficulties faced by applying existing phylogenetic techniques to tumor sequencing data.


Subject(s)
Mutation , Neoplasms , Algorithms , Humans , Phylogeny , Sequence Analysis, DNA
8.
Bioinformatics ; 31(12): i62-70, 2015 Jun 15.
Article in English | MEDLINE | ID: mdl-26072510

ABSTRACT

MOTIVATION: DNA sequencing of multiple samples from the same tumor provides data to analyze the process of clonal evolution in the population of cells that give rise to a tumor. RESULTS: We formalize the problem of reconstructing the clonal evolution of a tumor using single-nucleotide mutations as the variant allele frequency (VAF) factorization problem. We derive a combinatorial characterization of the solutions to this problem and show that the problem is NP-complete. We derive an integer linear programming solution to the VAF factorization problem in the case of error-free data and extend this solution to real data with a probabilistic model for errors. The resulting AncesTree algorithm is better able to identify ancestral relationships between individual mutations than existing approaches, particularly in ultra-deep sequencing data when high read counts for mutations yield high confidence VAFs. AVAILABILITY AND IMPLEMENTATION: An implementation of AncesTree is available at: http://compbio.cs.brown.edu/software.


Subject(s)
Algorithms , Clonal Evolution/genetics , High-Throughput Nucleotide Sequencing/methods , Mutation/genetics , Neoplasms/classification , Neoplasms/genetics , Sequence Analysis, DNA/methods , Gene Frequency , Humans , Models, Statistical
9.
Bioinformatics ; 30(24): 3532-40, 2014 Dec 15.
Article in English | MEDLINE | ID: mdl-25297070

ABSTRACT

MOTIVATION: Most tumor samples are a heterogeneous mixture of cells, including admixture by normal (non-cancerous) cells and subpopulations of cancerous cells with different complements of somatic aberrations. This intra-tumor heterogeneity complicates the analysis of somatic aberrations in DNA sequencing data from tumor samples. RESULTS: We describe an algorithm called THetA2 that infers the composition of a tumor sample-including not only tumor purity but also the number and content of tumor subpopulations-directly from both whole-genome (WGS) and whole-exome (WXS) high-throughput DNA sequencing data. This algorithm builds on our earlier Tumor Heterogeneity Analysis (THetA) algorithm in several important directions. These include improved ability to analyze highly rearranged genomes using a variety of data types: both WGS sequencing (including low ∼7× coverage) and WXS sequencing. We apply our improved THetA2 algorithm to WGS (including low-pass) and WXS sequence data from 18 samples from The Cancer Genome Atlas (TCGA). We find that the improved algorithm is substantially faster and identifies numerous tumor samples containing subclonal populations in the TCGA data, including in one highly rearranged sample for which other tumor purity estimation algorithms were unable to estimate tumor purity.


Subject(s)
Algorithms , Exome , High-Throughput Nucleotide Sequencing/methods , Neoplasms/genetics , Sequence Analysis, DNA/methods , Breast Neoplasms/genetics , Female , Gene Frequency , Genomics , Humans , Lung Neoplasms/genetics , Models, Statistical
10.
Genome Med ; 6(1): 5, 2014.
Article in English | MEDLINE | ID: mdl-24479672

ABSTRACT

High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high-throughput DNA sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on protein sequence or structure. Finally, we review techniques to identify recurrent combinations of somatic mutations, including approaches that examine mutations in known pathways or protein-interaction networks, as well as de novo approaches that identify combinations of mutations according to statistical patterns of mutual exclusivity. These techniques, coupled with advances in high-throughput DNA sequencing, are enabling precision medicine approaches to the diagnosis and treatment of cancer.

11.
BMC Genomics ; 15 Suppl 6: S4, 2014.
Article in English | MEDLINE | ID: mdl-25572114

ABSTRACT

BACKGROUND: The evolution of a cancer genome has traditionally been described as a sequential accumulation of mutations - including chromosomal rearrangements - over a period of time. Recent research suggests, however, that numerous rearrangements may be acquired simultaneously during a single cataclysmic event, leading to the proposal of new mechanisms of rearrangement such as chromothripsis and chromoplexy. RESULTS: We introduce two measures, open adjacency rate (OAR) and copy-number asymmetry enrichment (CAE), that assess the prevalence of simultaneously formed breakpoints, or k-breaks with k >2, compared to the sequential accumulation of standard rearrangements, or 2-breaks. We apply the OAR and the CAE to genome sequencing data from 121 cancer genomes from two different studies. CONCLUSIONS: We find that the OAR and CAE correlate well with previous analyses of chromothripsis/chromoplexy but make differing predictions on a small subset of genomes. These results lend support to the existence of simultaneous rearrangements, but also demonstrate the difficulty of characterizing such rearrangements using different criterion.


Subject(s)
Chromosome Breakpoints , Genome , Models, Genetic , Neoplasms/genetics , Translocation, Genetic , Algorithms , Animals , Humans
12.
Genome Biol ; 14(7): R80, 2013 Jul 29.
Article in English | MEDLINE | ID: mdl-23895164

ABSTRACT

Tumor samples are typically heterogeneous, containing admixture by normal, non-cancerous cells and one or more subpopulations of cancerous cells. Whole-genome sequencing of a tumor sample yields reads from this mixture, but does not directly reveal the cell of origin for each read. We introduce THetA (Tumor Heterogeneity Analysis), an algorithm that infers the most likely collection of genomes and their proportions in a sample, for the case where copy number aberrations distinguish subpopulations. THetA successfully estimates normal admixture and recovers clonal and subclonal copy number aberrations in real and simulated sequencing data. THetA is available at http://compbio.cs.brown.edu/software/.


Subject(s)
Breast Neoplasms/genetics , Genetic Heterogeneity , High-Throughput Nucleotide Sequencing/methods , Software , Algorithms , Computer Simulation , Female , Genome, Human/genetics , Humans , Likelihood Functions , Statistics as Topic
13.
BMC Bioinformatics ; 13 Suppl 6: S10, 2012 Apr 19.
Article in English | MEDLINE | ID: mdl-22537039

ABSTRACT

BACKGROUND: A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for reconstructing the block organization of a cancer genome from paired-end DNA sequencing data. RESULTS: By aligning paired reads from a cancer genome - and a matched germline genome, if available - to the human reference genome, we derive: (i) a partition of the reference genome into intervals; (ii) adjacencies between these intervals in the cancer genome; (iii) an estimated copy number for each interval. We formulate the Copy Number and Adjacency Genome Reconstruction Problem of determining the cancer genome as a sequence of the derived intervals that is consistent with the measured adjacencies and copy numbers. We design an efficient algorithm, called Paired-end Reconstruction of Genome Organization (PREGO), to solve this problem by reducing it to an optimization problem on an interval-adjacency graph constructed from the data. The solution to the optimization problem results in an Eulerian graph, containing an alternating Eulerian tour that corresponds to a cancer genome that is consistent with the sequencing data. We apply our algorithm to five ovarian cancer genomes that were sequenced as part of The Cancer Genome Atlas. We identify numerous rearrangements, or structural variants, in these genomes, analyze reciprocal vs. non-reciprocal rearrangements, and identify rearrangements consistent with known mechanisms of duplication such as tandem duplications and breakage/fusion/bridge (B/F/B) cycles. CONCLUSIONS: We demonstrate that PREGO efficiently identifies complex and biologically relevant rearrangements in cancer genome sequencing data. An implementation of the PREGO algorithm is available at http://compbio.cs.brown.edu/software/.


Subject(s)
Algorithms , Genome, Human , Mutation , Ovarian Neoplasms/genetics , Chromosome Aberrations , DNA Copy Number Variations , Female , Humans , Sequence Analysis, DNA/methods
14.
Source Code Biol Med ; 6: 7, 2011 Apr 07.
Article in English | MEDLINE | ID: mdl-21473782

ABSTRACT

BACKGROUND: When biological networks are studied, it is common to look for clusters, i.e. sets of nodes that are highly inter-connected. To understand the biological meaning of a cluster, the user usually has to sift through many textual annotations that are associated with biological entities. FINDINGS: The WordCloud Cytoscape plugin generates a visual summary of these annotations by displaying them as a tag cloud, where more frequent words are displayed using a larger font size. Word co-occurrence in a phrase can be visualized by arranging words in clusters or as a network. CONCLUSIONS: WordCloud provides a concise visual summary of annotations which is helpful for network analysis and interpretation. WordCloud is freely available at http://baderlab.org/Software/WordCloudPlugin.

SELECTION OF CITATIONS
SEARCH DETAIL
...