Search | VHL Regional Portal

Fast intratumor heterogeneity inference from single-cell sequencing data.

Kizilkale, Can; Rashidi Mehrabadi, Farid; Sadeqi Azer, Erfan; Pérez-Guijarro, Eva; Marie, Kerrie L; Lee, Maxwell P; Day, Chi-Ping; Merlino, Glenn; Ergün, Funda; Buluç, Aydin; Sahinalp, S Cenk; Malikic, Salem.

Nat Comput Sci ; 2(9): 577-583, 2022 Sep.

Article in English | MEDLINE | ID: mdl-38177468

ABSTRACT

We introduce HUNTRESS, a computational method for mutational intratumor heterogeneity inference from noisy genotype matrices derived from single-cell sequencing data, the running time of which is linear with the number of cells and quadratic with the number of mutations. We prove that, under reasonable conditions, HUNTRESS computes the true progression history of a tumor with high probability. On simulated and real tumor sequencing data, HUNTRESS is demonstrated to be faster than available alternatives with comparable or better accuracy. Additionally, the progression histories of tumors inferred by HUNTRESS on real single-cell sequencing datasets agree with the best known evolution scenarios for the associated tumors.

Subject(s)

Neoplasms , Humans , Neoplasms/genetics , Sequence Analysis , Mutation

PhISCS-BnB: a fast branch and bound algorithm for the perfect tumor phylogeny reconstruction problem.

Sadeqi Azer, Erfan; Rashidi Mehrabadi, Farid; Malikic, Salem; Li, Xuan Cindy; Bartok, Osnat; Litchfield, Kevin; Levy, Ronen; Samuels, Yardena; Schäffer, Alejandro A; Gertz, E Michael; Day, Chi-Ping; Pérez-Guijarro, Eva; Marie, Kerrie; Lee, Maxwell P; Merlino, Glenn; Ergun, Funda; Sahinalp, S Cenk.

Bioinformatics ; 36(Suppl_1): i169-i176, 2020 07 01.

Article in English | MEDLINE | ID: mdl-32657358

ABSTRACT

MOTIVATION: Recent advances in single-cell sequencing (SCS) offer an unprecedented insight into tumor emergence and evolution. Principled approaches to tumor phylogeny reconstruction via SCS data are typically based on general computational methods for solving an integer linear program, or a constraint satisfaction program, which, although guaranteeing convergence to the most likely solution, are very slow. Others based on Monte Carlo Markov Chain or alternative heuristics not only offer no such guarantee, but also are not faster in practice. As a result, novel methods that can scale up to handle the size and noise characteristics of emerging SCS data are highly desirable to fully utilize this technology. RESULTS: We introduce PhISCS-BnB (phylogeny inference using SCS via branch and bound), a branch and bound algorithm to compute the most likely perfect phylogeny on an input genotype matrix extracted from an SCS dataset. PhISCS-BnB not only offers an optimality guarantee, but is also 10-100 times faster than the best available methods on simulated tumor SCS data. We also applied PhISCS-BnB on a recently published large melanoma dataset derived from the sublineages of a cell line involving 20 clones with 2367 mutations, which returned the optimal tumor phylogeny in <4 h. The resulting phylogeny agrees with and extends the published results by providing a more detailed picture on the clonal evolution of the tumor. AVAILABILITY AND IMPLEMENTATION: https://github.com/algo-cancer/PhISCS-BnB. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Neoplasms , Humans , Markov Chains , Neoplasms/genetics , Phylogeny , Sequence Analysis , Software

Identifying uniformly mutated segments within repeats.

Sahinalp, S Cenk; Eichler, Evan; Goldberg, Paul; Berenbrink, Petra; Friedetzky, Tom; Ergun, Funda.

J Bioinform Comput Biol ; 2(4): 657-68, 2004 Dec.

Article in English | MEDLINE | ID: mdl-15617159

ABSTRACT

Given a long string of characters from a constant size alphabet we present an algorithm to determine whether its characters have been generated by a single i.i.d. random source. More specifically, consider all possible n-coin models for generating a binary string S, where each bit of S is generated via an independent toss of one of the n coins in the model. The choice of which coin to toss is decided by a random walk on the set of coins where the probability of a coin change is much lower than the probability of using the same coin repeatedly. We present a procedure to evaluate the likelihood of a n-coin model for given S, subject a uniform prior distribution over the parameters of the model (that represent mutation rates and probabilities of copying events). In the absence of detailed prior knowledge of these parameters, the algorithm can be used to determine whether the a posteriori probability for n=1 is higher than for any other n>1. Our algorithm runs in time O(l4logl), where l is the length of S, through a dynamic programming approach which exploits the assumed convexity of the a posteriori probability for n. Our test can be used in the analysis of long alignments between pairs of genomic sequences in a number of ways. For example, functional regions in genome sequences exhibit much lower mutation rates than non-functional regions. Because our test provides means for determining variations in the mutation rate, it may be used to distinguish functional regions from non-functional ones. Another application is in determining whether two highly similar, thus evolutionarily related, genome segments are the result of a single copy event or of a complex series of copy events. This is particularly an issue in evolutionary studies of genome regions rich with repeat segments (especially tandemly repeated segments).

Subject(s)

Algorithms , Chromosome Mapping/methods , DNA Mutational Analysis/methods , Repetitive Sequences, Nucleic Acid/genetics , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Gene Dosage , Models, Genetic , Models, Statistical , Sequence Homology, Nucleic Acid

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL