Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38746464

RESUMO

The abundant discordance between evolutionary relationships across the genome has rekindled interest in ways of comparing and averaging trees on a shared leaf set. However, most attempts at reconciling trees have focused on tree topology, producing metrics for comparing topologies and methods for computing median tree topologies. Using branch lengths, however, has been more elusive, due to several challenges. Species tree branch lengths can be measured in many units, often different from gene trees. Moreover, rates of evolution change across the genome, the species tree, and specific branches of gene trees. These factors compound the stochasticity of coalescence times. Thus, branch lengths are highly heterogeneous across both the genome and the tree. For many downstream applications in phylogenomic analyses, branch lengths are as important as the topology, and yet, existing tools to compare and combine weighted trees are limited. In this paper, we make progress on the question of mapping one tree to another, incorporating both topology and branch length. We define a series of computational problems to formalize finding the best transformation of one tree to another while maintaining its topology and other constraints. We show that all these problems can be solved in quadratic time and memory using a linear algebraic formulation coupled with dynamic programming preprocessing. Our formulations lead to convex optimization problems, with efficient and theoretically optimal solutions. While many applications can be imagined for this framework, we apply it to measure species tree branch lengths in the unit of the expected number of substitutions per site while allowing divergence from ultrametricity across the tree. In these applications, our method matches or surpasses other methods designed directly for solving those problems. Thus, our approach provides a versatile toolkit that finds applications in similar evolutionary questions. Code availability: The software is available at https://github.com/shayesteh99/TCMM.git . Data availability: Data are available on Github https://github.com/shayesteh99/TCMM-Data.git .

2.
J Comput Biol ; 30(11): 1146-1181, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37902986

RESUMO

We address the problem of rooting an unrooted species tree given a set of unrooted gene trees, under the assumption that gene trees evolve within the model species tree under the multispecies coalescent (MSC) model. Quintet Rooting (QR) is a polynomial time algorithm that was recently proposed for this problem, which is based on the theory developed by Allman, Degnan, and Rhodes that proves the identifiability of rooted 5-taxon trees from unrooted gene trees under the MSC. However, although QR had good accuracy in simulations, its statistical consistency was left as an open problem. We present QR-STAR, a variant of QR with an additional step and a different cost function, and prove that it is statistically consistent under the MSC. Moreover, we derive sample complexity bounds for QR-STAR and show that a particular variant of it based on "short quintets" has polynomial sample complexity. Finally, our simulation study under a variety of model conditions shows that QR-STAR matches or improves on the accuracy of QR. QR-STAR is available in open-source form on github.


Assuntos
Algoritmos , Modelos Genéticos , Filogenia , Simulação por Computador
3.
Bioinformatics ; 39(39 Suppl 1): i185-i193, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387151

RESUMO

MOTIVATION: Branch lengths and topology of a species tree are essential in most downstream analyses, including estimation of diversification dates, characterization of selection, understanding adaptation, and comparative genomics. Modern phylogenomic analyses often use methods that account for the heterogeneity of evolutionary histories across the genome due to processes such as incomplete lineage sorting. However, these methods typically do not generate branch lengths in units that are usable by downstream applications, forcing phylogenomic analyses to resort to alternative shortcuts such as estimating branch lengths by concatenating gene alignments into a supermatrix. Yet, concatenation and other available approaches for estimating branch lengths fail to address heterogeneity across the genome. RESULTS: In this article, we derive expected values of gene tree branch lengths in substitution units under an extension of the multispecies coalescent (MSC) model that allows substitutions with varying rates across the species tree. We present CASTLES, a new technique for estimating branch lengths on the species tree from estimated gene trees that uses these expected values, and our study shows that CASTLES improves on the most accurate prior methods with respect to both speed and accuracy. AVAILABILITY AND IMPLEMENTATION: CASTLES is available at https://github.com/ytabatabaee/CASTLES.


Assuntos
Evolução Biológica , Neoplasias Epiteliais e Glandulares , Humanos , Filogenia , Movimento Celular , Genômica
4.
Bioinform Adv ; 3(1): vbad015, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36789293

RESUMO

Motivation: Genes evolve under processes such as gene duplication and loss (GDL), so that gene family trees are multi-copy, as well as incomplete lineage sorting (ILS); both processes produce gene trees that differ from the species tree. The estimation of species trees from sets of gene family trees is challenging, and the estimation of rooted species trees presents additional analytical challenges. Two of the methods developed for this problem are STRIDE, which roots species trees by considering GDL events, and Quintet Rooting (QR), which roots species trees by considering ILS. Results: We present DISCO+QR, a new approach to rooting species trees that first uses DISCO to address GDL and then uses QR to perform rooting in the presence of ILS. DISCO+QR operates by taking the input gene family trees and decomposing them into single-copy trees using DISCO and then roots the given species tree using the information in the single-copy gene trees using QR. We show that the relative accuracy of STRIDE and DISCO+QR depend on the properties of the dataset (number of species, genes, rate of gene duplication, degree of ILS and gene tree estimation error), and that each provides advantages over the other under some conditions. Availability and implementation: DISCO and QR are available in github. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

5.
Bioinformatics ; 38(Suppl 1): i109-i117, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758805

RESUMO

MOTIVATION: Rooted species trees are a basic model with multiple applications throughout biology, including understanding adaptation, biodiversity, phylogeography and co-evolution. Because most species tree estimation methods produce unrooted trees, methods for rooting these trees have been developed. However, most rooting methods either rely on prior biological knowledge or assume that evolution is close to clock-like, which is not usually the case. Furthermore, most prior rooting methods do not account for biological processes that create discordance between gene trees and species trees. RESULTS: We present Quintet Rooting (QR), a method for rooting species trees based on a proof of identifiability of the rooted species tree under the multi-species coalescent model established by Allman, Degnan and Rhodes (J. Math. Biol., 2011). We show that QR is generally more accurate than other rooting methods, except under extreme levels of gene tree estimation error. AVAILABILITY AND IMPLEMENTATION: Quintet Rooting is available in open source form at https://github.com/ytabatabaee/Quintet-Rooting. The simulated datasets used in this study are from a prior study and are available at https://www.ideals.illinois.edu/handle/2142/55319. The biological dataset used in this study is also from a prior study and is available at http://gigadb.org/dataset/101041. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Modelos Genéticos , Filogenia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...