Pesquisa | Portal Regional da BVS

1.

Brauer and partition diagram models for phylogenetic trees and forests.

Francis, Andrew; Jarvis, Peter D.

Proc Math Phys Eng Sci ; 478(2262): 20220044, 2022 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-35702594

RESUMO

We introduce a correspondence between phylogenetic trees and Brauer diagrams, inspired by links between binary trees and matchings described by Diaconis and Holmes (1998 Proc. Natl Acad. Sci. USA 95, 14 600-14 602. (doi:10.1073/pnas.95.25.14600)). This correspondence gives rise to a range of semigroup structures on the set of phylogenetic trees, and opens the prospect of many applications. We furthermore extend the Diaconis-Holmes correspondence from binary trees to non-binary trees and to forests, showing for instance that the set of all forests is in bijection with the set of partitions of finite sets.

2.

Correction to: Matrix group structure and Markov invariants in the strand symmetric phylogenetic substitution model.

Jarvis, Peter D; Sumner, Jeremy G.

J Math Biol ; 82(7): 68, 2021 Jun 08.

Artigo em Inglês | MEDLINE | ID: mdl-34101022

3.

Developing a statistically powerful measure for quartet tree inference using phylogenetic identities and Markov invariants.

Sumner, Jeremy G; Taylor, Amelia; Holland, Barbara R; Jarvis, Peter D.

J Math Biol ; 75(6-7): 1619-1654, 2017 12.

Artigo em Inglês | MEDLINE | ID: mdl-28434023

RESUMO

Recently there has been renewed interest in phylogenetic inference methods based on phylogenetic invariants, alongside the related Markov invariants. Broadly speaking, both these approaches give rise to polynomial functions of sequence site patterns that, in expectation value, either vanish for particular evolutionary trees (in the case of phylogenetic invariants) or have well understood transformation properties (in the case of Markov invariants). While both approaches have been valued for their intrinsic mathematical interest, it is not clear how they relate to each other, and to what extent they can be used as practical tools for inference of phylogenetic trees. In this paper, by focusing on the special case of binary sequence data and quartets of taxa, we are able to view these two different polynomial-based approaches within a common framework. To motivate the discussion, we present three desirable statistical properties that we argue any invariant-based phylogenetic method should satisfy: (1) sensible behaviour under reordering of input sequences; (2) stability as the taxa evolve independently according to a Markov process; and (3) explicit dependence on the assumption of a continuous-time process. Motivated by these statistical properties, we develop and explore several new phylogenetic inference methods. In particular, we develop a statistically bias-corrected version of the Markov invariants approach which satisfies all three properties. We also extend previous work by showing that the phylogenetic invariants can be implemented in such a way as to satisfy property (3). A simulation study shows that, in comparison to other methods, our new proposed approach based on bias-corrected Markov invariants is extremely powerful for phylogenetic inference. The binary case is of particular theoretical interest as-in this case only-the Markov invariants can be expressed as linear combinations of the phylogenetic invariants. A wider implication of this is that, for models with more than two states-for example DNA sequence alignments with four-state models-we find that methods which rely on phylogenetic invariants are incapable of satisfying all three of the stated statistical properties. This is because in these cases the relevant Markov invariants belong to a class of polynomials independent from the phylogenetic invariants.

Assuntos

Filogenia , Bioestatística/métodos , Simulação por Computador , DNA/genética , Evolução Molecular , Cadeias de Markov , Conceitos Matemáticos , Modelos Genéticos , Alinhamento de Sequência

4.

Maximum likelihood estimates of pairwise rearrangement distances.

Serdoz, Stuart; Egri-Nagy, Attila; Sumner, Jeremy; Holland, Barbara R; Jarvis, Peter D; Tanaka, Mark M; Francis, Andrew R.

J Theor Biol ; 423: 31-40, 2017 06 21.

Artigo em Inglês | MEDLINE | ID: mdl-28435014

RESUMO

Accurate estimation of evolutionary distances between taxa is important for many phylogenetic reconstruction methods. Distances can be estimated using a range of different evolutionary models, from single nucleotide polymorphisms to large-scale genome rearrangements. Corresponding corrections for genome rearrangement distances fall into 3 categories: Empirical computational studies, Bayesian/MCMC approaches, and combinatorial approaches. Here, we introduce a maximum likelihood estimator for the inversion distance between a pair of genomes, using a group-theoretic approach to modelling inversions introduced recently. This MLE functions as a corrected distance: in particular, we show that because of the way sequences of inversions interact with each other, it is quite possible for minimal distance and MLE distance to differently order the distances of two genomes from a third. The second aspect tackles the problem of accounting for the symmetries of circular arrangements. While, generally, a frame of reference is locked, and all computation made accordingly, this work incorporates the action of the dihedral group so that distance estimates are free from any a priori frame of reference. The philosophy of accounting for symmetries can be applied to any existing correction method, for which examples are offered.

Assuntos

Evolução Molecular , Genoma/genética , Filogenia , Funções Verossimilhança , Análise Espacial

5.

Matrix group structure and Markov invariants in the strand symmetric phylogenetic substitution model.

Jarvis, Peter D; Sumner, Jeremy G.

J Math Biol ; 73(2): 259-82, 2016 08.

Artigo em Inglês | MEDLINE | ID: mdl-26660305

RESUMO

We consider the continuous-time presentation of the strand symmetric phylogenetic substitution model (in which rate parameters are unchanged under nucleotide permutations given by Watson-Crick base conjugation). Algebraic analysis of the model's underlying structure as a matrix group leads to a change of basis where the rate generator matrix is given by a two-part block decomposition. We apply representation theoretic techniques and, for any (fixed) number of phylogenetic taxa L and polynomial degree D of interest, provide the means to classify and enumerate the associated Markov invariants. In particular, in the quadratic and cubic cases we prove there are precisely [Formula: see text] and [Formula: see text] linearly independent Markov invariants, respectively. Additionally, we give the explicit polynomial forms of the Markov invariants for (i) the quadratic case with any number of taxa L, and (ii) the cubic case in the special case of a three-taxon phylogenetic tree. We close by showing our results are of practical interest since the quadratic Markov invariants provide independent estimates of phylogenetic distances based on (i) substitution rates within Watson-Crick conjugate pairs, and (ii) substitution rates across conjugate base pairs.

Assuntos

Classificação/métodos , Modelos Genéticos , Filogenia , Algoritmos

6.

Lie Markov models with purine/pyrimidine symmetry.

Fernández-Sánchez, Jesús; Sumner, Jeremy G; Jarvis, Peter D; Woodhams, Michael D.

J Math Biol ; 70(4): 855-91, 2015 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-24723068

RESUMO

Continuous-time Markov chains are a standard tool in phylogenetic inference. If homogeneity is assumed, the chain is formulated by specifying time-independent rates of substitutions between states in the chain. In applications, there are usually extra constraints on the rates, depending on the situation. If a model is formulated in this way, it is possible to generalise it and allow for an inhomogeneous process, with time-dependent rates satisfying the same constraints. It is then useful to require that, under some time restrictions, there exists a homogeneous average of this inhomogeneous process within the same model. This leads to the definition of "Lie Markov models" which, as we will show, are precisely the class of models where such an average exists. These models form Lie algebras and hence concepts from Lie group theory are central to their derivation. In this paper, we concentrate on applications to phylogenetics and nucleotide evolution, and derive the complete hierarchy of Lie Markov models that respect the grouping of nucleotides into purines and pyrimidines-that is, models with purine/pyrimidine symmetry. We also discuss how to handle the subtleties of applying Lie group methods, most naturally defined over the complex field, to the stochastic case of a Markov process, where parameter values are restricted to be real and positive. In particular, we explore the geometric embedding of the cone of stochastic rate matrices within the ambient space of the associated complex Lie algebra.

Assuntos

Modelos Genéticos , Nucleotídeos de Purina/genética , Nucleotídeos de Pirimidina/genética , Animais , DNA/genética , Evolução Molecular , Humanos , Cadeias de Markov , Conceitos Matemáticos , Filogenia , Processos Estocásticos

7.

A tensorial approach to the inversion of group-based phylogenetic models.

Sumner, Jeremy G; Jarvis, Peter D; Holland, Barbara R.

BMC Evol Biol ; 14: 236, 2014 Dec 04.

Artigo em Inglês | MEDLINE | ID: mdl-25472897

RESUMO

BACKGROUND: Hadamard conjugation is part of the standard mathematical armoury in the analysis of molecular phylogenetic methods. For group-based models, the approach provides a one-to-one correspondence between the so-called "edge length" and "sequence" spectrum on a phylogenetic tree. The Hadamard conjugation has been used in diverse phylogenetic applications not only for inference but also as an important conceptual tool for thinking about molecular data leading to generalizations beyond strictly tree-like evolutionary modelling. RESULTS: For general group-based models of phylogenetic branching processes, we reformulate the problem of constructing a one-one correspondence between pattern probabilities and edge parameters. This takes a classic result previously shown through use of Fourier analysis and presents it in the language of tensors and group representation theory. This derivation makes it clear why the inversion is possible, because, under their usual definition, group-based models are defined for abelian groups only. CONCLUSION: We provide an inversion of group-based phylogenetic models that can implemented using matrix multiplication between rectangular matrices indexed by ordered-partitions of varying sizes. Our approach provides additional context for the construction of phylogenetic probability distributions on network structures, and highlights the potential limitations of restricting to group-based models in this setting.

Assuntos

Modelos Genéticos , Filogenia , Evolução Biológica , Cadeias de Markov

8.

Low-parameter phylogenetic inference under the general markov model.

Holland, Barbara R; Jarvis, Peter D; Sumner, Jeremy G.

Syst Biol ; 62(1): 78-92, 2013 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-22914976

RESUMO

In their 2008 and 2009 articles, Sumner and colleagues introduced the "squangles"-a small set of Markov invariants for phylogenetic quartets. The squangles are consistent with the general Markov (GM) model and can be used to infer quartets without the need to explicitly estimate all parameters. As the GM model is inhomogeneous and hence nonstationary, the squangles are expected to perform well compared with standard approaches when there are changes in base composition among species. However, the GM model assumes constant rates across sites, so the squangles should be confounded by data generated with invariant sites or other forms of rate-variation across sites. Here we implement the squangles in a least-squares setting that returns quartets weighted by either confidence or internal edge lengths, and we show how these weighted quartets can be used as input into a variety of supertree and supernetwork methods. For the first time, we quantitatively investigate the robustness of the squangles to breaking of the constant rates-across-sites assumption on both simulated and real data sets; and we suggest a modification that improves the performance of the squangles in the presence of invariant sites. Our conclusion is that the squangles provide a novel tool for phylogenetic estimation that is complementary to methods that explicitly account for rate-variation across sites, but rely on homogeneous-and hence stationary-models.

Assuntos

Classificação/métodos , Modelos Genéticos , Filogenia , Animais , Simulação por Computador , Análise dos Mínimos Quadrados , Mamíferos/classificação , Mamíferos/genética , Cadeias de Markov , Reprodutibilidade dos Testes

9.

Is the general time-reversible model bad for molecular phylogenetics?

Sumner, Jeremy G; Jarvis, Peter D; Fernández-Sánchez, Jesús; Kaine, Bodie T; Woodhams, Michael D; Holland, Barbara R.

Syst Biol ; 61(6): 1069-74, 2012 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-22442193

Assuntos

Classificação/métodos , Modelos Teóricos , Filogenia

10.

Markov invariants for phylogenetic rate matrices derived from embedded submodels.

Jarvis, Peter D; Sumner, Jeremy G.

IEEE/ACM Trans Comput Biol Bioinform ; 9(3): 828-36, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-22331860

RESUMO

We consider novel phylogenetic models with rate matrices that arise via the embedding of a progenitor model on a small number of character states, into a target model on a larger number of character states. Adapting representation-theoretic results from recent investigations of Markov invariants for the general rate matrix model, we give a prescription for identifying and counting Markov invariants for such "symmetric embedded" models, and we provide enumerations of these for the first few cases with a small number of character states. The simplest example is a target model on three states, constructed from a general 2 state model; the "2 --> 3" embedding. We show that for 2 taxa, there exist two invariants of quadratic degree that can be used to directly infer pairwise distances from observed sequences under this model. A simple simulation study verifies their theoretical expected values, and suggests that, given the appropriateness of the model class, they have superior statistical properties than the standard (log) Det invariant (which is of cubic degree for this case).

Assuntos

Modelos Genéticos , Filogenia , Cadeias de Markov

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA