Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Article in English | MEDLINE | ID: mdl-17051697

ABSTRACT

In practice, one is often faced with incomplete phylogenetic data, such as a collection of partial trees or partial splits. This paper poses the problem of inferring a phylogenetic super-network from such data and provides an efficient algorithm for doing so, called the Z-closure method. Additionally, the questions of assigning lengths to the edges of the network and how to restrict the "dimensionality" of the network are addressed. Applications to a set of five published partial gene trees relating different fungal species and to six published partial gene trees relating different grasses illustrate the usefulness of the method and an experimental study confirms its potential. The method is implemented as a plug-in for the program SplitsTree4.


Subject(s)
Computational Biology/methods , Phylogeny , Algorithms , Evolution, Molecular , Genes, Fungal , Genes, Plant , Genetic Speciation , Models, Genetic , Models, Statistical , Models, Theoretical
3.
Mol Biol Evol ; 15(9): 1183-8, 1998 Sep.
Article in English | MEDLINE | ID: mdl-9729882

ABSTRACT

The aims of the work were (1) to develop statistical tests to identify whether substitution takes place under a covariotide model in sequences used for phylogenetic inference and (2) to determine the influence of covariotide substitution on phylogenetic trees inferred for photosynthetic and other organisms. (Covariotide and covarion models are ones in which sites that are variable in some parts of the underlying tree are invariable in others and vice versa.) Two tests were developed. The first was a contingency test, and the second was an inequality test comparing the expected number of variable sites in two groups with the observed number. Application of these tests to 16S rDNA and tufA sequences from a range of nonphotosynthetic prokaryotes and oxygenic photosynthetic prokaryotes and eukaryotes suggests the occurrence of a covariotide mechanism. The degree of support for partitioning of taxa in reconstructed trees involving these organisms was determined in the presence or absence of sites showing particular substitution patterns. This analysis showed that the support for splits between (1) photosynthetic eukaryotes and prokaryotes and (2) photosynthetic and nonphotosynthetic organisms could be accounted for by patterns arising from covariotide substitution. We show that the additional problem of compositional bias in sequence data needs to be considered in the context of patterns of covariotide/covarion substitution. We argue that while covariotide or covarion substitution may give rise to phylogenetically informative patterns in sequence data, this may not always be so.


Subject(s)
Models, Genetic , Oxygen/metabolism , Photosynthesis/genetics , Phylogeny , Markov Chains
4.
Mol Phylogenet Evol ; 8(3): 398-414, 1997 Dec.
Article in English | MEDLINE | ID: mdl-9417897

ABSTRACT

A series of new results useful to the study of DNA sequences using Markov models of substitution are presented with proofs. General time-reversible distances can be extended to accommodate any fixed distribution of rates across sites by replacing the logarithmic function of a matrix with the inverse of a moment generating function. Estimators are presented assuming a gamma distribution, the inverse Gaussian distribution, or a mixture of either of these with invariant sites. Also considered are the different ways invariant sites may be removed and how these differences may affect estimated distances. Through collaboration, we implemented these distances into PAUP in 1994. The variance of these new distances is approximated via the delta method. It is also shown how to predict the divergence expected for a pair of sequences given a rate matrix and a distribution of rates across sites, allowing iterated ML estimates of distances under any reversible model. A simple test of whether a rate matrix is time reversible is also presented. These new methods are used to estimate the divergence time of humans and chimps from mtDNA sequence data. These analyses support suggestions that the human lineage has an enhanced transition rate relative to other hominoids. These studies also show that transversion distances differ substantially from the overall distances which are dominated by transitions. Transversions alone apparently suggest a very recent divergence time for humans versus chimps and/or a very old (> 16 myr) divergence time for humans versus orangutans. This work illustrates graphically ways to interpret the reliability of distance-based transformations, using the corrected transition to transversion ratio returned for pairs of sequences which are successively more diverged.


Subject(s)
Models, Genetic , Animals , DNA, Mitochondrial/genetics , Humans , Markov Chains , Primates/genetics
5.
FEBS Lett ; 385(3): 193-6, 1996 May 06.
Article in English | MEDLINE | ID: mdl-8647249

ABSTRACT

We investigate the evolutionary relationships between photosynthetic reaction center proteins (D1, D2, L and M) and demonstrate that the pattern of nucleotide substitution in these is more complicated than has been assumed in previous phylogenetic analyses. We show that there are serious violations of methodological assumptions in previous published studies. We conclude that there is equal support for hypotheses indicating (i) a single gene duplication of an ancestral reaction center protein followed by diversification and (ii) two independent gene duplications giving rise to proteins in oxygenic and anoxygenic systems.


Subject(s)
Evolution, Molecular , Multigene Family , Photosynthetic Reaction Center Complex Proteins/genetics , Amino Acid Sequence , Animals , Bacteria/chemistry , Bacteria/genetics , Codon/genetics , Cyanobacteria/chemistry , Cyanobacteria/genetics , Euglena/chemistry , Euglena/genetics , Gene Deletion , Genes, Bacterial , Genes, Plant , Molecular Sequence Data , Mutation/genetics , Photosynthetic Reaction Center Complex Proteins/chemistry , Phylogeny , Plants/chemistry , Plants/genetics , Rhodophyta/chemistry , Rhodophyta/genetics , Sequence Alignment , Software
6.
J Comput Biol ; 2(1): 39-47, 1995.
Article in English | MEDLINE | ID: mdl-7497119

ABSTRACT

Linear invariants are useful tools for testing phylogenetic hypotheses from aligned DNA/RNA sequences, particularly when the sites evolve at different rates. Here we give a simple, graph theoretic classification for each phylogenetic tree T, of its associated vector space I(T) of linear invariants under the Jukes-Cantor one-parameter model of nucleotide substitution. We also provide an easily described basis for I(T), and show that if I is a binary (fully resolved) phylogenetic tree with n sequences at its leaves then: dim[I(T)] = 4n-F2n-2 where Fn is the nth Fibonacci number. Our method applies a recently developed Hadamard matrix-based technique to describe elements of I(T) in terms of edge-disjoint packings of subtrees in T, and thereby complements earlier more algebraic treatments.


Subject(s)
Base Sequence , DNA/chemistry , Markov Chains , Models, Genetic , Phylogeny , RNA/chemistry , Consensus Sequence , DNA/genetics , RNA/genetics
7.
Proc Natl Acad Sci U S A ; 91(8): 3339-43, 1994 Apr 12.
Article in English | MEDLINE | ID: mdl-8159749

ABSTRACT

Discrete Fourier transformations have recently been developed to model the evolution of two-state characters (the Cavender/Farris model). We report here the extension of these transformations to provide invertible relationships between a phylogenetic tree T (with three probability parameters of nucleotide substitution on each edge corresponding to Kimura's 3ST model) and the expected frequencies of the nucleotide patterns in the sequences. We refer to these relationships as spectral analysis. In either model with independent and identically distributed site substitutions, spectral analysis allows a global correction for all multiple substitutions (second- and higher-order interactions), independent of any particular tree. From these corrected data we use a least-squares selection procedure, the closest tree algorithm, to infer an evolutionary tree. Other selection criteria such as parsimony or compatibility analysis could also be used; each of these criteria will be statistically consistent for these models. The closest tree algorithm selects a unique best-fit phylogenetic tree together with independent edge length parameters for each edge. The method is illustrated with an analysis of some primate hemoglobin sequences.


Subject(s)
Phylogeny , Pseudogenes , Sequence Analysis/methods , Animals , DNA/genetics , Fourier Analysis , Globins/genetics , Humans , Mutation , Primates
8.
J Comput Biol ; 1(2): 153-63, 1994.
Article in English | MEDLINE | ID: mdl-8790461

ABSTRACT

For a sequence of colors independently evolving on a tree under a simple Markov model, we consider conditions under which the tree can be uniquely recovered from the "sequence spectrum"-the expected frequencies of the various leaf colorations. This is relevant for phylogenetic analysis (where colors represent nucleotides or amino acids; leaves represent extant taxa) as the sequence spectrum is estimated directly from a collection of aligned sequences. Allowing the rate of the evolutionary process to vary across sites is an important extension over most previous studies-we show that, given suitable restrictions on the rate distribution, the true tree (up to the placement of its root) is uniquely identified by its sequence spectrum. However, if the rate distribution is unknown and arbitrary, then, for simple models, it is possible for every tree to produce the same sequence spectrum. Hence there is a logical barrier to accurate, consistent phylogenetic inference for these models when assumptions about the rate distribution are not made. This result exploits a novel theorem on the action of polynomials with non-negative coefficients on sequences.


Subject(s)
Models, Biological , Phylogeny , Sequence Analysis , Markov Chains , Reproducibility of Results , Sequence Alignment
9.
Mol Biol Evol ; 11(4): 605-12, 1994 Jul.
Article in English | MEDLINE | ID: mdl-19391266

ABSTRACT

We report a new transformation, the LogDet, that is consistent for sequences with differing nucleotide composition and that have arisen under simple but asymmetric stochastic models of evolution. This transformation is required because existing methods tend to group sequences on the basis of their nucleotide composition, irrespective of their evolutionary history. This effect of differing nucleotide frequencies is illustrated by using a tree-selection criterion on a simple distance measure defined solely on the basis of base composition, independent of the actual sequences. The new LogDet transformation uses determinants of the observed divergence matrices and works because multiplication of determinants (real numbers) is commutative, whereas multiplication of matrices is not,except in special symmetric cases. The use of determinants thus allows more general models of evolution with a symmetric rates of nucleotide change. The transformation is illustrated on a theoretical data set (where existing methods select the wrong tree) and with three biological data sets: chloroplasts, birds/mammals (nuclear), and honeybees ( mitochondrial ) . The LogDet transformation reinforces the logical distinction between transformations on the data and tree-selection criteria. The overall conclusions from this study are that irregular A,C,G,T compositions are an important and possible general cause of patterns that can mislead tree-reconstruction methods, even when high bootstrap values are obtained. Consequently, many published studies may need to be reexamined.


Subject(s)
Base Sequence , Evolution, Molecular , Models, Genetic , Phylogeny , Animals , Bees/genetics , DNA, Mitochondrial/genetics , Humans , Models, Statistical , RNA, Ribosomal, 18S/genetics
10.
Nature ; 364(6436): 440-2, 1993 Jul 29.
Article in English | MEDLINE | ID: mdl-8332213

ABSTRACT

The reliable construction of evolutionary trees from nucleotide sequences often depends on randomization tests such as the bootstrap and PTP (cladistic permutation tail probability) tests. The genomes of bacteria, viruses, animals and plants, however, vary widely in their nucleotide frequencies. Where genomes have independently acquired similar G+C base compositions, signals in the data arise that cause methods of evolutionary tree reconstruction to estimate the wrong tree by grouping together sequences with similar G+C content. Under these conditions randomization tests can lead to both the rejection of the correct evolutionary hypothesis and acceptance of an incorrect hypothesis (such as with the contradictory inferences from the photosynthetic rbcS and rbcL sequences). We have proposed one approach to testing for G+C content problem. Here we present a formalization of this method, a frequency-dependent significance test, which has general application.


Subject(s)
Biological Evolution , Statistics as Topic , Base Sequence , Classification , Models, Statistical
11.
Trends Ecol Evol ; 7(3): 73-9, 1992 Mar.
Article in English | MEDLINE | ID: mdl-21235960

ABSTRACT

Evolutionists dream of a tree-reconstruction method that is efficient (fast), powerful, consistent, robust and falsifiable. These criteria are at present conflicting in that the fastest methods are weak (in their use of information in the sequences) and inconsistent (even with very long sequences they may lead to an incorrect tree). But there has been exciting progress in new approaches to tree inference, in understanding general properties of methods, and in developing ideas for estimating the reliability of trees. New phylogenetic invariant methods allow selected parameters of the underlying model to be estimated directly from sequences. There is still a need for more theoretical understanding and assistance in applying what is already known.

12.
Nature ; 336(6195): 118, 1988 Nov 10.
Article in English | MEDLINE | ID: mdl-3185733
SELECTION OF CITATIONS
SEARCH DETAIL
...