Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
Bioinformatics ; 38(23): 5182-5190, 2022 11 30.
Article in English | MEDLINE | ID: mdl-36227122

ABSTRACT

MOTIVATION: The multispecies coalescent model is now widely accepted as an effective model for incorporating variation in the evolutionary histories of individual genes into methods for phylogenetic inference from genome-scale data. However, because model-based analysis under the coalescent can be computationally expensive for large datasets, a variety of inferential frameworks and corresponding algorithms have been proposed for estimation of species-level phylogenies and associated parameters, including speciation times and effective population sizes. RESULTS: We consider the problem of estimating the timing of speciation events along a phylogeny in a coalescent framework. We propose a maximum a posteriori estimator based on composite likelihood (MAPCL) for inferring these speciation times under a model of DNA sequence evolution for which exact site-pattern probabilities can be computed under the assumption of a constant θ throughout the species tree. We demonstrate that the MAPCL estimates are statistically consistent and asymptotically normally distributed, and we show how this result can be used to estimate their asymptotic variance. We also provide a more computationally efficient estimator of the asymptotic variance based on the non-parametric bootstrap. We evaluate the performance of our method using simulation and by application to an empirical dataset for gibbons. AVAILABILITY AND IMPLEMENTATION: The method has been implemented in the PAUP* program, freely available at https://paup.phylosolutions.com for Macintosh, Windows and Linux operating systems. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Software , Phylogeny , Computer Simulation , Probability , Models, Genetic , Genetic Speciation
2.
Syst Biol ; 68(6): 1052-1061, 2019 11 01.
Article in English | MEDLINE | ID: mdl-31034053

ABSTRACT

BEAGLE is a high-performance likelihood-calculation library for phylogenetic inference. The BEAGLE library defines a simple, but flexible, application programming interface (API), and includes a collection of efficient implementations for calculation under a variety of evolutionary models on different hardware devices. The library has been integrated into recent versions of popular phylogenetics software packages including BEAST and MrBayes and has been widely used across a diverse range of evolutionary studies. Here, we present BEAGLE 3 with new parallel implementations, increased performance for challenging data sets, improved scalability, and better usability. We have added new OpenCL and central processing unit-threaded implementations to the library, allowing the effective utilization of a wider range of modern hardware. Further, we have extended the API and library to support concurrent computation of independent partial likelihood arrays, for increased performance of nucleotide-model analyses with greater flexibility of data partitioning. For better scalability and usability, we have improved how phylogenetic software packages use BEAGLE in multi-GPU (graphics processing unit) and cluster environments, and introduced an automated method to select the fastest device given the data set, evolutionary model, and hardware. For application developers who wish to integrate the library, we also have developed an online tutorial. To evaluate the effect of the improvements, we ran a variety of benchmarks on state-of-the-art hardware. For a partitioned exemplar analysis, we observe run-time performance improvements as high as 5.9-fold over our previous GPU implementation. BEAGLE 3 is free, open-source software licensed under the Lesser GPL and available at https://beagle-dev.github.io.


Subject(s)
Classification/methods , Software/standards , Data Interpretation, Statistical , Phylogeny
3.
Nat Commun ; 9(1): 5451, 2018 12 21.
Article in English | MEDLINE | ID: mdl-30575731

ABSTRACT

Interactions between fungi and plants, including parasitism, mutualism, and saprotrophy, have been invoked as key to their respective macroevolutionary success. Here we evaluate the origins of plant-fungal symbioses and saprotrophy using a time-calibrated phylogenetic framework that reveals linked and drastic shifts in diversification rates of each kingdom. Fungal colonization of land was associated with at least two origins of terrestrial green algae and preceded embryophytes (as evidenced by losses of fungal flagellum, ca. 720 Ma), likely facilitating terrestriality through endomycorrhizal and possibly endophytic symbioses. The largest radiation of fungi (Leotiomyceta), the origin of arbuscular mycorrhizae, and the diversification of extant embryophytes occurred ca. 480 Ma. This was followed by the origin of extant lichens. Saprotrophic mushrooms diversified in the Late Paleozoic as forests of seed plants started to dominate the landscape. The subsequent diversification and explosive radiation of Agaricomycetes, and eventually of ectomycorrhizal mushrooms, were associated with the evolution of Pinaceae in the Mesozoic, and establishment of angiosperm-dominated biomes in the Cretaceous.


Subject(s)
Biological Evolution , Embryophyta , Fungi , Symbiosis
4.
Biochim Biophys Acta Rev Cancer ; 1867(2): 101-108, 2017 Apr.
Article in English | MEDLINE | ID: mdl-27810337

ABSTRACT

Despite decades of research and an enormity of resultant data, cancer remains a significant public health problem. New tools and fresh perspectives are needed to obtain fundamental insights, to develop better prognostic and predictive tools, and to identify improved therapeutic interventions. With increasingly common genome-scale data, one suite of algorithms and concepts with potential to shed light on cancer biology is phylogenetics, a scientific discipline used in diverse fields. From grouping subsets of cancer samples to tracing subclonal evolution during cancer progression and metastasis, the use of phylogenetics is a powerful systems biology approach. Well-developed phylogenetic applications provide fast, robust approaches to analyze high-dimensional, heterogeneous cancer data sets. This article is part of a Special Issue entitled: Evolutionary principles - heterogeneity in cancer?, edited by Dr. Robert A. Gatenby.


Subject(s)
Biomarkers, Tumor/genetics , Cell Transformation, Neoplastic/genetics , Evolution, Molecular , Genetic Fitness , Neoplasms/genetics , Phylogeny , Adaptation, Physiological , Algorithms , Animals , Biomarkers, Tumor/metabolism , Cell Transformation, Neoplastic/metabolism , Cell Transformation, Neoplastic/pathology , Gene Expression Regulation, Neoplastic , Genetic Predisposition to Disease , Genomics/methods , Heredity , Humans , Models, Genetic , Mutation , Neoplasms/drug therapy , Neoplasms/metabolism , Neoplasms/pathology , Pedigree , Phenotype , Signal Transduction/genetics , Systems Biology , Time Factors
5.
Proc Natl Acad Sci U S A ; 113(29): 8049-56, 2016 07 19.
Article in English | MEDLINE | ID: mdl-27432945

ABSTRACT

Phylogeographic analysis can be described as the study of the geological and climatological processes that have produced contemporary geographic distributions of populations and species. Here, we attempt to understand how the dynamic process of landscape change on Madagascar has shaped the distribution of a targeted clade of mouse lemurs (genus Microcebus) and, conversely, how phylogenetic and population genetic patterns in these small primates can reciprocally advance our understanding of Madagascar's prehuman environment. The degree to which human activity has impacted the natural plant communities of Madagascar is of critical and enduring interest. Today, the eastern rainforests are separated from the dry deciduous forests of the west by a large expanse of presumed anthropogenic grassland savanna, dominated by the Family Poaceae, that blankets most of the Central Highlands. Although there is firm consensus that anthropogenic activities have transformed the original vegetation through agricultural and pastoral practices, the degree to which closed-canopy forest extended from the east to the west remains debated. Phylogenetic and population genetic patterns in a five-species clade of mouse lemurs suggest that longitudinal dispersal across the island was readily achieved throughout the Pleistocene, apparently ending at ∼55 ka. By examining patterns of both inter- and intraspecific genetic diversity in mouse lemur species found in the eastern, western, and Central Highland zones, we conclude that the natural environment of the Central Highlands would have been mosaic, consisting of a matrix of wooded savanna that formed a transitional zone between the extremes of humid eastern and dry western forest types.


Subject(s)
Cheirogaleidae/genetics , Animals , DNA, Mitochondrial/genetics , Forests , Madagascar , Phylogeny , Phylogeography
6.
Am Nat ; 185(3): 433-42, 2015 Mar.
Article in English | MEDLINE | ID: mdl-25674696

ABSTRACT

A fern from the French Pyrenees-×Cystocarpium roskamianum-is a recently formed intergeneric hybrid between parental lineages that diverged from each other approximately 60 million years ago (mya; 95% highest posterior density: 40.2-76.2 mya). This is an extraordinarily deep hybridization event, roughly akin to an elephant hybridizing with a manatee or a human with a lemur. In the context of other reported deep hybrids, this finding suggests that populations of ferns, and other plants with abiotically mediated fertilization, may evolve reproductive incompatibilities more slowly, perhaps because they lack many of the premating isolation mechanisms that characterize most other groups of organisms. This conclusion implies that major features of Earth's biodiversity-such as the relatively small number of species of ferns compared to those of angiosperms-may be, in part, an indirect by-product of this slower "speciation clock" rather than a direct consequence of adaptive innovations by the more diverse lineages.


Subject(s)
Ferns/genetics , Genetic Speciation , Hybridization, Genetic , Biological Evolution , France , Molecular Sequence Data , Phylogeny , Reproduction , Sequence Analysis, Protein
7.
Syst Biol ; 64(3): 525-31, 2015 May.
Article in English | MEDLINE | ID: mdl-25577605

ABSTRACT

Phycas is open source, freely available Bayesian phylogenetics software written primarily in C++ but with a Python interface. Phycas specializes in Bayesian model selection for nucleotide sequence data, particularly the estimation of marginal likelihoods, central to computing Bayes Factors. Marginal likelihoods can be estimated using newer methods (Thermodynamic Integration and Generalized Steppingstone) that are more accurate than the widely used Harmonic Mean estimator. In addition, Phycas supports two posterior predictive approaches to model selection: Gelfand-Ghosh and Conditional Predictive Ordinates. The General Time Reversible family of substitution models, as well as a codon model, are available, and data can be partitioned with all parameters unlinked except tree topology and edge lengths. Phycas provides for analyses in which the prior on tree topologies allows polytomous trees as well as fully resolved trees, and provides for several choices for edge length priors, including a hierarchical model as well as the recently described compound Dirichlet prior, which helps avoid overly informative induced priors on tree length.


Subject(s)
Classification/methods , Phylogeny , Software , Algorithms , Bayes Theorem , Chlorophyta/classification , Chlorophyta/genetics
8.
Syst Biol ; 61(1): 170-3, 2012 Jan.
Article in English | MEDLINE | ID: mdl-21963610

ABSTRACT

Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.


Subject(s)
Computational Biology/methods , Phylogeny , Software , Algorithms , Computing Methodologies , Evolution, Molecular , Genome
9.
Genome Res ; 21(6): 850-62, 2011 Jun.
Article in English | MEDLINE | ID: mdl-21518738

ABSTRACT

Here we provide a detailed comparative analysis across the candidate X-Inactivation Center (XIC) region and the XIST locus in the genomes of six primates and three mammalian outgroup species. Since lemurs and other strepsirrhine primates represent the sister lineage to all other primates, this analysis focuses on lemurs to reconstruct the ancestral primate sequences and to gain insight into the evolution of this region and the genes within it. This comparative evolutionary genomics approach reveals significant expansion in genomic size across the XIC region in higher primates, with minimal size alterations across the XIST locus itself. Reconstructed primate ancestral XIC sequences show that the most dramatic changes during the past 80 million years occurred between the ancestral primate and the lineage leading to Old World monkeys. In contrast, the XIST locus compared between human and the primate ancestor does not indicate any dramatic changes to exons or XIST-specific repeats; rather, evolution of this locus reflects small incremental changes in overall sequence identity and short repeat insertions. While this comparative analysis reinforces that the region around XIST has been subject to significant genomic change, even among primates, our data suggest that evolution of the XIST sequences themselves represents only small lineage-specific changes across the past 80 million years.


Subject(s)
Evolution, Molecular , Genes, X-Linked/genetics , Lemur/genetics , Phylogeny , RNA, Untranslated/genetics , Animals , Base Sequence , Chromosomes, Artificial, Bacterial , Computational Biology , DNA, Complementary/genetics , Humans , In Situ Hybridization, Fluorescence , Likelihood Functions , Models, Genetic , Molecular Sequence Data , Polymerase Chain Reaction , RNA, Long Noncoding , Sequence Analysis, DNA , Species Specificity
10.
Cladistics ; 27(4): 417-427, 2011 Aug.
Article in English | MEDLINE | ID: mdl-34875790

ABSTRACT

Because horizontal gene transfer can confound the recovery of the largely prokaryotic tree of life (ToL), most genome-based techniques seek to eliminate horizontal signal from ToL analyses, commonly by sieving out incongruent genes and data. This approach greatly limits the number of gene families analysed to a subset thought to be representative of vertical evolutionary history. However, formalized tests have not been performed to determine whether combining the massive amounts of information available in fully sequenced genomes can recover a reasonable ToL. Consequently, we used empirically defined gene homology definitions from a previous study that delineate xenologous gene families (gene families derived from a common transfer event) to generate a massively concatenated, combined-data ToL matrix derived from 323 404 translated open reading frames arranged into 12 381 gene homologue groups coded as amino acid data and 63 336, 64 105, 65 153, 66 922 and 67 109 gene homologue groups coded as gene presence/absence data for 166 fully sequenced genomes. This whole-genome gene presence/absence and amino acid sequence ToL data matrix is composed of 4867 184 characters (a combined data-type mega-matrix). Phylogenetic analysis of this mega-matrix yielded a fully resolved ToL that classifies all three commonly accepted domains of life as monophyletic and groups most taxa in traditionally recognized locations with high support. Most importantly, these results corroborate the existence of a common evolutionary history for these taxa present in both data types that is evident only when these data are analysed in combination. © The Willi Hennig Society 2010.

12.
PLoS Biol ; 6(8): e206, 2008 Aug 26.
Article in English | MEDLINE | ID: mdl-18752347

ABSTRACT

Inosine monophosphate dehydrogenase (IMPDH) catalyzes an essential step in the biosynthesis of guanine nucleotides. This reaction involves two different chemical transformations, an NAD-linked redox reaction and a hydrolase reaction, that utilize mutually exclusive protein conformations with distinct catalytic residues. How did Nature construct such a complicated catalyst? Here we employ a "Wang-Landau" metadynamics algorithm in hybrid quantum mechanical/molecular mechanical (QM/MM) simulations to investigate the mechanism of the hydrolase reaction. These simulations show that the lowest energy pathway utilizes Arg418 as the base that activates water, in remarkable agreement with previous experiments. Surprisingly, the simulations also reveal a second pathway for water activation involving a proton relay from Thr321 to Glu431. The energy barrier for the Thr321 pathway is similar to the barrier observed experimentally when Arg418 is removed by mutation. The Thr321 pathway dominates at low pH when Arg418 is protonated, which predicts that the substitution of Glu431 with Gln will shift the pH-rate profile to the right. This prediction is confirmed in subsequent experiments. Phylogenetic analysis suggests that the Thr321 pathway was present in the ancestral enzyme, but was lost when the eukaryotic lineage diverged. We propose that the primordial IMPDH utilized the Thr321 pathway exclusively, and that this mechanism became obsolete when the more sophisticated catalytic machinery of the Arg418 pathway was installed. Thus, our simulations provide an unanticipated window into the evolution of a complex enzyme.


Subject(s)
Amino Acids/metabolism , IMP Dehydrogenase/chemistry , Models, Biological , Water/metabolism , Amino Acid Substitution , Catalysis , Computer Simulation , Hydrolases/metabolism , IMP Dehydrogenase/metabolism , Phylogeny , Quantum Theory , Thermodynamics
13.
Bioinformatics ; 24(4): 581-3, 2008 Feb 15.
Article in English | MEDLINE | ID: mdl-17766271

ABSTRACT

UNLABELLED: A key element to a successful Markov chain Monte Carlo (MCMC) inference is the programming and run performance of the Markov chain. However, the explicit use of quality assessments of the MCMC simulations-convergence diagnostics-in phylogenetics is still uncommon. Here, we present a simple tool that uses the output from MCMC simulations and visualizes a number of properties of primary interest in a Bayesian phylogenetic analysis, such as convergence rates of posterior split probabilities and branch lengths. Graphical exploration of the output from phylogenetic MCMC simulations gives intuitive and often crucial information on the success and reliability of the analysis. The tool presented here complements convergence diagnostics already available in other software packages primarily designed for other applications of MCMC. Importantly, the common practice of using trace-plots of a single parameter or summary statistic, such as the likelihood score of sampled trees, can be misleading for assessing the success of a phylogenetic MCMC simulation. AVAILABILITY: The program is available as source under the GNU General Public License and as a web application at http://ceb.scs.fsu.edu/awty.


Subject(s)
Computational Biology/methods , Computer Graphics , Markov Chains , Monte Carlo Method , Phylogeny , Software , Bayes Theorem
14.
Mol Biol Evol ; 22(6): 1386-92, 2005 Jun.
Article in English | MEDLINE | ID: mdl-15758203

ABSTRACT

Almost all studies that estimate phylogenies from DNA sequence data under the maximum-likelihood (ML) criterion employ an approximate approach. Most commonly, model parameters are estimated on some initial phylogenetic estimate derived using a rapid method (neighbor-joining or parsimony). Parameters are then held constant during a tree search, and ideally, the procedure is repeated until convergence is achieved. However, the effectiveness of this approximation has not been formally assessed, in part because doing so requires computationally intensive, full-optimization analyses. Here, we report both indirect and direct evaluations of the effectiveness of successive approximations. We obtained an indirect evaluation by comparing the results of replicate runs on real data that use random trees to provide initial parameter estimates. For six real data sets taken from the literature, all replicate iterative searches converged to the same joint estimates of topology and model parameters, suggesting that the approximation is not starting-point dependent, as long as the heuristic searches of tree space are rigorous. We conducted a more direct assessment using simulations in which we compared the accuracy of phylogenies estimated using full optimization of all model parameters on each tree evaluated to the accuracy of trees estimated via successive approximations. There is no significant difference between the accuracy of the approximation searches relative to full-optimization searches. Our results demonstrate that successive approximation is reliable and provide reassurance that this much faster approach is safe to use for ML estimation of topology.


Subject(s)
Computational Biology/methods , Models, Genetic , Phylogeny , Algorithms , Databases, Genetic , Evolution, Molecular , Likelihood Functions , Models, Theoretical , Software , Time Factors
16.
Mol Phylogenet Evol ; 33(2): 440-51, 2004 Nov.
Article in English | MEDLINE | ID: mdl-15336677

ABSTRACT

Although long-branch attraction (LBA) is frequently cited as the cause of anomalous phylogenetic groupings, few examples of LBA involving real sequence data are known. We have found several cases of probable LBA by analyzing subsamples from an alignment of 18S rDNA sequences for 133 metazoans. In one example, maximum parsimony analysis of sequences from two rotifers, a ctenophore, and a polychaete annelid resulted in strong support for a tree grouping two "long-branch taxa" (a rotifer and the ctenophore). Maximum-likelihood analysis of the same sequences yielded strong support for a more biologically reasonable "rotifer monophyly" tree. Attempts to break up long branches for problematic subsamples through increased taxon sampling reduced, but did not eliminate, LBA problems. Exhaustive analyses of all quartets for a subset of 50 sequences were performed in order to compare the performance of maximum likelihood, equal-weights parsimony, and two additional variants of parsimony; these methods do differ substantially in their rates of failure to recover trees consistent with well established, but highly unresolved phylogenies. Power analyses using simulations suggest that some incorrect inferences by maximum parsimony are due to statistical inconsistency and that when estimates of central branch lengths for certain quartets are very low, maximum-likelihood analyses have difficulty recovering accepted phylogenies even with large amounts of data. These examples demonstrate that LBA problems can occur in real data sets, and they provide an opportunity to investigate causes of incorrect inferences.


Subject(s)
Bias , Invertebrates/classification , Phylogeny , RNA, Ribosomal, 18S/classification , Animals , DNA, Ribosomal/classification , Invertebrates/genetics , Likelihood Functions , RNA, Ribosomal, 18S/genetics
17.
Curr Protoc Bioinformatics ; Chapter 6: Unit 6.4, 2003 Feb.
Article in English | MEDLINE | ID: mdl-18428704

ABSTRACT

This unit provides a general description of reconstructing evolutionary trees using PAUP* 4.0. The protocol takes users through an example analysis of mitochondrial DNA sequence data using the parsimony and the likelihood criteria to infer optimal trees. The protocol also discusses searching options available in PAUP* and demonstrates how to import non-NEXUS formats. Finally, a general discussion is given regarding the pros and cons of the "model-free" and "model-based" methods used throughout the protocol.


Subject(s)
DNA Mutational Analysis/methods , Evolution, Molecular , Genetic Variation/genetics , Models, Genetic , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Software , Computer Simulation , Phylogeny
18.
Cladistics ; 13(1-2): 153-159, 1997 Mar.
Article in English | MEDLINE | ID: mdl-34920631

ABSTRACT

We provide three simple examples demonstrating that Wheeler and Nixon's method of recoding "stepmatrix' characters can fail to yield most parsimonious reconstructions of character evolution under specified cost (transformation-weight) schemes. These examples variously indicate undercounting or overcounting of tree lengths due to an inappropriate assumption of independence among the recoded characters. Their method is therefore not equivalent to Sankoff's dynamic programming algorithm, contrary to their claim.

SELECTION OF CITATIONS
SEARCH DETAIL
...