Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 30
Filter
Add more filters










Publication year range
1.
Syst Biol ; 69(3): 579-592, 2020 05 01.
Article in English | MEDLINE | ID: mdl-31747023

ABSTRACT

Studies have demonstrated that pervasive gene tree conflict underlies several important phylogenetic relationships where different species tree methods produce conflicting results. Here, we present a means of dissecting the phylogenetic signal for alternative resolutions within a data set in order to resolve recalcitrant relationships and, importantly, identify what the data set is unable to resolve. These procedures extend upon methods for isolating conflict and concordance involving specific candidate relationships and can be used to identify systematic error and disambiguate sources of conflict among species tree inference methods. We demonstrate these on a large phylogenomic plant data set. Our results support the placement of Amborella as sister to the remaining extant angiosperms, Gnetales as sister to pines, and the monophyly of extant gymnosperms. Several other contentious relationships, including the resolution of relationships within the bryophytes and the eudicots, remain uncertain given the low number of supporting gene trees. To address whether concatenation of filtered genes amplified phylogenetic signal for relationships, we implemented a combinatorial heuristic to test combinability of genes. We found that nested conflicts limited the ability of data filtering methods to fully ameliorate conflicting signal amongst gene trees. These analyses confirmed that the underlying conflicting signal does not support broad concatenation of genes. Our approach provides a means of dissecting a specific data set to address deep phylogenetic relationships while also identifying the inferential boundaries of the data set. [Angiosperms; coalescent; gene-tree conflict; genomics; phylogenetics; phylogenomics.].


Subject(s)
Classification/methods , Phylogeny , Plants/classification , Genes, Plant/genetics , Plants/genetics
2.
PeerJ ; 7: e6334, 2019.
Article in English | MEDLINE | ID: mdl-30886768

ABSTRACT

Comparative methods allow researchers to make inferences about evolutionary processes and patterns from phylogenetic trees. In Bayesian phylogenetics, estimating a phylogeny requires specifying priors on parameters characterizing the branching process and rates of substitution among lineages, in addition to others. Accordingly, characterizing the effect of prior selection on phylogenies is an active area of research. The choice of priors may systematically bias phylogenetic reconstruction and, subsequently, affect conclusions drawn from the resulting phylogeny. Here, we focus on the impact of priors in Bayesian phylogenetic inference and evaluate how they affect the estimation of parameters in macroevolutionary models of lineage diversification. Specifically, we simulate trees under combinations of tree priors and molecular clocks, simulate sequence data, estimate trees, and estimate diversification parameters (e.g., speciation and extinction rates) from these trees. When substitution rate heterogeneity is large, diversification rate estimates deviate substantially from those estimated under the simulation conditions when not captured by an appropriate choice of relaxed molecular clock. However, in general, we find that the choice of tree prior and molecular clock has relatively little impact on the estimation of diversification rates insofar as the sequence data are sufficiently informative and substitution rate heterogeneity among lineages is low-to-moderate.

3.
PLoS Comput Biol ; 15(2): e1006493, 2019 02.
Article in English | MEDLINE | ID: mdl-30768597

ABSTRACT

Phylogenomic research is accelerating the publication of landmark studies that aim to resolve deep divergences of major organismal groups. Meanwhile, systems for identifying and integrating the products of phylogenomic inference-such as newly supported clade concepts-have not kept pace. However, the ability to verbalize node concept congruence and conflict across multiple, in effect simultaneously endorsed phylogenomic hypotheses, is a prerequisite for building synthetic data environments for biological systematics and other domains impacted by these conflicting inferences. Here we develop a novel solution to the conflict verbalization challenge, based on a logic representation and reasoning approach that utilizes the language of Region Connection Calculus (RCC-5) to produce consistent alignments of node concepts endorsed by incongruent phylogenomic studies. The approach employs clade concept labels to individuate concepts used by each source, even if these carry identical names. Indirect RCC-5 modeling of intensional (property-based) node concept definitions, facilitated by the local relaxation of coverage constraints, allows parent concepts to attain congruence in spite of their differentially sampled children. To demonstrate the feasibility of this approach, we align two recent phylogenomic reconstructions of higher-level avian groups that entail strong conflict in the "neoavian explosion" region. According to our representations, this conflict is constituted by 26 instances of input "whole concept" overlap. These instances are further resolvable in the output labeling schemes and visualizations as "split concepts", which provide the labels and relations needed to build truly synthetic phylogenomic data environments. Because the RCC-5 alignments fundamentally reflect the trained, logic-enabled judgments of systematic experts, future designs for such environments need to promote a culture where experts routinely assess the intensionalities of node concepts published by our peers-even and especially when we are not in agreement with each other.


Subject(s)
Computational Biology/methods , Genomics/methods , Phylogeny , Animals , Birds/genetics , Computer Simulation , Humans , Language
4.
Mol Biol Evol ; 36(1): 112-126, 2019 01 01.
Article in English | MEDLINE | ID: mdl-30371871

ABSTRACT

Several plant lineages have evolved adaptations that allow survival in extreme and harsh environments including many families within the plant clade Portulacineae (Caryophyllales) such as the Cactaceae, Didiereaceae, and Montiaceae. Here, using newly generated transcriptomic data, we reconstructed the phylogeny of Portulacineae and examined potential correlates between molecular evolution and adaptation to harsh environments. Our phylogenetic results were largely congruent with previous analyses, but we identified several early diverging nodes characterized by extensive gene tree conflict. For particularly contentious nodes, we present detailed information about the phylogenetic signal for alternative relationships. We also analyzed the frequency of gene duplications, confirmed previously identified whole genome duplications (WGD), and proposed a previously unidentified WGD event within the Didiereaceae. We found that the WGD events were typically associated with shifts in climatic niche but did not find a direct association with WGDs and diversification rate shifts. Diversification shifts occurred within the Portulacaceae, Cactaceae, and Anacampserotaceae, and whereas these did not experience WGDs, the Cactaceae experienced extensive gene duplications. We examined gene family expansion and molecular evolutionary patterns with a focus on genes associated with environmental stress responses and found evidence for significant gene family expansion in genes with stress adaptation and clades found in extreme environments. These results provide important directions for further and deeper examination of the potential links between molecular evolutionary patterns and adaptation to harsh environments.


Subject(s)
Adaptation, Biological , Biological Evolution , Caryophyllales/genetics , Cold Temperature , Droughts , Multigene Family , Polyploidy
5.
Syst Biol ; 67(5): 916-924, 2018 09 01.
Article in English | MEDLINE | ID: mdl-29893968

ABSTRACT

Recent studies have demonstrated that conflict is common among gene trees in phylogenomic studies, and that less than one percent of genes may ultimately drive species tree inference in supermatrix analyses. Herein, we examined two data sets where supermatrix and coalescent-based species trees conflict. We identified two highly influential "outlier" genes in each data set. When removed from each data set, the inferred supermatrix trees matched the topologies obtained from coalescent analyses. We also demonstrate that, while the outlier genes in the vertebrate data set have been shown in a previous study to be the result of errors in orthology detection, the outlier genes from a plant data set did not exhibit any obvious systematic error, and therefore, may be the result of some biological process yet to be determined. While topological comparisons among a small set of alternate topologies can be helpful in discovering outlier genes, they can be limited in several ways, such as assuming all genes share the same topology. Coalescent species tree methods relax this assumption but do not explicitly facilitate the examination of specific edges. Coalescent methods often also assume that conflict is the result of incomplete lineage sorting. Herein, we explored a framework that allows for quickly examining alternative edges and support for large phylogenomic data sets that does not assume a single topology for all genes. For both data sets, these analyses provided detailed results confirming the support for coalescent-based topologies. This framework suggests that we can improve our understanding of the underlying signal in phylogenomic data sets by asking more targeted edge-based questions.


Subject(s)
Caryophyllales/classification , Genomics , Phylogeny , Vertebrates/classification , Animals , Caryophyllales/genetics , Models, Genetic , Vertebrates/genetics
6.
PLoS One ; 13(5): e0197433, 2018.
Article in English | MEDLINE | ID: mdl-29772020

ABSTRACT

Phylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data sources for efficient divergence-time inference remains unexamined. Phylogenomic datasets pose two distinct problems for divergence-time estimation: (i) the volume of data makes inference of the entire dataset intractable, and (ii) the extent of underlying topological and rate heterogeneity across genes makes model mis-specification a real concern. "Gene shopping", wherein a phylogenomic dataset is winnowed to a set of genes with desirable properties, represents an alternative approach that holds promise in alleviating these issues. We implemented an approach for phylogenomic datasets (available in SortaDate) that filters genes by three criteria: (i) clock-likeness, (ii) reasonable tree length (i.e., discernible information content), and (iii) least topological conflict with a focal species tree (presumed to have already been inferred). Such a winnowing procedure ensures that errors associated with model (both clock and topology) mis-specification are minimized, therefore reducing error in divergence-time estimation. We demonstrated the efficacy of this approach through simulation and applied it to published animal (Aves, Diplopoda, and Hymenoptera) and plant (carnivorous Caryophyllales, broad Caryophyllales, and Vitales) phylogenomic datasets. By quantifying rate heterogeneity across both genes and lineages we found that every empirical dataset examined included genes with clock-like, or nearly clock-like, behavior. Moreover, many datasets had genes that were clock-like, exhibited reasonable evolutionary rates, and were mostly compatible with the species tree. We identified overlap in age estimates when analyzing these filtered genes under strict clock and uncorrelated lognormal (UCLN) models. However, this overlap was often due to imprecise estimates from the UCLN model. We find that "gene shopping" can be an efficient approach to divergence-time inference for phylogenomic datasets that may otherwise be characterized by extensive gene tree heterogeneity.


Subject(s)
Genomics/methods , Animals , Evolution, Molecular , Humans , Models, Genetic , Phylogeny
7.
Am J Bot ; 105(3): 385-403, 2018 03.
Article in English | MEDLINE | ID: mdl-29746719

ABSTRACT

PREMISE OF THE STUDY: Phylogenetic support has been difficult to evaluate within the green plant tree of life partly due to a lack of specificity between conflicted versus poorly informed branches. As data sets continue to expand in both breadth and depth, new support measures are needed that are more efficient and informative. METHODS: We describe the Quartet Sampling (QS) method, a quartet-based evaluation system that synthesizes several phylogenetic and genomic analytical approaches. QS characterizes discordance in large-sparse and genome-wide data sets, overcoming issues of alignment sparsity and distinguishing strong conflict from weak support. We tested QS with simulations and recent plant phylogenies inferred from variously sized data sets. KEY RESULTS: QS scores demonstrated convergence with increasing replicates and were not strongly affected by branch depth. Patterns of QS support from different phylogenies led to a coherent understanding of ancestral branches defining key disagreements, including the relationships of Ginkgo to cycads, magnoliids to monocots and eudicots, and mosses to liverworts. The relationships of ANA-grade angiosperms (Amborella, Nymphaeales, Austrobaileyales), major monocot groups, bryophytes, and fern families are likely highly discordant in their evolutionary histories, rather than poorly informed. QS can also detect discordance due to introgression in phylogenomic data. CONCLUSIONS: Quartet Sampling is an efficient synthesis of phylogenetic tests that offers more comprehensive and specific information on branch support than conventional measures. The QS method corroborates growing evidence that phylogenomic investigations that incorporate discordance testing are warranted when reconstructing complex evolutionary histories, in particular those surrounding ANA-grade, monocots, and nonvascular plants.


Subject(s)
Biological Evolution , DNA, Plant/analysis , Genome, Plant , Genomics/methods , Phylogeny , Viridiplantae/genetics , Bryophyta/genetics , Computer Simulation , Cycadopsida/genetics , Ferns/genetics , Ginkgo biloba/genetics , Hepatophyta/genetics , Magnoliopsida/genetics , Reproducibility of Results
8.
Am J Bot ; 105(3): 302-314, 2018 03.
Article in English | MEDLINE | ID: mdl-29746720

ABSTRACT

PREMISE OF THE STUDY: Large phylogenies can help shed light on macroevolutionary patterns that inform our understanding of fundamental processes that shape the tree of life. These phylogenies also serve as tools that facilitate other systematic, evolutionary, and ecological analyses. Here we combine genetic data from public repositories (GenBank) with phylogenetic data (Open Tree of Life project) to construct a dated phylogeny for seed plants. METHODS: We conducted a hierarchical clustering analysis of publicly available molecular data for major clades within the Spermatophyta. We constructed phylogenies of major clades, estimated divergence times, and incorporated data from the Open Tree of Life project, resulting in a seed plant phylogeny. We estimated diversification rates, excluding those taxa without molecular data. We also summarized topological uncertainty and data overlap for each major clade. KEY RESULTS: The trees constructed for Spermatophyta consisted of 79,881 and 353,185 terminal taxa; the latter included the Open Tree of Life taxa for which we could not include molecular data from GenBank. The diversification analyses demonstrated nested patterns of rate shifts throughout the phylogeny. Data overlap and inference uncertainty show significant variation throughout and demonstrate the continued need for data collection across seed plants. CONCLUSIONS: This study demonstrates a means for combining available resources to construct a dated phylogeny for plants. However, this approach is an early step and more developments are needed to add data, better incorporating underlying uncertainty, and improve resolution. The methods discussed here can also be applied to other major clades in the tree of life.


Subject(s)
Biological Evolution , Phylogeny , Plants/genetics , Seeds , Classification , Cluster Analysis , Ecology
9.
Syst Biol ; 67(2): 340-353, 2018 Mar 01.
Article in English | MEDLINE | ID: mdl-28945912

ABSTRACT

Divergence time estimation-the calibration of a phylogeny to geological time-is an integral first step in modeling the tempo of biological evolution (traits and lineages). However, despite increasingly sophisticated methods to infer divergence times from molecular genetic sequences, the estimated age of many nodes across the tree of life contrast significantly and consistently with timeframes conveyed by the fossil record. This is perhaps best exemplified by crown angiosperms, where molecular clock (Triassic) estimates predate the oldest (Early Cretaceous) undisputed angiosperm fossils by tens of millions of years or more. While the incompleteness of the fossil record is a common concern, issues of data limitation and model inadequacy are viable (if underexplored) alternative explanations. In this vein, Beaulieu et al. (2015) convincingly demonstrated how methods of divergence time inference can be misled by both (i) extreme state-dependent molecular substitution rate heterogeneity and (ii) biased sampling of representative major lineages. These results demonstrate the impact of (potentially common) model violations. Here, we suggest another potential challenge: that the configuration of the statistical inference problem (i.e., the parameters, their relationships, and associated priors) alone may preclude the reconstruction of the paleontological timeframe for the crown age of angiosperms. We demonstrate, through sampling from the joint prior (formed by combining the tree (diversification) prior with the calibration densities specified for fossil-calibrated nodes) that with no data present at all, that an Early Cretaceous crown angiosperms is rejected (i.e., has essentially zero probability). More worrisome, however, is that for the 24 nodes calibrated by fossils, almost all have indistinguishable marginal prior and posterior age distributions when employing routine lognormal fossil calibration priors. These results indicate that there is inadequate information in the data to over-rule the joint prior. Given that these calibrated nodes are strategically placed in disparate regions of the tree, they act to anchor the tree scaffold, and so the posterior inference for the tree as a whole is largely determined by the pseudodata present in the (often arbitrary) calibration densities. We recommend, as for any Bayesian analysis, that marginal prior and posterior distributions be carefully compared to determine whether signal is coming from the data or prior belief, especially for parameters of direct interest. This recommendation is not novel. However, given how rarely such checks are carried out in evolutionary biology, it bears repeating. Our results demonstrate the fundamental importance of prior/posterior comparisons in any Bayesian analysis, and we hope that they further encourage both researchers and journals to consistently adopt this crucial step as standard practice. Finally, we note that the results presented here do not refute the biological modeling concerns identified by Beaulieu et al. (2015). Both sets of issues remain apposite to the goals of accurate divergence time estimation, and only by considering them in tandem can we move forward more confidently.


Subject(s)
Models, Biological , Phylogeny , Bayes Theorem , Biological Evolution , Fossils , Magnoliopsida/classification , Magnoliopsida/genetics , Time
10.
New Phytol ; 217(2): 836-854, 2018 01.
Article in English | MEDLINE | ID: mdl-28892163

ABSTRACT

The role played by whole genome duplication (WGD) in plant evolution is actively debated. WGDs have been associated with advantages such as superior colonization, various adaptations, and increased effective population size. However, the lack of a comprehensive mapping of WGDs within a major plant clade has led to uncertainty regarding the potential association of WGDs and higher diversification rates. Using seven chloroplast and nuclear ribosomal genes, we constructed a phylogeny of 5036 species of Caryophyllales, representing nearly half of the extant species. We phylogenetically mapped putative WGDs as identified from analyses on transcriptomic and genomic data and analyzed these in conjunction with shifts in climatic occupancy and lineage diversification rate. Thirteen putative WGDs and 27 diversification shifts could be mapped onto the phylogeny. Of these, four WGDs were concurrent with diversification shifts, with other diversification shifts occurring at more recent nodes than WGDs. Five WGDs were associated with shifts to colder climatic occupancy. While we find that many diversification shifts occur after WGDs, it is difficult to consider diversification and duplication to be tightly correlated. Our findings suggest that duplications may often occur along with shifts in either diversification rate, climatic occupancy, or rate of evolution.


Subject(s)
Caryophyllales/genetics , Gene Duplication , Genetic Variation , Caryophyllales/classification , Climate , Genome, Plant , Phylogeny
11.
Proc Biol Sci ; 284(1864)2017 10 11.
Article in English | MEDLINE | ID: mdl-29021179

ABSTRACT

Puttick et al. (2017 Proc. R. Soc. B284, 20162290 (doi:10.1098/rspb.2016.2290)) performed a simulation study to compare accuracy among methods of inferring phylogeny from discrete morphological characters. They report that a Bayesian implementation of the Mk model (Lewis 2001 Syst. Biol.50, 913-925 (doi:10.1080/106351501753462876)) was most accurate (but with low resolution), while a maximum-likelihood (ML) implementation of the same model was least accurate. They conclude by strongly advocating that Bayesian implementations of the Mk model should be the default method of analysis for such data. While we appreciate the authors' attempt to investigate the accuracy of alternative methods of analysis, their conclusion is based on an inappropriate comparison of the ML point estimate, which does not consider confidence, with the Bayesian consensus, which incorporates estimation credibility into the summary tree. Using simulation, we demonstrate that ML and Bayesian estimates are concordant when confidence and credibility are comparably reflected in summary trees, a result expected from statistical theory. We therefore disagree with the conclusions of Puttick et al. and consider their prescription of any default method to be poorly founded. Instead, we recommend caution and thoughtful consideration of the model or method being applied to a morphological dataset.


Subject(s)
Bayes Theorem , Phylogeny , Likelihood Functions , Phenotype , Uncertainty
12.
Mol Phylogenet Evol ; 116: 69-77, 2017 11.
Article in English | MEDLINE | ID: mdl-28797692

ABSTRACT

Recent developments in phylogenetic methods and data acquisition have allowed for the construction of large and comprehensive phylogenetic relationships. Published phylogenies represent an enormous resource that not only facilitates the resolution of questions related to comparative biology, but also provides a resource on which to gauge the development of concordance across the tree of life. From the Open Tree of Life, we gathered 290 avian phylogenies representing all major groups that have been published over the last few decades and analyzed how concordance and conflict develop among these trees through time. Nine large scale phylogenetic hypotheses (including a new synthetic tree from this study) were used for comparisons. We found that conflicts were over-represented both along the backbone (higher-level neoavian relationships) and within the oscine Passeriformes. Importantly, although we have made major strides in the resolution of major clades, recent published comprehensive trees, as well as trees of individual clades, continue to contribute significantly to the resolution of relationships throughout the avian phylogeny. Our analyses highlight the need for continued research into the resolution of avian relationships.


Subject(s)
Birds/classification , Animals , Consensus , Models, Biological , Phylogeny
13.
Bioinformatics ; 33(12): 1886-1888, 2017 Jun 15.
Article in English | MEDLINE | ID: mdl-28174903

ABSTRACT

SUMMARY: The ease with which phylogenomic data can be generated has drastically escalated the computational burden for even routine phylogenetic investigations. To address this, we present phyx : a collection of programs written in C ++ to explore, manipulate, analyze and simulate phylogenetic objects (alignments, trees and MCMC logs). Modelled after Unix/GNU/Linux command line tools, individual programs perform a single task and operate on standard I/O streams that can be piped to quickly and easily form complex analytical pipelines. Because of the stream-centric paradigm, memory requirements are minimized (often only a single tree or sequence in memory at any instance), and hence phyx is capable of efficiently processing very large datasets. AVAILABILITY AND IMPLEMENTATION: phyx runs on POSIX-compliant operating systems. Source code, installation instructions, documentation and example files are freely available under the GNU General Public License at https://github.com/FePhyFoFum/phyx. CONTACT: eebsmith@umich.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genomics/methods , Phylogeny , Software
14.
Mol Phylogenet Evol ; 105: 193-199, 2016 12.
Article in English | MEDLINE | ID: mdl-27601346

ABSTRACT

New World Vultures are large-bodied carrion feeding birds in the family Cathartidae, currently consisting of seven species from five genera with geographic distributions in North and South America. No study to date has included all cathartid species in a single phylogenetic analysis. In this study, we investigated the phylogenetic relationships among all cathartid species using five nuclear (nuc; 4060bp) and two mitochondrial (mt; 2165bp) DNA loci with fossil calibrated gene tree (27 outgroup taxa) and coalescent-based species tree (2 outgroup taxa) analyses. We also included an additional four nuclear loci (2578bp) for the species tree analysis to explore changes in nodal support values. Although the stem lineage is inferred to have originated ∼69 million years ago (Ma; 74.5-64.9 credible interval), a more recent basal split within Cathartidae was recovered at ∼14Ma (17.1-11.1 credible interval). Two primary clades were identified: (1) Black Vulture (Coragyps atratus) together with the three Cathartes species (Lesser C. burrovianus and Greater C. melambrotus Yellow-headed Vultures, and Turkey Vulture C. aura), and (2) King Vulture (Sarcoramphus papa), California (Gymnogyps californianus) and Andean (Vultur gryphus) Condors. Support for taxon relationships within the two basal clades were inconsistent between analyses with the exception of Black Vulture sister to a monophyletic Cathartes clade. Increased support for a yellow-headed vulture clade was recovered in the species tree analysis using the four additional nuclear loci. Overall, these results are in agreement with cathartid life history (e.g. olfaction ability and behavior) and contrasting habitat affinities among sister taxa with overlapping geographic distributions. More research is needed using additional molecular loci to further resolve the phylogenetic relationships within the two basal cathartid clades, as speciation appeared to have occurred in a relatively short period of time.


Subject(s)
Birds/classification , Animals , Birds/genetics , California , DNA , DNA, Mitochondrial/genetics , Phylogeny , Sequence Analysis, DNA , South America
15.
BMC Evol Biol ; 15: 150, 2015 Aug 05.
Article in English | MEDLINE | ID: mdl-26239519

ABSTRACT

BACKGROUND: The use of transcriptomic and genomic datasets for phylogenetic reconstruction has become increasingly common as researchers attempt to resolve recalcitrant nodes with increasing amounts of data. The large size and complexity of these datasets introduce significant phylogenetic noise and conflict into subsequent analyses. The sources of conflict may include hybridization, incomplete lineage sorting, or horizontal gene transfer, and may vary across the phylogeny. For phylogenetic analysis, this noise and conflict has been accommodated in one of several ways: by binning gene regions into subsets to isolate consistent phylogenetic signal; by using gene-tree methods for reconstruction, where conflict is presumed to be explained by incomplete lineage sorting (ILS); or through concatenation, where noise is presumed to be the dominant source of conflict. The results provided herein emphasize that analysis of individual homologous gene regions can greatly improve our understanding of the underlying conflict within these datasets. RESULTS: Here we examined two published transcriptomic datasets, the angiosperm group Caryophyllales and the aculeate Hymenoptera, for the presence of conflict, concordance, and gene duplications in individual homologs across the phylogeny. We found significant conflict throughout the phylogeny in both datasets and in particular along the backbone. While some nodes in each phylogeny showed patterns of conflict similar to what might be expected with ILS alone, the backbone nodes also exhibited low levels of phylogenetic signal. In addition, certain nodes, especially in the Caryophyllales, had highly elevated levels of strongly supported conflict that cannot be explained by ILS alone. CONCLUSION: This study demonstrates that phylogenetic signal is highly variable in phylogenomic data sampled across related species and poses challenges when conducting species tree analyses on large genomic and transcriptomic datasets. Further insight into the conflict and processes underlying these complex datasets is necessary to improve and develop adequate models for sequence analysis and downstream applications. To aid this effort, we developed the open source software phyparts ( https://bitbucket.org/blackrim/phyparts ), which calculates unique, conflicting, and concordant bipartitions, maps gene duplications, and outputs summary statistics such as internode certainy (ICA) scores and node-specific counts of gene duplications.


Subject(s)
Gene Duplication , Gene Transfer, Horizontal , Magnoliopsida/genetics , Animals , Genomics , Magnoliopsida/physiology , Phylogeny , Software , Wasps/classification , Wasps/genetics
16.
Mol Phylogenet Evol ; 92: 155-64, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26140861

ABSTRACT

The phylogeny of Galliformes (landfowl) has been studied extensively; however, the associated chronologies have been criticized recently due to misplaced or misidentified fossil calibrations. As a consequence, it is unclear whether any crown-group lineages arose in the Cretaceous and survived the Cretaceous-Paleogene (K-Pg; 65.5 Ma) mass extinction. Using Bayesian phylogenetic inference on an alignment spanning 14,539 bp of mitochondrial and nuclear DNA sequence data, four fossil calibrations, and a combination of uncorrelated lognormally distributed relaxed-clock and strict-clock models, we inferred a time-calibrated molecular phylogeny for 225 of the 291 extant Galliform taxa. These analyses suggest that crown Galliformes diversified in the Cretaceous and that three-stem lineages survived the K-Pg mass extinction. Ideally, characterizing the tempo and mode of diversification involves a taxonomically complete phylogenetic hypothesis. We used simple constraint structures to incorporate 66 data-deficient taxa and inferred the first taxon-complete phylogenetic hypothesis for the Galliformes. Diversification analyses conducted on 10,000 timetrees sampled from the posterior distribution of candidate trees show that the evolutionary history of the Galliformes is best explained by a rate-shift model including 1-3 clade-specific increases in diversification rate. We further show that the tempo and mode of diversification in the Galliformes conforms to a three-pulse model, with three-stem lineages arising in the Cretaceous and inter and intrafamilial diversification occurring after the K-Pg mass extinction, in the Paleocene-Eocene (65.5-33.9 Ma) or in association with the Eocene-Oligocene transition (33.9 Ma).


Subject(s)
Galliformes/genetics , Phylogeny , Animals , Bayes Theorem , Calibration , Fossils , Time Factors
17.
New Phytol ; 207(2): 454-467, 2015 Jul.
Article in English | MEDLINE | ID: mdl-26053261

ABSTRACT

Our growing understanding of the plant tree of life provides a novel opportunity to uncover the major drivers of angiosperm diversity. Using a time-calibrated phylogeny, we characterized hot and cold spots of lineage diversification across the angiosperm tree of life by modeling evolutionary diversification using stepwise AIC (MEDUSA). We also tested the whole-genome duplication (WGD) radiation lag-time model, which postulates that increases in diversification tend to lag behind established WGD events. Diversification rates have been incredibly heterogeneous throughout the evolutionary history of angiosperms and reveal a pattern of 'nested radiations' - increases in net diversification nested within other radiations. This pattern in turn generates a negative relationship between clade age and diversity across both families and orders. We suggest that stochastically changing diversification rates across the phylogeny explain these patterns. Finally, we demonstrate significant statistical support for the WGD radiation lag-time model. Across angiosperms, nested shifts in diversification led to an overall increasing rate of net diversification and declining relative extinction rates through time. These diversification shifts are only rarely perfectly associated with WGD events, but commonly follow them after a lag period.


Subject(s)
Biodiversity , Biological Evolution , Genome, Plant , Magnoliopsida/genetics , Phylogeny , Evolution, Molecular , Models, Genetic
18.
Bioinformatics ; 31(17): 2794-800, 2015 Sep 01.
Article in English | MEDLINE | ID: mdl-25940563

ABSTRACT

MOTIVATION: Phylogenetic estimates from published studies can be archived using general platforms like Dryad (Vision, 2010) or TreeBASE (Sanderson et al., 1994). Such services fulfill a crucial role in ensuring transparency and reproducibility in phylogenetic research. However, digital tree data files often require some editing (e.g. rerooting) to improve the accuracy and reusability of the phylogenetic statements. Furthermore, establishing the mapping between tip labels used in a tree and taxa in a single common taxonomy dramatically improves the ability of other researchers to reuse phylogenetic estimates. As the process of curating a published phylogenetic estimate is not error-free, retaining a full record of the provenance of edits to a tree is crucial for openness, allowing editors to receive credit for their work and making errors introduced during curation easier to correct. RESULTS: Here, we report the development of software infrastructure to support the open curation of phylogenetic data by the community of biologists. The backend of the system provides an interface for the standard database operations of creating, reading, updating and deleting records by making commits to a git repository. The record of the history of edits to a tree is preserved by git's version control features. Hosting this data store on GitHub (http://github.com/) provides open access to the data store using tools familiar to many developers. We have deployed a server running the 'phylesystem-api', which wraps the interactions with git and GitHub. The Open Tree of Life project has also developed and deployed a JavaScript application that uses the phylesystem-api and other web services to enable input and curation of published phylogenetic statements. AVAILABILITY AND IMPLEMENTATION: Source code for the web service layer is available at https://github.com/OpenTreeOfLife/phylesystem-api. The data store can be cloned from: https://github.com/OpenTreeOfLife/phylesystem. A web application that uses the phylesystem web services is deployed at http://tree.opentreeoflife.org/curator. Code for that tool is available from https://github.com/OpenTreeOfLife/opentree. CONTACT: mtholder@gmail.com.


Subject(s)
Computational Biology/methods , Databases, Factual , Information Storage and Retrieval , Phylogeny , Software , Humans , Internet , Programming Languages , Reproducibility of Results , User-Computer Interface
19.
Bioinformatics ; 30(15): 2216-8, 2014 Aug 01.
Article in English | MEDLINE | ID: mdl-24728855

ABSTRACT

SUMMARY: Phylogenetic comparative methods are essential for addressing evolutionary hypotheses with interspecific data. The scale and scope of such data have increased dramatically in the past few years. Many existing approaches are either computationally infeasible or inappropriate for data of this size. To address both of these problems, we present geiger v2.0, a complete overhaul of the popular R package geiger. We have reimplemented existing methods with more efficient algorithms and have developed several new approaches for accomodating heterogeneous models and data types. AVAILABILITY AND IMPLEMENTATION: This R package is available on the CRAN repository http://cran.r-project.org/web/packages/geiger/. All source code is also available on github http://github.com/mwpennell/geiger-v2. geiger v2.0 depends on the ape package. CONTACT: mwpennell@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Biological Evolution , Computational Biology/methods , Models, Biological , Phylogeny , Programming Languages , Algorithms , Bayes Theorem , Likelihood Functions
20.
New Phytol ; 202(4): 1382-1397, 2014 Jun.
Article in English | MEDLINE | ID: mdl-24611540

ABSTRACT

Succulent plants are widely distributed, reaching their highest diversity in arid and semi-arid regions. Their origin and diversification is thought to be associated with a global expansion of aridity. We test this hypothesis by investigating the tempo and pattern of Cactaceae diversification. Our results contribute to the understanding of the evolution of New World Succulent Biomes. We use the most taxonomically complete dataset currently available for Cactaceae. We estimate divergence times and utilize Bayesian and maximum likelihood methods that account for nonrandom taxonomic sampling, possible extinction scenarios and phylogenetic uncertainty to analyze diversification rates, and evolution of growth form and pollination syndrome. Cactaceae originated shortly after the Eocene-Oligocene global drop in CO2 , and radiation of its richest genera coincided with the expansion of aridity in North America during the late Miocene. A significant correlation between growth form and pollination syndrome was found, as well as a clear state dependence between diversification rate, and pollination and growth-form evolution. This study suggests a complex picture underlying the diversification of Cactaceae. It not only responded to the availability of new niches resulting from aridification, but also to the correlated evolution of novel growth forms and reproductive strategies.


Subject(s)
Cactaceae/genetics , Biodiversity , Biological Evolution , Cactaceae/physiology , Phylogeny
SELECTION OF CITATIONS
SEARCH DETAIL
...