Search | VHL Regional Portal

1.

Efficient Bayesian inference under the multispecies coalescent with migration.

Flouri, Tomás; Jiao, Xiyun; Huang, Jun; Rannala, Bruce; Yang, Ziheng.

Proc Natl Acad Sci U S A ; 120(44): e2310708120, 2023 Oct 31.

Article in English | MEDLINE | ID: mdl-37871206

ABSTRACT

Analyses of genome sequence data have revealed pervasive interspecific gene flow and enriched our understanding of the role of gene flow in speciation and adaptation. Inference of gene flow using genomic data requires powerful statistical methods. Yet current likelihood-based methods involve heavy computation and are feasible for small datasets only. Here, we implement the multispecies-coalescent-with-migration model in the Bayesian program bpp, which can be used to test for gene flow and estimate migration rates, as well as species divergence times and population sizes. We develop Markov chain Monte Carlo algorithms for efficient sampling from the posterior, enabling the analysis of genome-scale datasets with thousands of loci. Implementation of both introgression and migration models in the same program allows us to test whether gene flow occurred continuously over time or in pulses. Analyses of genomic data from Anopheles mosquitoes demonstrate rich information in typical genomic datasets about the mode and rate of gene flow.

Subject(s)

Algorithms , Gene Flow , Animals , Phylogeny , Computer Simulation , Bayes Theorem , Likelihood Functions , Models, Genetic

2.

Phylogenies increase power to detect highly transmissible viral genome variants.

May, Michael R; Rannala, Bruce.

medRxiv ; 2023 Aug 05.

Article in English | MEDLINE | ID: mdl-37577556

ABSTRACT

As demonstrated by the SARS-CoV-2 pandemic, the emergence of novel viral strains with increased transmission rates poses a significant threat to global health. Viral genome sequences, combined with statistical models of sequence evolution, may provide a critical tool for early detection of these strains. Using a novel statistical model that links transmission rates to the entire viral genome sequence, we study the power of phylogenetic methods-using a phylogenetic tree relating viral samples-and count-based methods-using case-counts of variants over time-to detect increased transmission rates, and to identify causative mutations. We find that phylogenies in particular can detect novel variants very soon after their origin, and may facilitate the development of early detection systems for outbreak surveillance.

3.

Bayesian phylogenetic inference of HIV latent lineage ages using serial sequences.

Nagel, Anna A; Rannala, Bruce.

J R Soc Interface ; 20(201): 20230022, 2023 04.

Article in English | MEDLINE | ID: mdl-37073519

ABSTRACT

HIV evolves rapidly within individuals, allowing phylogenetic studies to infer histories of viral lineages on short time scales. Latent HIV sequences are an exception to this rapid evolution, as their transcriptional inactivity leads to negligible mutation rates compared with non-latent HIV lineages. This difference in mutation rates generates potential information about the times at which sequences entered the latent reservoir, providing insight into the dynamics of the latent reservoir. A Bayesian phylogenetic method is developed to infer integration times of latent HIV sequences. The method uses informative priors to incorporate biologically sensible bounds on inferences (such as requiring sequences to become latent before being sampled) that many existing methods lack. A new simulation method is also developed, based on widely used epidemiological models of within-host viral dynamics, and applied to evaluate the new method-showing that point estimates and credible intervals are often more accurate than existing methods. Accurate estimates of latent integration dates are crucial in relating integration times to key events during HIV infection, such as treatment initiation. The method is applied to publicly available sequence data from four HIV patients, providing new insights regarding the temporal pattern of latent integration.

Subject(s)

HIV Infections , Humans , Phylogeny , HIV Infections/epidemiology , Bayes Theorem , Computer Simulation

4.

Estimation of species divergence times in presence of cross-species gene flow.

Tiley, George P; Flouri, Tomás; Jiao, Xiyun; Poelstra, Jelmer W; Xu, Bo; Zhu, Tianqi; Rannala, Bruce; Yoder, Anne D; Yang, Ziheng.

Syst Biol ; 72(4): 820-836, 2023 08 07.

Article in English | MEDLINE | ID: mdl-36961245

ABSTRACT

Cross-species introgression can have significant impacts on phylogenomic reconstruction of species divergence events. Here, we used simulations to show how the presence of even a small amount of introgression can bias divergence time estimates when gene flow is ignored in the analysis. Using advances in analytical methods under the multispecies coalescent (MSC) model, we demonstrate that by accounting for incomplete lineage sorting and introgression using large phylogenomic data sets this problem can be avoided. The multispecies-coalescent-with-introgression (MSci) model is capable of accurately estimating both divergence times and ancestral effective population sizes, even when only a single diploid individual per species is sampled. We characterize some general expectations for biases in divergence time estimation under three different scenarios: 1) introgression between sister species, 2) introgression between non-sister species, and 3) introgression from an unsampled (i.e., ghost) outgroup lineage. We also conducted simulations under the isolation-with-migration (IM) model and found that the MSci model assuming episodic gene flow was able to accurately estimate species divergence times despite high levels of continuous gene flow. We estimated divergence times under the MSC and MSci models from two published empirical datasets with previous evidence of introgression, one of 372 target-enrichment loci from baobabs (Adansonia), and another of 1000 transcriptome loci from 14 species of the tomato relative, Jaltomata. The empirical analyses not only confirm our findings from simulations, demonstrating that the MSci model can reliably estimate divergence times but also show that divergence time estimation under the MSC can be robust to the presence of small amounts of introgression in empirical datasets with extensive taxon sampling. [divergence time; gene flow; hybridization; introgression; MSci model; multispecies coalescent].

Subject(s)

Gene Flow , Hybridization, Genetic , Phylogeny , Models, Genetic

5.

Model misspecification misleads inference of the spatial dynamics of disease outbreaks.

Gao, Jiansi; May, Michael R; Rannala, Bruce; Moore, Brian R.

Proc Natl Acad Sci U S A ; 120(11): e2213913120, 2023 03 14.

Article in English | MEDLINE | ID: mdl-36897983

ABSTRACT

Epidemiology has been transformed by the advent of Bayesian phylodynamic models that allow researchers to infer the geographic history of pathogen dispersal over a set of discrete geographic areas [1, 2]. These models provide powerful tools for understanding the spatial dynamics of disease outbreaks, but contain many parameters that are inferred from minimal geographic information (i.e., the single area in which each pathogen was sampled). Consequently, inferences under these models are inherently sensitive to our prior assumptions about the model parameters. Here, we demonstrate that the default priors used in empirical phylodynamic studies make strong and biologically unrealistic assumptions about the underlying geographic process. We provide empirical evidence that these unrealistic priors strongly (and adversely) impact commonly reported aspects of epidemiological studies, including: 1) the relative rates of dispersal between areas; 2) the importance of dispersal routes for the spread of pathogens among areas; 3) the number of dispersal events between areas, and; 4) the ancestral area in which a given outbreak originated. We offer strategies to avoid these problems, and develop tools to help researchers specify more biologically reasonable prior models that will realize the full potential of phylodynamic methods to elucidate pathogen biology and, ultimately, inform surveillance and monitoring policies to mitigate the impacts of disease outbreaks.

Subject(s)

Disease Outbreaks , Phylogeny , Bayes Theorem

6.

PrioriTree: a utility for improving phylodynamic analyses in BEAST.

Gao, Jiansi; May, Michael R; Rannala, Bruce; Moore, Brian R.

Bioinformatics ; 39(1)2023 01 01.

Article in English | MEDLINE | ID: mdl-36592035

ABSTRACT

SUMMARY: Phylodynamic methods are central to studies of the geographic and demographic history of disease outbreaks. Inference under discrete-geographic phylodynamic models-which involve many parameters that must be inferred from minimal information-is inherently sensitive to our prior beliefs about the model parameters. We present an interactive utility, PrioriTree, to help researchers identify and accommodate prior sensitivity in discrete-geographic inferences. Specifically, PrioriTree provides a suite of functions to generate input files for-and summarize output from-BEAST analyses for performing robust Bayesian inference, data-cloning analyses and assessing the relative and absolute fit of candidate discrete-geographic (prior) models to empirical datasets. AVAILABILITY AND IMPLEMENTATION: PrioriTree is distributed as an R package available at https://github.com/jsigao/prioritree, with a comprehensive user manual provided at https://bookdown.org/jsigao/prioritree_manual/.

Subject(s)

Disease Outbreaks , Software , Bayes Theorem

7.

An efficient exact algorithm for identifying hybrids using population genomic sequences.

Chakraborty, Sneha; Rannala, Bruce.

Genetics ; 223(4)2023 04 06.

Article in English | MEDLINE | ID: mdl-36708142

ABSTRACT

The identification of individuals that have a recent hybrid ancestry (between populations or species) has been a goal of naturalists for centuries. Since the 1960s, codominant genetic markers have been used with statistical and computational methods to identify F1 hybrids and backcrosses. Existing hybrid inference methods assume that alleles at different loci undergo independent assortment (are unlinked or in population linkage equilibrium). Genomic datasets include thousands of markers that are located on the same chromosome and are in population linkage disequilibrium which violate this assumption. Existing methods may therefore be viewed as composite likelihoods when applied to genomic datasets and their performance in identifying hybrid ancestry (which is a model-choice problem) is unknown. Here, we develop a new program Mongrail that implements a full-likelihood Bayesian hybrid inference method that explicitly models linkage and recombination, generating the posterior probability of different F1 or F2 hybrid, or backcross, genealogical classes. We use simulations to compare the statistical performance of Mongrail with that of an existing composite likelihood method (NewHybrids) and apply the method to analyze genome sequence data for hybridizing species of barred and spotted owls.

Subject(s)

Genetics, Population , Metagenomics , Bayes Theorem , Genomics

8.

New Phylogenetic Models Incorporating Interval-Specific Dispersal Dynamics Improve Inference of Disease Spread.

Gao, Jiansi; May, Michael R; Rannala, Bruce; Moore, Brian R.

Mol Biol Evol ; 39(8)2022 08 03.

Article in English | MEDLINE | ID: mdl-35861314

ABSTRACT

Phylodynamic methods reveal the spatial and temporal dynamics of viral geographic spread, and have featured prominently in studies of the COVID-19 pandemic. Virtually all such studies are based on phylodynamic models that assume-despite direct and compelling evidence to the contrary-that rates of viral geographic dispersal are constant through time. Here, we: (1) extend phylodynamic models to allow both the average and relative rates of viral dispersal to vary independently between pre-specified time intervals; (2) implement methods to infer the number and timing of viral dispersal events between areas; and (3) develop statistics to assess the absolute fit of discrete-geographic phylodynamic models to empirical datasets. We first validate our new methods using simulations, and then apply them to a SARS-CoV-2 dataset from the early phase of the COVID-19 pandemic. We show that: (1) under simulation, failure to accommodate interval-specific variation in the study data will severely bias parameter estimates; (2) in practice, our interval-specific discrete-geographic phylodynamic models can significantly improve the relative and absolute fit to empirical data; and (3) the increased realism of our interval-specific models provides qualitatively different inferences regarding key aspects of the COVID-19 pandemic-revealing significant temporal variation in global viral dispersal rates, viral dispersal routes, and the number of viral dispersal events between areas-and alters interpretations regarding the efficacy of intervention measures to mitigate the pandemic.

Subject(s)

COVID-19 , Pandemics , COVID-19/epidemiology , Humans , Phylogeny , Phylogeography , SARS-CoV-2/genetics

9.

Bayesian Phylogenetic Inference using Relaxed-clocks and the Multispecies Coalescent.

Flouri, Tomás; Huang, Jun; Jiao, Xiyun; Kapli, Paschalia; Rannala, Bruce; Yang, Ziheng.

Mol Biol Evol ; 39(8)2022 08 03.

Article in English | MEDLINE | ID: mdl-35907248

ABSTRACT

The multispecies coalescent (MSC) model accommodates both species divergences and within-species coalescent and provides a natural framework for phylogenetic analysis of genomic data when the gene trees vary across the genome. The MSC model implemented in the program bpp assumes a molecular clock and the Jukes-Cantor model, and is suitable for analyzing genomic data from closely related species. Here we extend our implementation to more general substitution models and relaxed clocks to allow the rate to vary among species. The MSC-with-relaxed-clock model allows the estimation of species divergence times and ancestral population sizes using genomic sequences sampled from contemporary species when the strict clock assumption is violated, and provides a simulation framework for evaluating species tree estimation methods. We conducted simulations and analyzed two real datasets to evaluate the utility of the new models. We confirm that the clock-JC model is adequate for inference of shallow trees with closely related species, but it is important to account for clock violation for distant species. Our simulation suggests that there is valuable phylogenetic information in the gene-tree branch lengths even if the molecular clock assumption is seriously violated, and the relaxed-clock models implemented in bpp are able to extract such information. Our Markov chain Monte Carlo algorithms suffer from mixing problems when used for species tree estimation under the relaxed clock and we discuss possible improvements. We conclude that the new models are currently most effective for estimating population parameters such as species divergence times when the species tree is fixed.

Subject(s)

Models, Genetic , Bayes Theorem , Computer Simulation , Markov Chains , Monte Carlo Method , Phylogeny

10.

Haplotype analysis of the internationally distributed BRCA1 c.3331_3334delCAAG founder mutation reveals a common ancestral origin in Iberia.

Tuazon, Anna Marie De Asis; Lott, Paul; Bohórquez, Mabel; Benavides, Jennyfer; Ramirez, Carolina; Criollo, Angel; Estrada-Florez, Ana; Mateus, Gilbert; Velez, Alejandro; Carmona, Jenny; Olaya, Justo; Garcia, Elisha; Polanco-Echeverry, Guadalupe; Stultz, Jacob; Alvarez, Carolina; Tapia, Teresa; Ashton-Prolla, Patricia; Vega, Ana; Lazaro, Conxi; Tornero, Eva; Martinez-Bouzas, Cristina; Infante, Mar; De La Hoya, Miguel; Diez, Orland; Browning, Brian L; Rannala, Bruce; Teixeira, Manuel R; Carvallo, Pilar; Echeverry, Magdalena; Carvajal-Carmona, Luis G.

Breast Cancer Res ; 22(1): 108, 2020 10 21.

Article in English | MEDLINE | ID: mdl-33087180

ABSTRACT

BACKGROUND: The BRCA1 c.3331_3334delCAAG founder mutation has been reported in hereditary breast and ovarian cancer families from multiple Hispanic groups. We aimed to evaluate BRCA1 c.3331_3334delCAAG haplotype diversity in cases of European, African, and Latin American ancestry. METHODS: BC mutation carrier cases from Colombia (n = 32), Spain (n = 13), Portugal (n = 2), Chile (n = 10), Africa (n = 1), and Brazil (n = 2) were genotyped with the genome-wide single nucleotide polymorphism (SNP) arrays to evaluate haplotype diversity around BRCA1 c.3331_3334delCAAG. Additional Portuguese (n = 13) and Brazilian (n = 18) BC mutation carriers were genotyped for 15 informative SNPs surrounding BRCA1. Data were phased using SHAPEIT2, and identical by descent regions were determined using BEAGLE and GERMLINE. DMLE+ was used to date the mutation in Colombia and Iberia. RESULTS: The haplotype reconstruction revealed a shared 264.4-kb region among carriers from all six countries. The estimated mutation age was ~ 100 generations in Iberia and that it was introduced to South America early during the European colonization period. CONCLUSIONS: Our results suggest that this mutation originated in Iberia and later introduced to Colombia and South America at the time of Spanish colonization during the early 1500s. We also found that the Colombian mutation carriers had higher European ancestry, at the BRCA1 gene harboring chromosome 17, than controls, which further supported the European origin of the mutation. Understanding founder mutations in diverse populations has implications in implementing cost-effective, ancestry-informed screening.

Subject(s)

BRCA1 Protein/genetics , Breast Neoplasms/epidemiology , Breast Neoplasms/genetics , Genetic Predisposition to Disease , Germ-Line Mutation , Haplotypes , Polymorphism, Single Nucleotide , Africa/epidemiology , Brazil/epidemiology , Chile/epidemiology , Chromosomes, Human, Pair 17/genetics , Colombia/epidemiology , Female , Founder Effect , Genome-Wide Association Study/methods , Humans , Portugal/epidemiology , Spain/epidemiology

11.

The Impact of Cross-Species Gene Flow on Species Tree Estimation.

Jiao, Xiyun; Flouri, Tomás; Rannala, Bruce; Yang, Ziheng.

Syst Biol ; 69(5): 830-847, 2020 09 01.

Article in English | MEDLINE | ID: mdl-31977022

ABSTRACT

Recent analyses of genomic sequence data suggest cross-species gene flow is common in both plants and animals, posing challenges to species tree estimation. We examine the levels of gene flow needed to mislead species tree estimation with three species and either episodic introgressive hybridization or continuous migration between an outgroup and one ingroup species. Several species tree estimation methods are examined, including the majority-vote method based on the most common gene tree topology (with either the true or reconstructed gene trees used), the UPGMA method based on the average sequence distances (or average coalescent times) between species, and the full-likelihood method based on multilocus sequence data. Our results suggest that the majority-vote method based on gene tree topologies is more robust to gene flow than the UPGMA method based on coalescent times and both are more robust than likelihood assuming a multispecies coalescent (MSC) model with no cross-species gene flow. Comparison of the continuous migration model with the episodic introgression model suggests that a small amount of gene flow per generation can cause drastic changes to the genetic history of the species and mislead species tree methods, especially if the species diverged through radiative speciation events. Estimates of parameters under the MSC with gene flow suggest that African mosquito species in the Anopheles gambiae species complex constitute such an example of extreme impact of gene flow on species phylogeny. [IM; introgression; migration; MSci; multispecies coalescent; species tree.].

Subject(s)

Classification/methods , Gene Flow , Models, Biological , Phylogeny , Animal Migration , Animals , Anopheles/classification , Anopheles/genetics

12.

A Bayesian Implementation of the Multispecies Coalescent Model with Introgression for Phylogenomic Analysis.

Flouri, Tomás; Jiao, Xiyun; Rannala, Bruce; Yang, Ziheng.

Mol Biol Evol ; 37(4): 1211-1223, 2020 04 01.

Article in English | MEDLINE | ID: mdl-31825513

ABSTRACT

Recent analyses suggest that cross-species gene flow or introgression is common in nature, especially during species divergences. Genomic sequence data can be used to infer introgression events and to estimate the timing and intensity of introgression, providing an important means to advance our understanding of the role of gene flow in speciation. Here, we implement the multispecies-coalescent-with-introgression model, an extension of the multispecies-coalescent model to incorporate introgression, in our Bayesian Markov chain Monte Carlo program Bpp. The multispecies-coalescent-with-introgression model accommodates deep coalescence (or incomplete lineage sorting) and introgression and provides a natural framework for inference using genomic sequence data. Computer simulation confirms the good statistical properties of the method, although hundreds or thousands of loci are typically needed to estimate introgression probabilities reliably. Reanalysis of data sets from the purple cone spruce confirms the hypothesis of homoploid hybrid speciation. We estimated the introgression probability using the genomic sequence data from six mosquito species in the Anopheles gambiae species complex, which varies considerably across the genome, likely driven by differential selection against introgressed alleles.

Subject(s)

Genetic Introgression , Models, Genetic , Phylogeny , Animals , Anopheles/genetics , Bayes Theorem , Picea/genetics , Saccharomycetales/genetics

13.

The Spectre of Too Many Species.

Leaché, Adam D; Zhu, Tianqi; Rannala, Bruce; Yang, Ziheng.

Syst Biol ; 68(1): 168-181, 2019 01 01.

Article in English | MEDLINE | ID: mdl-29982825

ABSTRACT

Recent simulation studies examining the performance of Bayesian species delimitation as implemented in the bpp program have suggested that bpp may detect population splits but not species divergences and that it tends to over-split when data of many loci are analyzed. Here, we confirm these results and provide the mathematical justifications. We point out that the distinction between population and species splits made in the protracted speciation model (PSM) has no influence on the generation of gene trees and sequence data, which explains why no method can use such data to distinguish between population splits and speciation. We suggest that the PSM is unrealistic as its mechanism for assigning species status assumes instantaneous speciation, contradicting prevailing taxonomic practice. We confirm the suggestion, based on simulation, that in the case of speciation with gene flow, Bayesian model selection as implemented in bpp tends to detect population splits when the amount of data (the number of loci) increases. We discuss the use of a recently proposed empirical genealogical divergence index (gdi) for species delimitation and illustrate that parameter estimates produced by a full likelihood analysis as implemented in bpp provide much more reliable inference under the gdi than the approximate method phrapl. We distinguish between Bayesian model selection and parameter estimation and suggest that the model selection approach is useful for identifying sympatric cryptic species, while the parameter estimation approach may be used to implement empirical criteria for determining species status among allopatric populations.

Subject(s)

Classification/methods , Genetic Speciation , Models, Biological , Bayes Theorem , Computer Simulation

14.

Species Tree Inference with BPP Using Genomic Sequences and the Multispecies Coalescent.

Flouri, Tomás; Jiao, Xiyun; Rannala, Bruce; Yang, Ziheng.

Mol Biol Evol ; 35(10): 2585-2593, 2018 10 01.

Article in English | MEDLINE | ID: mdl-30053098

ABSTRACT

The multispecies coalescent provides a natural framework for accommodating ancestral genetic polymorphism and coalescent processes that can cause different genomic regions to have different genealogical histories. The Bayesian program BPP includes a full-likelihood implementation of the multispecies coalescent, using transmodel Markov chain Monte Carlo to calculate the posterior probabilities of different species trees. BPP is suitable for analyzing multilocus sequence data sets and it accommodates the heterogeneity of gene trees (both the topology and branch lengths) among loci and gene tree uncertainties due to limited phylogenetic information at each locus. Here, we provide a practical guide to the use of BPP in species tree estimation. BPP is a command-line program that runs on linux, macosx, and windows. This protocol shows how to use both BPP 3.4 (http://abacus.gene.ucl.ac.uk/software/) and BPP 4.0 (https://github.com/bpp/).

Subject(s)

Genetic Techniques , Phylogeny , Software , Animals , Bayes Theorem , Humans , Ranidae

15.

Bayesian species identification under the multispecies coalescent provides significant improvements to DNA barcoding analyses.

Yang, Ziheng; Rannala, Bruce.

Mol Ecol ; 26(11): 3028-3036, 2017 Jun.

Article in English | MEDLINE | ID: mdl-28281309

ABSTRACT

DNA barcoding methods use a single locus (usually the mitochondrial COI gene) to assign unidentified specimens to known species in a library based on a genetic distance threshold that distinguishes between-species divergence from within-species diversity. Recently developed species delimitation methods based on the multispecies coalescent (MSC) model offer an alternative approach to individual assignment using either single-locus or multiloci sequence data. Here, we use simulations to demonstrate three features of an MSC method implemented in the program bpp. First, we show that with one locus, MSC can accurately assign individuals to species without the need for arbitrarily determined distance thresholds (as required for barcoding methods). We provide an example in which no single threshold or barcoding gap exists that can be used to assign all specimens without incurring high error rates. Second, we show that bpp can identify cryptic species that may be misidentified as a single species within the library, potentially improving the accuracy of barcoding libraries. Third, we show that taxon rarity does not present any particular problems for species assignments using bpp and that accurate assignments can be achieved even when only one or a few loci are available. Thus, concerns that have been raised that MSC methods may have problems analysing rare taxa (singletons) are unfounded. Currently, barcoding methods enjoy a huge computational advantage over MSC methods and may be the only approach feasible for massively large data sets, but MSC methods may offer a more stringent test for species that are tentatively assigned by barcoding.

Subject(s)

DNA Barcoding, Taxonomic , Models, Genetic , Bayes Theorem , Computer Simulation , Gene Library , Genes, Mitochondrial

16.

Efficient Bayesian Species Tree Inference under the Multispecies Coalescent.

Rannala, Bruce; Yang, Ziheng.

Syst Biol ; 66(5): 823-842, 2017 09 01.

Article in English | MEDLINE | ID: mdl-28053140

ABSTRACT

We develop a Bayesian method for inferring the species phylogeny under the multispecies coalescent (MSC) model. To improve the mixing properties of the Markov chain Monte Carlo (MCMC) algorithm that traverses the space of species trees, we implement two efficient MCMC proposals: the first is based on the Subtree Pruning and Regrafting (SPR) algorithm and the second is based on a node-slider algorithm. Like the Nearest-Neighbor Interchange (NNI) algorithm we implemented previously, both new algorithms propose changes to the species tree, while simultaneously altering the gene trees at multiple genetic loci to automatically avoid conflicts with the newly proposed species tree. The method integrates over gene trees, naturally taking account of the uncertainty of gene tree topology and branch lengths given the sequence data. A simulation study was performed to examine the statistical properties of the new method. The method was found to show excellent statistical performance, inferring the correct species tree with near certainty when 10 loci were included in the dataset. The prior on species trees has some impact, particularly for small numbers of loci. We analyzed several previously published datasets (both real and simulated) for rattlesnakes and Philippine shrews, in comparison with alternative methods. The results suggest that the Bayesian coalescent-based method is statistically more efficient than heuristic methods based on summary statistics, and that our implementation is computationally more efficient than alternative full-likelihood methods under the MSC. Parameter estimates for the rattlesnake data suggest drastically different evolutionary dynamics between the nuclear and mitochondrial loci, even though they support largely consistent species trees. We discuss the different challenges facing the marginal likelihood calculation and transmodel MCMC as alternative strategies for estimating posterior probabilities for species trees. [Bayes factor; Bayesian inference; MCMC; multispecies coalescent; nodeslider; species tree; SPR.].

Subject(s)

Classification/methods , Models, Biological , Phylogeny , Algorithms , Animals , Bayes Theorem , Computer Simulation , Crotalus/classification , Crotalus/genetics , Shrews/classification , Shrews/genetics

17.

Critically evaluating the theory and performance of Bayesian analysis of macroevolutionary mixtures.

Moore, Brian R; Höhna, Sebastian; May, Michael R; Rannala, Bruce; Huelsenbeck, John P.

Proc Natl Acad Sci U S A ; 113(34): 9569-74, 2016 08 23.

Article in English | MEDLINE | ID: mdl-27512038

ABSTRACT

Bayesian analysis of macroevolutionary mixtures (BAMM) has recently taken the study of lineage diversification by storm. BAMM estimates the diversification-rate parameters (speciation and extinction) for every branch of a study phylogeny and infers the number and location of diversification-rate shifts across branches of a tree. Our evaluation of BAMM reveals two major theoretical errors: (i) the likelihood function (which estimates the model parameters from the data) is incorrect, and (ii) the compound Poisson process prior model (which describes the prior distribution of diversification-rate shifts across branches) is incoherent. Using simulation, we demonstrate that these theoretical issues cause statistical pathologies; posterior estimates of the number of diversification-rate shifts are strongly influenced by the assumed prior, and estimates of diversification-rate parameters are unreliable. Moreover, the inability to correctly compute the likelihood or to correctly specify the prior for rate-variable trees precludes the use of Bayesian approaches for testing hypotheses regarding the number and location of diversification-rate shifts using BAMM.

Subject(s)

Biological Coevolution , Extinction, Biological , Genetic Speciation , Phylogeny , Whales/classification , Animals , Bayes Theorem , Biodiversity , Likelihood Functions , Poisson Distribution , Whales/genetics

18.

Conceptual issues in Bayesian divergence time estimation.

Rannala, Bruce.

Philos Trans R Soc Lond B Biol Sci ; 371(1699)2016 07 19.

Article in English | MEDLINE | ID: mdl-27325831

ABSTRACT

Bayesian inference of species divergence times is an unusual statistical problem, because the divergence time parameters are not identifiable unless both fossil calibrations and sequence data are available. Commonly used marginal priors on divergence times derived from fossil calibrations may conflict with node order on the phylogenetic tree causing a change in the prior on divergence times for a particular topology. Care should be taken to avoid confusing this effect with changes due to informative sequence data. This effect is illustrated with examples. A topology-consistent prior that preserves the marginal priors is defined and examples are constructed. Conflicts between fossil calibrations and relative branch lengths (based on sequence data) can cause estimates of divergence times that are grossly incorrect, yet have a narrow posterior distribution. An example of this effect is given; it is recommended that overly narrow posterior distributions of divergence times should be carefully scrutinized.This article is part of the themed issue 'Dating species divergences using rocks and clocks'.

Subject(s)

Evolution, Molecular , Fossils , Phylogeny , Bayes Theorem , Time Factors

19.

A Glance at Recombination Hotspots in the Domestic Cat.

Alhaddad, Hasan; Zhang, Chi; Rannala, Bruce; Lyons, Leslie A.

PLoS One ; 11(2): e0148710, 2016.

Article in English | MEDLINE | ID: mdl-26859385

ABSTRACT

Recombination has essential roles in increasing genetic variability within a population and in ensuring successful meiotic events. The objective of this study is to (i) infer the population-scaled recombination rate (ρ), and (ii) identify and characterize regions of increased recombination rate for the domestic cat, Felis silvestris catus. SNPs (n = 701) were genotyped in twenty-two East Asian feral cats (random bred). The SNPs covered ten different chromosomal regions (A1, A2, B3, C2, D1, D2, D4, E2, F2, X) with an average region size of 850 Kb and an average SNP density of 70 SNPs/region. The Bayesian method in the program inferRho was used to infer regional population recombination rates and hotspots localities. The regions exhibited variable population recombination rates and four decisive recombination hotspots were identified on cat chromosome A2, D1, and E2 regions. As a description of the identified hotspots, no correlation was detected between the GC content and the locality of recombination spots, and the hotspots enclosed L2 LINE elements and MIR and tRNA-Lys SINE elements.

Subject(s)

Cats/genetics , Recombination, Genetic , Animals , Base Composition , Bayes Theorem , China , Female , Genetics, Population , Long Interspersed Nucleotide Elements , Male , Models, Genetic , Polymorphism, Single Nucleotide , Short Interspersed Nucleotide Elements

20.

Unguided species delimitation using DNA sequence data from multiple Loci.

Yang, Ziheng; Rannala, Bruce.

Mol Biol Evol ; 31(12): 3125-35, 2014 Dec.

Article in English | MEDLINE | ID: mdl-25274273

ABSTRACT

A method was developed for simultaneous Bayesian inference of species delimitation and species phylogeny using the multispecies coalescent model. The method eliminates the need for a user-specified guide tree in species delimitation and incorporates phylogenetic uncertainty in a Bayesian framework. The nearest-neighbor interchange algorithm was adapted to propose changes to the species tree, with the gene trees for multiple loci altered in the proposal to avoid conflicts with the newly proposed species tree. We also modify our previous scheme for specifying priors for species delimitation models to construct joint priors for models of species delimitation and species phylogeny. As in our earlier method, the modified algorithm integrates over gene trees, taking account of the uncertainty of gene tree topology and branch lengths given the sequence data. We conducted a simulation study to examine the statistical properties of the method using six populations (two sequences each) and a true number of three species, with values of divergence times and ancestral population sizes that are realistic for recently diverged species. The results suggest that the method tends to be conservative with high posterior probabilities being a confident indicator of species status. Simulation results also indicate that the power of the method to delimit species increases with an increase of the divergence times in the species tree, and with an increased number of gene loci. Reanalyses of two data sets of cavefish and coast horned lizards suggest considerable phylogenetic uncertainty even though the data are informative about species delimitation. We discuss the impact of the prior on models of species delimitation and species phylogeny and of the prior on population size parameters (Î¸) on Bayesian species delimitation.

Subject(s)

Models, Genetic , Multilocus Sequence Typing , Algorithms , Animals , Bayes Theorem , Computer Simulation , Fishes/genetics , Lizards/genetics , Markov Chains , Monte Carlo Method , Phylogeny

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL