Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 36
Filter
Add more filters










Publication year range
1.
PeerJ ; 12: e17276, 2024.
Article in English | MEDLINE | ID: mdl-38699195

ABSTRACT

In this article, we study the distance matrix as a representation of a phylogeny by way of hierarchical clustering. By defining a multivariate normal distribution on (a subset of) the entries in a matrix, this allows us to represent a distribution over rooted time trees. Here, we demonstrate tree distributions can be represented accurately this way for a number of published tree distributions. Though such a representation does not map to unique trees, restriction to a subspace, in particular one we call a "cube", makes the representation bijective at the cost of not being able to represent all possible trees. We introduce an algorithm "cubeVB" specifically for cubes and show through well calibrated simulation study that it is possible to recover parameters of interest like tree height and length. Although a cube cannot represent all of tree space, it is a great improvement over a single summary tree, and it opens up exciting new opportunities for scaling up Bayesian phylogenetic inference. We also demonstrate how to use a matrix representation of a tree distribution to get better summary trees than commonly used maximum clade credibility trees. An open source implementation of the cubeVB algorithm is available from https://github.com/rbouckaert/cubevb as the cubevb package for BEAST 2.


Subject(s)
Algorithms , Bayes Theorem , Phylogeny , Cluster Analysis , Computer Simulation
2.
bioRxiv ; 2024 Mar 13.
Article in English | MEDLINE | ID: mdl-38496513

ABSTRACT

The spread of infectious diseases is shaped by spatial and temporal aspects, such as host population structure or changes in the transmission rate or number of infected individuals over time. These spatiotemporal dynamics are imprinted in the genome of pathogens and can be recovered from those genomes using phylodynamics methods. However, phylodynamic methods typically quantify either the temporal or spatial transmission dynamics, which leads to unclear biases, as one can potentially not be inferred without the other. Here, we address this challenge by introducing a structured coalescent skyline approach, MASCOT-Skyline that allows us to jointly infer spatial and temporal transmission dynamics of infectious diseases using Markov chain Monte Carlo inference. To do so, we model the effective population size dynamics in different locations using a non-parametric function, allowing us to approximate a range of population size dynamics. We show, using a range of different viral outbreak datasets, potential issues with phylogeographic methods. We then use these viral datasets to motivate simulations of outbreaks that illuminate the nature of biases present in the different phylogeographic methods. We show that spatial and temporal dynamics should be modeled jointly even if one seeks to recover just one of the two. Further, we showcase conditions under which we can expect phylogeographic analyses to be biased, particularly different subsampling approaches, as well as provide recommendations of when we can expect them to perform well. We implemented MASCOT-Skyline as part of the open-source software package MASCOT for the Bayesian phylodynamics platform BEAST2.

3.
Nucleic Acids Res ; 52(2): 558-571, 2024 Jan 25.
Article in English | MEDLINE | ID: mdl-38048305

ABSTRACT

How genetic information gained its exquisite control over chemical processes needed to build living cells remains an enigma. Today, the aminoacyl-tRNA synthetases (AARS) execute the genetic codes in all living systems. But how did the AARS that emerged over three billion years ago as low-specificity, protozymic forms then spawn the full range of highly-specific enzymes that distinguish between 22 diverse amino acids? A phylogenetic reconstruction of extant AARS genes, enhanced by analysing modular acquisitions, reveals six AARS with distinct bacterial, archaeal, eukaryotic, or organellar clades, resulting in a total of 36 families of AARS catalytic domains. Small structural modules that differentiate one AARS family from another played pivotal roles in discriminating between amino acid side chains, thereby expanding the genetic code and refining its precision. The resulting model shows a tendency for less elaborate enzymes, with simpler catalytic domains, to activate amino acids that were not synthesised until later in the evolution of the code. The most probable evolutionary route for an emergent amino acid type to establish a place in the code was by recruiting older, less specific AARS, rather than adapting contemporary lineages. This process, retrofunctionalisation, differs from previously described mechanisms through which amino acids would enter the code.


Subject(s)
Amino Acyl-tRNA Synthetases , Evolution, Molecular , Genetic Code , Amino Acids/genetics , Amino Acids/metabolism , Amino Acyl-tRNA Synthetases/chemistry , Amino Acyl-tRNA Synthetases/genetics , Amino Acyl-tRNA Synthetases/metabolism , Bacteria/enzymology , Bacteria/genetics , Phylogeny , Archaea/enzymology , Archaea/genetics , Eukaryota/enzymology , Eukaryota/genetics
4.
Commun Med (Lond) ; 3(1): 97, 2023 Jul 13.
Article in English | MEDLINE | ID: mdl-37443390

ABSTRACT

BACKGROUND: The emergence of highly transmissible SARS-CoV-2 variants has led to surges in cases and the need for global genomic surveillance. While some variants rapidly spread worldwide, other variants only persist nationally. There is a need for more fine-scale analysis to understand transmission dynamics at a country scale. For instance, the Mu variant of interest, also known as lineage B.1.621, was first detected in Colombia and was responsible for a large local wave but only a few sporadic cases elsewhere. METHODS: To better understand the epidemiology of SARS-Cov-2 variants in Colombia, we used 14,049 complete SARS-CoV-2 genomes from the 32 states of Colombia. We performed Bayesian phylodynamic analyses to estimate the time of variants' introduction, their respective effective reproductive number, and effective population size, and the impact of disease control measures. RESULTS: Here, we detect a total of 188 SARS-CoV-2 Pango lineages circulating in Colombia since the pandemic's start. We show that the effective reproduction number oscillated drastically throughout the first two years of the pandemic, with Mu showing the highest transmissibility (Re and growth rate estimation). CONCLUSIONS: Our results reinforce that genomic surveillance programs are essential for countries to make evidence-driven interventions toward the emergence and circulation of novel SARS-CoV-2 variants.


Colombia reported its first COVID-19 case on 6th March 2020. By April 2022, the country had reported over 6 million infections and over 135,000 deaths. Here, we aim to understand how SARS-CoV-2, the virus that causes COVID-19, spread through Colombia over this time and how the predominant version of the virus (variant) changed over time. We found that there were multiple introductions of different variants from other countries into Colombia during the first two years of the pandemic. The Gamma variant was dominant earlier in 2021 but was replaced by the Delta variant. The Mu variant had the highest potential to be transmitted. Our findings provide valuable insights into the pandemic in Colombia and highlight the importance of continued surveillance of the virus to guide the public health response.

5.
Science ; 381(6656): eabg0818, 2023 07 28.
Article in English | MEDLINE | ID: mdl-37499002

ABSTRACT

The origins of the Indo-European language family are hotly disputed. Bayesian phylogenetic analyses of core vocabulary have produced conflicting results, with some supporting a farming expansion out of Anatolia ~9000 years before present (yr B.P.), while others support a spread with horse-based pastoralism out of the Pontic-Caspian Steppe ~6000 yr B.P. Here we present an extensive database of Indo-European core vocabulary that eliminates past inconsistencies in cognate coding. Ancestry-enabled phylogenetic analysis of this dataset indicates that few ancient languages are direct ancestors of modern clades and produces a root age of ~8120 yr B.P. for the family. Although this date is not consistent with the Steppe hypothesis, it does not rule out an initial homeland south of the Caucasus, with a subsequent branch northward onto the steppe and then across Europe. We reconcile this hybrid hypothesis with recently published ancient DNA evidence from the steppe and the northern Fertile Crescent.


Subject(s)
Language , Bayes Theorem , Europe , Farms , Language/history , Phylogeny
6.
Nat Commun ; 14(1): 3557, 2023 06 15.
Article in English | MEDLINE | ID: mdl-37322028

ABSTRACT

At over 0.6% of the population, Peru has one of the highest SARS-CoV-2 mortality rate in the world. Much effort to sequence genomes has been done in this country since mid-2020. However, an adequate analysis of the dynamics of the variants of concern and interest (VOCIs) is missing. We investigated the dynamics of the COVID-19 pandemic in Peru with a focus on the second wave, which had the greatest case fatality rate. The second wave in Peru was dominated by Lambda and Gamma. Analysis of the origin of Lambda shows that it most likely emerged in Peru before the second wave (June-November, 2020). After its emergence it reached Argentina and Chile from Peru where it was locally transmitted. During the second wave in Peru, we identify the coexistence of two Lambda and three Gamma sublineages. Lambda sublineages emerged in the center of Peru whereas the Gamma sublineages more likely originated in the north-east and mid-east. Importantly, it is observed that the center of Peru played a prominent role in transmitting SARS-CoV-2 to other regions within Peru.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , COVID-19/epidemiology , Pandemics , Peru/epidemiology , Argentina
7.
Proc Natl Acad Sci U S A ; 119(32): e2112853119, 2022 08 09.
Article in English | MEDLINE | ID: mdl-35914165

ABSTRACT

The Bantu expansion transformed the linguistic, economic, and cultural composition of sub-Saharan Africa. However, the exact dates and routes taken by the ancestors of the speakers of the more than 500 current Bantu languages remain uncertain. Here, we use the recently developed "break-away" geographical diffusion model, specially designed for modeling migrations, with "augmented" geographic information, to reconstruct the Bantu language family expansion. This Bayesian phylogeographic approach with augmented geographical data provides a powerful way of linking linguistic, archaeological, and genetic data to test hypotheses about large language family expansions. We compare four hypotheses: an early major split north of the rainforest; a migration through the Sangha River Interval corridor around 2,500 BP; a coastal migration around 4,000 BP; and a migration through the rainforest before the corridor opening, at 4,000 BP. Our results produce a topology and timeline for the Bantu language family, which supports the hypothesis of an expansion through Central African tropical forests at 4,420 BP (4,040 to 5,000 95% highest posterior density interval), well before the Sangha River Interval was open.


Subject(s)
Language , Rainforest , Africa, Central , Bayes Theorem , Black People , Human Migration , Humans , Phylogeography , Rivers
8.
Commun Biol ; 5(1): 755, 2022 07 28.
Article in English | MEDLINE | ID: mdl-35902726

ABSTRACT

We introduce a widely applicable species delimitation method based on the multispecies coalescent model that is more efficient and more biologically realistic than existing methods. We extend a threshold-based method to allow the ancestral speciation rate to vary through time as a smooth piecewise function. Furthermore, we introduce the cutting-edge proposal kernels of StarBeast3 to this model, thus enabling rapid species delimitation on large molecular datasets and allowing the use of relaxed molecular clock models. We validate these methods with genomic sequence data and SNP data, and show they are more efficient than existing methods at achieving parameter convergence during Bayesian MCMC. Lastly, we apply these methods to two datasets (Hemidactylus and Galagidae) and find inconsistencies with the published literature. Our methods are powerful for rapid quantitative testing of species boundaries in large multilocus datasets and are implemented as an open source BEAST 2 package called SPEEDEMON.


Subject(s)
Lizards , Animals , Bayes Theorem , Computer Simulation , Genome , Phylogeny
9.
Syst Biol ; 71(6): 1549-1560, 2022 10 12.
Article in English | MEDLINE | ID: mdl-35212733

ABSTRACT

We present a two-headed approach called Bayesian Integrated Coalescent Epoch PlotS (BICEPS) for efficient inference of coalescent epoch models. Firstly, we integrate out population size parameters, and secondly, we introduce a set of more powerful Markov chain Monte Carlo (MCMC) proposals for flexing and stretching trees. Even though population sizes are integrated out and not explicitly sampled through MCMC, we are still able to generate samples from the population size posteriors. This allows demographic reconstruction through time and estimating the timing and magnitude of population bottlenecks and full population histories. Altogether, BICEPS can be considered a more muscular version of the popular Bayesian skyline model. We demonstrate its power and correctness by a well-calibrated simulation study. Furthermore, we demonstrate with an application to SARS-CoV-2 genomic data that some analyses that have trouble converging with the traditional Bayesian skyline prior and standard MCMC proposals can do well with the BICEPS approach. BICEPS is available as open-source package for BEAST 2 under GPL license and has a user-friendly graphical user interface.[Bayesian phylogenetics; BEAST 2; BICEPS; coalescent model.].


Subject(s)
COVID-19 , Software , Algorithms , Bayes Theorem , Humans , Markov Chains , Models, Genetic , Monte Carlo Method , Phylogeny , SARS-CoV-2
10.
Syst Biol ; 71(4): 901-916, 2022 06 16.
Article in English | MEDLINE | ID: mdl-35176772

ABSTRACT

As genomic sequence data become increasingly available, inferring the phylogeny of the species as that of concatenated genomic data can be enticing. However, this approach makes for a biased estimator of branch lengths and substitution rates and an inconsistent estimator of tree topology. Bayesian multispecies coalescent (MSC) methods address these issues. This is achieved by constraining a set of gene trees within a species tree and jointly inferring both under a Bayesian framework. However, this approach comes at the cost of increased computational demand. Here, we introduce StarBeast3-a software package for efficient Bayesian inference under the MSC model via Markov chain Monte Carlo. We gain efficiency by introducing cutting-edge proposal kernels and adaptive operators, and StarBeast3 is particularly efficient when a relaxed clock model is applied. Furthermore, gene-tree inference is parallelized, allowing the software to scale with the size of the problem. We validated our software and benchmarked its performance using three real and two synthetic data sets. Our results indicate that StarBeast3 is up to one-and-a-half orders of magnitude faster than StarBeast2, and therefore more than two orders faster than *BEAST, depending on the data set and on the parameter, and can achieve convergence on large data sets with hundreds of genes. StarBeast3 is open-source and is easy to set up with a friendly graphical user interface. [Adaptive; Bayesian inference; BEAST 2; effective population sizes; high performance; multispecies coalescent; parallelization; phylogenetics.].


Subject(s)
Models, Genetic , Software , Bayes Theorem , Markov Chains , Monte Carlo Method , Phylogeny
11.
Int J Mol Sci ; 23(3)2022 Jan 28.
Article in English | MEDLINE | ID: mdl-35163448

ABSTRACT

The role of aminoacyl-tRNA synthetases (aaRS) in the emergence and evolution of genetic coding poses challenging questions concerning their provenance. We seek evidence about their ancestry from curated structure-based multiple sequence alignments of a structurally invariant "scaffold" shared by all 10 canonical Class I aaRS. Three uncorrelated phylogenetic metrics-mutation frequency, its uniformity, and row-by-row cladistic congruence-imply that the Class I scaffold is a mosaic assembled from successive genetic sources. Metrics for different modules vary in accordance with their presumed functionality. Sequences derived from the ATP- and amino acid- binding sites exhibit specific two-way coupling to those derived from Connecting Peptide 1, a third module whose metrics suggest later acquisition. The data help validate: (i) experimental fragmentations of the canonical Class I structure into three partitions that retain catalytic activities in proportion to their length; and (ii) evidence that the ancestral Class I aaRS gene also encoded a Class II ancestor in frame on the opposite strand. A 46-residue Class I "protozyme" roots the Class I tree prior to the adaptive radiation of the Rossmann dinucleotide binding fold that refined substrate discrimination. Such rooting implies near simultaneous emergence of genetic coding and the origin of the proteome, resolving a conundrum posed by previous inferences that Class I aaRS evolved after the genetic code had been implemented in an RNA world. Further, pinpointing discontinuous enhancements of aaRS fidelity establishes a timeline for the growth of coding from a binary amino acid alphabet.


Subject(s)
Amino Acyl-tRNA Synthetases/chemistry , Amino Acyl-tRNA Synthetases/genetics , Mutation , Benchmarking , Binding Sites , Evolution, Molecular , Genetic Code , Models, Molecular , Phylogeny , Protein Conformation , Sequence Homology, Amino Acid , Structural Homology, Protein
12.
Nature ; 599(7886): 616-621, 2021 11.
Article in English | MEDLINE | ID: mdl-34759322

ABSTRACT

The origin and early dispersal of speakers of Transeurasian languages-that is, Japanese, Korean, Tungusic, Mongolic and Turkic-is among the most disputed issues of Eurasian population history1-3. A key problem is the relationship between linguistic dispersals, agricultural expansions and population movements4,5. Here we address this question by 'triangulating' genetics, archaeology and linguistics in a unified perspective. We report wide-ranging datasets from these disciplines, including a comprehensive Transeurasian agropastoral and basic vocabulary; an archaeological database of 255 Neolithic-Bronze Age sites from Northeast Asia; and a collection of ancient genomes from Korea, the Ryukyu islands and early cereal farmers in Japan, complementing previously published genomes from East Asia. Challenging the traditional 'pastoralist hypothesis'6-8, we show that the common ancestry and primary dispersals of Transeurasian languages can be traced back to the first farmers moving across Northeast Asia from the Early Neolithic onwards, but that this shared heritage has been masked by extensive cultural interaction since the Bronze Age. As well as marking considerable progress in the three individual disciplines, by combining their converging evidence we show that the early spread of Transeurasian speakers was driven by agriculture.


Subject(s)
Agriculture/history , Archaeology , Genetics, Population , Human Migration/history , Language/history , Linguistics , China , Datasets as Topic , Geographic Mapping , History, Ancient , Humans , Japan , Korea , Mongolia
13.
Virus Evol ; 7(2): veab052, 2021.
Article in English | MEDLINE | ID: mdl-34527282

ABSTRACT

New Zealand, Australia, Iceland, and Taiwan all saw success in controlling their first waves of Coronavirus Disease 2019 (COVID-19). As islands, they make excellent case studies for exploring the effects of international travel and human movement on the spread of COVID-19. We employed a range of robust phylodynamic methods and genome subsampling strategies to infer the epidemiological history of Severe acute respiratory syndrome coronavirus 2 in these four countries. We compared these results to transmission clusters identified by the New Zealand Ministry of Health by contact tracing strategies. We estimated the effective reproduction number of COVID-19 as 1-1.4 during early stages of the pandemic and show that it declined below 1 as human movement was restricted. We also showed that this disease was introduced many times into each country and that introductions slowed down markedly following the reduction of international travel in mid-March 2020. Finally, we confirmed that New Zealand transmission clusters identified via standard health surveillance strategies largely agree with those defined by genomic data. We have demonstrated how the use of genomic data and computational biology methods can assist health officials in characterising the epidemiology of viral epidemics and for contact tracing.

14.
Emerg Infect Dis ; 27(9): 2361-2368, 2021 09.
Article in English | MEDLINE | ID: mdl-34424164

ABSTRACT

Since severe acute respiratory syndrome coronavirus 2 was first eliminated in New Zealand in May 2020, a total of 13 known coronavirus disease (COVID-19) community outbreaks have occurred, 2 of which led health officials to issue stay-at-home orders. These outbreaks originated at the border via isolating returnees, airline workers, and cargo vessels. Because a public health system was informed by real-time viral genomic sequencing and complete genomes typically were available within 12 hours of community-based positive COVID-19 test results, every outbreak was well-contained. A total of 225 community cases resulted in 3 deaths. Real-time genomics were essential for establishing links between cases when epidemiologic data could not do so and for identifying when concurrent outbreaks had different origins.


Subject(s)
COVID-19 , Viruses , Genomics , Humans , New Zealand/epidemiology , SARS-CoV-2
15.
PLoS Comput Biol ; 17(2): e1008322, 2021 02.
Article in English | MEDLINE | ID: mdl-33529184

ABSTRACT

Relaxed clock models enable estimation of molecular substitution rates across lineages and are widely used in phylogenetics for dating evolutionary divergence times. Under the (uncorrelated) relaxed clock model, tree branches are associated with molecular substitution rates which are independently and identically distributed. In this article we delved into the internal complexities of the relaxed clock model in order to develop efficient MCMC operators for Bayesian phylogenetic inference. We compared three substitution rate parameterisations, introduced an adaptive operator which learns the weights of other operators during MCMC, and we explored how relaxed clock model estimation can benefit from two cutting-edge proposal kernels: the AVMVN and Bactrian kernels. This work has produced an operator scheme that is up to 65 times more efficient at exploring continuous relaxed clock parameters compared with previous setups, depending on the dataset. Finally, we explored variants of the standard narrow exchange operator which are specifically designed for the relaxed clock model. In the most extreme case, this new operator traversed tree space 40% more efficiently than narrow exchange. The methodologies introduced are adaptive and highly effective on short as well as long alignments. The results are available via the open source optimised relaxed clock (ORC) package for BEAST 2 under a GNU licence (https://github.com/jordandouglas/ORC).


Subject(s)
Evolution, Molecular , Models, Genetic , Phylogeny , Algorithms , Animals , Bayes Theorem , Computational Biology , Computer Simulation , Databases, Genetic/statistics & numerical data , Likelihood Functions , Markov Chains , Monte Carlo Method , Mutation Rate , Software , Time Factors
16.
Syst Biol ; 70(1): 145-161, 2021 01 01.
Article in English | MEDLINE | ID: mdl-33005955

ABSTRACT

We describe a new and computationally efficient Bayesian methodology for inferring species trees and demographics from unlinked binary markers. Likelihood calculations are carried out using diffusion models of allele frequency dynamics combined with novel numerical algorithms. The diffusion approach allows for analysis of data sets containing hundreds or thousands of individuals. The method, which we call Snapper, has been implemented as part of the BEAST2 package. We conducted simulation experiments to assess numerical error, computational requirements, and accuracy recovering known model parameters. A reanalysis of soybean SNP data demonstrates that the models implemented in Snapp and Snapper can be difficult to distinguish in practice, a characteristic which we tested with further simulations. We demonstrate the scale of analysis possible using a SNP data set sampled from 399 fresh water turtles in 41 populations. [Bayesian inference; diffusion models; multi-species coalescent; SNP data; species trees; spectral methods.].


Subject(s)
Algorithms , Models, Genetic , Bayes Theorem , Computer Simulation , Phylogeny , Probability
17.
PeerJ ; 8: e9473, 2020.
Article in English | MEDLINE | ID: mdl-32995072

ABSTRACT

With ever more complex models used to study evolutionary patterns, approaches that facilitate efficient inference under such models are needed. Metropolis-coupled Markov chain Monte Carlo (MCMC) has long been used to speed up phylogenetic analyses and to make use of multi-core CPUs. Metropolis-coupled MCMC essentially runs multiple MCMC chains in parallel. All chains are heated except for one cold chain that explores the posterior probability space like a regular MCMC chain. This heating allows chains to make bigger jumps in phylogenetic state space. The heated chains can then be used to propose new states for other chains, including the cold chain. One of the practical challenges using this approach, is to find optimal temperatures of the heated chains to efficiently explore state spaces. We here provide an adaptive Metropolis-coupled MCMC scheme to Bayesian phylogenetics, where the temperature difference between heated chains is automatically tuned to achieve a target acceptance probability of states being exchanged between individual chains. We first show the validity of this approach by comparing inferences of adaptive Metropolis-coupled MCMC to MCMC on several datasets. We then explore where Metropolis-coupled MCMC provides benefits over MCMC. We implemented this adaptive Metropolis-coupled MCMC approach as an open source package licenced under GPL 3.0 to the Bayesian phylogenetics software BEAST 2, available from https://github.com/nicfel/CoupledMCMC.

18.
PeerJ ; 8: e9460, 2020.
Article in English | MEDLINE | ID: mdl-32832259

ABSTRACT

BACKGROUND: Bayesian analyses offer many benefits for phylogenetic, and have been popular for analysis of amino acid alignments. It is necessary to specify a substitution and site model for such analyses, and often an ad hoc, or likelihood based method is employed for choosing these models that are typically of no interest to the analysis overall. METHODS: We present a method called OBAMA that averages over substitution models and site models, thus letting the data inform model choices and taking model uncertainty into account. It uses trans-dimensional Markov Chain Monte Carlo (MCMC) proposals to switch between various empirical substitution models for amino acids such as Dayhoff, WAG, and JTT. Furthermore, it switches base frequencies from these substitution models or use base frequencies estimated based on the alignment. Finally, it switches between using gamma rate heterogeneity or not, and between using a proportion of invariable sites or not. RESULTS: We show that the model performs well in a simulation study. By using appropriate priors, we demonstrate both proportion of invariable sites and the shape parameter for gamma rate heterogeneity can be estimated. The OBAMA method allows taking in account model uncertainty, thus reducing bias in phylogenetic estimates. The method is implemented in the OBAMA package in BEAST 2, which is open source licensed under LGPL and allows joint tree inference under a wide range of models.

19.
PLoS Comput Biol ; 15(8): e1007189, 2019 08.
Article in English | MEDLINE | ID: mdl-31386651

ABSTRACT

Model-based phylodynamic approaches recently employed generalized linear models (GLMs) to uncover potential predictors of viral spread. Very recently some of these models have allowed both the predictors and their coefficients to be time-dependent. However, these studies mainly focused on predictors that are assumed to be constant through time. Here we inferred the phylodynamics of avian influenza A virus H9N2 isolated in 12 Asian countries and regions under both discrete trait analysis (DTA) and structured coalescent (MASCOT) approaches. Using MASCOT we applied a new time-dependent GLM to uncover the underlying factors behind H9N2 spread. We curated a rich set of time-series predictors including annual international live poultry trade and national poultry production figures. This time-dependent phylodynamic prediction model was compared to commonly employed time-independent alternatives. Additionally the time-dependent MASCOT model allowed for the estimation of viral effective sub-population sizes and their changes through time, and these effective population dynamics within each country were predicted by a GLM. International annual poultry trade is a strongly supported predictor of virus migration rates. There was also strong support for geographic proximity as a predictor of migration rate in all GLMs investigated. In time-dependent MASCOT models, national poultry production was also identified as a predictor of virus genetic diversity through time and this signal was obvious in mainland China. Our application of a recently introduced time-dependent GLM predictors integrated rich time-series data in Bayesian phylodynamic prediction. We demonstrated the contribution of poultry trade and geographic proximity (potentially unheralded wild bird movements) to avian influenza spread in Asia. To gain a better understanding of the drivers of H9N2 spread, we suggest increased surveillance of the H9N2 virus in countries that are currently under-sampled as well as in wild bird populations in the most affected countries.


Subject(s)
Influenza A Virus, H9N2 Subtype , Influenza in Birds/transmission , Models, Biological , Animal Migration , Animals , Animals, Wild/virology , Asia/epidemiology , Bayes Theorem , Birds/virology , Commerce , Computational Biology , Environmental Monitoring , Influenza A Virus, H9N2 Subtype/classification , Influenza A Virus, H9N2 Subtype/genetics , Influenza in Birds/epidemiology , Influenza in Birds/virology , Linear Models , Phylogeography/statistics & numerical data , Population Dynamics , Poultry/virology , Spatio-Temporal Analysis
20.
PLoS Comput Biol ; 15(4): e1006650, 2019 04.
Article in English | MEDLINE | ID: mdl-30958812

ABSTRACT

Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.


Subject(s)
Bayes Theorem , Biological Evolution , Phylogeny , Software , Animals , Computational Biology , Computer Simulation , Evolution, Molecular , Humans , Markov Chains , Models, Genetic , Monte Carlo Method
SELECTION OF CITATIONS
SEARCH DETAIL
...