Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 30
Filter
Add more filters










Publication year range
1.
Syst Biol ; 2024 Jan 08.
Article in English | MEDLINE | ID: mdl-38190300

ABSTRACT

The opposing forces of gene flow and isolation are two major processes shaping genetic diversity. Understanding how these vary across space and time is necessary to identify the environmental features that promote diversification. The detection of considerable geographic structure in taxa from the arid Nearctic has prompted research into the drivers of isolation in the region. Several geographic features have been proposed as barriers to gene flow, including the Colorado River, Western Continental Divide, and a hypothetical Mid-Peninsular Seaway in Baja California. However, recent studies suggest that the role of barriers in genetic differentiation may have been overestimated when compared to other mechanisms of divergence. In this study, we infer historical and spatial patterns of connectivity and isolation in Desert Spiny Lizards (Sceloporus magister) and Baja Spiny Lizards (S. zosteromus), which together form a species complex composed of parapatric lineages with wide distributions in arid western North America. Our analyses incorporate mitochondrial sequences, genomic-scale data, and past and present climatic data to evaluate the nature and strength of barriers to gene flow in the region. Our approach relies on estimates of migration under the multispecies coalescent to understand the history of lineage divergence in the face of gene flow. Results show that the S. magister complex is geographically structured, but we also detect instances of gene flow. The Continental Divide is a strong barrier to gene flow, while the Colorado River is more permeable. Analyses yield conflicting results for the catalyst of differentiation of peninsular lineages in S. zosteromus. Our study shows how large-scale genomic data for thoroughly sampled species can shed new light on biogeography. Furthermore, our approach highlights the need for the combined analysis of multiple sources of evidence to adequately characterize the drivers of divergence.

2.
Proc Natl Acad Sci U S A ; 120(44): e2310708120, 2023 Oct 31.
Article in English | MEDLINE | ID: mdl-37871206

ABSTRACT

Analyses of genome sequence data have revealed pervasive interspecific gene flow and enriched our understanding of the role of gene flow in speciation and adaptation. Inference of gene flow using genomic data requires powerful statistical methods. Yet current likelihood-based methods involve heavy computation and are feasible for small datasets only. Here, we implement the multispecies-coalescent-with-migration model in the Bayesian program bpp, which can be used to test for gene flow and estimate migration rates, as well as species divergence times and population sizes. We develop Markov chain Monte Carlo algorithms for efficient sampling from the posterior, enabling the analysis of genome-scale datasets with thousands of loci. Implementation of both introgression and migration models in the same program allows us to test whether gene flow occurred continuously over time or in pulses. Analyses of genomic data from Anopheles mosquitoes demonstrate rich information in typical genomic datasets about the mode and rate of gene flow.


Subject(s)
Algorithms , Gene Flow , Animals , Phylogeny , Computer Simulation , Bayes Theorem , Likelihood Functions , Models, Genetic
3.
Mol Biol Evol ; 40(8)2023 08 03.
Article in English | MEDLINE | ID: mdl-37552932

ABSTRACT

Genomic data are informative about the history of species divergence and interspecific gene flow, including the direction, timing, and strength of gene flow. However, gene flow in opposite directions generates similar patterns in multilocus sequence data, such as reduced sequence divergence between the hybridizing species. As a result, inference of the direction of gene flow is challenging. Here, we investigate the information about the direction of gene flow present in genomic sequence data using likelihood-based methods under the multispecies-coalescent-with-introgression model. We analyze the case of two species, and use simulation to examine cases with three or four species. We find that it is easier to infer gene flow from a small population to a large one than in the opposite direction, and easier to infer inflow (gene flow from outgroup species to an ingroup species) than outflow (gene flow from an ingroup species to an outgroup species). It is also easier to infer gene flow if there is a longer time of separate evolution between the initial divergence and subsequent introgression. When introgression is assumed to occur in the wrong direction, the time of introgression tends to be correctly estimated and the Bayesian test of gene flow is often significant, while estimates of introgression probability can be even greater than the true probability. We analyze genomic sequences from Heliconius butterflies to demonstrate that typical genomic datasets are informative about the direction of interspecific gene flow, as well as its timing and strength.


Subject(s)
Butterflies , Animals , Likelihood Functions , Bayes Theorem , Butterflies/genetics , Genome , Genomics , Gene Flow , Phylogeny , Hybridization, Genetic
4.
Syst Biol ; 72(4): 820-836, 2023 08 07.
Article in English | MEDLINE | ID: mdl-36961245

ABSTRACT

Cross-species introgression can have significant impacts on phylogenomic reconstruction of species divergence events. Here, we used simulations to show how the presence of even a small amount of introgression can bias divergence time estimates when gene flow is ignored in the analysis. Using advances in analytical methods under the multispecies coalescent (MSC) model, we demonstrate that by accounting for incomplete lineage sorting and introgression using large phylogenomic data sets this problem can be avoided. The multispecies-coalescent-with-introgression (MSci) model is capable of accurately estimating both divergence times and ancestral effective population sizes, even when only a single diploid individual per species is sampled. We characterize some general expectations for biases in divergence time estimation under three different scenarios: 1) introgression between sister species, 2) introgression between non-sister species, and 3) introgression from an unsampled (i.e., ghost) outgroup lineage. We also conducted simulations under the isolation-with-migration (IM) model and found that the MSci model assuming episodic gene flow was able to accurately estimate species divergence times despite high levels of continuous gene flow. We estimated divergence times under the MSC and MSci models from two published empirical datasets with previous evidence of introgression, one of 372 target-enrichment loci from baobabs (Adansonia), and another of 1000 transcriptome loci from 14 species of the tomato relative, Jaltomata. The empirical analyses not only confirm our findings from simulations, demonstrating that the MSci model can reliably estimate divergence times but also show that divergence time estimation under the MSC can be robust to the presence of small amounts of introgression in empirical datasets with extensive taxon sampling. [divergence time; gene flow; hybridization; introgression; MSci model; multispecies coalescent].


Subject(s)
Gene Flow , Hybridization, Genetic , Phylogeny , Models, Genetic
5.
Mol Biol Evol ; 39(12)2022 12 05.
Article in English | MEDLINE | ID: mdl-36317198

ABSTRACT

Genomic sequence data provide a rich source of information about the history of species divergence and interspecific hybridization or introgression. Despite recent advances in genomics and statistical methods, it remains challenging to infer gene flow, and as a result, one may have to estimate introgression rates and times under misspecified models. Here we use mathematical analysis and computer simulation to examine estimation bias and issues of interpretation when the model of gene flow is misspecified in analysis of genomic datasets, for example, if introgression is assigned to the wrong lineages. In the case of two species, we establish a correspondence between the migration rate in the continuous migration model and the introgression probability in the introgression model. When gene flow occurs continuously through time but in the analysis is assumed to occur at a fixed time point, common evolutionary parameters such as species divergence times are surprisingly well estimated. However, the time of introgression tends to be estimated towards the recent end of the period of continuous gene flow. When introgression events are assigned incorrectly to the parental or daughter lineages, introgression times tend to collapse onto species divergence times, with introgression probabilities underestimated. Overall, our analyses suggest that the simple introgression model is useful for extracting information concerning between-specific gene flow and divergence even when the model may be misspecified. However, for reliable inference of gene flow it is important to include multiple samples per species, in particular, from hybridizing species.


Subject(s)
Gene Flow , Genomics , Computer Simulation
6.
Mol Biol Evol ; 39(8)2022 08 03.
Article in English | MEDLINE | ID: mdl-35907248

ABSTRACT

The multispecies coalescent (MSC) model accommodates both species divergences and within-species coalescent and provides a natural framework for phylogenetic analysis of genomic data when the gene trees vary across the genome. The MSC model implemented in the program bpp assumes a molecular clock and the Jukes-Cantor model, and is suitable for analyzing genomic data from closely related species. Here we extend our implementation to more general substitution models and relaxed clocks to allow the rate to vary among species. The MSC-with-relaxed-clock model allows the estimation of species divergence times and ancestral population sizes using genomic sequences sampled from contemporary species when the strict clock assumption is violated, and provides a simulation framework for evaluating species tree estimation methods. We conducted simulations and analyzed two real datasets to evaluate the utility of the new models. We confirm that the clock-JC model is adequate for inference of shallow trees with closely related species, but it is important to account for clock violation for distant species. Our simulation suggests that there is valuable phylogenetic information in the gene-tree branch lengths even if the molecular clock assumption is seriously violated, and the relaxed-clock models implemented in bpp are able to extract such information. Our Markov chain Monte Carlo algorithms suffer from mixing problems when used for species tree estimation under the relaxed clock and we discuss possible improvements. We conclude that the new models are currently most effective for estimating population parameters such as species divergence times when the species tree is fixed.


Subject(s)
Models, Genetic , Bayes Theorem , Computer Simulation , Markov Chains , Monte Carlo Method , Phylogeny
7.
Mol Biol Evol ; 39(5)2022 05 03.
Article in English | MEDLINE | ID: mdl-35417543

ABSTRACT

Full-likelihood implementations of the multispecies coalescent with introgression (MSci) model treat genealogical fluctuations across the genome as a major source of information to infer the history of species divergence and gene flow using multilocus sequence data. However, MSci models are known to have unidentifiability issues, whereby different models or parameters make the same predictions about the data and cannot be distinguished by the data. Previous studies of unidentifiability have focused on heuristic methods based on gene trees and do not make an efficient use of the information in the data. Here we study the unidentifiability of MSci models under the full-likelihood methods. We characterize the unidentifiability of the bidirectional introgression (BDI) model, which assumes that gene flow occurs in both directions. We derive simple rules for arbitrary BDI models, which create unidentifiability of the label-switching type. In general, an MSci model with k BDI events has 2k unidentifiable modes or towers in the posterior, with each BDI event between sister species creating within-model parameter unidentifiability and each BDI event between nonsister species creating between-model unidentifiability. We develop novel algorithms for processing Markov chain Monte Carlo samples to remove label-switching problems and implement them in the bpp program. We analyze real and synthetic data to illustrate the utility of the BDI models and the new algorithms. We discuss the unidentifiability of heuristic methods and provide guidelines for the use of MSci models to infer gene flow using genomic data.


Subject(s)
Gene Flow , Genomics , Algorithms , Genomics/methods , Models, Genetic , Phylogeny
8.
Mol Ecol ; 31(10): 2814-2829, 2022 05.
Article in English | MEDLINE | ID: mdl-35313033

ABSTRACT

Phylogenomic analyses under the multispecies coalescent model assume no recombination within locus and free recombination among loci. Yet, in real data sets intralocus recombination causes different sites of the same locus to have different genealogical histories so that the model is misspecified. The impact of recombination on various coalescent-based phylogenomic analyses has not been systematically examined. Here, we conduct a computer simulation to examine the impact of recombination on several Bayesian analyses of multilocus sequence data, including species tree estimation, species delimitation (by Bayesian selection of delimitation models) and estimation of evolutionary parameters such as species divergence and introgression times, population sizes for modern and extinct species, and cross-species introgression probabilities. We found that recombination, at rates comparable to estimates from the human being, has little impact on coalescent-based species tree estimation, species delimitation and estimation of population parameters. At rates 10 times higher than the human rate, recombination may affect parameter estimation, causing positive biases in introgression times and ancestral population sizes, although species divergence times and cross-species introgression probabilities are estimated with little bias. Overall, the simulation suggests that phylogenomic inferences under the multispecies coalescent model are robust to realistic amounts of intralocus recombination.


Subject(s)
Models, Genetic , Recombination, Genetic , Bayes Theorem , Computer Simulation , Humans , Phylogeny , Recombination, Genetic/genetics
9.
Syst Biol ; 71(2): 334-352, 2022 02 10.
Article in English | MEDLINE | ID: mdl-34143216

ABSTRACT

Genome sequencing projects routinely generate haploid consensus sequences from diploid genomes, which are effectively chimeric sequences with the phase at heterozygous sites resolved at random. The impact of phasing errors on phylogenomic analyses under the multispecies coalescent (MSC) model is largely unknown. Here, we conduct a computer simulation to evaluate the performance of four phase-resolution strategies (the true phase resolution, the diploid analytical integration algorithm which averages over all phase resolutions, computational phase resolution using the program PHASE, and random resolution) on estimation of the species tree and evolutionary parameters in analysis of multilocus genomic data under the MSC model. We found that species tree estimation is robust to phasing errors when species divergences were much older than average coalescent times but may be affected by phasing errors when the species tree is shallow. Estimation of parameters under the MSC model with and without introgression is affected by phasing errors. In particular, random phase resolution causes serious overestimation of population sizes for modern species and biased estimation of cross-species introgression probability. In general, the impact of phasing errors is greater when the mutation rate is higher, the data include more samples per species, and the species tree is shallower with recent divergences. Use of phased sequences inferred by the PHASE program produced small biases in parameter estimates. We analyze two real data sets, one of East Asian brown frogs and another of Rocky Mountains chipmunks, to demonstrate that heterozygote phase-resolution strategies have similar impacts on practical data analyses. We suggest that genome sequencing projects should produce unphased diploid genotype sequences if fully phased data are too challenging to generate, and avoid haploid consensus sequences, which have heterozygous sites phased at random. In case the analytical integration algorithm is computationally unfeasible, computational phasing prior to population genomic analyses is an acceptable alternative. [BPP; introgression; multispecies coalescent; phase; species tree.].


Subject(s)
Diploidy , Models, Genetic , Computer Simulation , Heterozygote , Phylogeny
10.
Mol Ecol Resour ; 22(1): 430-438, 2022 Jan.
Article in English | MEDLINE | ID: mdl-34288531

ABSTRACT

A wide range of data types can be used to delimit species and various computer-based tools dedicated to this task are now available. Although these formalized approaches have significantly contributed to increase the objectivity of species delimitation (SD) under different assumptions, they are not routinely used by alpha-taxonomists. One obvious shortcoming is the lack of interoperability among the various independently developed SD programs. Given the frequent incongruences between species partitions inferred by different SD approaches, researchers applying these methods often seek to compare these alternative species partitions to evaluate the robustness of the species boundaries. This procedure is excessively time consuming at present, and the lack of a standard format for species partitions is a major obstacle. Here, we propose a standardized format, SPART, to enable compatibility between different SD tools exporting or importing partitions. This format reports the partitions and describes, for each of them, the assignment of individuals to the "inferred species". The syntax also allows support values to be optionally reported, as well as original trees and the full command lines used in the respective SD analyses. Two variants of this format are proposed, overall using the same terminology but presenting the data either optimized for human readability (matricial SPART) or in a format in which each partition forms a separate block (SPART.XML). ABGD, DELINEATE, GMYC, PTP and TR2 have already been adapted to output SPART files and a new version of LIMES has been developed to import, export, merge and split them.

11.
Genome Biol Evol ; 14(1)2022 01 04.
Article in English | MEDLINE | ID: mdl-34849831

ABSTRACT

The southwestern and central United States serve as an ideal region to test alternative hypotheses regarding biotic diversification. Genomic data can now be combined with sophisticated computational models to quantify the impacts of paleoclimate change, geographic features, and habitat heterogeneity on spatial patterns of genetic diversity. In this study, we combine thousands of genotyping-by-sequencing (GBS) loci with mtDNA sequences (ND1) from the Texas horned lizard (Phrynosoma cornutum) to quantify relative support for different catalysts of diversification. Phylogenetic and clustering analyses of the GBS data indicate support for at least three primary populations. The spatial distribution of populations appears concordant with habitat type, with desert populations in AZ and NM showing the largest genetic divergence from the remaining populations. The mtDNA data also support a divergent desert population, but other relationships differ and suggest mtDNA introgression. Genotype-environment association with bioclimatic variables supports divergence along precipitation gradients more than along temperature gradients. Demographic analyses support a complex history, with introgression and gene flow playing an important role during diversification. Bayesian multispecies coalescent analyses with introgression (MSci) analyses also suggest that gene flow occurred between populations. Paleo-species distribution models support two southern refugia that geographically correspond to contemporary lineages. We find that divergence times are underestimated and population sizes are overestimated when introgression occurred and is ignored in coalescent analyses, and furthermore, inference of ancient introgression events and demographic history is sensitive to inclusion of a single recently admixed sample. Our analyses cannot refute the riverine barrier or glacial refugia hypotheses. Results also suggest that populations are continuing to diverge along habitat gradients. Finally, the strong evidence of admixture, gene flow, and mtDNA introgression among populations suggests that P. cornutum should be considered a single widespread species under the General Lineage Species Concept.


Subject(s)
Lizards , Animals , Bayes Theorem , DNA, Mitochondrial/genetics , Demography , Genetic Variation , Lizards/genetics , Phylogeny , Phylogeography , United States
12.
Curr Biol ; 31(2): R59-R64, 2021 01 25.
Article in English | MEDLINE | ID: mdl-33497629

ABSTRACT

The effort to reconstruct the tree of life was revolutionized by the use of sequences of proteins and nucleic acids. Phylogenetic trees are now routinely inferred using hundreds of thousands of amino acid or nucleotide characters. It thus seems surprising that many aspects of the tree of life are still controversial; conflicting results between large scale phylogenomic studies show that errors remain common despite large datasets. These errors often result from systematic biases in the way sequences evolve. While the resulting systematic errors are well understood, it requires careful efforts to reduce their effects.


Subject(s)
Data Accuracy , Evolution, Molecular , Phylogeny , Genetic Heterogeneity , Sequence Alignment , Sequence Homology, Nucleic Acid
13.
Natl Sci Rev ; 8(12): nwab127, 2021 Dec.
Article in English | MEDLINE | ID: mdl-34987842

ABSTRACT

Multispecies coalescent (MSC) is the extension of the single-population coalescent model to multiple species. It integrates the phylogenetic process of species divergences and the population genetic process of coalescent, and provides a powerful framework for a number of inference problems using genomic sequence data from multiple species, including estimation of species divergence times and population sizes, estimation of species trees accommodating discordant gene trees, inference of cross-species gene flow and species delimitation. In this review, we introduce the major features of the MSC model, discuss full-likelihood and heuristic methods of species tree estimation and summarize recent methodological advances in inference of cross-species gene flow. We discuss the statistical and computational challenges in the field and research directions where breakthroughs may be likely in the next few years.

15.
Mol Biol Evol ; 37(11): 3211-3224, 2020 11 01.
Article in English | MEDLINE | ID: mdl-32642765

ABSTRACT

We use computer simulation to examine the information content in multilocus data sets for inference under the multispecies coalescent model. Inference problems considered include estimation of evolutionary parameters (such as species divergence times, population sizes, and cross-species introgression probabilities), species tree estimation, and species delimitation based on Bayesian comparison of delimitation models. We found that the number of loci is the most influential factor for almost all inference problems examined. Although the number of sequences per species does not appear to be important to species tree estimation, it is very influential to species delimitation. Increasing the number of sites and the per-site mutation rate both increase the mutation rate for the whole locus and these have the same effect on estimation of parameters, but the sequence length has a greater effect than the per-site mutation rate for species tree estimation. We discuss the computational costs when the data size increases and provide guidelines concerning the subsampling of genomic data to enable the application of full-likelihood methods of inference.


Subject(s)
Models, Genetic , Phylogeny , Computer Simulation , Genetic Speciation , Population Density
16.
BMC Evol Biol ; 20(1): 64, 2020 06 03.
Article in English | MEDLINE | ID: mdl-32493355

ABSTRACT

BACKGROUND: The latest advancements in DNA sequencing technologies have facilitated the resolution of the phylogeny of insects, yet parts of the tree of Holometabola remain unresolved. The phylogeny of Neuropterida has been extensively studied, but no strong consensus exists concerning the phylogenetic relationships within the order Neuroptera. Here, we assembled a novel transcriptomic dataset to address previously unresolved issues in the phylogeny of Neuropterida and to infer divergence times within the group. We tested the robustness of our phylogenetic estimates by comparing summary coalescent and concatenation-based phylogenetic approaches and by employing different quartet-based measures of phylogenomic incongruence, combined with data permutations. RESULTS: Our results suggest that the order Raphidioptera is sister to Neuroptera + Megaloptera. Coniopterygidae is inferred as sister to all remaining neuropteran families suggesting that larval cryptonephry could be a ground plan feature of Neuroptera. A clade that includes Nevrorthidae, Osmylidae, and Sisyridae (i.e. Osmyloidea) is inferred as sister to all other Neuroptera except Coniopterygidae, and Dilaridae is placed as sister to all remaining neuropteran families. Ithonidae is inferred as the sister group of monophyletic Myrmeleontiformia. The phylogenetic affinities of Chrysopidae and Hemerobiidae were dependent on the data type analyzed, and quartet-based analyses showed only weak support for the placement of Hemerobiidae as sister to Ithonidae + Myrmeleontiformia. Our molecular dating analyses suggest that most families of Neuropterida started to diversify in the Jurassic and our ancestral character state reconstructions suggest a primarily terrestrial environment of the larvae of Neuropterida and Neuroptera. CONCLUSION: Our extensive phylogenomic analyses consolidate several key aspects in the backbone phylogeny of Neuropterida, such as the basal placement of Coniopterygidae within Neuroptera and the monophyly of Osmyloidea. Furthermore, they provide new insights into the timing of diversification of Neuropterida. Despite the vast amount of analyzed molecular data, we found that certain nodes in the tree of Neuroptera are not robustly resolved. Therefore, we emphasize the importance of integrating the results of morphological analyses with those of sequence-based phylogenomics. We also suggest that comparative analyses of genomic meta-characters should be incorporated into future phylogenomic studies of Neuropterida.


Subject(s)
Evolution, Molecular , Holometabola/genetics , Phylogeny , Animals , Base Sequence , Genomics , Larva/genetics , Sequence Analysis, DNA , Transcriptome
17.
Syst Biol ; 69(5): 830-847, 2020 09 01.
Article in English | MEDLINE | ID: mdl-31977022

ABSTRACT

Recent analyses of genomic sequence data suggest cross-species gene flow is common in both plants and animals, posing challenges to species tree estimation. We examine the levels of gene flow needed to mislead species tree estimation with three species and either episodic introgressive hybridization or continuous migration between an outgroup and one ingroup species. Several species tree estimation methods are examined, including the majority-vote method based on the most common gene tree topology (with either the true or reconstructed gene trees used), the UPGMA method based on the average sequence distances (or average coalescent times) between species, and the full-likelihood method based on multilocus sequence data. Our results suggest that the majority-vote method based on gene tree topologies is more robust to gene flow than the UPGMA method based on coalescent times and both are more robust than likelihood assuming a multispecies coalescent (MSC) model with no cross-species gene flow. Comparison of the continuous migration model with the episodic introgression model suggests that a small amount of gene flow per generation can cause drastic changes to the genetic history of the species and mislead species tree methods, especially if the species diverged through radiative speciation events. Estimates of parameters under the MSC with gene flow suggest that African mosquito species in the Anopheles gambiae species complex constitute such an example of extreme impact of gene flow on species phylogeny. [IM; introgression; migration; MSci; multispecies coalescent; species tree.].


Subject(s)
Classification/methods , Gene Flow , Models, Biological , Phylogeny , Animal Migration , Animals , Anopheles/classification , Anopheles/genetics
18.
Mol Biol Evol ; 37(1): 291-294, 2020 Jan 01.
Article in English | MEDLINE | ID: mdl-31432070

ABSTRACT

ModelTest-NG is a reimplementation from scratch of jModelTest and ProtTest, two popular tools for selecting the best-fit nucleotide and amino acid substitution models, respectively. ModelTest-NG is one to two orders of magnitude faster than jModelTest and ProtTest but equally accurate and introduces several new features, such as ascertainment bias correction, mixture, and free-rate models, or the automatic processing of single partitions. ModelTest-NG is available under a GNU GPL3 license at https://github.com/ddarriba/modeltest , last accessed September 2, 2019.


Subject(s)
Amino Acid Substitution , Evolution, Molecular , Genetic Techniques , Models, Genetic , Software
19.
Mol Biol Evol ; 37(4): 1211-1223, 2020 04 01.
Article in English | MEDLINE | ID: mdl-31825513

ABSTRACT

Recent analyses suggest that cross-species gene flow or introgression is common in nature, especially during species divergences. Genomic sequence data can be used to infer introgression events and to estimate the timing and intensity of introgression, providing an important means to advance our understanding of the role of gene flow in speciation. Here, we implement the multispecies-coalescent-with-introgression model, an extension of the multispecies-coalescent model to incorporate introgression, in our Bayesian Markov chain Monte Carlo program Bpp. The multispecies-coalescent-with-introgression model accommodates deep coalescence (or incomplete lineage sorting) and introgression and provides a natural framework for inference using genomic sequence data. Computer simulation confirms the good statistical properties of the method, although hundreds or thousands of loci are typically needed to estimate introgression probabilities reliably. Reanalysis of data sets from the purple cone spruce confirms the hypothesis of homoploid hybrid speciation. We estimated the introgression probability using the genomic sequence data from six mosquito species in the Anopheles gambiae species complex, which varies considerably across the genome, likely driven by differential selection against introgressed alleles.


Subject(s)
Genetic Introgression , Models, Genetic , Phylogeny , Animals , Anopheles/genetics , Bayes Theorem , Picea/genetics , Saccharomycetales/genetics
20.
Bioinformatics ; 35(21): 4453-4455, 2019 11 01.
Article in English | MEDLINE | ID: mdl-31070718

ABSTRACT

MOTIVATION: Phylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture and medicine. Finding the optimal tree under the popular maximum likelihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical datasets. RESULTS: We present RAxML-NG, a from-scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML. RAxML-NG offers improved accuracy, flexibility, speed, scalability, and usability compared with RAxML/ExaML. On taxon-rich datasets, RAxML-NG typically finds higher-scoring trees than IQTree, an increasingly popular recent tool for ML-based phylogenetic inference (although IQ-Tree shows better stability). Finally, RAxML-NG introduces several new features, such as the detection of terraces in tree space and the recently introduced transfer bootstrap support metric. AVAILABILITY AND IMPLEMENTATION: The code is available under GNU GPL at https://github.com/amkozlov/raxml-ng. RAxML-NG web service (maintained by Vital-IT) is available at https://raxml-ng.vital-it.ch/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Phylogeny , Software , Likelihood Functions
SELECTION OF CITATIONS
SEARCH DETAIL
...