Search | VHL Regional Portal

Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo.

Kapli, P; Lutteropp, S; Zhang, J; Kobert, K; Pavlidis, P; Stamatakis, A; Flouri, T.

Bioinformatics ; 33(11): 1630-1638, 2017 Jun 01.

Article in English | MEDLINE | ID: mdl-28108445

ABSTRACT

MOTIVATION: In recent years, molecular species delimitation has become a routine approach for quantifying and classifying biodiversity. Barcoding methods are of particular importance in large-scale surveys as they promote fast species discovery and biodiversity estimates. Among those, distance-based methods are the most common choice as they scale well with large datasets; however, they are sensitive to similarity threshold parameters and they ignore evolutionary relationships. The recently introduced "Poisson Tree Processes" (PTP) method is a phylogeny-aware approach that does not rely on such thresholds. Yet, two weaknesses of PTP impact its accuracy and practicality when applied to large datasets; it does not account for divergent intraspecific variation and is slow for a large number of sequences. RESULTS: We introduce the multi-rate PTP (mPTP), an improved method that alleviates the theoretical and technical shortcomings of PTP. It incorporates different levels of intraspecific genetic diversity deriving from differences in either the evolutionary history or sampling of each species. Results on empirical data suggest that mPTP is superior to PTP and popular distance-based methods as it, consistently yields more accurate delimitations with respect to the taxonomy (i.e., identifies more taxonomic species, infers species numbers closer to the taxonomy). Moreover, mPTP does not require any similarity threshold as input. The novel dynamic programming algorithm attains a speedup of at least five orders of magnitude compared to PTP, allowing it to delimit species in large (meta-) barcoding data. In addition, Markov Chain Monte Carlo sampling provides a comprehensive evaluation of the inferred delimitation in just a few seconds for millions of steps, independently of tree size. AVAILABILITY AND IMPLEMENTATION: mPTP is implemented in C and is available for download at http://github.com/Pas-Kapli/mptp under the GNU Affero 3 license. A web-service is available at http://mptp.h-its.org . CONTACT: : paschalia.kapli@h-its.org or alexandros.stamatakis@h-its.org or tomas.flouri@h-its.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Classification/methods , DNA Barcoding, Taxonomic/methods , Markov Chains , Monte Carlo Method , Animals , Electron Transport Complex IV/genetics , Genes, Mitochondrial , Phylogeny

Efficient Detection of Repeating Sites to Accelerate Phylogenetic Likelihood Calculations.

Kobert, K; Stamatakis, A; Flouri, T.

Syst Biol ; 66(2): 205-217, 2017 Mar 01.

Article in English | MEDLINE | ID: mdl-27576546

ABSTRACT

The phylogenetic likelihood function (PLF) is the major computational bottleneck in several applications of evolutionary biology such as phylogenetic inference, species delimitation, model selection, and divergence times estimation. Given the alignment, a tree and the evolutionary model parameters, the likelihood function computes the conditional likelihood vectors for every node of the tree. Vector entries for which all input data are identical result in redundant likelihood operations which, in turn, yield identical conditional values. Such operations can be omitted for improving run-time and, using appropriate data structures, reducing memory usage. We present a fast, novel method for identifying and omitting such redundant operations in phylogenetic likelihood calculations, and assess the performance improvement and memory savings attained by our method. Using empirical and simulated data sets, we show that a prototype implementation of our method yields up to 12-fold speedups and uses up to 78% less memory than one of the fastest and most highly tuned implementations of the PLF currently available. Our method is generic and can seamlessly be integrated into any phylogenetic likelihood implementation. [Algorithms; maximum likelihood; phylogenetic likelihood function; phylogenetics].

Subject(s)

Classification/methods , Models, Biological , Phylogeny , Algorithms , Evolution, Molecular , Likelihood Functions , Software

INSECT PHYLOGENOMICS. Response to Comment on "Phylogenomics resolves the timing and pattern of insect evolution".

Kjer, K M; Ware, J L; Rust, J; Wappler, T; Lanfear, R; Jermiin, L S; Zhou, X; Aspöck, H; Aspöck, U; Beutel, R G; Blanke, A; Donath, A; Flouri, T; Frandsen, P B; Kapli, P; Kawahara, A Y; Letsch, H; Mayer, C; McKenna, D D; Meusemann, K; Niehuis, O; Peters, R S; Wiegmann, B M; Yeates, D K; von Reumont, B M; Stamatakis, A; Misof, B.

Science ; 349(6247): 487, 2015 Jul 31.

Article in English | MEDLINE | ID: mdl-26228138

ABSTRACT

Tong et al. comment on the accuracy of the dating analysis presented in our work on the phylogeny of insects and provide a reanalysis of our data. They replace log-normal priors with uniform priors and add a "roachoid" fossil as a calibration point. Although the reanalysis provides an interesting alternative viewpoint, we maintain that our choices were appropriate.

Subject(s)

Insect Proteins/classification , Insecta/classification , Phylogeny , Animals

The phylogenetic likelihood library.

Flouri, T; Izquierdo-Carrasco, F; Darriba, D; Aberer, A J; Nguyen, L-T; Minh, B Q; Von Haeseler, A; Stamatakis, A.

Syst Biol ; 64(2): 356-62, 2015 Mar.

Article in English | MEDLINE | ID: mdl-25358969

ABSTRACT

We introduce the Phylogenetic Likelihood Library (PLL), a highly optimized application programming interface for developing likelihood-based phylogenetic inference and postanalysis software. The PLL implements appropriate data structures and functions that allow users to quickly implement common, error-prone, and labor-intensive tasks, such as likelihood calculations, model parameter as well as branch length optimization, and tree space exploration. The highly optimized and parallelized implementation of the phylogenetic likelihood function and a thorough documentation provide a framework for rapid development of scalable parallel phylogenetic software. By example of two likelihood-based phylogenetic codes we show that the PLL improves the sequential performance of current software by a factor of 2-10 while requiring only 1 month of programming time for integration. We show that, when numerical scaling for preventing floating point underflow is enabled, the double precision likelihood calculations in the PLL are up to 1.9 times faster than those in BEAGLE. On an empirical DNA dataset with 2000 taxa the AVX version of PLL is 4 times faster than BEAGLE (scaling enabled and required). The PLL is available at http://www.libpll.org under the GNU General Public License (GPL).

Subject(s)

Classification/methods , Phylogeny , Software , Algorithms , Libraries, Digital , Software/standards

An optimal algorithm for computing all subtree repeats in trees.

Flouri, T; Kobert, K; Pissis, S P; Stamatakis, A.

Philos Trans A Math Phys Eng Sci ; 372(2016): 20130140, 2014 May 28.

Article in English | MEDLINE | ID: mdl-24751873

ABSTRACT

Given a labelled tree T, our goal is to group repeating subtrees of T into equivalence classes with respect to their topologies and the node labels. We present an explicit, simple and time-optimal algorithm for solving this problem for unrooted unordered labelled trees and show that the running time of our method is linear with respect to the size of T. By unordered, we mean that the order of the adjacent nodes (children/neighbours) of any node of T is irrelevant. An unrooted tree T does not have a node that is designated as root and can also be referred to as an undirected tree. We show how the presented algorithm can easily be modified to operate on trees that do not satisfy some or any of the aforementioned assumptions on the tree structure; for instance, how it can be applied to rooted, ordered or unlabelled trees.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL