Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
Add more filters










Publication year range
1.
Syst Biol ; 68(1): 157-167, 2019 01 01.
Article in English | MEDLINE | ID: mdl-30329125

ABSTRACT

The test for model-to-data fitness is a fundamental principle within the statistical sciences. The purpose of such a test is to assess whether the selected best-fitting model adequately describes the behavior in the data. Despite their broad application across many areas of statistics, goodness of fit tests for phylogenetic models have received much less attention than model selection methods in the last decade. At present a number of approaches have been suggested. However, these are often flawed, with problems ranging from the presence of systematic error in the models themselves to the difficulties presented by the nature of phylogenetic data. Ultimately these problems lead to an inadequate choice of statistic. This is one of the main reasons why goodness of fit assessment is often a neglected step within phylogenetic analysis. We argue not only for the necessity of these goodness of fit measures to test how well the model reflects the data, but additionally for the need for "useful" tests that explain why the model-to-data fit may be inadequate. Such tests are a critical part of the model building process, allowing the model to be adapted to provide a better model-to-data fit or to reject a model class outright due to such an inadequate fit that the intended use of the class may be compromised. Proposed and existing methods in both the maximum likelihood and Bayesian framework will be discussed here, whilst highlighting their strengths and limitations for assessing goodness of fit. The final section discusses some critical open statistical problems in goodness of fit assessment for this field, with the hope of encouraging more research into such a fundamental yet underdeveloped area of phylogenetic inference. [Bayesian phylogenetics; Goodness of fit; maximum likelihood; molecular phylogenetics; outlier detection; residual diagnostics.].


Subject(s)
Classification/methods , Models, Biological , Phylogeny , Data Interpretation, Statistical
2.
Syst Biol ; 68(2): 219-233, 2019 03 01.
Article in English | MEDLINE | ID: mdl-29961836

ABSTRACT

Bayesian inference methods rely on numerical algorithms for both model selection and parameter inference. In general, these algorithms require a high computational effort to yield reliable estimates. One of the major challenges in phylogenetics is the estimation of the marginal likelihood. This quantity is commonly used for comparing different evolutionary models, but its calculation, even for simple models, incurs high computational cost. Another interesting challenge relates to the estimation of the posterior distribution. Often, long Markov chains are required to get sufficient samples to carry out parameter inference, especially for tree distributions. In general, these problems are addressed separately by using different procedures. Nested sampling (NS) is a Bayesian computation algorithm, which provides the means to estimate marginal likelihoods together with their uncertainties, and to sample from the posterior distribution at no extra cost. The methods currently used in phylogenetics for marginal likelihood estimation lack in practicality due to their dependence on many tuning parameters and their inability of most implementations to provide a direct way to calculate the uncertainties associated with the estimates, unlike NS. In this article, we introduce NS to phylogenetics. Its performance is analysed under different scenarios and compared to established methods. We conclude that NS is a competitive and attractive algorithm for phylogenetic inference. An implementation is available as a package for BEAST 2 under the LGPL licence, accessible at https://github.com/BEAST2-Dev/nested-sampling.


Subject(s)
Classification/methods , Models, Genetic , Phylogeny , Algorithms
3.
Food Chem ; 208: 326-35, 2016 Oct 01.
Article in English | MEDLINE | ID: mdl-27132857

ABSTRACT

In the wine industry, fining agents are commonly used with many choices now commercially available. Here the influence of pre-fermentation fining on wine aroma chemistry has been explored. Free run and press fraction Sauvignon blanc juices from two vineyards were fined using gelatin, activated carbon, polyvinylpolypyrrolidone (PVPP) and a combination agent which included bentonite, PVPP and isinglass. Over thirty aroma compounds were quantified in the experimental wines. Results showed that activated carbon fining led to a significant (p<0.05) concentration decrease of hexan-1-ol and linalool in the experimental wines when compared to a control, consistent across all vineyard and fraction combinations. Other aroma compounds were also influenced by fining agent, even if vineyards and press fractions played a crucial role. This study confirmed that fining agents used pre-fermentation can influence wine aroma profiles and therefore needs specific tailoring addressing style and origin of grape.


Subject(s)
Fermentation , Flavoring Agents/chemistry , Food Handling/methods , Smell , Vitis/chemistry , Wine/analysis , Acyclic Monoterpenes , Hexanols/analysis , Monoterpenes/analysis
4.
Sci Rep ; 5: 14233, 2015 Sep 24.
Article in English | MEDLINE | ID: mdl-26400688

ABSTRACT

Many crops display differential geographic phenotypes and sensorial signatures, encapsulated by the concept of terroir. The drivers behind these differences remain elusive, and the potential contribution of microbes has been ignored until recently. Significant genetic differentiation between microbial communities and populations from different geographic locations has been demonstrated, but crucially it has not been shown whether this correlates with differential agricultural phenotypes or not. Using wine as a model system, we utilize the regionally genetically differentiated population of Saccharomyces cerevisiae in New Zealand and objectively demonstrate that these populations differentially affect wine phenotype, which is driven by a complex mix of chemicals. These findings reveal the importance of microbial populations for the regional identity of wine, and potentially extend to other important agricultural commodities. Moreover, this suggests that long-term implementation of methods maintaining differential biodiversity may have tangible economic imperatives as well as being desirable in terms of employing agricultural practices that increase responsible environmental stewardship.


Subject(s)
Biodiversity , Phenotype , Saccharomyces cerevisiae , Wine , Analysis of Variance , Fermentation , Genotype , New Zealand , Saccharomyces cerevisiae/classification , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Volatile Organic Compounds
5.
Methods Ecol Evol ; 6(1): 83-91, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25893087

ABSTRACT

Phylogenetic diversity (PD) is a measure of biodiversity based on the evolutionary history of species. Here, we discuss several optimization problems related to the use of PD, and the more general measure split diversity (SD), in conservation prioritization.Depending on the conservation goal and the information available about species, one can construct optimization routines that incorporate various conservation constraints. We demonstrate how this information can be used to select sets of species for conservation action. Specifically, we discuss the use of species' geographic distributions, the choice of candidates under economic pressure, and the use of predator-prey interactions between the species in a community to define viability constraints.Despite such optimization problems falling into the area of NP hard problems, it is possible to solve them in a reasonable amount of time using integer programming. We apply integer linear programming to a variety of models for conservation prioritization that incorporate the SD measure.We exemplarily show the results for two data sets: the Cape region of South Africa and a Caribbean coral reef community. Finally, we provide user-friendly software at http://www.cibiv.at/software/pda.

6.
PLoS One ; 9(1): e85196, 2014.
Article in English | MEDLINE | ID: mdl-24416362

ABSTRACT

Bayesian inference methods are extensively used to detect the presence of population structure given genetic data. The primary output of software implementing these methods are ancestry profiles of sampled individuals. While these profiles robustly partition the data into subgroups, currently there is no objective method to determine whether the fixed factor of interest (e.g. geographic origin) correlates with inferred subgroups or not, and if so, which populations are driving this correlation. We present ObStruct, a novel tool to objectively analyse the nature of structure revealed in Bayesian ancestry profiles using established statistical methods. ObStruct evaluates the extent of structural similarity between sampled and inferred populations, tests the significance of population differentiation, provides information on the contribution of sampled and inferred populations to the observed structure and crucially determines whether the predetermined factor of interest correlates with inferred population structure. Analyses of simulated and experimental data highlight ObStruct's ability to objectively assess the nature of structure in populations. We show the method is capable of capturing an increase in the level of structure with increasing time since divergence between simulated populations. Further, we applied the method to a highly structured dataset of 1,484 humans from seven continents and a less structured dataset of 179 Saccharomyces cerevisiae from three regions in New Zealand. Our results show that ObStruct provides an objective metric to classify the degree, drivers and significance of inferred structure, as well as providing novel insights into the relationships between sampled populations, and adds a final step to the pipeline for population structure analyses.


Subject(s)
Models, Genetic , Population Dynamics/statistics & numerical data , Racial Groups/genetics , Software , Bayes Theorem , Genetic Variation , Humans , Microsatellite Repeats , New Zealand , Phylogeography , Racial Groups/classification , Saccharomyces cerevisiae/classification , Saccharomyces cerevisiae/genetics
7.
Algorithms Mol Biol ; 7(1): 36, 2012 Dec 15.
Article in English | MEDLINE | ID: mdl-23241267

ABSTRACT

Recently one step mutation matrices were introduced to model the impact of substitutions on arbitrary branches of a phylogenetic tree on an alignment site. This concept works nicely for the four-state nucleotide alphabet and provides an efficient procedure conjectured to compute the minimal number of substitutions needed to transform one alignment site into another. The present paper delivers a proof of the validity of this algorithm. Moreover, we provide several mathematical insights into the generalization of the OSM matrix to multi-state alphabets. The construction of the OSM matrix is only possible if the matrices representing the substitution types acting on the character states and the identity matrix form a commutative group with respect to matrix multiplication. We illustrate this approach by looking at Abelian groups over twenty states and critically discuss their biological usefulness when investigating amino acids.

8.
Math Biosci ; 237(1-2): 38-48, 2012 May.
Article in English | MEDLINE | ID: mdl-22430560

ABSTRACT

Methods of phylogenetic inference use more and more complex models to generate trees from data. However, even simple models and their implications are not fully understood. Here, we investigate the two-state Markov model on a tripod tree, inferring conditions under which a given set of observations gives rise to such a model. This type of investigation has been undertaken before by several scientists from different fields of research. In contrast to other work we fully analyse the model, presenting conditions under which one can infer a model from the observation or at least get support for the tree-shaped interdependence of the leaves considered. We also present all conditions under which the results can be extended from tripod trees to quartet trees, a step necessary to reconstruct at least a topology. Apart from finding conditions under which such an extension works we discuss example cases for which such an extension does not work.


Subject(s)
Markov Chains , Models, Genetic , Phylogeny
9.
J Math Biol ; 64(1-2): 149-62, 2012 Jan.
Article in English | MEDLINE | ID: mdl-21336622

ABSTRACT

We derive an invertible transform linking two widely used measures of species diversity: phylogenetic diversity and the expected proportions of segregating (non-constant) sites. We assume a bi-allelic (two-state), symmetric, finite site model of substitution. Like the Hadamard transform of Hendy and Penny, the transform can be expressed independently of the underlying phylogeny. Our results bridge work on diversity from two quite distinct scientific communities.


Subject(s)
Biodiversity , Models, Biological , Models, Statistical , Phylogeny
10.
Mol Biol Evol ; 28(1): 143-52, 2011 Jan.
Article in English | MEDLINE | ID: mdl-20643866

ABSTRACT

As models of sequence evolution become more and more complicated, many criteria for model selection have been proposed, and tools are available to select the best model for an alignment under a particular criterion. However, in many instances the selected model fails to explain the data adequately as reflected by large deviations between observed pattern frequencies and the corresponding expectation. We present MISFITS, an approach to evaluate the goodness of fit (http://www.cibiv.at/software/misfits). MISFITS introduces a minimum number of "extra substitutions" on the inferred tree to provide a biologically motivated explanation why the alignment may deviate from expectation. These extra substitutions plus the evolutionary model then fully explain the alignment. We illustrate the method on several examples and then give a survey about the goodness of fit of the selected models to the alignments in the PANDIT database.


Subject(s)
Algorithms , Models, Genetic , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Software , Animals , Base Sequence , DNA, Mitochondrial/analysis , DNA, Mitochondrial/genetics , Databases, Genetic , Evolution, Molecular , Humans , Likelihood Functions , Molecular Sequence Data , Phylogeny , Primates/genetics , Sequence Homology, Nucleic Acid
11.
Nature ; 462(7272): 505-9, 2009 Nov 26.
Article in English | MEDLINE | ID: mdl-19940926

ABSTRACT

Receptor-activator of NF-kappaB ligand (TNFSF11, also known as RANKL, OPGL, TRANCE and ODF) and its tumour necrosis factor (TNF)-family receptor RANK are essential regulators of bone remodelling, lymph node organogenesis and formation of a lactating mammary gland. RANKL and RANK are also expressed in the central nervous system. However, the functional relevance of RANKL/RANK in the brain was entirely unknown. Here we report that RANKL and RANK have an essential role in the brain. In both mice and rats, central RANKL injections trigger severe fever. Using tissue-specific Nestin-Cre and GFAP-Cre rank(floxed) deleter mice, the function of RANK in the fever response was genetically mapped to astrocytes. Importantly, Nestin-Cre and GFAP-Cre rank(floxed) deleter mice are resistant to lipopolysaccharide-induced fever as well as fever in response to the key inflammatory cytokines IL-1beta and TNFalpha. Mechanistically, RANKL activates brain regions involved in thermoregulation and induces fever via the COX2-PGE(2)/EP3R pathway. Moreover, female Nestin-Cre and GFAP-Cre rank(floxed) mice exhibit increased basal body temperatures, suggesting that RANKL and RANK control thermoregulation during normal female physiology. We also show that two children with RANK mutations exhibit impaired fever during pneumonia. These data identify an entirely novel and unexpected function for the key osteoclast differentiation factors RANKL/RANK in female thermoregulation and the central fever response in inflammation.


Subject(s)
Body Temperature Regulation/drug effects , Body Temperature Regulation/physiology , Fever/chemically induced , Fever/metabolism , RANK Ligand/pharmacology , Receptor Activator of Nuclear Factor-kappa B/metabolism , Sex Characteristics , Animals , Astrocytes/drug effects , Astrocytes/metabolism , Child , Dinoprostone/metabolism , Female , Fever/complications , Gene Expression Profiling , Humans , Injections, Intraventricular , Male , Mice , Mice, Inbred C57BL , Pneumonia/complications , Pneumonia/metabolism , RANK Ligand/administration & dosage , RANK Ligand/antagonists & inhibitors , RANK Ligand/metabolism , Rats , Rats, Wistar , Receptor Activator of Nuclear Factor-kappa B/genetics , Receptors, Prostaglandin E/metabolism , Receptors, Prostaglandin E, EP3 Subtype
12.
Article in English | MEDLINE | ID: mdl-19179696

ABSTRACT

In the last 15 years, Phylogenetic Diversity (PD) has gained interest in the community of conservation biologists as a surrogate measure for assessing biodiversity. We have recently proposed two approaches to select taxa for maximizing PD, namely PD with budget constraints and PD on split systems. In this paper, we will unify these two strategies and present a dynamic programming algorithm to solve the unified framework of selecting taxa with maximal PD under budget constraints on circular split systems. An improved algorithm will also be given if the underlying split system is a tree.


Subject(s)
Biodiversity , Computational Biology/methods , Conservation of Natural Resources , Phylogeny , Software , Algorithms , Animals , Conservation of Natural Resources/economics , Conservation of Natural Resources/methods , Extinction, Biological , Systems Integration
13.
Syst Biol ; 58(6): 586-94, 2009 Dec.
Article in English | MEDLINE | ID: mdl-20525611

ABSTRACT

The "phylogenetic diversity" (PD) measure of biodiversity is evaluated using a phylogenetic tree, usually inferred from morphological or molecular data. Consequently, it is vulnerable to errors in that tree, including those resulting from sampling error, model misspecification, or conflicting signals. To improve the robustness of PD, we can evaluate the measure using either a collection (or distribution) of trees or a phylogenetic network. Recently, it has been shown that these 2 approaches are equivalent but that the problem of maximizing PD in the general concept is NP-hard. In this study, we provide an efficient dynamic programming algorithm for maximizing PD when splits in the trees or network form a circular split system. We illustrate our method using a case study of game birds ("Galliformes") and discuss the different choices of taxa based on our approach and PD.


Subject(s)
Algorithms , Biodiversity , Classification/methods , Computational Biology/methods , Models, Genetic , Phylogeny , Animals , Conservation of Natural Resources/methods , Galliformes/genetics , Research Design
14.
Philos Trans R Soc Lond B Biol Sci ; 363(1512): 4041-7, 2008 Dec 27.
Article in English | MEDLINE | ID: mdl-18852110

ABSTRACT

We introduce another view of sequence evolution. Contrary to other approaches, we model the substitution process in two steps. First we assume (arbitrary) scaled branch lengths on a given phylogenetic tree. Second we allocate a Poisson distributed number of substitutions on the branches. The probability to place a mutation on a branch is proportional to its relative branch length. More importantly, the action of a single mutation on an alignment column is described by a doubly stochastic matrix, the so-called one-step mutation matrix. This matrix leads to analytical formulae for the posterior probability distribution of the number of substitutions for an alignment column.


Subject(s)
Amino Acid Substitution/genetics , Evolution, Molecular , Models, Genetic , Phylogeny , Sequence Alignment , Bayes Theorem , Likelihood Functions , Probability
15.
J Comput Biol ; 15(6): 577-91, 2008.
Article in English | MEDLINE | ID: mdl-18631022

ABSTRACT

The geometrical representation of the space of phylogenetic trees implies a metric on the space of weighted trees. This metric, the geodesic distance, is the length of the shortest path through that space. We present an exact algorithm to compute this metric. For biologically reasonable trees, the implementation allows fast computations of the geodesic distance, although the running time of the algorithm is worst-case exponential. The algorithm was applied to pairs of 118 gene trees of the metazoa. The results show that a special path in tree space, the cone path, which can be computed in linear time, is a good approximation of the geodesic distance. The program GeoMeTree is a python implementation of the geodesic distance, and it is approximations and is available from www.cibiv.at/software/geometree.


Subject(s)
Algorithms , Phylogeny
16.
Syst Biol ; 55(5): 769-73, 2006 Oct.
Article in English | MEDLINE | ID: mdl-17060198

ABSTRACT

We consider a (phylogenetic) tree with n labeled leaves, the taxa, and a length for each branch in the tree. For any subset of k taxa, the phylogenetic diversity is defined as the sum of the branch-lengths of the minimal subtree connecting the taxa in the subset. We introduce two time-efficient algorithms (greedy and pruning) to compute a subset of size k with maximal phylogenetic diversity in O(n log k) and O[n + (n-k) log (n-k)] time, respectively. The greedy algorithm is an efficient implementation of the so-called greedy strategy (Steel, 2005; Pardi and Goldman, 2005), whereas the pruning algorithm provides an alternative description of the same problem. Both algorithms compute within seconds a subtree with maximal phylogenetic diversity for trees with 100,000 taxa or more.


Subject(s)
Biodiversity , Phylogeny , Algorithms , Classification/methods , Computer Simulation/standards
SELECTION OF CITATIONS
SEARCH DETAIL
...