Pesquisa | Portal Regional da BVS (teste)

tqDist: a library for computing the quartet and triplet distances between binary or general trees.

Sand, Andreas; Holt, Morten K; Johansen, Jens; Brodal, Gerth Stølting; Mailund, Thomas; Pedersen, Christian N S.

Bioinformatics ; 30(14): 2079-80, 2014 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-24651968

RESUMO

UNLABELLED: tqDist is a software package for computing the triplet and quartet distances between general rooted or unrooted trees, respectively. The program is based on algorithms with running time [Formula: see text] for the triplet distance calculation and [Formula: see text] for the quartet distance calculation, where n is the number of leaves in the trees and d is the degree of the tree with minimum degree. These are currently the fastest algorithms both in theory and in practice. AVAILABILITY AND IMPLEMENTATION: tqDist can be installed on Windows, Linux and Mac OS X. Doing this will install a set of command-line tools together with a Python module and an R package for scripting in Python or R. The software package is freely available under the GNU LGPL licence at http://birc.au.dk/software/tqDist.

Assuntos

Filogenia , Software , Algoritmos , Classificação/métodos

zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm.

Sand, Andreas; Kristiansen, Martin; Pedersen, Christian N S; Mailund, Thomas.

BMC Bioinformatics ; 14: 339, 2013 Nov 22.

Artigo em Inglês | MEDLINE | ID: mdl-24266924

RESUMO

BACKGROUND: Hidden Markov models are widely used for genome analysis as they combine ease of modelling with efficient analysis algorithms. Calculating the likelihood of a model using the forward algorithm has worst case time complexity linear in the length of the sequence and quadratic in the number of states in the model. For genome analysis, however, the length runs to millions or billions of observations, and when maximising the likelihood hundreds of evaluations are often needed. A time efficient forward algorithm is therefore a key ingredient in an efficient hidden Markov model library. RESULTS: We have built a software library for efficiently computing the likelihood of a hidden Markov model. The library exploits commonly occurring substrings in the input to reuse computations in the forward algorithm. In a pre-processing step our library identifies common substrings and builds a structure over the computations in the forward algorithm which can be reused. This analysis can be saved between uses of the library and is independent of concrete hidden Markov models so one preprocessing can be used to run a number of different models.Using this library, we achieve up to 78 times shorter wall-clock time for realistic whole-genome analyses with a real and reasonably complex hidden Markov model. In one particular case the analysis was performed in less than 8 minutes compared to 9.6 hours for the previously fastest library. CONCLUSIONS: We have implemented the preprocessing procedure and forward algorithm as a C++ library, zipHMM, with Python bindings for use in scripts. The library is available at http://birc.au.dk/software/ziphmm/.

Assuntos

Cadeias de Markov , Biblioteca de Peptídeos , Software , Algoritmos , Animais , Simulação por Computador , Gorilla gorilla/genética , Humanos , Funções Verossimilhança , Estudos Observacionais como Assunto , Pan troglodytes/genética , Pongo/genética , Probabilidade , Fatores de Tempo

Hide and seek: placing and finding an optimal tree for thousands of homoplasy-rich sequences.

Radel, Dietrich; Sand, Andreas; Steel, Mike.

Mol Phylogenet Evol ; 69(3): 1186-9, 2013 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-23939134

RESUMO

Finding optimal evolutionary trees from sequence data is typically an intractable problem, and there is usually no way of knowing how close to optimal the best tree from some search truly is. The problem would seem to be particularly acute when we have many taxa and when that data has high levels of homoplasy, in which the individual characters require many changes to fit on the best tree. However, a recent mathematical result has provided a precise tool to generate a short number of high-homoplasy characters for any given tree, so that this tree is provably the optimal tree under the maximum parsimony criterion. This provides, for the first time, a rigorous way to test tree search algorithms on homoplasy-rich data, where we know in advance what the 'best' tree is. In this short note we consider just one search program (TNT) but show that it is able to locate the globally optimal tree correctly for 32,768 taxa, even though the characters in the dataset require, on average, 1148 state-changes each to fit on this tree, and the number of characters is only 57.

Assuntos

Biologia Computacional , Modelos Genéticos , Filogenia , Algoritmos

The standard lateral gene transfer model is statistically consistent for pectinate four-taxon trees.

Sand, Andreas; Steel, Mike.

J Theor Biol ; 335: 295-8, 2013 Oct 21.

Artigo em Inglês | MEDLINE | ID: mdl-23859822

RESUMO

Evolutionary events such as incomplete lineage sorting and lateral gene transfers constitute major problems for inferring species trees from gene trees, as they can sometimes lead to gene trees which conflict with the underlying species tree. One particularly simple and efficient way to infer species trees from gene trees under such conditions is to combine three-taxon analyses for several genes using a majority vote approach. For incomplete lineage sorting this method is known to be statistically consistent; however, for lateral gene transfers it was recently shown that a zone of inconsistency exists for a specific four-taxon tree topology, and it was posed as an open question whether inconsistencies could exist for other four-taxon tree topologies? In this letter we analyze all remaining four-taxon topologies and show that no other inconsistencies exist.

Assuntos

Evolução Molecular , Transferência Genética Horizontal/fisiologia , Modelos Genéticos , Filogenia

A practical O(n log2 n) time algorithm for computing the triplet distance on binary trees.

Sand, Andreas; Brodal, Gerth Stølting; Fagerberg, Rolf; Pedersen, Christian N S; Mailund, Thomas.

BMC Bioinformatics ; 14 Suppl 2: S18, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23368759

RESUMO

The triplet distance is a distance measure that compares two rooted trees on the same set of leaves by enumerating all sub-sets of three leaves and counting how often the induced topologies of the tree are equal or different. We present an algorithm that computes the triplet distance between two rooted binary trees in time O (n log2 n). The algorithm is related to an algorithm for computing the quartet distance between two unrooted binary trees in time O (n log n). While the quartet distance algorithm has a very severe overhead in the asymptotic time complexity that makes it impractical compared to O (n2) time algorithms, we show through experiments that the triplet distance algorithm can be implemented to give a competitive wall-time running time.

Assuntos

Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Filogenia , Software

Algorithms for computing the triplet and quartet distances for binary and general trees.

Sand, Andreas; Holt, Morten K; Johansen, Jens; Fagerberg, Rolf; Brodal, Gerth Stølting; Pedersen, Christian N S; Mailund, Thomas.

Biology (Basel) ; 2(4): 1189-209, 2013 Sep 26.

Artigo em Inglês | MEDLINE | ID: mdl-24833220

RESUMO

Distance measures between trees are useful for comparing trees in a systematic manner, and several different distance measures have been proposed. The triplet and quartet distances, for rooted and unrooted trees, respectively, are defined as the number of subsets of three or four leaves, respectively, where the topologies of the induced subtrees differ. These distances can trivially be computed by explicitly enumerating all sets of three or four leaves and testing if the topologies are different, but this leads to time complexities at least of the order n3 or n4 just for enumerating the sets. The different topologies can be counte dimplicitly, however, and in this paper, we review a series of algorithmic improvements that have been used during the last decade to develop more efficient algorithms by exploiting two different strategies for this; one based on dynamic programming and another based oncoloring leaves in one tree and updating a hierarchical decomposition of the other.

Algorithms for hidden markov models restricted to occurrences of regular expressions.

Tataru, Paula; Sand, Andreas; Hobolth, Asger; Mailund, Thomas; Pedersen, Christian N S.

Biology (Basel) ; 2(4): 1282-95, 2013 Nov 08.

Artigo em Inglês | MEDLINE | ID: mdl-24833225

RESUMO

Hidden Markov Models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed data using a decoding algorithm and analyzing the annotation to study patterns of interest. For example, given an HMM modeling genes in DNA sequences, the focus is on occurrences of genes in the annotation. In this paper, we define a pattern through a regular expression and present a restriction of three classical algorithms to take the number of occurrences of the pattern in the hidden sequence into account. We present a new algorithm to compute the distribution of the number of pattern occurrences, and we extend the two most widely used existing decoding algorithms to employ information from this distribution. We show experimentally that the expectation of the distribution of the number of pattern occurrences gives a highly accurate estimate, while the typical procedure can be biased in the sense that the identified number of pattern occurrences does not correspond to the true number. We furthermore show that using this distribution in the decoding algorithms improves the predictive power of the model.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA