Pesquisa | Portal Regional da BVS

Efficient algorithms for knowledge-enhanced supertree and supermatrix phylogenetic problems.

Wehe, André; Burleigh, J Gordon; Eulenstein, Oliver.

IEEE/ACM Trans Comput Biol Bioinform ; 10(6): 1432-41, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24407302

RESUMO

Phylogenetic inference is a computationally difficult problem, and constructing high-quality phylogenies that can build upon existing phylogenetic knowledge and synthesize insights from new data remains a major challenge. We introduce knowledge-enhanced phylogenetic problems for both supertree and supermatrix phylogenetic analyses. These problems seek an optimal phylogenetic tree that can only be assembled from a user-supplied set of, possibly incompatible, phylogenetic relationships. We describe exact polynomial time algorithms for the knowledge-enhanced versions of the NP-hard Robinson Foulds, gene duplication, duplication and loss, and deep coalescence supertree problems. Further, we demonstrate that our algorithms can rapidly improve upon results of local search heuristics for these problems. Finally, we introduce a knowledge-enhanced search heuristic that can be applied to any discrete character data set using the maximum parsimony (MP) phylogenetic problem. Although this approach is not guaranteed to find exact solutions, we show that it also can improve upon solutions from commonly used MP heuristics.

Assuntos

Algoritmos , Biologia Computacional/métodos , Filogenia , Inteligência Artificial , Análise por Conglomerados , Evolução Molecular , Duplicação Gênica , Software

Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees.

Burleigh, J Gordon; Bansal, Mukul S; Eulenstein, Oliver; Hartmann, Stefanie; Wehe, André; Vision, Todd J.

Syst Biol ; 60(2): 117-25, 2011 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-21186249

RESUMO

Phylogenetic analyses using genome-scale data sets must confront incongruence among gene trees, which in plants is exacerbated by frequent gene duplications and losses. Gene tree parsimony (GTP) is a phylogenetic optimization criterion in which a species tree that minimizes the number of gene duplications induced among a set of gene trees is selected. The run time performance of previous implementations has limited its use on large-scale data sets. We used new software that incorporates recent algorithmic advances to examine the performance of GTP on a plant data set consisting of 18,896 gene trees containing 510,922 protein sequences from 136 plant taxa (giving a combined alignment length of >2.9 million characters). The relationships inferred from the GTP analysis were largely consistent with previous large-scale studies of backbone plant phylogeny and resolved some controversial nodes. The placement of taxa that were present in few gene trees generally varied the most among GTP bootstrap replicates. Excluding these taxa either before or after the GTP analysis revealed high levels of phylogenetic support across plants. The analyses supported magnoliids sister to a eudicot + monocot clade and did not support the eurosid I and II clades. This study presents a nuclear genomic perspective on the broad-scale phylogenic relationships among plants, and it demonstrates that nuclear genes with a history of duplication and loss can be phylogenetically informative for resolving the plant tree of life.

Assuntos

Classificação/métodos , Filogenia , Plantas/classificação , Plantas/genética , Algoritmos , Etiquetas de Sequências Expressas , Genômica

iGTP: a software package for large-scale gene tree parsimony analysis.

Chaudhary, Ruchi; Bansal, Mukul S; Wehe, André; Fernández-Baca, David; Eulenstein, Oliver.

BMC Bioinformatics ; 11: 574, 2010 Nov 23.

Artigo em Inglês | MEDLINE | ID: mdl-21092314

RESUMO

BACKGROUND: The ever-increasing wealth of genomic sequence information provides an unprecedented opportunity for large-scale phylogenetic analysis. However, species phylogeny inference is obfuscated by incongruence among gene trees due to evolutionary events such as gene duplication and loss, incomplete lineage sorting (deep coalescence), and horizontal gene transfer. Gene tree parsimony (GTP) addresses this issue by seeking a species tree that requires the minimum number of evolutionary events to reconcile a given set of incongruent gene trees. Despite its promise, the use of gene tree parsimony has been limited by the fact that existing software is either not fast enough to tackle large data sets or is restricted in the range of evolutionary events it can handle. RESULTS: We introduce iGTP, a platform-independent software program that implements state-of-the-art algorithms that greatly speed up species tree inference under the duplication, duplication-loss, and deep coalescence reconciliation costs. iGTP significantly extends and improves the functionality and performance of existing gene tree parsimony software and offers advanced features such as building effective initial trees using stepwise leaf addition and the ability to have unrooted gene trees in the input. Moreover, iGTP provides a user-friendly graphical interface with integrated tree visualization software to facilitate analysis of the results. CONCLUSIONS: iGTP enables, for the first time, gene tree parsimony analyses of thousands of genes from hundreds of taxa using the duplication, duplication-loss, and deep coalescence reconciliation costs, all from within a convenient graphical user interface.

Assuntos

Genômica/métodos , Filogenia , Software , Algoritmos , Bases de Dados Genéticas , Evolução Molecular , Duplicação Gênica , Genoma

The gene-duplication problem: near-linear time algorithms for NNI-based local searches.

Bansal, Mukul S; Eulenstein, Oliver; Wehe, André.

IEEE/ACM Trans Comput Biol Bioinform ; 6(2): 221-31, 2009.

Artigo em Inglês | MEDLINE | ID: mdl-19407347

RESUMO

The gene-duplication problem is to infer a species supertree from a collection of gene trees that are confounded by complex histories of gene-duplication events. This problem is NP-complete and thus requires efficient and effective heuristics. Existing heuristics perform a stepwise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. A classical local search problem is the {\tt NNI} search problem, which is based on the nearest neighbor interchange operation. In this work, we 1) provide a novel near-linear time algorithm for the {\tt NNI} search problem, 2) introduce extensions that significantly enlarge the search space of the {\tt NNI} search problem, and 3) present algorithms for these extended versions that are asymptotically just as efficient as our algorithm for the {\tt NNI} search problem. The exceptional speedup achieved in the extended {\tt NNI} search problems makes the gene-duplication problem more tractable for large-scale phylogenetic analyses. We verify the performance of our algorithms in a comparison study using sets of large randomly generated gene trees.

Assuntos

Algoritmos , Duplicação Gênica , Modelos Genéticos , Animais , Biologia Computacional , Filogenia , Análise de Sequência de DNA

The PhyLoTA Browser: processing GenBank for molecular phylogenetics research.

Sanderson, Michael J; Boss, Darren; Chen, Duhong; Cranston, Karen A; Wehe, Andre.

Syst Biol ; 57(3): 335-46, 2008 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-18570030

RESUMO

As an archive of sequence data for over 165,000 species, GenBank is an indispensable resource for phylogenetic inference. Here we describe an informatics processing pipeline and online database, the PhyLoTA Browser (http://loco.biosci.arizona.edu/pb), which offers a view of GenBank tailored for molecular phylogenetics. The first release of the Browser is computed from 2.6 million sequences representing the taxonomically enriched subset of GenBank sequences for eukaryotes (excluding most genome survey sequences, ESTs, and other high-throughput data). In addition to summarizing sequence diversity and species diversity across nodes in the NCBI taxonomy, it reports 87,000 potentially phylogenetically informative clusters of homologous sequences, which can be viewed or downloaded, along with provisional alignments and coarse phylogenetic trees. At each node in the NCBI hierarchy, the user can display a "data availability matrix" of all available sequences for entries in a subtaxa-by-clusters matrix. This matrix provides a guidepost for subsequent assembly of multigene data sets or supertrees. The database allows for comparison of results from previous GenBank releases, highlighting recent additions of either sequences or taxa to GenBank and letting investigators track progress on data availability worldwide. Although the reported alignments and trees are extremely approximate, the database reports several statistics correlated with alignment quality to help users choose from alternative data sources.

Assuntos

Bases de Dados de Ácidos Nucleicos , Filogenia , Software , Análise por Conglomerados , Biologia Computacional/métodos , Internet

DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony.

Wehe, André; Bansal, Mukul S; Burleigh, J Gordon; Eulenstein, Oliver.

Bioinformatics ; 24(13): 1540-1, 2008 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-18474508

RESUMO

UNLABELLED: DupTree is a new software program for inferring rooted species trees from collections of gene trees using the gene tree parsimony approach. The program implements a novel algorithm that significantly improves upon the run time of standard search heuristics for gene tree parsimony, and enables the first truly genome-scale phylogenetic analyses. In addition, DupTree allows users to examine alternate rootings and to weight the reconciliation costs for gene trees. DupTree is an open source project written in C++. AVAILABILITY: DupTree for Mac OS X, Windows, and Linux along with a sample dataset and an on-line manual are available at http://genome.cs.iastate.edu/CBL/DupTree

Assuntos

Algoritmos , Mapeamento Cromossômico/métodos , Análise Mutacional de DNA/métodos , Evolução Molecular , Modelos Genéticos , Análise de Sequência de DNA/métodos , Software , Sequência de Bases , Simulação por Computador , Dados de Sequência Molecular , Filogenia

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA