Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters










Database
Language
Publication year range
1.
Bioinformatics ; 30(16): 2280-7, 2014 Aug 15.
Article in English | MEDLINE | ID: mdl-24764459

ABSTRACT

MOTIVATION: Although the majority of gene histories found in a clade of organisms are expected to be generated by a common process (e.g. the coalescent process), it is well known that numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history distinct from those of the majority of genes. Such 'outlying' gene trees are considered to be biologically interesting, and identifying these genes has become an important problem in phylogenetics. RESULTS: We propose and implement kdetrees, a non-parametric method for estimating distributions of phylogenetic trees, with the goal of identifying trees that are significantly different from the rest of the trees in the sample. Our method compares favorably with a similar recently published method, featuring an improvement of one polynomial order of computational complexity (to quadratic in the number of trees analyzed), with simulation studies suggesting only a small penalty to classification accuracy. Application of kdetrees to a set of Apicomplexa genes identified several unreliable sequence alignments that had escaped previous detection, as well as a gene independently reported as a possible case of horizontal gene transfer. We also analyze a set of Epichloë genes, fungi symbiotic with grasses, successfully identifying a contrived instance of paralogy. AVAILABILITY AND IMPLEMENTATION: Our method for estimating tree distributions and identifying outlying trees is implemented as the R package kdetrees and is available for download from CRAN.


Subject(s)
Phylogeny , Algorithms , Apicomplexa/genetics , Epichloe/genetics , Gene Transfer, Horizontal , Genes , Sequence Alignment , Software , Statistics, Nonparametric
2.
PLoS Comput Biol ; 2(6): e73, 2006 Jun 23.
Article in English | MEDLINE | ID: mdl-16789815

ABSTRACT

The classic algorithms of Needleman-Wunsch and Smith-Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). To process large genomes that have undergone complex genome rearrangements, almost all existing whole genome alignment methods apply fast heuristics to divide genomes into small pieces that are suitable for Needleman-Wunsch alignment. In these alignment methods, it is standard practice to fix the parameters and to produce a single alignment for subsequent analysis by biologists. As the number of alignment programs applied on a whole genome scale continues to increase, so does the disagreement in their results. The alignments produced by different programs vary greatly, especially in non-coding regions of eukaryotic genomes where the biologically correct alignment is hard to find. Parametric alignment is one possible remedy. This methodology resolves the issue of robustness to changes in parameters by finding all optimal alignments for all possible parameters in a PHMM. Our main result is the construction of a whole genome parametric alignment of Drosophila melanogaster and Drosophila pseudoobscura. This alignment draws on existing heuristics for dividing whole genomes into small pieces for alignment, and it relies on advances we have made in computing convex polytopes that allow us to parametrically align non-coding regions using biologically realistic models. We demonstrate the utility of our parametric alignment for biological inference by showing that cis-regulatory elements are more conserved between Drosophila melanogaster and Drosophila pseudoobscura than previously thought. We also show how whole genome parametric alignment can be used to quantitatively assess the dependence of branch length estimates on alignment parameters.


Subject(s)
Algorithms , Chromosome Mapping/methods , Drosophila/genetics , Genome, Insect/genetics , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Animals , Base Sequence , Conserved Sequence , Molecular Sequence Data , Sequence Homology, Nucleic Acid
SELECTION OF CITATIONS
SEARCH DETAIL
...