Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bull Math Biol ; 86(8): 99, 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38954147

RESUMO

Classification of gene trees is an important task both in the analysis of multi-locus phylogenetic data, and assessment of the convergence of Markov Chain Monte Carlo (MCMC) analyses used in Bayesian phylogenetic tree reconstruction. The logistic regression model is one of the most popular classification models in statistical learning, thanks to its computational speed and interpretability. However, it is not appropriate to directly apply the standard logistic regression model to a set of phylogenetic trees, as the space of phylogenetic trees is non-Euclidean and thus contradicts the standard assumptions on covariates. It is well-known in tropical geometry and phylogenetics that the space of phylogenetic trees is a tropical linear space in terms of the max-plus algebra. Therefore, in this paper, we propose an analogue approach of the logistic regression model in the setting of tropical geometry. Our proposed method outperforms classical logistic regression in terms of Area under the ROC Curve in numerical examples, including with data generated by the multi-species coalescent model. Theoretical properties such as statistical consistency have been proved and generalization error rates have been derived. Finally, our classification algorithm is proposed as an MCMC convergence criterion for Mr Bayes. Unlike the convergence metric used by Mr Bayes which is only dependent on tree topologies, our method is sensitive to branch lengths and therefore provides a more robust metric for convergence. In a test case, it is illustrated that the tropical logistic regression can differentiate between two independently run MCMC chains, even when the standard metric cannot.


Assuntos
Algoritmos , Teorema de Bayes , Cadeias de Markov , Conceitos Matemáticos , Modelos Genéticos , Método de Monte Carlo , Filogenia , Modelos Logísticos , Curva ROC , Simulação por Computador
2.
Artigo em Inglês | MEDLINE | ID: mdl-38941208

RESUMO

Much evidence from biological theory and empirical data indicates that, gene trees, phylogenetic trees reconstructed from different genes (loci), do not have to have exactly the same tree topologies. Such incongruence between gene trees might be caused by some "unusual" evolutionary events, such as meiotic sexual recombination in eukaryotes or horizontal transfers of genetic material in prokaryotes. However, most of the gene trees are constrained by the tree topology of the underlying species tree, that is, the phylogenetic tree depicting the evolutionary history of the set of species under consideration. In order to discover "outlying" gene trees which do not follow the "main distribution(s)" of trees, we propose to apply the "tropical metric" with the max-plus algebra from tropical geometry to a non-parametric estimation of gene trees over the space of phylogenetic trees. In this research we apply the "tropical metric," a well-defined metric over the space of phylogenetic trees under the max-plus algebra, to non-parametric estimation of gene trees distribution over the tree space. Kernel density estimator (KDE) is one of the most popular non-parametric estimation of a distribution from a given sample, and we propose an analogue of the classical KDE in the setting of tropical geometry with the tropical metric which measures the length of an intrinsic geodesic between trees over the tree space. We estimate the probability of an observed tree by empirical frequencies of nearby trees, with the level of influence determined by the tropical metric. Then, with simulated data generated from the multispecies coalescent model, we show that the non-parametric estimation of the gene tree distribution using the tropical metric performs better than one using the Billera-Holmes-Vogtmann (BHV) metric developed by Weyenberg et al. in terms of computational times and accuracy. We then apply it to Apicomplexa data.

3.
Shoulder Elbow ; 16(3): 321-329, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38818100

RESUMO

Background: The detailed complexity of triceps brachii insertional footprint continues to challenge surgeons as evidenced by continued reports of triceps-associated complications following elbow procedures. The purpose of this study is to describe the three-dimensional footprint of the triceps brachii at its olecranon insertion at the elbow. Methods: 22 cadaveric elbows were dissected leaving only the distal insertion of the triceps intact. The insertion was defined and probed with a three-dimensional digitizer to create a digital three-dimensional footprint allowing width, height, and surface area of the footprint to be recorded relative to the bare area. The insertional soft tissues of tendon versus muscle along with the shape of the footprints were qualitatively described. Results: The mean width and surface area of the lateral segment was greater in males than in females (30.07 mm vs. 24.37 mm, p = 0.0339 and 282.1 mm vs. 211. 56 mm, p = 0.0181, respectively). No other statistically significant differences between the sexes were noted. The triceps insertional footprint was "crescent-shaped" and consisted of three regions: central tendon, medial muscular extension, and lateral muscular extension. Discussion: These findings can help explain the importance of avoiding these muscular structures during triceps-off approaches and provides the framework for future clinical studies. Clinical Relevance: Basic Science, anatomy study, cadaver dissection.

4.
Neural Netw ; 157: 77-89, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36334541

RESUMO

Support Vector Machines (SVMs) are one of the most popular supervised learning models to classify using a hyperplane in an Euclidean space. Similar to SVMs, tropical SVMs classify data points using a tropical hyperplane under the tropical metric with the max-plus algebra. In this paper, first we show generalization error bounds of tropical SVMs over the tropical projective torus. While the generalization error bounds attained via Vapnik-Chervonenkis (VC) dimensions in a distribution-free manner still depend on the dimension, we also show numerically and theoretically by extreme value statistics that the tropical SVMs for classifying data points from two Gaussian distributions as well as empirical data sets of different neuron types are fairly robust against the curse of dimensionality. Extreme value statistics also underlie the anomalous scaling behaviors of the tropical distance between random vectors with additional noise dimensions. Finally, we define tropical SVMs over a function space with the tropical metric.


Assuntos
Máquina de Vetores de Suporte , Distribuição Normal , Previsões
5.
Bioinformatics ; 36(17): 4590-4598, 2020 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-32516398

RESUMO

MOTIVATION: Due to new technology for efficiently generating genome data, machine learning methods are urgently needed to analyze large sets of gene trees over the space of phylogenetic trees. However, the space of phylogenetic trees is not Euclidean, so ordinary machine learning methods cannot be directly applied. In 2019, Yoshida et al. introduced the notion of tropical principal component analysis (PCA), a statistical method for visualization and dimensionality reduction using a tropical polytope with a fixed number of vertices that minimizes the sum of tropical distances between each data point and its tropical projection. However, their work focused on the tropical projective space rather than the space of phylogenetic trees. We focus here on tropical PCA for dimension reduction and visualization over the space of phylogenetic trees. RESULTS: Our main results are 2-fold: (i) theoretical interpretations of the tropical principal components over the space of phylogenetic trees, namely, the existence of a tropical cell decomposition into regions of fixed tree topology; and (ii) the development of a stochastic optimization method to estimate tropical PCs over the space of phylogenetic trees using a Markov Chain Monte Carlo approach. This method performs well with simulation studies, and it is applied to three empirical datasets: Apicomplexa and African coelacanth genomes as well as sequences of hemagglutinin for influenza from New York. AVAILABILITY AND IMPLEMENTATION: Dataset: http://polytopes.net/Data.tar.gz. Code: http://polytopes.net/tropica_MCMC_codes.tar.gz. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Cadeias de Markov , Método de Monte Carlo , Filogenia , Análise de Componente Principal
6.
IEEE/ACM Trans Comput Biol Bioinform ; 17(4): 1222-1230, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-30507538

RESUMO

Advances in modern genomics have allowed researchers to apply phylogenetic analyses on a genome-wide scale. While large volumes of genomic data can be generated cheaply and quickly, data missingness is a non-trivial and somewhat expected problem. Since the available information is often incomplete for a given set of genetic loci and individual organisms, a large proportion of trees that depict the evolutionary history of a single genetic locus, called gene trees, fail to contain all individuals. Data incompleteness causes difficulties in data collection, information extraction, and gene tree inference. Furthermore, identifying outlying gene trees, which can represent horizontal gene transfers, gene duplications, or hybridizations, is difficult when data is missing from the gene trees. The typical approach is to remove all individuals with missing data from the gene trees, and focus the analysis on individuals whose information is fully available - a huge loss of information. In this work, we propose and design an optimization-based imputation approach to infer the missing distances between leaves in a set of gene trees via a mixed integer non-linear programming model. We also present a new research pipeline, imPhy, that can (i) simulate a set of gene trees with leaves randomly missing in each tree, (ii) impute the missing pairwise distances in each gene tree, (iii) reconstruct the gene trees using the Neighbor Joining (NJ) and Unweighted Pair Group Method with Arithmetic Mean (UPGMA) methods, and (iv) analyze and report the efficiency of the reconstruction. To impute the missing leaves, we employ our newly proposed non-linear programming framework, and demonstrate its capability in reconstructing gene trees with incomplete information in both simulated and empirical datasets. In the empirical datasets apicomplexa and lungfish, our imputation has very small normalized mean square errors, even in the extreme case where 50 percent of the individuals in each gene tree are missing. Data, software, and user manuals can be found at https://github.com/yasuiniko/imPhy.


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Filogenia , Software , Algoritmos , Animais , Bases de Dados Genéticas , Transferência Genética Horizontal/genética , Modelos Genéticos , Dinâmica não Linear
7.
Bull Math Biol ; 81(2): 568-597, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30206809

RESUMO

Principal component analysis is a widely used method for the dimensionality reduction of a given data set in a high-dimensional Euclidean space. Here we define and analyze two analogues of principal component analysis in the setting of tropical geometry. In one approach, we study the Stiefel tropical linear space of fixed dimension closest to the data points in the tropical projective torus; in the other approach, we consider the tropical polytope with a fixed number of vertices closest to the data points. We then give approximative algorithms for both approaches and apply them to phylogenetics, testing the methods on simulated phylogenetic data and on an empirical dataset of Apicomplexa genomes.


Assuntos
Filogenia , Análise de Componente Principal , Algoritmos , Apicomplexa/classificação , Apicomplexa/genética , Biologia Computacional , Heurística Computacional , Simulação por Computador , Bases de Dados Genéticas/estatística & dados numéricos , Genoma de Protozoário , Modelos Lineares , Conceitos Matemáticos , Modelos Genéticos , Modelos Estatísticos
8.
Artigo em Inglês | MEDLINE | ID: mdl-30387738

RESUMO

Evolutionary hypotheses provide important underpinnings of biological and medical sciences, and comprehensive, genome-wide understanding of evolutionary relationships among organisms are needed to test and refine such hypotheses. Theory and empirical evidence clearly indicate that phylogenies (trees) of different genes (loci) should not display precisely matching topologies. The main reason for such phylogenetic incongruence is reticulated evolutionary history of most species due to meiotic sexual recombination in eukaryotes, or horizontal transfers of genetic material in prokaryotes. Nevertheless, many genes should display topologically related phylogenies, and should group into one or more (for genetic hybrids) clusters in poly-dimensional "tree space". Unusual evolutionary histories or effects of selection may result in "outlier" genes with phylogenies that fall outside the main distribution(s) of trees in tree space. We present a new phylogenomic method, CURatio, which uses ratios of total branch lengths in gene trees to help identify phylogenetic outliers in a given set of ortholog groups from multiple genomes. An advantage of CURatio over other methods is that genes absent from and/or duplicated in some genomes can be included in the analysis. We conducted a simulation study under the coalescent model, and showed that, given sufficient species depth and topological difference, these ratios are significantly higher for the "outlier" gene phylogenies. Also, we applied CURatio to a set of annotated genomes of the fungal family, Clavicipitaceae, and identified alkaloid biosynthesis genes as outliers, probably due to a history of duplication and loss. The source code is available at https://github.com/QiwenKang/CURatio, and the empirical data set on Clavicipitaceae and simulated data set are available at Mendeley https://data.mendeley.com/datasets/mrxts7wjrr/1.

9.
IEEE/ACM Trans Comput Biol Bioinform ; 14(6): 1359-1365, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28113725

RESUMO

As costs of genome sequencing have dropped precipitously, development of efficient bioinformatic methods to analyze genome structure and evolution have become ever more urgent. For example, most published phylogenomic studies involve either massive concatenation of sequences, or informal comparisons of phylogenies inferred on a small subset of orthologous genes, neither of which provides a comprehensive overview of evolution or systematic identification of genes with unusual and interesting evolution (e.g., horizontal gene transfers, gene duplication, and subsequent neofunctionalization). We are interested in identifying such "outlying" gene trees from the set of gene trees and estimating the distribution of trees over the "tree space". This paper describes an improvement to the kdetrees algorithm, an adaptation of classical kernel density estimation to the metric space of phylogenetic trees (Billera-Holmes-Vogtman treespace), whereby the kernel normalizing constants, are estimated through the use of the novel holonomic gradient methods. As in the original kdetrees paper, we have applied kdetrees to a set of Apicomplexa genes. The analysis identified several unreliable sequence alignments that had escaped previous detection, as well as a gene independently reported as a possible case of horizontal gene transfer. The updated version of the kdetrees software package is available both from CRAN (the official R package system), as well as from the official development repository on Github. ( github.com/grady/kdetrees).


Assuntos
Algoritmos , Genoma/genética , Genômica/métodos , Filogenia , Simulação por Computador , Alinhamento de Sequência
10.
Biometrika ; 104(4): 901-922, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29422694

RESUMO

Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample's structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the [Formula: see text]th principal component in Euclidean space: the locus of the weighted Fréchet mean of [Formula: see text] vertex trees when the weights vary over the [Formula: see text]-simplex. We establish some basic properties of these objects, in particular showing that they have dimension [Formula: see text], and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components.

11.
Mol Biol Evol ; 33(6): 1618-24, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-26929244

RESUMO

At the present time it is often stated that the maximum likelihood or the Bayesian method of phylogenetic construction is more accurate than the neighbor joining (NJ) method. Our computer simulations, however, have shown that the converse is true if we use p distance in the NJ procedure and the criterion of obtaining the true tree (Pc expressed as a percentage) or the combined quantity (c) of a value of Pc and a value of Robinson-Foulds' average topological error index (dT). This c is given by Pc (1 - dT/dTmax) = Pc (m - 3 - dT/2)/(m - 3), where m is the number of taxa used and dTmax is the maximum possible value of dT, which is given by 2(m - 3). This neighbor joining method with p distance (NJp method) will be shown generally to give the best data-fit model. This c takes a value between 0 and 1, and a tree-making method giving a high value of c is considered to be good. Our computer simulations have shown that the NJp method generally gives a better performance than the other methods and therefore this method should be used in general whether the gene is compositional or it contains the mosaic DNA regions or not.


Assuntos
Biologia Computacional/métodos , Modelos Genéticos , Filogenia , Análise de Sequência de DNA/métodos , Algoritmos , Teorema de Bayes , Simulação por Computador , DNA/genética , Funções Verossimilhança , Probabilidade
12.
Bioinformatics ; 30(16): 2280-7, 2014 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-24764459

RESUMO

MOTIVATION: Although the majority of gene histories found in a clade of organisms are expected to be generated by a common process (e.g. the coalescent process), it is well known that numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history distinct from those of the majority of genes. Such 'outlying' gene trees are considered to be biologically interesting, and identifying these genes has become an important problem in phylogenetics. RESULTS: We propose and implement kdetrees, a non-parametric method for estimating distributions of phylogenetic trees, with the goal of identifying trees that are significantly different from the rest of the trees in the sample. Our method compares favorably with a similar recently published method, featuring an improvement of one polynomial order of computational complexity (to quadratic in the number of trees analyzed), with simulation studies suggesting only a small penalty to classification accuracy. Application of kdetrees to a set of Apicomplexa genes identified several unreliable sequence alignments that had escaped previous detection, as well as a gene independently reported as a possible case of horizontal gene transfer. We also analyze a set of Epichloë genes, fungi symbiotic with grasses, successfully identifying a contrived instance of paralogy. AVAILABILITY AND IMPLEMENTATION: Our method for estimating tree distributions and identifying outlying trees is implemented as the R package kdetrees and is available for download from CRAN.


Assuntos
Filogenia , Algoritmos , Apicomplexa/genética , Epichloe/genética , Transferência Genética Horizontal , Genes , Alinhamento de Sequência , Software , Estatísticas não Paramétricas
13.
BMC Bioinformatics ; 13: 210, 2012 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-22909268

RESUMO

BACKGROUND: The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes such as selection, gene duplication, or horizontal gene transfer. RESULTS: Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributions of gene trees, and we developed the software GeneOut to estimate a p-value for the test. Our approach maps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) to measure the separation between two sets of pre-defined trees. We use a permutation test to assess the significance of the SVM separation. To demonstrate the performance of GeneOut, we applied it to the comparison of gene trees simulated within different species trees across a range of species tree depths. Applied directly to sets of simulated gene trees with large sample sizes, GeneOut was able to detect very small differences between two set of gene trees generated under different species trees. Our statistical test can also include tree reconstruction into its test framework through a variety of phylogenetic optimality criteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the form of receiver operating characteristic (ROC) curves indicated that GeneOut performed well in the detection of differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, it controlled false positive and false negative rates very well, indicating a high degree of accuracy. CONCLUSIONS: The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it an applicable test for any scenario where evolutionary or other factors can lead to trees with different multi-dimensional distributions. The software GeneOut is freely available under the GNU public license.


Assuntos
Filogenia , Análise de Sequência de DNA/métodos , Software , Máquina de Vetores de Suporte , Sequência de Bases , Duplicação Gênica , Transferência Genética Horizontal , Genes
14.
ISRN Orthop ; 2012: 256239, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-24977074

RESUMO

A retrospective review of 21 patients that underwent bone screw removal from the elbow was studied in relation to the type of metal, duration of implantation, and the location of the screws about the elbow. Screw failure during extraction was the dependent variable. Five of 21 patients experienced hardware failure during extraction. Fourteen patients had titanium alloy implants. In four cases, titanium screws broke during extraction. Compared to stainless steel, titanium screw failure during removal was not statistically significant (P = 0.61). Screw removal 12 months after surgery was more likely to result in broken, retained screws in general (P = 0.046) and specifically for titanium alloy (P = 0.003). Bone screws removed from the distal humerus or proximal ulna had an equal chance of fracturing (P = 0.28). There appears to be a time-related association of titanium alloy bone screw failure during hardware removal cases from the elbow. This may be explained by titanium's properties and osseointegration.

15.
Bull Math Biol ; 73(11): 2627-48, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21373975

RESUMO

Balanced minimum evolution (BME) is a statistically consistent distance-based method to reconstruct a phylogenetic tree from an alignment of molecular data. In 2000, Pauplin showed that the BME method is equivalent to optimizing a linear functional over the BME polytope, the convex hull of the BME vectors obtained from Pauplin's formula applied to all binary trees. The BME method is related to the Neighbor Joining (NJ) Algorithm, now known to be a greedy optimization of the BME principle. Further, the NJ and BME algorithms have been studied previously to understand when the NJ Algorithm returns a BME tree for small numbers of taxa. In this paper we aim to elucidate the structure of the BME polytope and strengthen knowledge of the connection between the BME method and NJ Algorithm. We first prove that any subtree-prune-regraft move from a binary tree to another binary tree corresponds to an edge of the BME polytope. Moreover, we describe an entire family of faces parameterized by disjoint clades. We show that these clade-faces are smaller dimensional BME polytopes themselves. Finally, we show that for any order of joining nodes to form a tree, there exists an associated distance matrix (i.e., dissimilarity map) for which the NJ Algorithm returns the BME tree. More strongly, we show that the BME cone and every NJ cone associated to a tree T have an intersection of positive measure.


Assuntos
Evolução Biológica , Algoritmos , Animais , Conceitos Matemáticos , Modelos Genéticos , Filogenia
16.
Bull Math Biol ; 73(4): 829-72, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21181503

RESUMO

Evaluating the likelihood function of parameters in highly-structured population genetic models from extant deoxyribonucleic acid (DNA) sequences is computationally prohibitive. In such cases, one may approximately infer the parameters from summary statistics of the data such as the site-frequency-spectrum (SFS) or its linear combinations. Such methods are known as approximate likelihood or Bayesian computations. Using a controlled lumped Markov chain and computational commutative algebraic methods, we compute the exact likelihood of the SFS and many classical linear combinations of it at a non-recombining locus that is neutrally evolving under the infinitely-many-sites mutation model. Using a partially ordered graph of coalescent experiments around the SFS, we provide a decision-theoretic framework for approximate sufficiency. We also extend a family of classical hypothesis tests of standard neutrality at a non-recombining locus based on the SFS to a more powerful version that conditions on the topological information provided by the SFS.


Assuntos
Genética Populacional/métodos , Modelos Genéticos , Algoritmos , Sequência de Bases , Teorema de Bayes , Simulação por Computador , Heterozigoto , Funções Verossimilhança , Cadeias de Markov , Método de Monte Carlo , Mutação/genética , Linhagem , Densidade Demográfica , Crescimento Demográfico , Probabilidade , Alinhamento de Sequência , Processos Estocásticos
17.
Artigo em Inglês | MEDLINE | ID: mdl-20802801

RESUMO

We propose a statistical method to test whether two phylogenetic trees with given alignments are significantly incongruent. Our method compares the two distributions of phylogenetic trees given by two input alignments, instead of comparing point estimations of trees. This statistical approach can be applied to gene tree analysis for example, detecting unusual events in genome evolution such as horizontal gene transfer and reshuffling. Our method uses difference of means to compare two distributions of trees, after mapping trees into a vector space. Bootstrapping alignment columns can then be applied to obtain p-values. To compute distances between means, we employ a "kernel method" which speeds up distance calculations when trees are mapped in a high-dimensional feature space, e.g., splits or quartets feature space. In this pilot study, first we test our statistical method on data sets simulated under a coalescence model, to test whether two alignments are generated by congruent gene trees. We follow our simulation results with applications to data sets of gophers and lice, grasses and their endophytes, and different fungal genes from the same genome. A companion toolkit, Phylotree, is provided to facilitate computational experiments.

18.
Front Psychiatry ; 1: 138, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-21423447

RESUMO

Although exchange of genetic information by recombination plays an important role in the evolution of viruses, it is not clear how it generates diversity. Understanding recombination events helps with the study of the evolution of new virus strains or new viruses. Geminiviruses are plant viruses which have ambisense single-stranded circular DNA genomes and are one of the most economically important plant viruses in agricultural production. Small circular single-stranded DNA satellites, termed DNA-ß, have recently been found to be associated with some geminivirus infections. In this paper we analyze several DNA-ß sequences of geminiviruses for recombination events using phylogenetic and statistical analysis and we find that one strain from ToLCMaB has a recombination pattern and is a recombinant molecule between two strains from two species, PaLCuB-[IN:Chi:05] (major parent) and ToLCB-[IN:CP:04] (minor parent). We propose that this recombination event contributed to the evolution of the strain of ToLCMaB in South India. The Hidden Markov Chain (HMM) method developed by Webb et al. (2009) estimating phylogenetic tree through out the whole alignment provide us a recombination history of these DNA-ß strains. It is the first time that this statistic method has been used on DNA-ß recombination study and give a clear recombination history of DNA-ß recombination.

19.
Algorithms Mol Biol ; 3: 5, 2008 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-18447942

RESUMO

The popular neighbor-joining (NJ) algorithm used in phylogenetics is a greedy algorithm for finding the balanced minimum evolution (BME) tree associated to a dissimilarity map. From this point of view, NJ is "optimal" when the algorithm outputs the tree which minimizes the balanced minimum evolution criterion. We use the fact that the NJ tree topology and the BME tree topology are determined by polyhedral subdivisions of the spaces of dissimilarity maps [equation; see text] to study the optimality of the neighbor-joining algorithm. In particular, we investigate and compare the polyhedral subdivisions for n

20.
Mol Biol Evol ; 23(3): 491-8, 2006 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-16280538

RESUMO

The "neighbor-joining algorithm" is a recursive procedure for reconstructing trees that is based on a transformation of pairwise distances between leaves. We present a generalization of the neighbor-joining transformation, which uses estimates of phylogenetic diversity rather than pairwise distances in the tree. This leads to an improved neighbor-joining algorithm whose total running time is still polynomial in the number of taxa. On simulated data, the method outperforms other distance-based methods. We have implemented neighbor-joining for subtree weights in a program called MJOIN which is freely available under the Gnu Public License at http://bio.math.berkeley.edu/mjoin/.


Assuntos
Algoritmos , Variação Genética , Modelos Genéticos , Filogenia , Matemática
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...