Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 15(3): e0229493, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32119689

RESUMO

It is standard practice to model site-to-site variability of substitution rates by discretizing a continuous distribution into a small number, K, of equiprobable rate categories. We demonstrate that the variance of this discretized distribution has an upper bound determined solely by the choice of K and the mean of the distribution. This bound can introduce biases into statistical inference, especially when estimating parameters governing site-to-site variability of substitution rates. Applications to two large collections of sequence alignments demonstrate that this upper bound is often reached in analyses of real data. When parameter estimation is of primary interest, additional rate categories or more flexible modeling methods should be considered.


Assuntos
Substituição de Aminoácidos , Modelos Genéticos , Análise de Sequência de DNA/métodos , Algoritmos , Evolução Molecular , Funções Verossimilhança , Taxa de Mutação , Filogenia , Alinhamento de Sequência
2.
Mol Biol Evol ; 37(8): 2430-2439, 2020 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-32068869

RESUMO

Most molecular evolutionary studies of natural selection maintain the decades-old assumption that synonymous substitution rate variation (SRV) across sites within genes occurs at levels that are either nonexistent or negligible. However, numerous studies challenge this assumption from a biological perspective and show that SRV is comparable in magnitude to that of nonsynonymous substitution rate variation. We evaluated the impact of this assumption on methods for inferring selection at the molecular level by incorporating SRV into an existing method (BUSTED) for detecting signatures of episodic diversifying selection in genes. Using simulated data we found that failing to account for even moderate levels of SRV in selection testing is likely to produce intolerably high false positive rates. To evaluate the effect of the SRV assumption on actual inferences we compared results of tests with and without the assumption in an empirical analysis of over 13,000 Euteleostomi (bony vertebrate) gene alignments from the Selectome database. This exercise reveals that close to 50% of positive results (i.e., evidence for selection) in empirical analyses disappear when SRV is modeled as part of the statistical analysis and are thus candidates for being false positives. The results from this work add to a growing literature establishing that tests of selection are much more sensitive to certain model assumptions than previously believed.


Assuntos
Modelos Genéticos , Seleção Genética , Mutação Silenciosa , Animais , Filogenia , Rodopsina/genética , Vertebrados/genética
3.
Mol Biol Evol ; 37(1): 295-299, 2020 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-31504749

RESUMO

HYpothesis testing using PHYlogenies (HyPhy) is a scriptable, open-source package for fitting a broad range of evolutionary models to multiple sequence alignments, and for conducting subsequent parameter estimation and hypothesis testing, primarily in the maximum likelihood statistical framework. It has become a popular choice for characterizing various aspects of the evolutionary process: natural selection, evolutionary rates, recombination, and coevolution. The 2.5 release (available from www.hyphy.org) includes a completely re-engineered computational core and analysis library that introduces new classes of evolutionary models and statistical tests, delivers substantial performance and stability enhancements, improves usability, streamlines end-to-end analysis workflows, makes it easier to develop custom analyses, and is mostly backward compatible with previous HyPhy releases.


Assuntos
Técnicas Genéticas , Filogenia , Software
4.
Mol Biol Evol ; 35(3): 773-777, 2018 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-29301006

RESUMO

Inference of how evolutionary forces have shaped extant genetic diversity is a cornerstone of modern comparative sequence analysis. Advances in sequence generation and increased statistical sophistication of relevant methods now allow researchers to extract ever more evolutionary signal from the data, albeit at an increased computational cost. Here, we announce the release of Datamonkey 2.0, a completely re-engineered version of the Datamonkey web-server for analyzing evolutionary signatures in sequence data. For this endeavor, we leveraged recent developments in open-source libraries that facilitate interactive, robust, and scalable web application development. Datamonkey 2.0 provides a carefully curated collection of methods for interrogating coding-sequence alignments for imprints of natural selection, packaged as a responsive (i.e. can be viewed on tablet and mobile devices), fully interactive, and API-enabled web application. To complement Datamonkey 2.0, we additionally release HyPhy Vision, an accompanying JavaScript application for visualizing analysis results. HyPhy Vision can also be used separately from Datamonkey 2.0 to visualize locally executed HyPhy analyses. Together, Datamonkey 2.0 and HyPhy Vision showcase how scientific software development can benefit from general-purpose open-source frameworks. Datamonkey 2.0 is freely and publicly available at http://www.datamonkey.org, and the underlying codebase is available from https://github.com/veg/datamonkey-js.

6.
J Mol Evol ; 73(5-6): 266-72, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-22258433

RESUMO

While molecular analyses have provided insight into the phylogeny of ciliates, the few studies assessing intraspecific variation have largely relied on just a single locus [e.g., nuclear small subunit rDNA (nSSU-rDNA) or mitochondrial cytochrome oxidase I]. In this study, we characterize the diversity of several nuclear protein-coding genes plus both nSSU-rDNA and mitochondrial small subunit rDNA (mtSSU-rDNA) of five isolates of the ciliate morphospecies Chilodonella uncinata. Although these isolates have nearly identical nSSU-rDNA sequences, they differ by up to 8.0% in mtSSU-rDNA. Comparative analyses of all loci, including ß-tubulin paralogs, indicate a lack of recombination between strains, demonstrating that the morphospecies C. uncinata consists of multiple cryptic species. Further, there is considerable variation in substitution rates among loci as some protein-coding domains are nearly identical between isolates, while others differ by up to 13.2% at the amino acid level. Combining insights on macronuclear variation among isolates, the focus of this study, with published data from the micronucleus of two of these isolates, indicates that C. uncinata lineages are able to maintain both highly divergent and highly conserved genes within a rapidly evolving germline genome.


Assuntos
Cilióforos/genética , DNA Ribossômico/genética , Evolução Molecular , Cilióforos/classificação , Genoma , Proteínas Mitocondriais/genética , Proteínas Nucleares/genética , Filogenia , Recombinação Genética/genética , Especificidade da Espécie , Tubulina (Proteína)/genética
7.
PLoS Comput Biol ; 6(8)2010 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-20808876

RESUMO

Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA) method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes.


Assuntos
Algoritmos , Substituição de Aminoácidos/genética , Códon , Modelos Genéticos , Simulação por Computador , DNA Polimerase Dirigida por DNA/genética , Evolução Molecular , HIV-1/genética , Hemaglutininas/genética , Humanos , Cadeias de Markov , Alinhamento de Sequência
8.
PLoS One ; 5(7): e11230, 2010 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-20689581

RESUMO

Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a "corrected" empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of sequence alignments, our estimators show a significant improvement in goodness of fit compared to the approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the -style estimators.


Assuntos
Códon , Modelos Estatísticos , Algoritmos , Viés
9.
PLoS One ; 5(7): e11587, 2010 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-20657773

RESUMO

The single rate codon model of non-synonymous substitution is ubiquitous in phylogenetic modeling. Indeed, the use of a non-synonymous to synonymous substitution rate ratio parameter has facilitated the interpretation of selection pressure on genomes. Although the single rate model has achieved wide acceptance, we argue that the assumption of a single rate of non-synonymous substitution is biologically unreasonable, given observed differences in substitution rates evident from empirical amino acid models. Some have attempted to incorporate amino acid substitution biases into models of codon evolution and have shown improved model performance versus the single rate model. Here, we show that the single rate model of non-synonymous substitution is easily outperformed by a model with multiple non-synonymous rate classes, yet in which amino acid substitution pairs are assigned randomly to these classes. We argue that, since the single rate model is so easy to improve upon, new codon models should not be validated entirely on the basis of improved model fit over this model. Rather, we should strive to both improve on the single rate model and to approximate the general time-reversible model of codon substitution, with as few parameters as possible, so as to reduce model over-fitting. We hint at how this can be achieved with a Genetic Algorithm approach in which rate classes are assigned on the basis of sequence information content.


Assuntos
Códon , Modelos Genéticos , Algoritmos
10.
J Virol ; 82(10): 5099-103, 2008 May.
Artigo em Inglês | MEDLINE | ID: mdl-18321976

RESUMO

To understand astrovirus biology, it is essential to understand factors associated with its evolution. The current study reports the genomic sequences of nine novel turkey astrovirus (TAstV) type 2-like clinical isolates. This represents, to our knowledge, the largest genomic-length data set available for any one astrovirus type. The comparison of these TAstV sequences suggests that the TAstV species contains multiple subtypes and that recombination events have occurred across the astrovirus genome. In addition, the analysis of the capsid gene demonstrated evidence for both site-specific positive selection and purifying selection.


Assuntos
Avastrovirus/genética , Animais , Infecções por Astroviridae/virologia , Avastrovirus/isolamento & purificação , Genoma Viral , Filogenia , Doenças das Aves Domésticas/virologia , RNA Viral/genética , Recombinação Genética , Análise de Sequência de DNA , Homologia de Sequência , Perus , Estados Unidos
11.
Mol Biol Evol ; 24(1): 159-70, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17038448

RESUMO

The choice of a probabilistic model to describe sequence evolution can and should be justified. Underfitting the data through the use of overly simplistic models may miss out on interesting phenomena and lead to incorrect inferences. Overfitting the data with models that are too complex may ascribe biological meaning to statistical artifacts and result in falsely significant findings. We describe a likelihood-based approach for evolutionary model selection. The procedure employs a genetic algorithm (GA) to quickly explore a combinatorially large set of all possible time-reversible Markov models with a fixed number of substitution rates. When applied to stem RNA data subject to well-understood evolutionary forces, the models found by the GA 1) capture the expected overall rate patterns a priori; 2) fit the data better than the best available models based on a priori assumptions, suggesting subtle substitution patterns not previously recognized; 3) cannot be rejected in favor of the general reversible model, implying that the evolution of stem RNA sequences can be explained well with only a few substitution rate parameters; and 4) perform well on simulated data, both in terms of goodness of fit and the ability to estimate evolutionary rates. We also investigate the utility of several distance measures for comparing and contrasting inferred evolutionary models. Using widely available small computer clusters, our approach allows, for the first time, to evaluate the performance of existing RNA evolutionary models by comparing them with a large pool of candidate models and to validate common modeling assumptions. In addition, the new method provides the foundation for rigorous selection and comparison of substitution models for other types of sequence data.


Assuntos
Algoritmos , Evolução Molecular , Conformação de Ácido Nucleico , RNA/química , Animais , Biologia Computacional , HIV/genética , Invertebrados/genética , Funções Verossimilhança , Mamíferos/genética , Modelos Genéticos , RNA/genética , Elementos de Resposta , Alinhamento de Sequência
12.
Mol Biol Evol ; 23(9): 1681-7, 2006 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-16760419

RESUMO

Studies of microbial eukaryotes have been pivotal in the discovery of biological phenomena, including RNA editing, self-splicing RNA, and telomere addition. Here we extend this list by demonstrating that genome architecture, namely the extensive processing of somatic (macronuclear) genomes in some ciliate lineages, is associated with elevated rates of protein evolution. Using newly developed likelihood-based procedures for studying molecular evolution, we investigate 6 genes to compare 1) ciliate protein evolution to that of 3 other clades of eukaryotes (plants, animals, and fungi) and 2) protein evolution in ciliates with extensively processed macronuclear genomes to that of other ciliate lineages. In 5 of the 6 genes, ciliates are estimated to have a higher ratio of nonsynonymous/synonymous substitution rates, consistent with an increase in the rate of protein diversification in ciliates relative to other eukaryotes. Even more striking, there is a significant effect of genome architecture within ciliates as the most divergent proteins are consistently found in those lineages with the most highly processed macronuclear genomes. We propose a model whereby genome architecture-specifically chromosomal processing, amitosis within macronuclei, and epigenetics-allows ciliates to explore protein space in a novel manner. Further, we predict that examination of diverse eukaryotes will reveal additional evidence of the impact of genome architecture on molecular evolution.


Assuntos
Cilióforos/genética , Evolução Molecular , Variação Genética , Genoma de Protozoário , Seleção Genética , Animais , Especificidade da Espécie
13.
Mol Biol Evol ; 22(12): 2375-85, 2005 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-16107593

RESUMO

We develop a new model for studying the molecular evolution of protein-coding DNA sequences. In contrast to existing models, we incorporate the potential for site-to-site heterogeneity of both synonymous and nonsynonymous substitution rates. We demonstrate that within-gene heterogeneity of synonymous substitution rates appears to be common. Using the new family of models, we investigate the utility of a variety of new statistical inference procedures, and we pay particular attention to issues surrounding the detection of sites undergoing positive selection. We discuss how failure to model synonymous rate variation in the model can lead to misidentification of sites as positively selected.


Assuntos
Substituição de Aminoácidos , Evolução Molecular , Modelos Genéticos , Seleção Genética , Aminoácidos/genética , Códon , Variação Genética , Humanos , Mutação , Fases de Leitura Aberta , Filogenia
14.
J Mol Evol ; 61(3): 325-32, 2005 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-16044247

RESUMO

We analyze members of the receptor-like kinase (RLK) gene family in Arabidopsis thaliana for positive selection. Likelihood analyses find evidence for positive selection in 12 of the 52 RLK family sequences groups. These 12 groups represent 97 of the 403 sequences analyzed. The majority of genes in groups subject to positive selection have not been functionally characterized, but sites under selection are predominantly located in the extracellular region. The pattern of selection in the extracellular leucine-rich repeat (LRR) motif of groups 14 and 51 is similar to previous studies where positively selected positions are located in a solvent exposed beta-strand that may determine disease specificity, raising the possibility that some RLK genes function in a similar role.


Assuntos
Proteínas de Arabidopsis/classificação , Proteínas de Arabidopsis/genética , Arabidopsis/classificação , Arabidopsis/genética , Fosfotransferases/classificação , Fosfotransferases/genética , Arabidopsis/enzimologia , Leucina/genética , Filogenia
15.
Bioinformatics ; 21(9): 2128-9, 2005 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-15705655

RESUMO

SUMMARY: PowerMarker delivers a data-driven, integrated analysis environment (IAE) for genetic data. The IAE integrates data management, analysis and visualization in a user-friendly graphical user interface. It accelerates the analysis lifecycle and enables users to maintain data integrity throughout the process. An ever-growing list of more than 50 different statistical analyses for genetic markers has been implemented in PowerMarker. AVAILABILITY: www.powermarker.net


Assuntos
Análise Mutacional de DNA/métodos , Marcadores Genéticos/genética , Polimorfismo de Nucleotídeo Único/genética , Software , Interface Usuário-Computador , Algoritmos , Gráficos por Computador , Frequência do Gene , Genética Populacional/métodos , Desequilíbrio de Ligação/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos
16.
Bioinformatics ; 21(5): 676-9, 2005 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-15509596

RESUMO

UNLABELLED: The HyPhypackage is designed to provide a flexible and unified platform for carrying out likelihood-based analyses on multiple alignments of molecular sequence data, with the emphasis on studies of rates and patterns of sequence evolution. AVAILABILITY: http://www.hyphy.org CONTACT: muse@stat.ncsu.edu SUPPLEMENTARY INFORMATION: HyPhydocumentation and tutorials are available at http://www.hyphy.org.


Assuntos
Algoritmos , Evolução Molecular , Modelos Genéticos , Filogenia , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Interface Usuário-Computador , Simulação por Computador
17.
Syst Biol ; 53(5): 685-92, 2004 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-15545249

RESUMO

Likelihood applications have become a central approach for molecular evolutionary analyses since the first computationally tractable treatment two decades ago. Although Felsenstein's original pruning algorithm makes likelihood calculations feasible, it is usually possible to take advantage of repetitive structure present in the data to arrive at even greater computational reductions. In particular, alignment columns with certain similarities have components of the likelihood calculation that are identical and need not be recomputed if columns are evaluated in an optimal order. We develop an algorithm for exploiting this speed improvement via an application of graph theory. The reductions provided by the method depend on both the tree and the data, but typical savings range between 15%and 50%. Real-data examples with time reductions of 80%have been identified. The overhead costs associated with implementing the algorithm are minimal, and they are recovered in all but the smallest data sets. The modifications will provide faster likelihood algorithms, which will allow likelihood methods to be applied to larger sets of taxa and to include more thorough searches of the tree topology space.


Assuntos
Algoritmos , Classificação/métodos , Evolução Molecular , Modelos Genéticos , Filogenia , Sequência de Bases/genética , Funções Verossimilhança
18.
Mol Biol Evol ; 21(3): 555-62, 2004 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-14694079

RESUMO

The accumulation of divergent histone H4 amino acid sequences within and between ciliate lineages challenges traditional views of the evolution of this essential eukaryotic protein. We analyzed histone H4 sequences from 13 species of ciliates and compared these data with sequences from well-sampled eukaryotic clades. Ciliate histone H4s differ from one another at as many as 46% of their amino acids, in contrast with the highly conserved character of this protein in most other eukaryotes. Equally striking, we find paralogs of histone H4 within ciliate genomes that differ by up to 25% of their amino acids, whereas paralogs in other eukaryotes share identical or nearly identical amino acid sequences. Moreover, the most divergent H4 proteins within ciliates are found in the lineages with highly processed macronuclear genomes. Our analyses demonstrate that the dual nature of ciliate genomes-the presence of a "germline" micronucleus and a "somatic" macronucleus within each cell-allowed the dramatic variation in ciliate histone genes by altering functional constraints or enabling adaptive evolution of the histone H4 protein, or both.


Assuntos
Cilióforos/genética , Evolução Molecular , Variação Genética , Histonas/genética , Substituição de Aminoácidos , Animais , Genealogia e Heráldica
19.
Evolution ; 56(6): 1110-22, 2002 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-12144013

RESUMO

Ciliates provide a powerful system to analyze the evolution of duplicated alpha-tubulin genes in the context of single-celled organisms. Genealogical analyses of ciliate alpha-tubulin sequences reveal five apparently recent gene duplications. Comparisons of paralogs in different ciliates implicate differing patterns of substitutions (e.g., ratios of replacement/synonymous nucleotides and radical/conservative amino acids) following duplication. Most substitutions between paralogs in Euplotes crassus, Halteria grandinella and Paramecium tetraurelia are synonymous. In contrast, alpha-tubulin paralogs within Stylonychia lemnae and Chilodonella uncinata are evolving at significantly different rates and have higher ratios of both replacement substitutions to synonymous substitutions and radical amino acid changes to conservative amino acid changes. Moreover, the amino acid substitutions in C. uncinata and S. lemnae paralogs are limited to short stretches that correspond to functionally important regions of the alpha-tubulin protein. The topology of ciliate alpha-tubulin genealogies are inconsistent with taxonomy based on morphology and other molecular markers, which may be due to taxonomic sampling, gene conversion, unequal rates of evolution, or asymmetric patterns of gene duplication and loss.


Assuntos
Cilióforos/genética , Evolução Molecular , Duplicação Gênica , Tubulina (Proteína)/genética , Sequência de Aminoácidos , Animais , Sequência de Bases , Clonagem Molecular , Variação Genética , Dados de Sequência Molecular , Filogenia , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Homologia de Sequência do Ácido Nucleico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...