Pesquisa | Portal Regional da BVS (teste)

1.

Polymorphism-aware models in RevBayes: Species trees, disentangling Balancing Selection and CG-biased gene conversion.

Braichenko, Svitlana; Borges, Rui; Kosiol, Carolin.

Mol Biol Evol ; 2024 Jul 09.

Artigo em Inglês | MEDLINE | ID: mdl-38980178

RESUMO

The role of balancing selection is a long-standing evolutionary puzzle. Balancing selection is a crucial evolutionary process that maintains genetic variation (polymorphism) over extended periods of time; however, detecting it poses a significant challenge. Building upon the polymorphism-aware phylogenetic models (PoMos) framework rooted in the Moran model, we introduce PoMoBalance model. This novel approach is designed to disentangle the interplay of mutation, genetic drift, directional selection (GC-biased gene conversion), along with the previously unexplored balancing selection pressures on ultra-long timescales comparable with species divergence times by analysing multi-individual genomic and phylogenetic divergence data. Implemented in the open-source RevBayes Bayesian framework, PoMoBalance offers a versatile tool for inferring phylogenetic trees as well as quantifying various selective pressures. The novel aspect of our approach in studying balancing selection lies in PoMos' ability to account for ancestral polymorphisms and incorporate parameters that measure frequency-dependent selection, allowing us to determine the strength of the effect and exact frequencies under selection. We implemented validation tests and assessed the model on the data simulated with SLiM and a custom Moran model simulator. Real sequence analysis of Drosophila populations reveals insights into the evolutionary dynamics of regions subject to frequency-dependent balancing selection, particularly in the context of sex-limited colour dimorphism in Drosophila erecta.

2.

The Patterns of Codon Usage between Chordates and Arthropods are Different but Co-evolving with Mutational Biases.

Kotari, Ioanna; Kosiol, Carolin; Borges, Rui.

Mol Biol Evol ; 41(5)2024 May 03.

Artigo em Inglês | MEDLINE | ID: mdl-38667829

RESUMO

Different frequencies amongst codons that encode the same amino acid (i.e. synonymous codons) have been observed in multiple species. Studies focused on uncovering the forces that drive such codon usage showed that a combined effect of mutational biases and translational selection works to produce different frequencies of synonymous codons. However, only few have been able to measure and distinguish between these forces that may leave similar traces on the coding regions. Here, we have developed a codon model that allows the disentangling of mutation, selection on amino acids and synonymous codons, and GC-biased gene conversion (gBGC) which we employed on an extensive dataset of 415 chordates and 191 arthropods. We found that chordates need 15 more synonymous codon categories than arthropods to explain the empirical codon frequencies, which suggests that the extent of codon usage can vary greatly between animal phyla. Moreover, methylation at CpG sites seems to partially explain these patterns of codon usage in chordates but not in arthropods. Despite the differences between the two phyla, our findings demonstrate that in both, GC-rich codons are disfavored when mutations are GC-biased, and the opposite is true when mutations are AT-biased. This indicates that selection on the genomic coding regions might act primarily to stabilize its GC/AT content on a genome-wide level. Our study shows that the degree of synonymous codon usage varies considerably among animals, but is likely governed by a common underlying dynamic.

Assuntos

Artrópodes , Uso do Códon , Seleção Genética , Animais , Artrópodes/genética , Cordados/genética , Mutação , Evolução Molecular , Códon , Modelos Genéticos , Composição de Bases , Conversão Gênica

3.

Inference of genomic landscapes using ordered Hidden Markov Models with emission densities (oHMMed).

Vogl, Claus; Karapetiants, Mariia; Yildirim, Burçin; Kjartansdóttir, Hrönn; Kosiol, Carolin; Bergman, Juraj; Majka, Michal; Mikula, Lynette Caitlin.

BMC Bioinformatics ; 25(1): 151, 2024 Apr 16.

Artigo em Inglês | MEDLINE | ID: mdl-38627634

RESUMO

BACKGROUND: Genomes are inherently inhomogeneous, with features such as base composition, recombination, gene density, and gene expression varying along chromosomes. Evolutionary, biological, and biomedical analyses aim to quantify this variation, account for it during inference procedures, and ultimately determine the causal processes behind it. Since sequential observations along chromosomes are not independent, it is unsurprising that autocorrelation patterns have been observed e.g., in human base composition. In this article, we develop a class of Hidden Markov Models (HMMs) called oHMMed (ordered HMM with emission densities, the corresponding R package of the same name is available on CRAN): They identify the number of comparably homogeneous regions within autocorrelated observed sequences. These are modelled as discrete hidden states; the observed data points are realisations of continuous probability distributions with state-specific means that enable ordering of these distributions. The observed sequence is labelled according to the hidden states, permitting only neighbouring states that are also neighbours within the ordering of their associated distributions. The parameters that characterise these state-specific distributions are inferred. RESULTS: We apply our oHMMed algorithms to the proportion of G and C bases (modelled as a mixture of normal distributions) and the number of genes (modelled as a mixture of poisson-gamma distributions) in windows along the human, mouse, and fruit fly genomes. This results in a partitioning of the genomes into regions by statistically distinguishable averages of these features, and in a characterisation of their continuous patterns of variation. In regard to the genomic G and C proportion, this latter result distinguishes oHMMed from segmentation algorithms based in isochore or compositional domain theory. We further use oHMMed to conduct a detailed analysis of variation of chromatin accessibility (ATAC-seq) and epigenetic markers H3K27ac and H3K27me3 (modelled as a mixture of poisson-gamma distributions) along the human chromosome 1 and their correlations. CONCLUSIONS: Our algorithms provide a biologically assumption free approach to characterising genomic landscapes shaped by continuous, autocorrelated patterns of variation. Despite this, the resulting genome segmentation enables extraction of compositionally distinct regions for further downstream analyses.

Assuntos

Genoma , Genômica , Animais , Humanos , Camundongos , Cadeias de Markov , Composição de Bases , Probabilidade , Algoritmos

4.

Selection on the Fly: Short-Term Adaptation to an Altered Sexual Selection Regime in Drosophila pseudoobscura.

Barata, Carolina; Snook, Rhonda R; Ritchie, Michael G; Kosiol, Carolin.

Genome Biol Evol ; 15(7)2023 07 03.

Artigo em Inglês | MEDLINE | ID: mdl-37341535

RESUMO

Experimental evolution studies are powerful approaches to examine the evolutionary history of lab populations. Such studies have shed light on how selection changes phenotypes and genotypes. Most of these studies have not examined the time course of adaptation under sexual selection manipulation, by resequencing the populations' genomes at multiple time points. Here, we analyze allele frequency trajectories in Drosophila pseudoobscura where we altered their sexual selection regime for 200 generations and sequenced pooled populations at 5 time points. The intensity of sexual selection was either relaxed in monogamous populations (M) or elevated in polyandrous lines (E). We present a comprehensive study of how selection alters population genetics parameters at the chromosome and gene level. We investigate differences in the effective population size-Ne-between the treatments, and perform a genome-wide scan to identify signatures of selection from the time-series data. We found genomic signatures of adaptation to both regimes in D. pseudoobscura. There are more significant variants in E lines as expected from stronger sexual selection. However, we found that the response on the X chromosome was substantial in both treatments, more pronounced in E and restricted to the more recently sex-linked chromosome arm XR in M. In the first generations of experimental evolution, we estimate Ne to be lower on the X in E lines, which might indicate a swift adaptive response at the onset of selection. Additionally, the third chromosome was affected by elevated polyandry whereby its distal end harbors a region showing a strong signal of adaptive evolution especially in E lines.

Assuntos

Drosophila , Seleção Sexual , Animais , Drosophila/genética , Frequência do Gene , Genética Populacional , Adaptação Fisiológica/genética , Seleção Genética , Evolução Biológica

5.

Bait-ER: A Bayesian method to detect targets of selection in Evolve-and-Resequence experiments.

Barata, Carolina; Borges, Rui; Kosiol, Carolin.

J Evol Biol ; 36(1): 29-44, 2023 01.

Artigo em Inglês | MEDLINE | ID: mdl-36544394

RESUMO

For over a decade, experimental evolution has been combined with high-throughput sequencing techniques. In so-called Evolve-and-Resequence (E&R) experiments, populations are kept in the laboratory under controlled experimental conditions where their genomes are sampled and allele frequencies monitored. However, identifying signatures of adaptation in E&R datasets is far from trivial, and it is still necessary to develop more efficient and statistically sound methods for detecting selection in genome-wide data. Here, we present Bait-ER - a fully Bayesian approach based on the Moran model of allele evolution to estimate selection coefficients from E&R experiments. The model has overlapping generations, a feature that describes several experimental designs found in the literature. We tested our method under several different demographic and experimental conditions to assess its accuracy and precision, and it performs well in most scenarios. Nevertheless, some care must be taken when analysing trajectories where drift largely dominates and starting frequencies are low. We compare our method with other available software and report that ours has generally high accuracy even for trajectories whose complexity goes beyond a classical sweep model. Furthermore, our approach avoids the computational burden of simulating an empirical null distribution, outperforming available software in terms of computational time and facilitating its use on genome-wide data. We implemented and released our method in a new open-source software package that can be accessed at https://doi.org/10.5281/zenodo.7351736.

Assuntos

Seleção Genética , Software , Teorema de Bayes , Frequência do Gene , Adaptação Fisiológica

6.

Nucleotide Usage Biases Distort Inferences of the Species Tree.

Borges, Rui; Boussau, Bastien; Szöllosi, Gergely J; Kosiol, Carolin.

Genome Biol Evol ; 14(1)2022 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-34983052

RESUMO

Despite the importance of natural selection in species' evolutionary history, phylogenetic methods that take into account population-level processes typically ignore selection. The assumption of neutrality is often based on the idea that selection occurs at a minority of loci in the genome and is unlikely to compromise phylogenetic inferences significantly. However, genome-wide processes like GC-bias and some variation segregating at the coding regions are known to evolve in the nearly neutral range. As we are now using genome-wide data to estimate species trees, it is natural to ask whether weak but pervasive selection is likely to blur species tree inferences. We developed a polymorphism-aware phylogenetic model tailored for measuring signatures of nucleotide usage biases to test the impact of selection in the species tree. Our analyses indicate that although the inferred relationships among species are not significantly compromised, the genetic distances are systematically underestimated in a node-height-dependent manner: that is, the deeper nodes tend to be more underestimated than the shallow ones. Such biases have implications for molecular dating. We dated the evolutionary history of 30 worldwide fruit fly populations, and we found signatures of GC-bias considerably affecting the estimated divergence times (up to 23%) in the neutral model. Our findings call for the need to account for selection when quantifying divergence or dating species evolution.

Assuntos

Uso do Códon , Evolução Molecular , Animais , Uso do Códon/genética , Drosophila , Nucleotídeos , Filogenia , Seleção Genética

7.

Consistency and identifiability of the polymorphism-aware phylogenetic models.

Borges, Rui; Kosiol, Carolin.

J Theor Biol ; 486: 110074, 2020 02 07.

Artigo em Inglês | MEDLINE | ID: mdl-31711991

RESUMO

Polymorphism-aware phylogenetic models (PoMo) constitute an alternative approach for species tree estimation from genome-wide data. PoMo builds on the standard substitution models of DNA evolution but expands the classic alphabet of the four nucleotide bases to include polymorphic states. By doing so, PoMo accounts for ancestral and current intra-population variation, while also accommodating population-level processes ruling the substitution process (e.g. genetic drift, mutations, allelic selection). PoMo has shown to be a valuable tool in several phylogenetic applications but a proof of statistical consistency (and identifiability, a necessary condition for consistency) is lacking. Here, we prove that PoMo is identifiable and, using this result, we further show that the maximum a posteriori (MAP) tree estimator of PoMo is a consistent estimator of the species tree. We complement our theoretical results with a simulated data set mimicking the diversity observed in natural populations exhibiting incomplete lineage sorting. We implemented PoMo in a Bayesian framework and show that the MAP tree easily recovers the true tree for typical numbers of sites that are sampled in genome-wide analyses.

Assuntos

Estudo de Associação Genômica Ampla , Modelos Genéticos , Teorema de Bayes , Evolução Molecular , Filogenia , Polimorfismo Genético

8.

Selection Acting on Genomes.

Kosiol, Carolin; Anisimova, Maria.

Methods Mol Biol ; 1910: 373-397, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31278671

RESUMO

Populations evolve as mutations arise in individual organisms and, through hereditary transmission, may become "fixed" (shared by all individuals) in the population. Most mutations are lethal or have negative fitness consequences for the organism. Others have essentially no effect on organismal fitness and can become fixed through the neutral stochastic process known as random drift. However, mutations may also produce a selective advantage that boosts their chances of reaching fixation. Regions of genomes where new mutations are beneficial, rather than neutral or deleterious, tend to evolve more rapidly due to positive selection. Genes involved in immunity and defense are a well-known example; rapid evolution in these genes presumably occurs because new mutations help organisms to prevail in evolutionary "arms races" with pathogens. In recent years genome-wide scans for selection have enlarged our understanding of the genome evolution of various species. In this chapter, we will focus on methods to detect selection on the genome. In particular, we will discuss probabilistic models and how they have changed with the advent of new genome-wide data now available.

Assuntos

Evolução Molecular , Genoma , Seleção Genética , Animais , Códon , Biologia Computacional/métodos , Estudo de Associação Genômica Ampla , Genômica/métodos , Genótipo , Humanos , Mamíferos/genética , Modelos Genéticos , Modelos Estatísticos , Mutação , Filogenia , Polimorfismo Genético , Software , Navegador

9.

Quantifying GC-Biased Gene Conversion in Great Ape Genomes Using Polymorphism-Aware Models.

Borges, Rui; Szöllosi, Gergely J; Kosiol, Carolin.

Genetics ; 212(4): 1321-1336, 2019 08.

Artigo em Inglês | MEDLINE | ID: mdl-31147380

RESUMO

As multi-individual population-scale data become available, more complex modeling strategies are needed to quantify genome-wide patterns of nucleotide usage and associated mechanisms of evolution. Recently, the multivariate neutral Moran model was proposed. However, it was shown insufficient to explain the distribution of alleles in great apes. Here, we propose a new model that includes allelic selection. Our theoretical results constitute the basis of a new Bayesian framework to estimate mutation rates and selection coefficients from population data. We apply the new framework to a great ape dataset, where we found patterns of allelic selection that match those of genome-wide GC-biased gene conversion (gBGC). In particular, we show that great apes have patterns of allelic selection that vary in intensity-a feature that we correlated with great apes' distinct demographies. We also demonstrate that the AT/GC toggling effect decreases the probability of a substitution, promoting more polymorphisms in the base composition of great ape genomes. We further assess the impact of GC-bias in molecular analysis, and find that mutation rates and genetic distances are estimated under bias when gBGC is not properly accounted for. Our results contribute to the discussion on the tempo and mode of gBGC evolution, while stressing the need for gBGC-aware models in population genetics and phylogenetics.

Assuntos

Conversão Gênica , Hominidae/genética , Modelos Genéticos , Animais , Sequência Rica em GC , Genoma , Polimorfismo Genético

10.

The comparative genomics and complex population history of Papio baboons.

Rogers, Jeffrey; Raveendran, Muthuswamy; Harris, R Alan; Mailund, Thomas; Leppälä, Kalle; Athanasiadis, Georgios; Schierup, Mikkel Heide; Cheng, Jade; Munch, Kasper; Walker, Jerilyn A; Konkel, Miriam K; Jordan, Vallmer; Steely, Cody J; Beckstrom, Thomas O; Bergey, Christina; Burrell, Andrew; Schrempf, Dominik; Noll, Angela; Kothe, Maximillian; Kopp, Gisela H; Liu, Yue; Murali, Shwetha; Billis, Konstantinos; Martin, Fergal J; Muffato, Matthieu; Cox, Laura; Else, James; Disotell, Todd; Muzny, Donna M; Phillips-Conroy, Jane; Aken, Bronwen; Eichler, Evan E; Marques-Bonet, Tomas; Kosiol, Carolin; Batzer, Mark A; Hahn, Matthew W; Tung, Jenny; Zinner, Dietmar; Roos, Christian; Jolly, Clifford J; Gibbs, Richard A; Worley, Kim C.

Sci Adv ; 5(1): eaau6947, 2019 01.

Artigo em Inglês | MEDLINE | ID: mdl-30854422

RESUMO

Recent studies suggest that closely related species can accumulate substantial genetic and phenotypic differences despite ongoing gene flow, thus challenging traditional ideas regarding the genetics of speciation. Baboons (genus Papio) are Old World monkeys consisting of six readily distinguishable species. Baboon species hybridize in the wild, and prior data imply a complex history of differentiation and introgression. We produced a reference genome assembly for the olive baboon (Papio anubis) and whole-genome sequence data for all six extant species. We document multiple episodes of admixture and introgression during the radiation of Papio baboons, thus demonstrating their value as a model of complex evolutionary divergence, hybridization, and reticulation. These results help inform our understanding of similar cases, including modern humans, Neanderthals, Denisovans, and other ancient hominins.

Assuntos

Evolução Biológica , Genômica/métodos , Papio/genética , Animais , Sequência de Bases , Feminino , Fluxo Gênico , Haplótipos/genética , Humanos , Hibridização Genética , Masculino , Filogenia , Polimorfismo Genético , Sequenciamento Completo do Genoma

11.

Polymorphism-Aware Species Trees with Advanced Mutation Models, Bootstrap, and Rate Heterogeneity.

Schrempf, Dominik; Minh, Bui Quang; von Haeseler, Arndt; Kosiol, Carolin.

Mol Biol Evol ; 36(6): 1294-1301, 2019 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-30825307

RESUMO

Molecular phylogenetics has neglected polymorphisms within present and ancestral populations for a long time. Recently, multispecies coalescent based methods have increased in popularity, however, their application is limited to a small number of species and individuals. We introduced a polymorphism-aware phylogenetic model (PoMo), which overcomes this limitation and scales well with the increasing amount of sequence data whereas accounting for present and ancestral polymorphisms. PoMo circumvents handling of gene trees and directly infers species trees from allele frequency data. Here, we extend the PoMo implementation in IQ-TREE and integrate search for the statistically best-fit mutation model, the ability to infer mutation rate variation across sites, and assessment of branch support values. We exemplify an analysis of a hundred species with ten haploid individuals each, showing that PoMo can perform inference on large data sets. While PoMo is more accurate than standard substitution models applied to concatenated alignments, it is almost as fast. We also provide bmm-simulate, a software package that allows simulation of sequences evolving under PoMo. The new options consolidate the value of PoMo for phylogenetic analyses with population data.

Assuntos

Modelos Genéticos , Taxa de Mutação , Filogenia , Polimorfismo Genético , Animais , Humanos , Funções Verossimilhança , Software

12.

Inference in population genetics using forward and backward, discrete and continuous time processes.

Bergman, Juraj; Schrempf, Dominik; Kosiol, Carolin; Vogl, Claus.

J Theor Biol ; 439: 166-180, 2018 02 14.

Artigo em Inglês | MEDLINE | ID: mdl-29229523

RESUMO

A central aim of population genetics is the inference of the evolutionary history of a population. To this end, the underlying process can be represented by a model of the evolution of allele frequencies parametrized by e.g., the population size, mutation rates and selection coefficients. A large class of models use forward-in-time models, such as the discrete Wright-Fisher and Moran models and the continuous forward diffusion, to obtain distributions of population allele frequencies, conditional on an ancestral initial allele frequency distribution. Backward-in-time diffusion processes have been rarely used in the context of parameter inference. Here, we demonstrate how forward and backward diffusion processes can be combined to efficiently calculate the exact joint probability distribution of sample and population allele frequencies at all times in the past, for both discrete and continuous population genetics models. This procedure is analogous to the forward-backward algorithm of hidden Markov models. While the efficiency of discrete models is limited by the population size, for continuous models it suffices to expand the transition density in orthogonal polynomials of the order of the sample size to infer marginal likelihoods of population genetic parameters. Additionally, conditional allele trajectories and marginal likelihoods of samples from single populations or from multiple populations that split in the past can be obtained. The described approaches allow for efficient maximum likelihood inference of population genetic parameters in a wide variety of demographic scenarios.

Assuntos

Genética Populacional/métodos , Modelos Genéticos , Algoritmos , Evolução Biológica , Frequência do Gene , Funções Verossimilhança , Cadeias de Markov , Métodos , Densidade Demográfica , Tempo

13.

Approximate maximum likelihood estimation for population genetic inference.

Bertl, Johanna; Ewing, Gregory; Kosiol, Carolin; Futschik, Andreas.

Stat Appl Genet Mol Biol ; 16(5-6): 387-405, 2017 11 27.

Artigo em Inglês | MEDLINE | ID: mdl-29095700

RESUMO

In many population genetic problems, parameter estimation is obstructed by an intractable likelihood function. Therefore, approximate estimation methods have been developed, and with growing computational power, sampling-based methods became popular. However, these methods such as Approximate Bayesian Computation (ABC) can be inefficient in high-dimensional problems. This led to the development of more sophisticated iterative estimation methods like particle filters. Here, we propose an alternative approach that is based on stochastic approximation. By moving along a simulated gradient or ascent direction, the algorithm produces a sequence of estimates that eventually converges to the maximum likelihood estimate, given a set of observed summary statistics. This strategy does not sample much from low-likelihood regions of the parameter space, and is fast, even when many summary statistics are involved. We put considerable efforts into providing tuning guidelines that improve the robustness and lead to good performance on problems with high-dimensional summary statistics and a low signal-to-noise ratio. We then investigate the performance of our resulting approach and study its properties in simulations. Finally, we re-estimate parameters describing the demographic history of Bornean and Sumatran orang-utans.

Assuntos

Genética Populacional/métodos , Funções Verossimilhança , Modelos Genéticos , Algoritmos , Teorema de Bayes , Simulação por Computador , Evolução Molecular

14.

Adaptive sequence evolution is driven by biotic stress in a pair of orchid species (Dactylorhiza) with distinct ecological optima.

Balao, Francisco; Trucchi, Emiliano; Wolfe, Thomas M; Hao, Bao-Hai; Lorenzo, Maria Teresa; Baar, Juliane; Sedman, Laura; Kosiol, Carolin; Amman, Fabian; Chase, Mark W; Hedrén, Mikael; Paun, Ovidiu.

Mol Ecol ; 26(14): 3649-3662, 2017 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-28370647

RESUMO

The orchid family is the largest in the angiosperms, but little is known about the molecular basis of the significant variation they exhibit. We investigate here the transcriptomic divergence between two European terrestrial orchids, Dactylorhiza incarnata and Dactylorhiza fuchsii, and integrate these results in the context of their distinct ecologies that we also document. Clear signals of lineage-specific adaptive evolution of protein-coding sequences are identified, notably targeting elements of biotic defence, including both physical and chemical adaptations in the context of divergent pools of pathogens and herbivores. In turn, a substantial regulatory divergence between the two species appears linked to adaptation/acclimation to abiotic conditions. Several of the pathways affected by differential expression are also targeted by deviating post-transcriptional regulation via sRNAs. Finally, D. incarnata appears to suffer from insufficient sRNA control over the activity of RNA-dependent DNA polymerase, resulting in increased activity of class I transposable elements and, over time, in larger genome size than that of D. fuchsii. The extensive molecular divergence between the two species suggests significant genomic and transcriptomic shock in their hybrids and offers insights into the difficulty of coexistence at the homoploid level. Altogether, biological response to selection, accumulated during the history of these orchids, appears governed by their microenvironmental context, in which biotic and abiotic pressures act synergistically to shape transcriptome structure, expression and regulation.

Assuntos

Adaptação Biológica/genética , Evolução Biológica , Orchidaceae/classificação , Transcriptoma , Elementos de DNA Transponíveis , Ecologia , Meio Ambiente , Genoma de Planta , Genômica

15.

Estimating the Effective Population Size from Temporal Allele Frequency Changes in Experimental Evolution.

Jónás, Ágnes; Taus, Thomas; Kosiol, Carolin; Schlötterer, Christian; Futschik, Andreas.

Genetics ; 204(2): 723-735, 2016 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-27542959

RESUMO

The effective population size ([Formula: see text]) is a major factor determining allele frequency changes in natural and experimental populations. Temporal methods provide a powerful and simple approach to estimate short-term [Formula: see text] They use allele frequency shifts between temporal samples to calculate the standardized variance, which is directly related to [Formula: see text] Here we focus on experimental evolution studies that often rely on repeated sequencing of samples in pools (Pool-seq). Pool-seq is cost-effective and often outperforms individual-based sequencing in estimating allele frequencies, but it is associated with atypical sampling properties: Additional to sampling individuals, sequencing DNA in pools leads to a second round of sampling, which increases the variance of allele frequency estimates. We propose a new estimator of [Formula: see text] which relies on allele frequency changes in temporal data and corrects for the variance in both sampling steps. In simulations, we obtain accurate [Formula: see text] estimates, as long as the drift variance is not too small compared to the sampling and sequencing variance. In addition to genome-wide [Formula: see text] estimates, we extend our method using a recursive partitioning approach to estimate [Formula: see text] locally along the chromosome. Since the type I error is controlled, our method permits the identification of genomic regions that differ significantly in their [Formula: see text] estimates. We present an application to Pool-seq data from experimental evolution with Drosophila and provide recommendations for whole-genome data. The estimator is computationally efficient and available as an R package at https://github.com/ThomasTaus/Nest.

Assuntos

Evolução Molecular Direcionada , Frequência do Gene/genética , Densidade Demográfica , Análise de Sequência de DNA , Alelos , Animais , Drosophila/genética , Polimorfismo de Nucleotídeo Único/genética

16.

Reversible polymorphism-aware phylogenetic models and their application to tree inference.

Schrempf, Dominik; Minh, Bui Quang; De Maio, Nicola; von Haeseler, Arndt; Kosiol, Carolin.

J Theor Biol ; 407: 362-370, 2016 10 21.

Artigo em Inglês | MEDLINE | ID: mdl-27480613

RESUMO

We present a reversible Polymorphism-Aware Phylogenetic Model (revPoMo) for species tree estimation from genome-wide data. revPoMo enables the reconstruction of large scale species trees for many within-species samples. It expands the alphabet of DNA substitution models to include polymorphic states, thereby, naturally accounting for incomplete lineage sorting. We implemented revPoMo in the maximum likelihood software IQ-TREE. A simulation study and an application to great apes data show that the runtimes of our approach and standard substitution models are comparable but that revPoMo has much better accuracy in estimating trees, divergence times and mutation rates. The advantage of revPoMo is that an increase of sample size per species improves estimations but does not increase runtime. Therefore, revPoMo is a valuable tool with several applications, from speciation dating to species tree reconstruction.

Assuntos

Modelos Genéticos , Filogenia , Polimorfismo Genético , Animais , Simulação por Computador , Difusão , Hominidae/genética , Especificidade da Espécie

17.

Genetic diversity of species Fowl aviadenovirus D and Fowl aviadenovirus E.

Marek, Ana; Kaján, Gyozo L; Kosiol, Carolin; Benko, Mária; Schachner, Anna; Hess, Michael.

J Gen Virol ; 97(9): 2323-2332, 2016 09.

Artigo em Inglês | MEDLINE | ID: mdl-27267884

RESUMO

Complete genomes of eight reference strains representing different serotypes within the species Fowl aviadenovirus D (FAdV-D) and Fowl aviadenovirus E (FAdV-E) were sequenced. The sequenced genomes of FAdV-D and FAdV-E members comprise 43 287 to 44 336 bp, and have a gene organization identical to that of an earlier sequenced FAdV-D member (strain A-2A). Highest diversity was noticed in the hexon and fiber genes and ORF19. All genomes sequenced in this study contain one fiber gene. Phylogenetic analyses and G+C content support the division of the genus Aviadenovirus into the currently recognized species. Our data also suggest that strain SR48 should be considered as FAdV-11 instead of FAdV-2 and similarly strain HG as FAdV-8b. The present results complete the list of genome sequences of reference strains representing all serotypes in species FAdV-D and FAdV-E.

Assuntos

Aviadenovirus/classificação , Aviadenovirus/genética , Variação Genética , Composição de Bases , Proteínas do Capsídeo/genética , Análise por Conglomerados , DNA Viral/química , DNA Viral/genética , Ordem dos Genes , Genoma Viral , Filogenia , Análise de Sequência de DNA , Homologia de Sequência

18.

PoMo: An Allele Frequency-Based Approach for Species Tree Estimation.

De Maio, Nicola; Schrempf, Dominik; Kosiol, Carolin.

Syst Biol ; 64(6): 1018-31, 2015 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-26209413

RESUMO

Incomplete lineage sorting can cause incongruencies of the overall species-level phylogenetic tree with the phylogenetic trees for individual genes or genomic segments. If these incongruencies are not accounted for, it is possible to incur several biases in species tree estimation. Here, we present a simple maximum likelihood approach that accounts for ancestral variation and incomplete lineage sorting. We use a POlymorphisms-aware phylogenetic MOdel (PoMo) that we have recently shown to efficiently estimate mutation rates and fixation biases from within and between-species variation data. We extend this model to perform efficient estimation of species trees. We test the performance of PoMo in several different scenarios of incomplete lineage sorting using simulations and compare it with existing methods both in accuracy and computational speed. In contrast to other approaches, our model does not use coalescent theory but is allele frequency based. We show that PoMo is well suited for genome-wide species tree estimation and that on such data it is more accurate than previous approaches.

Assuntos

Classificação/métodos , Simulação por Computador , Frequência do Gene , Filogenia , Animais , Hominidae/classificação , Hominidae/genética , Mutação , Polimorfismo Genético

19.

Gaussian process test for high-throughput sequencing time series: application to experimental evolution.

Topa, Hande; Jónás, Ágnes; Kofler, Robert; Kosiol, Carolin; Honkela, Antti.

Bioinformatics ; 31(11): 1762-70, 2015 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-25614471

RESUMO

MOTIVATION: Recent advances in high-throughput sequencing (HTS) have made it possible to monitor genomes in great detail. New experiments not only use HTS to measure genomic features at one time point but also monitor them changing over time with the aim of identifying significant changes in their abundance. In population genetics, for example, allele frequencies are monitored over time to detect significant frequency changes that indicate selection pressures. Previous attempts at analyzing data from HTS experiments have been limited as they could not simultaneously include data at intermediate time points, replicate experiments and sources of uncertainty specific to HTS such as sequencing depth. RESULTS: We present the beta-binomial Gaussian process model for ranking features with significant non-random variation in abundance over time. The features are assumed to represent proportions, such as proportion of an alternative allele in a population. We use the beta-binomial model to capture the uncertainty arising from finite sequencing depth and combine it with a Gaussian process model over the time series. In simulations that mimic the features of experimental evolution data, the proposed method clearly outperforms classical testing in average precision of finding selected alleles. We also present simulations exploring different experimental design choices and results on real data from Drosophila experimental evolution experiment in temperature adaptation. AVAILABILITY AND IMPLEMENTATION: R software implementing the test is available at https://github.com/handetopa/BBGP.

Assuntos

Evolução Molecular , Frequência do Gene , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alelos , Animais , Drosophila/genética , Genômica/métodos , Modelos Estatísticos , Distribuição Normal , Polimorfismo de Nucleotídeo Único , Software

20.

Complete genome sequences of pigeon adenovirus 1 and duck adenovirus 2 extend the number of species within the genus Aviadenovirus.

Marek, Ana; Kaján, Gyozo L; Kosiol, Carolin; Harrach, Balázs; Schlötterer, Christian; Hess, Michael.

Virology ; 462-463: 107-14, 2014 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-24971703

RESUMO

Complete genomes of the first isolates of pigeon adenovirus 1 (PiAdV-1) and Muscovy duck adenovirus (duck adenovirus 2, DAdV-2) were sequenced. The PiAdV-1 genome is 45,480bp long, and has a gene organization most similar to turkey adenovirus 1. Near the left end of the genome, it lacks ORF0, ORF1A, ORF1B and ORF1C, and possesses ORF52, whereas six novel genes were found near the right end. The DAdV-2 genome is 43,734bp long, and has a gene organization similar to that of goose adenovirus 4 (GoAdV-4). It lacks ORF51, ORF1C and ORF54, and possesses ORF55A and five other novel genes. PiAdV-1 and DAdV-2 genomes contain two and one fiber genes, respectively. Genome organization, G+C content, molecular phylogeny and host type confirm the need to establish two novel species (Pigeon aviadenovirus A and Duck aviadenovirus B) within the genus Aviadenovirus. Phylogenetic data show that DAdV-2 is most closely related to GoAdV-4.

Assuntos

Aviadenovirus/genética , DNA Viral/química , DNA Viral/genética , Genoma Viral , Animais , Aviadenovirus/isolamento & purificação , Composição de Bases , Análise por Conglomerados , Columbidae , Patos , Ordem dos Genes , Dados de Sequência Molecular , Fases de Leitura Aberta , Filogenia , Análise de Sequência de DNA , Sintenia

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA