Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 71
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38451771

RESUMO

We present ViPRA-Haplo, a de novo strain-specific assembly workflow for reconstructing viral haplotypes in a viral population from paired-end next generation sequencing (NGS) data. The proposed Viral Path Reconstruction Algorithm (ViPRA) generates a subset of paths from a De Bruijn graph of reads using the pairing information of reads. The paths generated by ViPRA are an over-estimation of the true contigs. We propose two refinement methods to obtain an optimal set of contigs representing viral haplotypes. The first method clusters paths reconstructed by ViPRA using VSEARCH Deorowicz et al. 2015 based on sequence similarity, while the second method, MLEHaplo, generates a maximum likelihood estimate of viral populations. We evaluated our pipeline on both simulated and real viral quasispecies data from HIV (and real data from SARS-COV-2). Experimental results show that ViPRA-Haplo, although still an overestimation in the number of true contigs, outperforms the existing tool, PEHaplo, providing up to 9% better genome coverage on HIV real data. In addition, ViPRA-Haplo also retains higher diversity of the viral population as demonstrated by the presence of a higher percentage of contigs less than 1000 base pairs (bps), which also contain k-mers with counts less than 100 (representing rarer sequences), which are absent in PEHaplo. For SARS-CoV-2 sequencing data, ViPRA-Haplo reconstructs contigs that cover more than 90% of the reference genome and were able to validate known SARS-CoV-2 strains in the sequencing data.


Assuntos
Algoritmos , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala , SARS-CoV-2 , Sequenciamento de Nucleotídeos em Larga Escala/métodos , SARS-CoV-2/genética , Genoma Viral/genética , Humanos , Haplótipos/genética , COVID-19/virologia , HIV/genética , Biologia Computacional/métodos
2.
Syst Biol ; 2024 Feb 29.
Artigo em Inglês | MEDLINE | ID: mdl-38421146

RESUMO

Hundreds or thousands of loci are now routinely used in modern phylogenomic studies. Concatenation approaches to tree inference assume that there is a single topology for the entire dataset, but different loci may have different evolutionary histories due to incomplete lineage sorting, introgression, and/or horizontal gene transfer; even single loci may not be treelike due to recombination. To overcome this shortcoming, we introduce an implementation of a multi-tree mixture model that we call MAST. This model extends a prior implementation by Boussau et al. (2009) by allowing users to estimate the weight of each of a set of pre-specified bifurcating trees in a single alignment. The MAST model allows each tree to have its own weight, topology, branch lengths, substitution model, nucleotide or amino acid frequencies, and model of rate heterogeneity across sites. We implemented the MAST model in a maximum-likelihood framework in the popular phylogenetic software, IQ-TREE. Simulations show that we can accurately recover the true model parameters, including branch lengths and tree weights for a given set of tree topologies, under a wide range of biologically realistic scenarios. We also show that we can use standard statistical inference approaches to reject a single-tree model when data are simulated under multiple trees (and vice versa). We applied the MAST model to multiple primate datasets and found that it can recover the signal of incomplete lineage sorting in the Great Apes, as well as the asymmetry in minor trees caused by introgression among several macaque species. When applied to a dataset of four Platyrrhine species for which standard concatenated maximum likelihood and gene tree approaches disagree, we observe that MAST gives the highest weight (i.e. the largest proportion of sites) to the tree also supported by gene tree approaches. These results suggest that the MAST model is able to analyse a concatenated alignment using maximum likelihood, while avoiding some of the biases that come with assuming there is only a single tree. We discuss how the MAST model can be extended in the future.

3.
Mol Ecol Resour ; 22(2): 653-663, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34551204

RESUMO

The heteroduplex mobility assay (HMA) has proven to be a robust tool for the detection of genetic variation. Here, we describe a simple and rapid application of the HMA by microfluidic capillary electrophoresis, for phylogenetics and population genetic analyses (pgHMA). We show how commonly applied techniques in phylogenetics and population genetics have equivalents with pgHMA: phylogenetic reconstruction with bootstrapping, skyline plots, and mismatch distribution analysis. We assess the performance and accuracy of pgHMA by comparing the results obtained against those obtained using standard methods of analyses applied to sequencing data. The resulting comparisons demonstrate that: (a) there is a significant linear relationship (R2  = .992) between heteroduplex mobility and genetic distance, (b) phylogenetic trees obtained by HMA and nucleotide sequences present nearly identical topologies, (c) clades with high pgHMA parametric bootstrap support also have high bootstrap support on nucleotide phylogenies, (d) skyline plots estimated from the UPGMA trees of HMA and Bayesian trees of nucleotide data reveal similar trends, especially for the median trend estimate of effective population size, and (e) optimized mismatch distributions of HMA are closely fitted to the mismatch distributions of nucleotide sequences. In summary, pgHMA is an easily-applied method for approximating phylogenetic diversity and population trends.


Assuntos
Genética Populacional , Análise Heteroduplex , Sequência de Bases , Teorema de Bayes , Filogenia
4.
Front Oncol ; 11: 709829, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34604049

RESUMO

BACKGROUND: Single nucleotide polymorphisms (SNPs) are often associated with distinct phenotypes in cancer. The present study investigated associations of cancer risk and outcomes with SNPs discovered by whole exome sequencing of normal lung tissue DNA of 15 non-small cell lung cancer (NSCLC) patients, 10 early stage and 5 advanced stage. METHODS: DNA extracted from normal lung tissue of the 15 NSCLC patients was subjected to whole genome amplification and sequencing and analyzed for the occurrence of SNPs. The association of SNPs with the risk of lung cancer and survival was surveyed using the OncoArray study dataset of 85,716 patients (29,266 cases and 56,450 cancer-free controls) and the Prostate, Lung, Colorectal and Ovarian study subset of 1,175 lung cancer patients. RESULTS: We identified 4 SNPs exclusive to the 5 patients with advanced stage NSCLC: rs10420388 and rs10418574 in the CLPP gene, and rs11126435 and rs2021725 in the M1AP gene. The variant alleles G of SNP rs10420388 and A of SNP rs10418574 in the CLPP gene were associated with increased risk of squamous cell carcinoma (OR = 1.07 and 1.07; P = 0.013 and 0.016, respectively). The variant allele T of SNP rs11126435 in the M1AP gene was associated with decreased risk of adenocarcinoma (OR = 0.95; P = 0.027). There was no significant association of these SNPs with the overall survival of lung cancer patients (P > 0.05). CONCLUSIONS: SNPs identified in the CLPP and M1AP genes may be useful in risk prediction models for lung cancer. The previously established association of the CLPP gene with cancer progression lends relevance to our findings.

5.
PLoS Comput Biol ; 17(9): e1008949, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34516547

RESUMO

A current strategy for obtaining haplotype information from several individuals involves short-read sequencing of pooled amplicons, where fragments from each individual is identified by a unique DNA barcode. In this paper, we report a new method to recover the phylogeny of haplotypes from short-read sequences obtained using pooled amplicons from a mixture of individuals, without barcoding. The method, AFPhyloMix, accepts an alignment of the mixture of reads against a reference sequence, obtains the single-nucleotide-polymorphisms (SNP) patterns along the alignment, and constructs the phylogenetic tree according to the SNP patterns. AFPhyloMix adopts a Bayesian inference model to estimate the phylogeny of the haplotypes and their relative abundances, given that the number of haplotypes is known. In our simulations, AFPhyloMix achieved at least 80% accuracy at recovering the phylogenies and relative abundances of the constituent haplotypes, for mixtures with up to 15 haplotypes. AFPhyloMix also worked well on a real data set of kangaroo mitochondrial DNA sequences.


Assuntos
Código de Barras de DNA Taxonômico , Filogenia , Algoritmos , Teorema de Bayes , DNA Mitocondrial/genética , Humanos , Cadeias de Markov , Método de Monte Carlo , Polimorfismo de Nucleotídeo Único
6.
BMC Bioinformatics ; 21(1): 24, 2020 01 22.
Artigo em Inglês | MEDLINE | ID: mdl-31969110

RESUMO

Following publication of the original article [1], the author reported that there are several errors in the original article.

7.
BMC Bioinformatics ; 20(1): 654, 2019 Dec 11.
Artigo em Inglês | MEDLINE | ID: mdl-31829137

RESUMO

BACKGROUND: In short-read DNA sequencing experiments, the read coverage is a key parameter to successfully assemble the reads and reconstruct the sequence of the input DNA. When coverage is very low, the original sequence reconstruction from the reads can be difficult because of the occurrence of uncovered gaps. Reference guided assembly can then improve these assemblies. However, when the available reference is phylogenetically distant from the sequencing reads, the mapping rate of the reads can be extremely low. Some recent improvements in read mapping approaches aim at modifying the reference according to the reads dynamically. Such approaches can significantly improve the alignment rate of the reads onto distant references but the processing of insertions and deletions remains challenging. RESULTS: Here, we introduce a new algorithm to update the reference sequence according to previously aligned reads. Substitutions, insertions and deletions are performed in the reference sequence dynamically. We evaluate this approach to assemble a western-grey kangaroo mitochondrial amplicon. Our results show that more reads can be aligned and that this method produces assemblies of length comparable to the truth while limiting error rate when classic approaches fail to recover the correct length. Finally, we discuss how the core algorithm of this method could be improved and combined with other approaches to analyse larger genomic sequences. CONCLUSIONS: We introduced an algorithm to perform dynamic alignment of reads on a distant reference. We showed that such approach can improve the reconstruction of an amplicon compared to classically used bioinformatic pipelines. Although not portable to genomic scale in the current form, we suggested several improvements to be investigated to make this method more flexible and allow dynamic alignment to be used for large genome assemblies.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Aprendizado de Máquina , Algoritmos , Animais , Sequência de Bases , Genoma Mitocondrial , Macropodidae/genética , Nucleotídeos/genética
8.
BMC Bioinformatics ; 19(1): 389, 2018 Oct 22.
Artigo em Inglês | MEDLINE | ID: mdl-30348075

RESUMO

BACKGROUND: Pooling techniques, where multiple sub-samples are mixed in a single sample, are widely used to take full advantage of high-throughput DNA sequencing. Recently, Ranjard et al. (PLoS ONE 13:0195090, 2018) proposed a pooling strategy without the use of barcodes. Three sub-samples were mixed in different known proportions (i.e. 62.5%, 25% and 12.5%), and a method was developed to use these proportions to reconstruct the three haplotypes effectively. RESULTS: HaploJuice provides an alternative haplotype reconstruction algorithm for Ranjard et al.'s pooling strategy. HaploJuice significantly increases the accuracy by first identifying the empirical proportions of the three mixed sub-samples and then assembling the haplotypes using a dynamic programming approach. HaploJuice was evaluated against five different assembly algorithms, Hmmfreq (Ranjard et al., PLoS ONE 13:0195090, 2018), ShoRAH (Zagordi et al., BMC Bioinformatics 12:119, 2011), SAVAGE (Baaijens et al., Genome Res 27:835-848, 2017), PredictHaplo (Prabhakaran et al., IEEE/ACM Trans Comput Biol Bioinform 11:182-91, 2014) and QuRe (Prosperi and Salemi, Bioinformatics 28:132-3, 2012). Using simulated and real data sets, HaploJuice reconstructed the true sequences with the highest coverage and the lowest error rate. CONCLUSION: HaploJuice provides high accuracy in haplotype reconstruction, making Ranjard et al.'s pooling strategy more efficient, feasible, and applicable, with the benefit of reducing the sequencing cost.


Assuntos
Algoritmos , Haplótipos/genética , Sequência de Bases , Simulação por Computador , Bases de Dados Genéticas , Humanos
9.
PLoS One ; 13(4): e0195090, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29621260

RESUMO

Next-generation sequencing can be costly and labour intensive. Usually, the sequencing cost per sample is reduced by pooling amplified DNA = amplicons) derived from different individuals on the same sequencing lane. Barcodes unique to each amplicon permit short-read sequences to be assigned appropriately. However, the cost of the library preparation increases with the number of barcodes used. We propose an alternative to barcoding: by using different known proportions of individually-derived amplicons in a pooled sample, each is characterised a priori by an expected depth of coverage. We have developed a Hidden Markov Model that uses these expected proportions to reconstruct the input sequences. We apply this method to pools of mitochondrial DNA amplicons extracted from kangaroo meat, genus Macropus. Our experiments indicate that the sequence coverage can be efficiently used to index the short-reads and that we can reassemble the input haplotypes when secondary factors impacting the coverage are controlled. We therefore demonstrate that, by combining our approach with standard barcoding, the cost of the library preparation is reduced to a third.


Assuntos
Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Animais , Mapeamento Cromossômico , Biologia Computacional/métodos , DNA Mitocondrial , Genoma Mitocondrial , Sequenciamento de Nucleotídeos em Larga Escala/economia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Macropodidae/genética , Cadeias de Markov , Análise de Sequência de DNA
10.
Microbiome ; 6(1): 80, 2018 04 27.
Artigo em Inglês | MEDLINE | ID: mdl-29703247

RESUMO

BACKGROUND: Most empirical studies tend to focus on microbiome dynamics within hosts or microbiome compositional differences between hosts over short periods. However, there is still a dearth of formal models that allow us to investigate the observed short-term dynamics of microbiomes under a unified ecological and evolutionary framework. In our previous study, we developed a computational agent-based neutral framework that simulates microbiome dynamics spanning many host generations with the added dimension of a genealogy of hosts. Although this long-term framework revealed interesting microbial diversity patterns under a simple but plausible evolutionary process and provided a platform for future elaboration of more complex systems, it does not allow us to explore microbiome dynamics within a single host generation. METHODS: In this paper, we developed a computational, agent-based, forward-time framework of microbiome dynamics within a single host generation. As we have done under our neutral long-term models, we incorporate neutral processes of environmental microbiome assembly and microbe acquisition from parents and environment. We also incorporate a Moran genealogical model of hosts, so that the dynamics of microbiome evolution can be studied within a single host generation. Furthermore, we allow host subpopulation structure and host migration to affect microbiome recruitment. RESULTS: We show that microbiome diversity within hosts increases monotonically with increases in environmental contribution, while microbiome diversity between hosts increases with increasing parental inheritance. Host population division and dispersal limitation under high host contribution further shaped the patterns by elevating microbiome differences between hosts and depressing microbial diversity within hosts. Microbiome diversity within the whole population showed strong temporal stability regardless of the modes of microbiome acquisition and subpopulation structures. CONCLUSIONS: We present a computational framework that integrates various processes including host genealogy, microbe recruitment, and host dispersal limitation acting on the short-term dynamics of microbiomes. Our framework demonstrates that the neutral dynamics of microbiomes within a population of hosts is strongly influenced by transmission mode and shared environment.


Assuntos
Biologia Computacional/métodos , Simulação por Computador , Interações entre Hospedeiro e Microrganismos/fisiologia , Microbiota/fisiologia , Modelos Biológicos , Evolução Biológica , Humanos
11.
Mitochondrial DNA B Resour ; 3(1): 175-176, 2018 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-33490494

RESUMO

We describe here the first complete genome assembly of the New Zealand green-lipped mussel, Perna canaliculus, mitochondrion. The assembly was performed de novo from a mix of long nanopore sequencing reads and short sequencing reads. The genome is 16,005 bp long. Comparison to other Mytiloidea mitochondrial genomes indicates important gene rearrangements in this family.

12.
Gut Microbes ; 9(3): 202-217, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29182421

RESUMO

Many studies have demonstrated the effects of host diet on gut microbial membership, metagenomics, and fermentation individually; but few have attempted to interpret the relationship among these biological phenomena with respect to host features (e.g. gut morphology). We quantitatively compare the fecal microbial communities, metabolic pathways, and fermentation products associated with the nutritional intake of frugivorous (fruit-eating) and folivorous (leaf-eating) lemurs. Our results provide a uniquely multidimensional and comparative perspective on the adaptive dynamics between host and microbiome. Shotgun metagenomic sequencing revealed significant differential taxonomic and metabolic pathway enrichment, tailored to digest and detoxify different diets. Frugivorous metagenomes feature pathways to degrade simple carbohydrates and host-derived glycosaminoglycans, while folivorous metagenomes are equipped to break down phytic acid and other phytochemical compounds in an anaerobic environment. We used nuclear magnetic resonance based metabolic profiling of fecal samples to link metabolic pathways to fermentation products, confirming that the dissimilar substrates provided in each diet select for specific microbial functions. Fecal samples from frugivorous lemurs contained significantly different profiles of short chain fatty acids, alcohol fermentation products, amino acids, glucose, and glycerol compared to folivorous lemurs. We present the relationships between these datasets as an integrated visual framework, which we refer to as microbial geometry. We use microbial geometry to compare empirical gut microbial profiles across different feeding strategies, and suggest additional utility as a tool for hypothesis-generation.


Assuntos
Dieta , Trato Gastrointestinal/microbiologia , Lemur/microbiologia , Metagenoma , Microbiota/fisiologia , Animais , Bactérias/classificação , Bactérias/genética , Bactérias/metabolismo , Biodiversidade , Fezes/química , Fezes/microbiologia , Métodos de Alimentação/veterinária , Fermentação , Frutas/química , Frutas/metabolismo , Trato Gastrointestinal/fisiologia , Lemur/metabolismo , Redes e Vias Metabólicas , Microbiota/genética , Folhas de Planta/química , Folhas de Planta/metabolismo , Especificidade da Espécie , Strepsirhini/metabolismo , Strepsirhini/microbiologia
13.
Microb Ecol ; 76(1): 272-284, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29188302

RESUMO

Bamboo specialization is one of the most extreme examples of convergent herbivory, yet it is unclear how this specific high-fiber diet might selectively shape the composition of the gut microbiome compared to host phylogeny. To address these questions, we used deep sequencing to investigate the nature and comparative impact of phylogenetic and dietary selection for specific gut microbial membership in three bamboo specialists-the bamboo lemur (Hapalemur griseus, Primates: Lemuridae), giant panda (Ailuropoda melanoleuca, Carnivora: Ursidae), and red panda (Ailurus fulgens, Carnivora: Musteloideadae), as well as two phylogenetic controls-the ringtail lemur (Lemur catta) and the Asian black bear (Ursus thibetanus). We detected significantly higher Shannon diversity in the bamboo lemur (10.029) compared to both the giant panda (8.256; p = 0.0001936) and the red panda (6.484; p = 0.0000029). We also detected significantly enriched bacterial taxa that distinguished each species. Our results complement previous work in finding that phylogeny predominantly governs high-level microbiome community structure. However, we also find that 48 low-abundance OTUs are shared among bamboo specialists, compared to only 8 OTUs shared by the bamboo lemur and its sister species, the ringtail lemur (Lemur catta, a generalist). Our results suggest that deep sequencing is necessary to detect low-abundance bacterial OTUs, which may be specifically adapted to a high-fiber diet. These findings provide a more comprehensive framework for understanding the evolution and ecology of the microbiome as well as the host.


Assuntos
Bactérias/classificação , Bambusa , Dieta , Microbioma Gastrointestinal , Interações entre Hospedeiro e Microrganismos/fisiologia , Primatas/microbiologia , Ailuridae/microbiologia , Ração Animal , Animais , Bactérias/genética , Biodiversidade , DNA Bacteriano/genética , Fezes/microbiologia , Feminino , Herbivoria , Filogenia , RNA Ribossômico 16S/genética , Análise de Sequência de DNA , Especificidade da Espécie , Ursidae/microbiologia
14.
Microbiome ; 5(1): 127, 2017 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-28946894

RESUMO

BACKGROUND: Numerous empirical studies suggest that hosts and microbes exert reciprocal selective effects on their ecological partners. Nonetheless, we still lack an explicit framework to model the dynamics of both hosts and microbes under selection. In a previous study, we developed an agent-based forward-time computational framework to simulate the neutral evolution of host-associated microbial communities in a constant-sized, unstructured population of hosts. These neutral models allowed offspring to sample microbes randomly from parents and/or from the environment. Additionally, the environmental pool of available microbes was constituted by fixed and persistent microbial OTUs and by contributions from host individuals in the preceding generation. METHODS: In this paper, we extend our neutral models to allow selection to operate on both hosts and microbes. We do this by constructing a phenome for each microbial OTU consisting of a sample of traits that influence host and microbial fitnesses independently. Microbial traits can influence the fitness of hosts ("host selection") and the fitness of microbes ("trait-mediated microbial selection"). Additionally, the fitness effects of traits on microbes can be modified by their hosts ("host-mediated microbial selection"). We simulate the effects of these three types of selection, individually or in combination, on microbiome diversities and the fitnesses of hosts and microbes over several thousand generations of hosts. RESULTS: We show that microbiome diversity is strongly influenced by selection acting on microbes. Selection acting on hosts only influences microbiome diversity when there is near-complete direct or indirect parental contribution to the microbiomes of offspring. Unsurprisingly, microbial fitness increases under microbial selection. Interestingly, when host selection operates, host fitness only increases under two conditions: (1) when there is a strong parental contribution to microbial communities or (2) in the absence of a strong parental contribution, when host-mediated selection acts on microbes concomitantly. CONCLUSIONS: We present a computational framework that integrates different selective processes acting on the evolution of microbiomes. Our framework demonstrates that selection acting on microbes can have a strong effect on microbial diversities and fitnesses, whereas selection on hosts can have weaker outcomes.


Assuntos
Simulação por Computador , Evolução Molecular , Microbiota/genética , Bactérias/genética , Bactérias/patogenicidade , Aptidão Genética , Variação Genética , Humanos , Simbiose
16.
Virus Evol ; 2(2): vew023, 2016 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-27774306

RESUMO

Various factors determine the rate at which mutations are generated and fixed in viral genomes. Viral evolutionary rates may vary over the course of a single persistent infection and can reflect changes in replication rates and selective dynamics. Dedicated statistical inference approaches are required to understand how the complex interplay of these processes shapes the genetic diversity and divergence in viral populations. Although evolutionary models accommodating a high degree of complexity can now be formalized, adequately informing these models by potentially sparse data, and assessing the association of the resulting estimates with external predictors, remains a major challenge. In this article, we present a novel Bayesian evolutionary inference method, which integrates multiple potential predictors and tests their association with variation in the absolute rates of synonymous and non-synonymous substitutions along the evolutionary history. We consider clinical and virological measures as predictors, but also changes in population size trajectories that are simultaneously inferred using coalescent modelling. We demonstrate the potential of our method in an application to within-host HIV-1 sequence data sampled throughout the infection of multiple patients. While analyses of individual patient populations lack statistical power, we detect significant evidence for an abrupt drop in non-synonymous rates in late stage infection and a more gradual increase in synonymous rates over the course of infection in a joint analysis across all patients. The former is predicted by the immune relaxation hypothesis while the latter may be in line with increasing replicative fitness during the asymptomatic stage.

17.
PLoS One ; 11(9): e0162454, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27649303

RESUMO

Transposable elements (TEs) are DNA sequences that are able to replicate and move within and between host genomes. Their mechanism of replication is also shared with endogenous retroviruses (ERVs), which are also a type of TE that represent an ancient retroviral infection within animal genomes. Two models have been proposed to explain TE proliferation in host genomes: the strict master model (SMM), and the random template (or transposon) model (TM). In SMM only a single copy of a given TE lineage is able to replicate, and all other genomic copies of TEs are derived from that master copy. In TM, any element of a given family is able to replicate in the host genome. In this paper, we simulated ERV phylogenetic trees under variations of SMM and TM. To test whether current phylogenetic programs can recover the simulated ERV phylogenies, DNA sequence alignments were simulated and maximum likelihood trees were reconstructed and compared to the simulated phylogenies. Results indicate that visual inspection of phylogenetic trees alone can be misleading. However, if a set of statistical summaries is calculated, we are able to distinguish between models with high accuracy by using a data mining algorithm that we introduce here. We also demonstrate the use of our data mining algorithm with empirical data for the porcine endogenous retrovirus (PERV), an ERV that is able to replicate in human and pig cells in vitro.


Assuntos
Simulação por Computador , Elementos de DNA Transponíveis , Retrovirus Endógenos/genética , Modelos Genéticos , Filogenia , Animais , Mineração de Dados , Evolução Molecular , Humanos , Suínos
18.
BMC Bioinformatics ; 16: 357, 2015 Nov 04.
Artigo em Inglês | MEDLINE | ID: mdl-26536860

RESUMO

BACKGROUND: Over the last decade, next generation sequencing (NGS) has become widely available, and is now the sequencing technology of choice for most researchers. Nonetheless, NGS presents a challenge for the evolutionary biologists who wish to estimate evolutionary genetic parameters from a mixed sample of unlabelled or untagged individuals, especially when the reconstruction of full length haplotypes can be unreliable. We propose two novel approaches, least squares estimation (LS) and Approximate Bayesian Computation Markov chain Monte Carlo estimation (ABC-MCMC), to infer evolutionary genetic parameters from a collection of short-read sequences obtained from a mixed sample of anonymous DNA using the frequencies of nucleotides at each site only without reconstructing the full-length alignment nor the phylogeny. RESULTS: We used simulations to evaluate the performance of these algorithms, and our results demonstrate that LS performs poorly because bootstrap 95% Confidence Intervals (CIs) tend to under- or over-estimate the true values of the parameters. In contrast, ABC-MCMC 95% Highest Posterior Density (HPD) intervals recovered from ABC-MCMC enclosed the true parameter values with a rate approximately equivalent to that obtained using BEAST, a program that implements a Bayesian MCMC estimation of evolutionary parameters using full-length sequences. Because there is a loss of information with the use of sitewise nucleotide frequencies alone, the ABC-MCMC 95% HPDs are larger than those obtained by BEAST. CONCLUSION: We propose two novel algorithms to estimate evolutionary genetic parameters based on the proportion of each nucleotide. The LS method cannot be recommended as a standalone method for evolutionary parameter estimation. On the other hand, parameters recovered by ABC-MCMC are comparable to those obtained using BEAST, but with larger 95% HPDs. One major advantage of ABC-MCMC is that computational time scales linearly with the number of short-read sequences, and is independent of the number of full-length sequences in the original data. This allows us to perform the analysis on NGS datasets with large numbers of short read fragments. The source code for ABC-MCMC is available at https://github.com/stevenhwu/SF-ABC.


Assuntos
Evolução Molecular , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Sequência de Bases , Teorema de Bayes , Simulação por Computador , Intervalos de Confiança , Humanos , Análise dos Mínimos Quadrados , Cadeias de Markov , Método de Monte Carlo , Densidade Demográfica
19.
PLoS Comput Biol ; 11(7): e1004365, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-26200800

RESUMO

There has been an explosion of research on host-associated microbial communities (i.e.,microbiomes). Much of this research has focused on surveys of microbial diversities across a variety of host species, including humans, with a view to understanding how these microbiomes are distributed across space and time, and how they correlate with host health, disease, phenotype, physiology and ecology. Fewer studies have focused on how these microbiomes may have evolved. In this paper, we develop an agent-based framework to study the dynamics of microbiome evolution. Our framework incorporates neutral models of how hosts acquire their microbiomes, and how the environmental microbial community that is available to the hosts is assembled. Most importantly, our framework also incorporates a Wright-Fisher genealogical model of hosts, so that the dynamics of microbiome evolution is studied on an evolutionary timescale. Our results indicate that the extent of parental contribution to microbial availability from one generation to the next significantly impacts the diversity of microbiomes: the greater the parental contribution, the less diverse the microbiomes. In contrast, even when there is only a very small contribution from a constant environmental pool, microbial communities can remain highly diverse. Finally, we show that our models may be used to construct hypotheses about the types of processes that operate to assemble microbiomes over evolutionary time.


Assuntos
Evolução Biológica , Ecossistema , Variação Genética/genética , Especificidade de Hospedeiro/genética , Microbiota/genética , Modelos Genéticos , Simulação por Computador
20.
PLoS One ; 10(5): e0124618, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25970595

RESUMO

Host fitness is impacted by trillions of bacteria in the gastrointestinal tract that facilitate development and are inextricably tied to life history. During development, microbial colonization primes the gut metabolism and physiology, thereby setting the stage for adult nutrition and health. However, the ecological rules governing microbial succession are poorly understood. In this study, we examined the relationship between host lineage, captive diet, and life stage and gut microbiota characteristics in three primate species (infraorder, Lemuriformes). Fecal samples were collected from captive lemur mothers and their infants, from birth to weaning. Microbial DNA was extracted and the v4 region of 16S rDNA was sequenced on the Illumina platform using protocols from the Earth Microbiome Project. Here, we show that colonization proceeds along different successional trajectories in developing infants from species with differing dietary regimes and ecological profiles: frugivorous (fruit-eating) Varecia variegata, generalist Lemur catta, and folivorous (leaf-eating) Propithecus coquereli. Our analyses reveal community membership and succession patterns consistent with previous studies of human infants, suggesting that lemurs may serve as a useful model of microbial ecology in the primate gut. Each lemur species exhibits distinct species-specific bacterial diversity signatures correlating to life stages and life history traits, implying that gut microbial community assembly primes developing infants at species-specific rates for their respective adult feeding strategies.


Assuntos
DNA Bacteriano/genética , Microbioma Gastrointestinal/genética , Lemur/microbiologia , Lemuridae/microbiologia , Strepsirhini/microbiologia , Animais , Animais Recém-Nascidos , DNA Bacteriano/classificação , Dieta , Fezes/microbiologia , Feminino , Frutas/química , Trato Gastrointestinal/crescimento & desenvolvimento , Trato Gastrointestinal/microbiologia , Trato Gastrointestinal/fisiologia , Lemur/crescimento & desenvolvimento , Lemur/fisiologia , Lemuridae/crescimento & desenvolvimento , Lemuridae/fisiologia , Masculino , Anotação de Sequência Molecular , Filogenia , Folhas de Planta/química , RNA Ribossômico 16S/genética , Análise de Sequência de DNA , Especificidade da Espécie , Strepsirhini/crescimento & desenvolvimento , Strepsirhini/fisiologia , Simbiose/fisiologia , Desmame
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...