Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
PLoS Biol ; 22(3): e3002507, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38451924

ABSTRACT

While the malaria parasite Plasmodium falciparum has low average genome-wide diversity levels, likely due to its recent introduction from a gorilla-infecting ancestor (approximately 10,000 to 50,000 years ago), some genes display extremely high diversity levels. In particular, certain proteins expressed on the surface of human red blood cell-infecting merozoites (merozoite surface proteins (MSPs)) possess exactly 2 deeply diverged lineages that have seemingly not recombined. While of considerable interest, the evolutionary origin of this phenomenon remains unknown. In this study, we analysed the genetic diversity of 2 of the most variable MSPs, DBLMSP and DBLMSP2, which are paralogs (descended from an ancestral duplication). Despite thousands of available Illumina WGS datasets from malaria-endemic countries, diversity in these genes has been hard to characterise as reads containing highly diverged alleles completely fail to align to the reference genome. To solve this, we developed a pipeline leveraging genome graphs, enabling us to genotype them at high accuracy and completeness. Using our newly- resolved sequences, we found that both genes exhibit 2 deeply diverged lineages in a specific protein domain (DBL) and that one of the 2 lineages is shared across the genes. We identified clear evidence of nonallelic gene conversion between the 2 genes as the likely mechanism behind sharing, leading us to propose that gene conversion between diverged paralogs, and not recombination suppression, can generate this surprising genealogy; a model that is furthermore consistent with high diversity levels in these 2 genes despite the strong historical P. falciparum transmission bottleneck.


Subject(s)
Hominidae , Malaria, Falciparum , Malaria , Parasites , Animals , Humans , Plasmodium falciparum/metabolism , Parasites/metabolism , Gene Conversion , Antigens, Surface , Malaria/parasitology , Protozoan Proteins/genetics , Protozoan Proteins/metabolism , Genetic Variation
2.
Genome Biol ; 23(1): 147, 2022 07 05.
Article in English | MEDLINE | ID: mdl-35791022

ABSTRACT

There are many short-read variant-calling tools, with different strengths and weaknesses. We present a tool, Minos, which combines outputs from arbitrary variant callers, increasing recall without loss of precision. We benchmark on 62 samples from three bacterial species and an outbreak of 385 Mycobacterium tuberculosis samples. Minos also enables joint genotyping; we demonstrate on a large (N=13k) M. tuberculosis cohort, building a map of non-synonymous SNPs and indels in a region where all such variants are assumed to cause rifampicin resistance. We quantify the correlation with phenotypic resistance and then replicate in a second cohort (N=10k).


Subject(s)
High-Throughput Nucleotide Sequencing , Mycobacterium tuberculosis , Genome, Bacterial , Genotype , Humans , INDEL Mutation , Mycobacterium tuberculosis/genetics , Polymorphism, Single Nucleotide
3.
Genome Biol ; 22(1): 259, 2021 09 06.
Article in English | MEDLINE | ID: mdl-34488837

ABSTRACT

Genome graphs allow very general representations of genetic variation; depending on the model and implementation, variation at different length-scales (single nucleotide polymorphisms (SNPs), structural variants) and on different sequence backgrounds can be incorporated with different levels of transparency. We implement a model which handles this multiscale variation and develop a JSON extension of VCF (jVCF) allowing for variant calls on multiple references, both implemented in our software gramtools. We find gramtools outperforms existing methods for genotyping SNPs overlapping large deletions in M. tuberculosis and is able to genotype on multiple alternate backgrounds in P. falciparum, revealing previously hidden recombination.


Subject(s)
Algorithms , Genetic Variation , Genome, Human , Alleles , Antigens, Surface/metabolism , Computer Simulation , Genotyping Techniques , Haplotypes/genetics , Humans , Mycobacterium tuberculosis/genetics , Plasmodium falciparum/genetics , Polymorphism, Single Nucleotide/genetics , Reproducibility of Results , Sequence Deletion
4.
Genome Biol ; 22(1): 267, 2021 09 14.
Article in English | MEDLINE | ID: mdl-34521456

ABSTRACT

We present pandora, a novel pan-genome graph structure and algorithms for identifying variants across the full bacterial pan-genome. As much bacterial adaptability hinges on the accessory genome, methods which analyze SNPs in just the core genome have unsatisfactory limitations. Pandora approximates a sequenced genome as a recombinant of references, detects novel variation and pan-genotypes multiple samples. Using a reference graph of 578 Escherichia coli genomes, we compare 20 diverse isolates. Pandora recovers more rare SNPs than single-reference-based tools, is significantly better than picking the closest RefSeq reference, and provides a stable framework for analyzing diverse samples without reference bias.


Subject(s)
Genome, Bacterial , Genomics/methods , Software , Algorithms , Escherichia coli/genetics , Genetic Variation , High-Throughput Nucleotide Sequencing , Nanopore Sequencing , Nucleotides , Sequence Alignment , Sequence Analysis, DNA
5.
F1000Res ; 10: 33, 2021.
Article in English | MEDLINE | ID: mdl-34035898

ABSTRACT

Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.


Subject(s)
Data Analysis , Software , Reproducibility of Results , Workflow
SELECTION OF CITATIONS
SEARCH DETAIL
...