Pesquisa | Portal Regional da BVS

1.

Chromosome-Level Assembly and Annotation of the Pearly Heath Coenonympha arcania Butterfly Genome.

Legeai, Fabrice; Romain, Sandra; Capblancq, Thibaut; Doniol-Valcroze, Paul; Joron, Mathieu; Lemaitre, Claire; Després, Laurence.

Genome Biol Evol ; 16(3)2024 03 02.

Artigo em Inglês | MEDLINE | ID: mdl-38491969

RESUMO

We present the first chromosome-level genome assembly and annotation of the pearly heath Coenonympha arcania, generated with a PacBio HiFi sequencing approach and complemented with Hi-C data. We additionally compare synteny, gene, and repeat content between C. arcania and other Lepidopteran genomes. This reference genome will enable future population genomics studies with Coenonympha butterflies, a species-rich genus that encompasses some of the most highly endangered butterfly taxa in Europe.

Assuntos

Borboletas , Animais , Borboletas/genética , Genoma , Cromossomos/genética , Sintenia , Europa (Continente) , Anotação de Sequência Molecular

2.

MTG-Link: leveraging barcode information from linked-reads to assemble specific loci.

Guichard, Anne; Legeai, Fabrice; Tagu, Denis; Lemaitre, Claire.

BMC Bioinformatics ; 24(1): 284, 2023 Jul 14.

Artigo em Inglês | MEDLINE | ID: mdl-37452278

RESUMO

BACKGROUND: Local assembly with short and long reads has proven to be very useful in many applications: reconstruction of the sequence of a locus of interest, gap-filling in draft assemblies, as well as alternative allele reconstruction of large Structural Variants. Whereas linked-read technologies have a great potential to assemble specific loci as they provide long-range information while maintaining the power and accuracy of short-read sequencing, there is a lack of local assembly tools for linked-read data. RESULTS: We present MTG-Link, a novel local assembly tool dedicated to linked-reads. The originality of the method lies in its read subsampling step which takes advantage of the barcode information contained in linked-reads mapped in flanking regions. We validated our approach on several datasets from different linked-read technologies. We show that MTG-Link is able to assemble successfully large sequences, up to dozens of Kb. We also demonstrate that the read subsampling step of MTG-Link considerably improves the local assembly of specific loci compared to other existing short-read local assembly tools. Furthermore, MTG-Link was able to fully characterize large insertion variants and deletion breakpoints in a human genome and to reconstruct dark regions in clinically-relevant human genes. It also improved the contiguity of a 1.3 Mb locus of biological interest in several individual genomes of the mimetic butterfly Heliconius numata. CONCLUSIONS: MTG-Link is an efficient local assembly tool designed for different linked-read sequencing technologies. MTG-Link source code is available at https://github.com/anne-gcd/MTG-Link and as a Bioconda package.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Software , Humanos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genoma Humano

3.

SVJedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph.

Romain, Sandra; Lemaitre, Claire.

Bioinformatics ; 39(39 Suppl 1): i270-i278, 2023 06 30.

Artigo em Inglês | MEDLINE | ID: mdl-37387169

RESUMO

MOTIVATION: Structural variation (SV) is a class of genetic diversity whose importance is increasingly revealed by genome resequencing, especially with long-read technologies. One crucial problem when analyzing and comparing SVs in several individuals is their accurate genotyping, that is determining whether a described SV is present or absent in one sequenced individual, and if present, in how many copies. There are only a few methods dedicated to SV genotyping with long-read data, and all either suffer of a bias toward the reference allele by not representing equally all alleles, or have difficulties genotyping close or overlapping SVs due to a linear representation of the alleles. RESULTS: We present SVJedi-graph, a novel method for SV genotyping that relies on a variation graph to represent in a single data structure all alleles of a set of SVs. The long reads are mapped on the variation graph and the resulting alignments that cover allele-specific edges in the graph are used to estimate the most likely genotype for each SV. Running SVJedi-graph on simulated sets of close and overlapping deletions showed that this graph model prevents the bias toward the reference alleles and allows maintaining high genotyping accuracy whatever the SV proximity, contrary to other state of the art genotypers. On the human gold standard HG002 dataset, SVJedi-graph obtained the best performances, genotyping 99.5% of the high confidence SV callset with an accuracy of 95% in less than 30 min. AVAILABILITY AND IMPLEMENTATION: SVJedi-graph is distributed under an AGPL license and available on GitHub at https://github.com/SandraLouise/SVJedi-graph and as a BioConda package.

Assuntos

Genótipo , Humanos , Alelos , Análise de Sequência de DNA

4.

First chromosome scale genomes of ithomiine butterflies (Nymphalidae: Ithomiini): Comparative models for mimicry genetic studies.

Gauthier, Jérémy; Meier, Joana; Legeai, Fabrice; McClure, Melanie; Whibley, Annabel; Bretaudeau, Anthony; Boulain, Hélène; Parrinello, Hugues; Mugford, Sam T; Durbin, Richard; Zhou, Chenxi; McCarthy, Shane; Wheat, Christopher W; Piron-Prunier, Florence; Monsempes, Christelle; François, Marie-Christine; Jay, Paul; Noûs, Camille; Persyn, Emma; Jacquin-Joly, Emmanuelle; Meslin, Camille; Montagné, Nicolas; Lemaitre, Claire; Elias, Marianne.

Mol Ecol Resour ; 23(4): 872-885, 2023 May.

Artigo em Inglês | MEDLINE | ID: mdl-36533297

RESUMO

The ithomiine butterflies (Nymphalidae: Danainae) represent the largest known radiation of Müllerian mimetic butterflies. They dominate by number the mimetic butterfly communities, which include species such as the iconic neotropical Heliconius genus. Recent studies on the ecology and genetics of speciation in Ithomiini have suggested that sexual pheromones, colour pattern and perhaps hostplant could drive reproductive isolation. However, no reference genome was available for Ithomiini, which has hindered further exploration on the genetic architecture of these candidate traits, and more generally on the genomic patterns of divergence. Here, we generated high-quality, chromosome-scale genome assemblies for two Melinaea species, M. marsaeus and M. menophilus, and a draft genome of the species Ithomia salapia. We obtained genomes with a size ranging from 396 to 503 Mb across the three species and scaffold N50 of 40.5 and 23.2 Mb for the two chromosome-scale assemblies. Using collinearity analyses we identified massive rearrangements between the two closely related Melinaea species. An annotation of transposable elements and gene content was performed, as well as a specialist annotation to target chemosensory genes, which is crucial for host plant detection and mate recognition in mimetic species. A comparative genomic approach revealed independent gene expansions in ithomiines and particularly in gustatory receptor genes. These first three genomes of ithomiine mimetic butterflies constitute a valuable addition and a welcome comparison to existing biological models such as Heliconius, and will enable further understanding of the mechanisms of adaptation in butterflies.

Assuntos

Borboletas , Animais , Borboletas/genética , Adaptação Fisiológica , Fenótipo , Genômica , Cromossomos/genética

5.

Genomic evidence for global ocean plankton biogeography shaped by large-scale current systems.

Richter, Daniel J; Watteaux, Romain; Vannier, Thomas; Leconte, Jade; Frémont, Paul; Reygondeau, Gabriel; Maillet, Nicolas; Henry, Nicolas; Benoit, Gaëtan; Da Silva, Ophélie; Delmont, Tom O; Fernàndez-Guerra, Antonio; Suweis, Samir; Narci, Romain; Berney, Cédric; Eveillard, Damien; Gavory, Frederick; Guidi, Lionel; Labadie, Karine; Mahieu, Eric; Poulain, Julie; Romac, Sarah; Roux, Simon; Dimier, Céline; Kandels, Stefanie; Picheral, Marc; Searson, Sarah; Pesant, Stéphane; Aury, Jean-Marc; Brum, Jennifer R; Lemaitre, Claire; Pelletier, Eric; Bork, Peer; Sunagawa, Shinichi; Lombard, Fabien; Karp-Boss, Lee; Bowler, Chris; Sullivan, Matthew B; Karsenti, Eric; Mariadassou, Mahendra; Probert, Ian; Peterlongo, Pierre; Wincker, Patrick; de Vargas, Colomban; Ribera d'Alcalà, Maurizio; Iudicone, Daniele; Jaillon, Olivier.

Elife ; 112022 08 03.

Artigo em Inglês | MEDLINE | ID: mdl-35920817

RESUMO

Biogeographical studies have traditionally focused on readily visible organisms, but recent technological advances are enabling analyses of the large-scale distribution of microscopic organisms, whose biogeographical patterns have long been debated. Here we assessed the global structure of plankton geography and its relation to the biological, chemical, and physical context of the ocean (the 'seascape') by analyzing metagenomes of plankton communities sampled across oceans during the Tara Oceans expedition, in light of environmental data and ocean current transport. Using a consistent approach across organismal sizes that provides unprecedented resolution to measure changes in genomic composition between communities, we report a pan-ocean, size-dependent plankton biogeography overlying regional heterogeneity. We found robust evidence for a basin-scale impact of transport by ocean currents on plankton biogeography, and on a characteristic timescale of community dynamics going beyond simple seasonality or life history transitions of plankton.

Oceans are brimming with life invisible to our eyes, a myriad of species of bacteria, viruses and other microscopic organisms essential for the health of the planet. These 'marine plankton' are unable to swim against currents and should therefore be constantly on the move, yet previous studies have suggested that distinct species of plankton may in fact inhabit different oceanic regions. However, proving this theory has been challenging; collecting plankton is logistically difficult, and it is often impossible to distinguish between species simply by examining them under a microscope. However, within the last decade, a research schooner called Tara has travelled the globe to gather thousands of plankton samples. At the same time, advances in genomics have made it possible to identify species based only on fragments of their DNA sequence. To understand the hidden geography of plankton communities in Earth's oceans, Richter et al. pored over DNA from the Tara Oceans expedition. This revealed that, despite being unable to resist the flow of water, various planktonic species which live close to the surface manage to occupy distinct, stable provinces shaped by currents. Different sizes of plankton are distributed in different sized provinces, with the smallest organisms tending to inhabit the smallest areas. Comparing DNA similarities and speeds of currents at the ocean surface revealed how these might stretch and mix plankton communities. Plankton play a critical role in the health of the ocean and the chemical cycles of planet Earth. These results could allow deeper investigation by marine modellers, ecologists, and evolutionary biologists. Meanwhile, work is already underway to investigate how climate change might impact this hidden geography.

Assuntos

Ecossistema , Plâncton , Genômica , Geografia , Oceanos e Mares , Plâncton/genética

6.

Critical Assessment of Metagenome Interpretation: the second round of challenges.

Meyer, Fernando; Fritz, Adrian; Deng, Zhi-Luo; Koslicki, David; Lesker, Till Robin; Gurevich, Alexey; Robertson, Gary; Alser, Mohammed; Antipov, Dmitry; Beghini, Francesco; Bertrand, Denis; Brito, Jaqueline J; Brown, C Titus; Buchmann, Jan; Buluç, Aydin; Chen, Bo; Chikhi, Rayan; Clausen, Philip T L C; Cristian, Alexandru; Dabrowski, Piotr Wojciech; Darling, Aaron E; Egan, Rob; Eskin, Eleazar; Georganas, Evangelos; Goltsman, Eugene; Gray, Melissa A; Hansen, Lars Hestbjerg; Hofmeyr, Steven; Huang, Pingqin; Irber, Luiz; Jia, Huijue; Jørgensen, Tue Sparholt; Kieser, Silas D; Klemetsen, Terje; Kola, Axel; Kolmogorov, Mikhail; Korobeynikov, Anton; Kwan, Jason; LaPierre, Nathan; Lemaitre, Claire; Li, Chenhao; Limasset, Antoine; Malcher-Miranda, Fabio; Mangul, Serghei; Marcelino, Vanessa R; Marchet, Camille; Marijon, Pierre; Meleshko, Dmitry; Mende, Daniel R; Milanese, Alessio.

Nat Methods ; 19(4): 429-440, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-35396482

RESUMO

Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses.

Assuntos

Metagenoma , Metagenômica , Archaea/genética , Metagenômica/métodos , Reprodutibilidade dos Testes , Análise de Sequência de DNA , Software

7.

LRez: a C++ API and toolkit for analyzing and managing Linked-Reads data.

Morisse, Pierre; Lemaitre, Claire; Legeai, Fabrice.

Bioinform Adv ; 1(1): vbab022, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-36700107

RESUMO

Motivation: Linked-Reads technologies combine both the high quality and low cost of short-reads sequencing and long-range information, through the use of barcodes tagging reads which originate from a common long DNA molecule. This technology has been employed in a broad range of applications including genome assembly, phasing and scaffolding, as well as structural variant calling. However, to date, no tool or API dedicated to the manipulation of Linked-Reads data exist. Results: We introduce LRez, a C++ API and toolkit that allows easy management of Linked-Reads data. LRez includes various functionalities, for computing numbers of common barcodes between genomic regions, extracting barcodes from BAM files, as well as indexing and querying BAM, FASTQ and gzipped FASTQ files to quickly fetch all reads or alignments containing a given barcode. LRez is compatible with a wide range of Linked-Reads sequencing technologies, and can thus be used in any tool or pipeline requiring barcode processing or indexing, in order to improve their performances. Availability and implementation: LRez is implemented in C++, supported on Unix-based platforms and available under AGPL-3.0 License at https://github.com/morispi/LRez, and as a bioconda module. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

8.

Towards a better understanding of the low recall of insertion variants with short-read based variant callers.

Delage, Wesley J; Thevenon, Julien; Lemaitre, Claire.

BMC Genomics ; 21(1): 762, 2020 Nov 04.

Artigo em Inglês | MEDLINE | ID: mdl-33148192

RESUMO

BACKGROUND: Since 2009, numerous tools have been developed to detect structural variants using short read technologies. Insertions >50 bp are one of the hardest type to discover and are drastically underrepresented in gold standard variant callsets. The advent of long read technologies has completely changed the situation. In 2019, two independent cross technologies studies have published the most complete variant callsets with sequence resolved insertions in human individuals. Among the reported insertions, only 17 to 28% could be discovered with short-read based tools. RESULTS: In this work, we performed an in-depth analysis of these unprecedented insertion callsets in order to investigate the causes of such failures. We have first established a precise classification of insertion variants according to four layers of characterization: the nature and size of the inserted sequence, the genomic context of the insertion site and the breakpoint junction complexity. Because these levels are intertwined, we then used simulations to characterize the impact of each complexity factor on the recall of several structural variant callers. We showed that most reported insertions exhibited characteristics that may interfere with their discovery: 63% were tandem repeat expansions, 38% contained homology larger than 10 bp within their breakpoint junctions and 70% were located in simple repeats. Consequently, the recall of short-read based variant callers was significantly lower for such insertions (6% for tandem repeats vs 56% for mobile element insertions). Simulations showed that the most impacting factor was the insertion type rather than the genomic context, with various difficulties being handled differently among the tested structural variant callers, and they highlighted the lack of sequence resolution for most insertion calls. CONCLUSIONS: Our results explain the low recall by pointing out several difficulty factors among the observed insertion features and provide avenues for improving SV caller algorithms and their combinations.

Assuntos

Genoma , Genômica , Algoritmos , Sequência de Bases , Humanos , Análise de Sequência , Análise de Sequência de DNA

9.

Genomic architecture of endogenous ichnoviruses reveals distinct evolutionary pathways leading to virus domestication in parasitic wasps.

Legeai, Fabrice; Santos, Bernardo F; Robin, Stéphanie; Bretaudeau, Anthony; Dikow, Rebecca B; Lemaitre, Claire; Jouan, Véronique; Ravallec, Marc; Drezen, Jean-Michel; Tagu, Denis; Baudat, Frédéric; Gyapay, Gabor; Zhou, Xin; Liu, Shanlin; Webb, Bruce A; Brady, Seán G; Volkoff, Anne-Nathalie.

BMC Biol ; 18(1): 89, 2020 07 24.

Artigo em Inglês | MEDLINE | ID: mdl-32703219

RESUMO

BACKGROUND: Polydnaviruses (PDVs) are mutualistic endogenous viruses inoculated by some lineages of parasitoid wasps into their hosts, where they facilitate successful wasp development. PDVs include the ichnoviruses and bracoviruses that originate from independent viral acquisitions in ichneumonid and braconid wasps respectively. PDV genomes are fully incorporated into the wasp genomes and consist of (1) genes involved in viral particle production, which derive from the viral ancestor and are not encapsidated, and (2) proviral segments harboring virulence genes, which are packaged into the viral particle. To help elucidating the mechanisms that have facilitated viral domestication in ichneumonid wasps, we analyzed the structure of the viral insertions by sequencing the whole genome of two ichnovirus-carrying wasp species, Hyposoter didymator and Campoletis sonorensis. RESULTS: Assemblies with long scaffold sizes allowed us to unravel the organization of the endogenous ichnovirus and revealed considerable dispersion of the viral loci within the wasp genomes. Proviral segments contained species-specific sets of genes and occupied distinct genomic locations in the two ichneumonid wasps. In contrast, viral machinery genes were organized in clusters showing highly conserved gene content and order, with some loci located in collinear wasp genomic regions. This genomic architecture clearly differs from the organization of PDVs in braconid wasps, in which proviral segments are clustered and viral machinery elements are more dispersed. CONCLUSIONS: The contrasting structures of the two types of ichnovirus genomic elements are consistent with their different functions: proviral segments are vehicles for virulence proteins expected to adapt according to different host defense systems, whereas the genes involved in virus particle production in the wasp are likely more stable and may reflect ancestral viral architecture. The distinct genomic architectures seen in ichnoviruses versus bracoviruses reveal different evolutionary trajectories that have led to virus domestication in the two wasp lineages.

Assuntos

Evolução Molecular , Genoma Viral , Interações entre Hospedeiro e Microrganismos , Polydnaviridae/genética , Vespas/virologia , Animais , Especificidade da Espécie , Sequenciamento Completo do Genoma

10.

DiscoSnp-RAD: de novo detection of small variants for RAD-Seq population genomics.

Gauthier, Jérémy; Mouden, Charlotte; Suchan, Tomasz; Alvarez, Nadir; Arrigo, Nils; Riou, Chloé; Lemaitre, Claire; Peterlongo, Pierre.

PeerJ ; 8: e9291, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32566401

RESUMO

Restriction site Associated DNA Sequencing (RAD-Seq) is a technique characterized by the sequencing of specific loci along the genome that is widely employed in the field of evolutionary biology since it allows to exploit variants (mainly Single Nucleotide Polymorphism-SNPs) information from entire populations at a reduced cost. Common RAD dedicated tools, such as STACKS or IPyRAD, are based on all-vs-all read alignments, which require consequent time and computing resources. We present an original method, DiscoSnp-RAD, that avoids this pitfall since variants are detected by exploiting specific parts of the assembly graph built from the reads, hence preventing all-vs-all read alignments. We tested the implementation on simulated datasets of increasing size, up to 1,000 samples, and on real RAD-Seq data from 259 specimens of Chiastocheta flies, morphologically assigned to seven species. All individuals were successfully assigned to their species using both STRUCTURE and Maximum Likelihood phylogenetic reconstruction. Moreover, identified variants succeeded to reveal a within-species genetic structure linked to the geographic distribution. Furthermore, our results show that DiscoSnp-RAD is significantly faster than state-of-the-art tools. The overall results show that DiscoSnp-RAD is suitable to identify variants from RAD-Seq data, it does not require time-consuming parameterization steps and it stands out from other tools due to its completely different principle, making it substantially faster, in particular on large datasets.

11.

SVJedi: genotyping structural variations with long reads.

Lecompte, Lolita; Peterlongo, Pierre; Lavenier, Dominique; Lemaitre, Claire.

Bioinformatics ; 36(17): 4568-4575, 2020 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-32437523

RESUMO

MOTIVATION: Studies on structural variants (SVs) are expanding rapidly. As a result, and thanks to third generation sequencing technologies, the number of discovered SVs is increasing, especially in the human genome. At the same time, for several applications such as clinical diagnoses, it is important to genotype newly sequenced individuals on well-defined and characterized SVs. Whereas several SV genotypers have been developed for short read data, there is a lack of such dedicated tool to assess whether known SVs are present or not in a new long read sequenced sample, such as the one produced by Pacific Biosciences or Oxford Nanopore Technologies. RESULTS: We present a novel method to genotype known SVs from long read sequencing data. The method is based on the generation of a set of representative allele sequences that represent the two alleles of each structural variant. Long reads are aligned to these allele sequences. Alignments are then analyzed and filtered out to keep only informative ones, to quantify and estimate the presence of each SV allele and the allele frequencies. We provide an implementation of the method, SVJedi, to genotype SVs with long reads. The tool has been applied to both simulated and real human datasets and achieves high genotyping accuracy. We show that SVJedi obtains better performances than other existing long read genotyping tools and we also demonstrate that SV genotyping is considerably improved with SVJedi compared to other approaches, namely SV discovery and short read SV genotyping approaches. AVAILABILITY AND IMPLEMENTATION: https://github.com/llecompte/SVJedi.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genoma Humano , Software , Variação Estrutural do Genoma , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA

12.

Contrasting genomic and phenotypic outcomes of hybridization between pairs of mimetic butterfly taxa across a suture zone.

Gauthier, Jérémy; de Silva, Donna Lisa; Gompert, Zachariah; Whibley, Annabel; Houssin, Céline; Le Poul, Yann; McClure, Melanie; Lemaitre, Claire; Legeai, Fabrice; Mallet, James; Elias, Marianne.

Mol Ecol ; 29(7): 1328-1343, 2020 04.

Artigo em Inglês | MEDLINE | ID: mdl-32145112

RESUMO

Hybrid zones, whereby divergent lineages come into contact and eventually hybridize, can provide insights on the mechanisms involved in population differentiation and reproductive isolation, and ultimately speciation. Suture zones offer the opportunity to compare these processes across multiple species. In this paper we use reduced-complexity genomic data to compare the genetic and phenotypic structure and hybridization patterns of two mimetic butterfly species, Ithomia salapia and Oleria onega (Nymphalidae: Ithomiini), each consisting of a pair of lineages differentiated for their wing colour pattern and that come into contact in the Andean foothills of Peru. Despite similarities in their life history, we highlight major differences, both at the genomic and phenotypic level, between the two species. These differences include the presence of hybrids, variations in wing phenotype, and genomic patterns of introgression and differentiation. In I. salapia, the two lineages appear to hybridize only rarely, whereas in O. onega the hybrids are not only more common, but also genetically and phenotypically more variable. We also detected loci statistically associated with wing colour pattern variation, but in both species these loci were not over-represented among the candidate barrier loci, suggesting that traits other than wing colour pattern may be important for reproductive isolation. Our results contrast with the genomic patterns observed between hybridizing lineages in the mimetic Heliconius butterflies, and call for a broader investigation into the genomics of speciation in Ithomiini - the largest radiation of mimetic butterflies.

Assuntos

Borboletas/genética , Genética Populacional , Hibridização Genética , Animais , Borboletas/classificação , Especiação Genética , Genoma de Inseto , Genótipo , Peru , Fenótipo , Polimorfismo de Nucleotídeo Único , Isolamento Reprodutivo , Asas de Animais/anatomia & histologia

13.

MinYS: mine your symbiont by targeted genome assembly in symbiotic communities.

Guyomar, Cervin; Delage, Wesley; Legeai, Fabrice; Mougel, Christophe; Simon, Jean-Christophe; Lemaitre, Claire.

NAR Genom Bioinform ; 2(3): lqaa047, 2020 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-33575599

RESUMO

Most metazoans are associated with symbionts. Characterizing the effect of a particular symbiont often requires getting access to its genome, which is usually done by sequencing the whole community. We present MinYS, a targeted assembly approach to assemble a particular genome of interest from such metagenomic data. First, taking advantage of a reference genome, a subset of the reads is assembled into a set of backbone contigs. Then, this draft assembly is completed using the whole metagenomic readset in a de novo manner. The resulting assembly is output as a genome graph, enabling different strains with potential structural variants coexisting in the sample to be distinguished. MinYS was applied to 50 pea aphid resequencing samples, with variable diversity in symbiont communities, in order to recover the genome sequence of its obligatory bacterial symbiont, Buchnera aphidicola. It was able to return high-quality assemblies (one contig assembly in 90% of the samples), even when using increasingly distant reference genomes, and to retrieve large structural variations in the samples. Because of its targeted essence, it outperformed standard metagenomic assemblers in terms of both time and assembly quality.

14.

SimkaMin: fast and resource frugal de novo comparative metagenomics.

Benoit, Gaëtan; Mariadassou, Mahendra; Robin, Stéphane; Schbath, Sophie; Peterlongo, Pierre; Lemaitre, Claire.

Bioinformatics ; 36(4): 1275-1276, 2020 02 15.

Artigo em Inglês | MEDLINE | ID: mdl-31504187

RESUMO

MOTIVATION: De novo comparative metagenomics is one of the most straightforward ways to analyze large sets of metagenomic data. Latest methods use the fraction of shared k-mers to estimate genomic similarity between read sets. However, those methods, while extremely efficient, are still limited by computational needs for practical usage outside of large computing facilities. RESULTS: We present SimkaMin, a quick comparative metagenomics tool with low disk and memory footprints, thanks to an efficient data subsampling scheme used to estimate Bray-Curtis and Jaccard dissimilarities. One billion metagenomic reads can be analyzed in <3 min, with tiny memory (1.09 GB) and disk (≈0.3 GB) requirements and without altering the quality of the downstream comparative analyses, making of SimkaMin a tool perfectly tailored for very large-scale metagenomic projects. AVAILABILITY AND IMPLEMENTATION: https://github.com/GATB/simka. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Metagenômica , Software , Algoritmos , Genômica , Metagenoma , Análise de Sequência de DNA

15.

Multi-scale characterization of symbiont diversity in the pea aphid complex through metagenomic approaches.

Guyomar, Cervin; Legeai, Fabrice; Jousselin, Emmanuelle; Mougel, Christophe; Lemaitre, Claire; Simon, Jean-Christophe.

Microbiome ; 6(1): 181, 2018 10 10.

Artigo em Inglês | MEDLINE | ID: mdl-30305166

RESUMO

BACKGROUND: Most metazoans are involved in durable relationships with microbes which can take several forms, from mutualism to parasitism. The advances of NGS technologies and bioinformatics tools have opened opportunities to shed light on the diversity of microbial communities and to give some insights into the functions they perform in a broad array of hosts. The pea aphid is a model system for the study of insect-bacteria symbiosis. It is organized in a complex of biotypes, each adapted to specific host plants. It harbors both an obligatory symbiont supplying key nutrients and several facultative symbionts bringing additional functions to the host, such as protection against biotic and abiotic stresses. However, little is known on how the symbiont genomic diversity is structured at different scales: across host biotypes, among individuals of the same biotype, or within individual aphids, which limits our understanding on how these multi-partner symbioses evolve and interact. RESULTS: We present a framework well adapted to the study of genomic diversity and evolutionary dynamics of the pea aphid holobiont from metagenomic read sets, based on mapping to reference genomes and whole genome variant calling. Our results revealed that the pea aphid microbiota is dominated by a few heritable bacterial symbionts reported in earlier works, with no discovery of new microbial associates. However, we detected a large and heterogeneous genotypic diversity associated with the different symbionts of the pea aphid. Partitioning analysis showed that this fine resolution diversity is distributed across the three considered scales. Phylogenetic analyses highlighted frequent horizontal transfers of facultative symbionts between host lineages, indicative of flexible associations between the pea aphid and its microbiota. However, the evolutionary dynamics of symbiotic associations strongly varied depending on the symbiont, reflecting different histories and possible constraints. In addition, at the intra-host scale, we showed that different symbiont strains may coexist inside the same aphid host. CONCLUSIONS: We present a methodological framework for the detailed analysis of NGS data from microbial communities of moderate complexity and gave major insights into the extent of diversity in pea aphid-symbiont associations and the range of evolutionary trajectories they could take.

Assuntos

Afídeos/microbiologia , Buchnera/isolamento & purificação , Microbiota/genética , Rickettsia/isolamento & purificação , Simbiose/fisiologia , Animais , Buchnera/classificação , Buchnera/genética , Genoma Bacteriano/genética , Metagenoma/genética , Metagenômica , Filogenia , RNA Ribossômico 16S/genética , Rickettsia/classificação , Rickettsia/genética

16.

Identifying genomic hotspots of differentiation and candidate genes involved in the adaptive divergence of pea aphid host races.

Nouhaud, Pierre; Gautier, Mathieu; Gouin, Anaïs; Jaquiéry, Julie; Peccoud, Jean; Legeai, Fabrice; Mieuzet, Lucie; Smadja, Carole M; Lemaitre, Claire; Vitalis, Renaud; Simon, Jean-Christophe.

Mol Ecol ; 2018 Jul 16.

Artigo em Inglês | MEDLINE | ID: mdl-30010213

RESUMO

Identifying the genomic bases of adaptation to novel environments is a long-term objective in evolutionary biology. Because genetic differentiation is expected to increase between locally adapted populations at the genes targeted by selection, scanning the genome for elevated levels of differentiation is a first step towards deciphering the genomic architecture underlying adaptive divergence. The pea aphid Acyrthosiphon pisum is a model of choice to address this question, as it forms a large complex of plant-specialized races and cryptic species, resulting from recent adaptive radiation. Here, we characterized genomewide polymorphisms in three pea aphid races specialized on alfalfa, clover and pea crops, respectively, which we sequenced in pools (poolseq). Using a model-based approach that explicitly accounts for selection, we identified 392 genomic hotspots of differentiation spanning 47.3 Mb and 2,484 genes (respectively, 9.12% of the genome size and 8.10% of its genes). Most of these highly differentiated regions were located on the autosomes, and overall differentiation was weaker on the X chromosome. Within these hotspots, high levels of absolute divergence between races suggest that these regions experienced less gene flow than the rest of the genome, most likely by contributing to reproductive isolation. Moreover, population-specific analyses showed evidence of selection in every host race, depending on the hotspot considered. These hotspots were significantly enriched for candidate gene categories that control host-plant selection and use. These genes encode 48 salivary proteins, 14 gustatory receptors, 10 odorant receptors, five P450 cytochromes and one chemosensory protein, which represent promising candidates for the genetic basis of host-plant specialization and ecological isolation in the pea aphid complex. Altogether, our findings open new research directions towards functional studies, for validating the role of these genes on adaptive phenotypes.

17.

Disentangling the Causes for Faster-X Evolution in Aphids.

Jaquiéry, Julie; Peccoud, Jean; Ouisse, Tiphaine; Legeai, Fabrice; Prunier-Leterme, Nathalie; Gouin, Anais; Nouhaud, Pierre; Brisson, Jennifer A; Bickel, Ryan; Purandare, Swapna; Poulain, Julie; Battail, Christophe; Lemaitre, Claire; Mieuzet, Lucie; Le Trionnaire, Gael; Simon, Jean-Christophe; Rispe, Claude.

Genome Biol Evol ; 10(2): 507-520, 2018 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-29360959

RESUMO

The faster evolution of X chromosomes has been documented in several species, and results from the increased efficiency of selection on recessive alleles in hemizygous males and/or from increased drift due to the smaller effective population size of X chromosomes. Aphids are excellent models for evaluating the importance of selection in faster-X evolution because their peculiar life cycle and unusual inheritance of sex chromosomes should generally lead to equivalent effective population sizes for X and autosomes. Because we lack a high-density genetic map for the pea aphid, whose complete genome has been sequenced, we first assigned its entire genome to the X or autosomes based on ratios of sequencing depth in males (X0) to females (XX). Then, we computed nonsynonymous to synonymous substitutions ratios (dN/dS) for the pea aphid gene set and found faster evolution of X-linked genes. Our analyses of substitution rates, together with polymorphism and expression data, showed that relaxed selection is likely to be the greatest contributor to faster-X because a large fraction of X-linked genes are expressed at low rates and thus escape selection. Yet, a minor role for positive selection is also suggested by the difference between substitution rates for X and autosomes for male-biased genes (but not for asexual female-biased genes) and by lower Tajima's D for X-linked compared with autosomal genes with highly male-biased expression patterns. This study highlights the relevance of organisms displaying alternative chromosomal inheritance to the understanding of forces shaping genome evolution.

Assuntos

Afídeos/genética , Cromossomos de Insetos , Evolução Molecular , Cromossomo X/genética , Animais , Afídeos/fisiologia , Evolução Biológica , Feminino , Perfilação da Expressão Gênica , Genes Ligados ao Cromossomo X , Deriva Genética , Genoma de Inseto , Masculino , Polimorfismo Genético , Reprodução , Reprodução Assexuada , Cromossomos Sexuais/genética

18.

Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software.

Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter; Koslicki, David; Janssen, Stefan; Dröge, Johannes; Gregor, Ivan; Majda, Stephan; Fiedler, Jessika; Dahms, Eik; Bremges, Andreas; Fritz, Adrian; Garrido-Oter, Ruben; Jørgensen, Tue Sparholt; Shapiro, Nicole; Blood, Philip D; Gurevich, Alexey; Bai, Yang; Turaev, Dmitrij; DeMaere, Matthew Z; Chikhi, Rayan; Nagarajan, Niranjan; Quince, Christopher; Meyer, Fernando; Balvociute, Monika; Hansen, Lars Hestbjerg; Sørensen, Søren J; Chia, Burton K H; Denis, Bertrand; Froula, Jeff L; Wang, Zhong; Egan, Robert; Don Kang, Dongwan; Cook, Jeffrey J; Deltel, Charles; Beckstette, Michael; Lemaitre, Claire; Peterlongo, Pierre; Rizk, Guillaume; Lavenier, Dominique; Wu, Yu-Wei; Singer, Steven W; Jain, Chirag; Strous, Marc; Klingenberg, Heiner; Meinicke, Peter; Barton, Michael D; Lingner, Thomas; Lin, Hsin-Hung; Liao, Yu-Chieh.

Nat Methods ; 14(11): 1063-1071, 2017 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-28967888

RESUMO

Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from â¼700 newly sequenced microorganisms and â¼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.

Assuntos

Metagenômica , Software , Algoritmos , Benchmarking , Análise de Sequência de DNA

19.

Colib'read on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads.

Le Bras, Yvan; Collin, Olivier; Monjeaud, Cyril; Lacroix, Vincent; Rivals, Éric; Lemaitre, Claire; Miele, Vincent; Sacomoto, Gustavo; Marchet, Camille; Cazaux, Bastien; Zine El Aabidine, Amal; Salmela, Leena; Alves-Carvalho, Susete; Andrieux, Alexan; Uricaru, Raluca; Peterlongo, Pierre.

Gigascience ; 5: 9, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-26870323

RESUMO

BACKGROUND: With next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data. Classical analysis processes for such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to focus directly on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools. FINDINGS: Dedicated to 'whole-genome assembly-free' treatments, the Colib'read tools suite uses optimized algorithms for various analyses of NGS datasets, such as variant calling or read set comparisons. Based on the use of a de Bruijn graph and bloom filter, such analyses can be performed in a few hours, using small amounts of memory. Applications using real data demonstrate the good accuracy of these tools compared to classical approaches. To facilitate data analysis and tools dissemination, we developed Galaxy tools and tool shed repositories. CONCLUSIONS: With the Colib'read Galaxy tools suite, we enable a broad range of life scientists to analyze raw NGS data. More importantly, our approach allows the maximum biological information to be retained in the data, and uses a very low memory footprint.

Assuntos

Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Armazenamento e Recuperação da Informação/métodos , Software , Sequência de Bases , Análise por Conglomerados , Genoma/genética , Genômica/métodos , Dados de Sequência Molecular , Reprodutibilidade dos Testes

20.

Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph.

Benoit, Gaëtan; Lemaitre, Claire; Lavenier, Dominique; Drezen, Erwan; Dayris, Thibault; Uricaru, Raluca; Rizk, Guillaume.

BMC Bioinformatics ; 16: 288, 2015 Sep 14.

Artigo em Inglês | MEDLINE | ID: mdl-26370285

RESUMO

BACKGROUND: Data volumes generated by next-generation sequencing (NGS) technologies is now a major concern for both data storage and transmission. This triggered the need for more efficient methods than general purpose compression tools, such as the widely used gzip method. RESULTS: We present a novel reference-free method meant to compress data issued from high throughput sequencing technologies. Our approach, implemented in the software LEON, employs techniques derived from existing assembly principles. The method is based on a reference probabilistic de Bruijn Graph, built de novo from the set of reads and stored in a Bloom filter. Each read is encoded as a path in this graph, by memorizing an anchoring kmer and a list of bifurcations. The same probabilistic de Bruijn Graph is used to perform a lossy transformation of the quality scores, which allows to obtain higher compression rates without losing pertinent information for downstream analyses. CONCLUSIONS: LEON was run on various real sequencing datasets (whole genome, exome, RNA-seq or metagenomics). In all cases, LEON showed higher overall compression ratios than state-of-the-art compression software. On a C. elegans whole genome sequencing dataset, LEON divided the original file size by more than 20. LEON is an open source software, distributed under GNU affero GPL License, available for download at http://gatb.inria.fr/software/leon/.

Assuntos

Algoritmos , Proteínas de Caenorhabditis elegans/genética , Caenorhabditis elegans/genética , Gráficos por Computador , Compressão de Dados/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Animais , Biologia Computacional/métodos , Simulação por Computador , Metagenômica , Probabilidade

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA