Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
Add more filters










Publication year range
1.
BMC Ecol Evol ; 23(1): 46, 2023 09 01.
Article in English | MEDLINE | ID: mdl-37658324

ABSTRACT

BACKGROUND: Plankton seascape genomics studies have revealed different trends from large-scale weak differentiation to microscale structures. Previous studies have underlined the influence of the environment and seascape on species differentiation and adaptation. However, these studies have generally focused on a few single species, sparse molecular markers, or local scales. Here, we investigated the genomic differentiation of plankton at the macro-scale in a holistic approach using Tara Oceans metagenomic data together with a reference-free computational method. RESULTS: We reconstructed the FST-based genomic differentiation of 113 marine planktonic taxa occurring in the North and South Atlantic Oceans, Southern Ocean, and Mediterranean Sea. These taxa belong to various taxonomic clades spanning Metazoa, Chromista, Chlorophyta, Bacteria, and viruses. Globally, population genetic connectivity was significantly higher within oceanic basins and lower in bacteria and unicellular eukaryotes than in zooplankton. Using mixed linear models, we tested six abiotic factors influencing connectivity, including Lagrangian travel time, as proxies of oceanic current effects. We found that oceanic currents were the main population genetic connectivity drivers, together with temperature and salinity. Finally, we classified the 113 taxa into parameter-driven groups and showed that plankton taxa belonging to the same taxonomic rank such as phylum, class or order presented genomic differentiation driven by different environmental factors. CONCLUSION: Our results validate the isolation-by-current hypothesis for a non-negligible proportion of taxa and highlight the role of other physicochemical parameters in large-scale plankton genetic connectivity. The reference-free approach used in this study offers a new systematic framework to analyse the population genomics of non-model and undocumented marine organisms from a large-scale and holistic point of view.


Subject(s)
Acclimatization , Plankton , Animals , Plankton/genetics , Zooplankton/genetics , Genomics , Atlantic Ocean , Eukaryota
2.
PLoS Biol ; 20(11): e3001893, 2022 11.
Article in English | MEDLINE | ID: mdl-36441816

ABSTRACT

Diatoms form a diverse and abundant group of photosynthetic protists that are essential players in marine ecosystems. However, the microevolutionary structure of their populations remains poorly understood, particularly in polar regions. Exploring how closely related diatoms adapt to different environments is essential given their short generation times, which may allow rapid adaptations, and their prevalence in marine regions dramatically impacted by climate change, such as the Arctic and Southern Oceans. Here, we address genetic diversity patterns in Chaetoceros, the most abundant diatom genus and one of the most diverse, using 11 metagenome-assembled genomes (MAGs) reconstructed from Tara Oceans metagenomes. Genome-resolved metagenomics on these MAGs confirmed a prevalent distribution of Chaetoceros in the Arctic Ocean with lower dispersal in the Pacific and Southern Oceans as well as in the Mediterranean Sea. Single-nucleotide variants identified within the different MAG populations allowed us to draw a landscape of Chaetoceros genetic diversity and revealed an elevated genetic structure in some Arctic Ocean populations. Gene flow patterns of closely related Chaetoceros populations seemed to correlate with distinct abiotic factors rather than with geographic distance. We found clear positive selection of genes involved in nutrient availability responses, in particular for iron (e.g., ISIP2a, flavodoxin), silicate, and phosphate (e.g., polyamine synthase), that were further supported by analysis of Chaetoceros transcriptomes. Altogether, these results highlight the importance of environmental selection in shaping diatom diversity patterns and provide new insights into their metapopulation genomics through the integration of metagenomic and environmental data.


Subject(s)
Diatoms , Diatoms/genetics , Ecosystem , Genomics , Metagenomics
3.
Environ Microbiol ; 24(12): 6086-6099, 2022 12.
Article in English | MEDLINE | ID: mdl-36053818

ABSTRACT

For more than a decade, high-throughput sequencing has transformed the study of marine planktonic communities and has highlighted the extent of protist diversity in these ecosystems. Nevertheless, little is known relative to their genomic diversity at the species-scale as well as their major speciation mechanisms. An increasing number of data obtained from global scale sampling campaigns is becoming publicly available, and we postulate that metagenomic data could contribute to deciphering the processes shaping protist genomic differentiation in the marine realm. As a proof of concept, we developed a findable, accessible, interoperable and reusable (FAIR) pipeline and focused on the Mediterranean Sea to study three a priori abundant protist species: Bathycoccus prasinos, Pelagomonas calceolata and Phaeocystis cordata. We compared the genomic differentiation of each species in light of geographic, environmental and oceanographic distances. We highlighted that isolation-by-environment shapes the genomic differentiation of B. prasinos, whereas P. cordata is impacted by geographic distance (i.e. isolation-by-distance). At present time, the use of metagenomics to accurately estimate the genomic differentiation of protists remains challenging since coverages are lower compared to traditional population surveys. However, our approach sheds light on ecological and evolutionary processes occurring within natural marine populations and paves the way for future protist population metagenomic studies.


Subject(s)
Phytoplankton , Stramenopiles , Mediterranean Sea , Phytoplankton/genetics , Ecosystem , Genomics
4.
Biology (Basel) ; 10(7)2021 Jul 13.
Article in English | MEDLINE | ID: mdl-34356512

ABSTRACT

Copepods are among the most numerous animals, and they play an essential role in the marine trophic web and biogeochemical cycles. The genus Oithona is described as having the highest density of copepods. The Oithona male paradox describes the activity states of males, which are obliged to alternate between immobile and mobile phases for ambush feeding and mate searching, respectively, while the female is less mobile and feeds less. To characterize the molecular basis of this sexual dimorphism, we combined immunofluorescence, genomics, transcriptomics, and protein-protein interaction approaches and revealed the presence of a male-specific nervous ganglion. Transcriptomic analysis showed male-specific enrichment for nervous system development-related transcripts. Twenty-seven Lin12-Notch Repeat domain-containing protein coding genes (LDPGs) of the 75 LDPGs identified in the genome were specifically expressed in males. Furthermore, some LDPGs coded for proteins with predicted proteolytic activity, and proteases-associated transcripts showed a male-specific enrichment. Using yeast double-hybrid assays, we constructed a protein-protein interaction network involving two LDPs with proteases, extracellular matrix proteins, and neurogenesis-related proteins. We also hypothesized possible roles of the LDPGs in the development of the lateral ganglia through helping in extracellular matrix lysis, neurites growth guidance, and synapses genesis.

5.
Nat Commun ; 12(1): 1173, 2021 02 19.
Article in English | MEDLINE | ID: mdl-33608509

ABSTRACT

Antimicrobial resistance is a major global health threat and its development is promoted by antibiotic misuse. While disk diffusion antibiotic susceptibility testing (AST, also called antibiogram) is broadly used to test for antibiotic resistance in bacterial infections, it faces strong criticism because of inter-operator variability and the complexity of interpretative reading. Automatic reading systems address these issues, but are not always adapted or available to resource-limited settings. We present an artificial intelligence (AI)-based, offline smartphone application for antibiogram analysis. The application captures images with the phone's camera, and the user is guided throughout the analysis on the same device by a user-friendly graphical interface. An embedded expert system validates the coherence of the antibiogram data and provides interpreted results. The fully automatic measurement procedure of our application's reading system achieves an overall agreement of 90% on susceptibility categorization against a hospital-standard automatic system and 98% against manual measurement (gold standard), with reduced inter-operator variability. The application's performance showed that the automatic reading of antibiotic resistance testing is entirely feasible on a smartphone. Moreover our application is suited for resource-limited settings, and therefore has the potential to significantly increase patients' access to AST worldwide.


Subject(s)
Artificial Intelligence , Drug Resistance, Microbial , Microbial Sensitivity Tests/methods , Mobile Applications , Smartphone , Anti-Bacterial Agents/pharmacology , Bacterial Infections , Drug Resistance, Microbial/drug effects , Humans , Image Processing, Computer-Assisted , Machine Learning , Software
6.
Open Res Eur ; 1: 94, 2021.
Article in English | MEDLINE | ID: mdl-37645128

ABSTRACT

Background: The yellow mealworm beetle, Tenebrio molitor, is a promising alternative protein source for animal and human nutrition and its farming involves relatively low environmental costs. For these reasons, its industrial scale production started this century. However, to optimize and breed sustainable new T. molitor lines, the access to its genome remains essential. Methods: By combining Oxford Nanopore and Illumina Hi-C data, we constructed a high-quality chromosome-scale assembly of T. molitor. Then, we combined RNA-seq data and available coleoptera proteomes for gene prediction with GMOVE. Results: We produced a high-quality genome with a N50 = 21.9Mb with a completeness of 99.5% and predicted 21,435 genes with a median size of 1,780 bp. Gene orthology between T. molitor and Tribolium castaneum showed a highly conserved synteny between the two coleoptera and paralogs search revealed an expansion of histones in the T. molitor genome. Conclusions: The present genome will greatly help fundamental and applied research such as genetic breeding and will contribute to the sustainable production of the yellow mealworm.

7.
PLoS One ; 15(12): e0244637, 2020.
Article in English | MEDLINE | ID: mdl-33378381

ABSTRACT

The availability of large metagenomic data offers great opportunities for the population genomic analysis of uncultured organisms, which represent a large part of the unexplored biosphere and play a key ecological role. However, the majority of these organisms lack a reference genome or transcriptome, which constitutes a technical obstacle for classical population genomic analyses. We introduce the metavariant species (MVS) model, in which a species is represented only by intra-species nucleotide polymorphism. We designed a method combining reference-free variant calling, multiple density-based clustering and maximum-weighted independent set algorithms to cluster intra-species variants into MVSs directly from multisample metagenomic raw reads without a reference genome or read assembly. The frequencies of the MVS variants are then used to compute population genomic statistics such as FST, in order to estimate genomic differentiation between populations and to identify loci under natural selection. The MVS construction was tested on simulated and real metagenomic data. MVSs showed the required quality for robust population genomics and allowed an accurate estimation of genomic differentiation (ΔFST < 0.0001 and <0.03 on simulated and real data respectively). Loci predicted under natural selection on real data were all detected by MVSs. MVSs represent a new paradigm that may simplify and enhance holistic approaches for population genomics and the evolution of microorganisms.


Subject(s)
Computational Biology/methods , Genetic Variation , Metagenomics/methods , Cluster Analysis , Genetics, Population , Models, Genetic , Selection, Genetic , Software
8.
Ecol Evol ; 10(16): 8894-8905, 2020 Aug.
Article in English | MEDLINE | ID: mdl-32884665

ABSTRACT

Acclimation allowed by variation in gene or allele expression in natural populations is increasingly understood as a decisive mechanism, as much as adaptation, for species evolution. However, for small eukaryotic organisms, as species from zooplankton, classical methods face numerous challenges. Here, we propose the concept of allelic differential expression at the population-scale (psADE) to investigate the variation in allele expression in natural populations. We developed a novel approach to detect psADE based on metagenomic and metatranscriptomic data from environmental samples. This approach was applied on the widespread marine copepod, Oithona similis, by combining samples collected during the Tara Oceans expedition (2009-2013) and de novo transcriptome assemblies. Among a total of 25,768 single nucleotide variants (SNVs) of O. similis, 572 (2.2%) were affected by psADE in at least one population (FDR < 0.05). The distribution of SNVs under psADE in different populations is significantly shaped by population genomic differentiation (Pearson r = 0.87, p = 5.6 × 10-30), supporting a partial genetic control of psADE. Moreover, a significant amount of SNVs (0.6%) were under both selection and psADE (p < .05), supporting the hypothesis that natural selection and psADE tends to impact common loci. Population-scale allelic differential expression offers new insights into the gene regulation control in populations and its link with natural selection.

9.
Nat Genet ; 51(9): 1411-1422, 2019 09.
Article in English | MEDLINE | ID: mdl-31477930

ABSTRACT

We report the first annotated chromosome-level reference genome assembly for pea, Gregor Mendel's original genetic model. Phylogenetics and paleogenomics show genomic rearrangements across legumes and suggest a major role for repetitive elements in pea genome evolution. Compared to other sequenced Leguminosae genomes, the pea genome shows intense gene dynamics, most likely associated with genome size expansion when the Fabeae diverged from its sister tribes. During Pisum evolution, translocation and transposition differentially occurred across lineages. This reference sequence will accelerate our understanding of the molecular basis of agronomically important traits and support crop improvement.


Subject(s)
Chromosomes, Plant/genetics , Evolution, Molecular , Fabaceae/genetics , Genome, Plant , Pisum sativum/genetics , Plant Proteins/genetics , Quantitative Trait Loci , Chromosome Mapping , Fabaceae/classification , Gene Expression Regulation, Plant , Genetic Variation , Genomics , Phenotype , Phylogeny , Reference Standards , Repetitive Sequences, Nucleic Acid , Seed Storage Proteins/genetics , Whole Genome Sequencing
10.
Nat Commun ; 10(1): 3421, 2019 07 31.
Article in English | MEDLINE | ID: mdl-31366887

ABSTRACT

Transposable elements (TEs) are mobile parasitic sequences that have been repeatedly coopted during evolution to generate new functions and rewire gene regulatory networks. Yet, the contribution of active TEs to the creation of heritable mutations remains unknown. Using TE accumulation lines in Arabidopsis thaliana we show that once initiated, transposition produces an exponential spread of TE copies, which rapidly leads to high mutation rates. Most insertions occur near or within genes and targets differ between TE families. Furthermore, we uncover an essential role of the histone variant H2A.Z in the preferential integration of Ty1/copia retrotransposons within environmentally responsive genes and away from essential genes. We also show that epigenetic silencing of new Ty1/copia copies can affect their impact on major fitness-related traits, including flowering time. Our findings demonstrate that TEs are potent episodic (epi)mutagens that, thanks to marked chromatin tropisms, limit the mutation load and increase the potential for rapid adaptation.


Subject(s)
Arabidopsis Proteins/genetics , Arabidopsis/genetics , DNA Transposable Elements/genetics , Histones/genetics , Retroelements/genetics , Adaptation, Physiological/genetics , Genome, Plant/genetics
11.
Front Plant Sci ; 10: 323, 2019.
Article in English | MEDLINE | ID: mdl-30930928

ABSTRACT

Whole genome profiling (WGP) is a sequence-based physical mapping technology and uses sequence tags generated by next generation sequencing for construction of bacterial artificial chromosome (BAC) contigs of complex genomes. The physical map provides a framework for assembly of genome sequence and information for localization of genes that are difficult to find through positional cloning. To address the challenges of accurate assembly of the pea genome (∼4.2 GB of which approximately 85% is repetitive sequences), we have adopted the WGP technology for assembly of a pea BAC library. Multi-dimensional pooling of 295,680 BAC clones and sequencing the ends of restriction fragments of pooled DNA generated 1,814 million high quality reads, of which 825 million were deconvolutable to 1.11 million unique WGP sequence tags. These WGP tags were used to assemble 220,013 BACs into contigs. Assembly of the BAC clones using the modified Fingerprinted Contigs (FPC) program has resulted in 13,040 contigs, consisting of 213,719 BACs, and 6,294 singleton BACs. The average contig size is 0.33 Mbp and the N50 contig size is 0.62 Mbp. WGPTM technology has proved to provide a robust physical map of the pea genome, which would have been difficult to assemble using traditional restriction digestion based methods. This sequence-based physical map will be useful to assemble the genome sequence of pea. Additionally, the 1.1 million WGP tags will support efficient assignment of sequence scaffolds to the BAC clones, and thus an efficient sequencing of BAC pools with targeted genome regions of interest.

12.
Mol Ecol Resour ; 19(2): 526-535, 2019 Mar.
Article in English | MEDLINE | ID: mdl-30575285

ABSTRACT

Comparison of the molecular diversity in all plankton populations present in geographically distant water columns may allow for a holistic view of the connectivity, isolation and adaptation of organisms in the marine environment. In this context, a large-scale detection and analysis of genomic variants directly in metagenomic data appeared as a powerful strategy for the identification of genetic structures and genes under natural selection in plankton. Here, we used discosnp++, a reference-free variant caller, to produce genetic variants from large-scale metagenomic data and assessed its accuracy on the copepod Oithona nana in terms of variant calling, allele frequency estimation and population genomic statistics by comparing it to the state-of-the-art method. discosnp ++ produces variants leading to similar conclusions regarding the genetic structure and identification of loci under natural selection. discosnp++ was then applied to 120 metagenomic samples from four size fractions, including prokaryotes, protists and zooplankton sampled from 39 tara Oceans sampling stations located in the Atlantic Ocean and the Mediterranean Sea to produce a new set of marine genomic markers containing more than 19 million of variants. This new genomic resource can be used by the community to relocate these markers on their plankton genomes or transcriptomes of interest. This resource will be updated with new marine expeditions and the increase of metagenomic data (availability: http://bioinformatique.rennes.inria.fr/taravariants/).


Subject(s)
Aquatic Organisms/classification , Genetic Markers , Genetics, Population/methods , Genotyping Techniques/methods , Metagenomics/methods , Plankton/genetics , Animals , Aquatic Organisms/genetics , Atlantic Ocean , Mediterranean Sea
13.
Nat Plants ; 4(7): 440-452, 2018 07.
Article in English | MEDLINE | ID: mdl-29915331

ABSTRACT

Oaks are an important part of our natural and cultural heritage. Not only are they ubiquitous in our most common landscapes1 but they have also supplied human societies with invaluable services, including food and shelter, since prehistoric times2. With 450 species spread throughout Asia, Europe and America3, oaks constitute a critical global renewable resource. The longevity of oaks (several hundred years) probably underlies their emblematic cultural and historical importance. Such long-lived sessile organisms must persist in the face of a wide range of abiotic and biotic threats over their lifespans. We investigated the genomic features associated with such a long lifespan by sequencing, assembling and annotating the oak genome. We then used the growing number of whole-genome sequences for plants (including tree and herbaceous species) to investigate the parallel evolution of genomic characteristics potentially underpinning tree longevity. A further consequence of the long lifespan of trees is their accumulation of somatic mutations during mitotic divisions of stem cells present in the shoot apical meristems. Empirical4 and modelling5 approaches have shown that intra-organismal genetic heterogeneity can be selected for6 and provides direct fitness benefits in the arms race with short-lived pests and pathogens through a patchwork of intra-organismal phenotypes7. However, there is no clear proof that large-statured trees consist of a genetic mosaic of clonally distinct cell lineages within and between branches. Through this case study of oak, we demonstrate the accumulation and transmission of somatic mutations and the expansion of disease-resistance gene families in trees.


Subject(s)
Genome, Plant/genetics , Quercus/genetics , Biological Evolution , DNA, Plant/genetics , Genetic Variation/genetics , Longevity/genetics , Mutation , Phylogeny , Sequence Analysis, DNA
14.
PeerJ ; 6: e4685, 2018.
Article in English | MEDLINE | ID: mdl-29780666

ABSTRACT

Among copepods, which are the most abundant animals on Earth, the genus Oithona is described as one of the most numerous and plays a major role in the marine food chain and biogeochemical cycles, particularly through the excretion of chitin-coated fecal pellets. Despite the morphology of several Oithona species is well known, knowledge of its internal anatomy and chitin distribution is still limited. To answer this problem, Oithona nana and O. similis individuals were stained by Wheat Germ Agglutinin-Fluorescein IsoThioCyanate (WGA-FITC) and DiAmidino-2-PhenylIndole (DAPI) for fluorescence microscopy observations. The image analyses allowed a new description of the organization and chitin content of the digestive and reproductive systems of Oithona male and female. Chitin microfibrils were found all along the digestive system from the stomach to the hindgut with a higher concentration at the peritrophic membrane of the anterior midgut. Several midgut shrinkages were observed and proposed to be involved in faecal pellet shaping and motion. Amorphous chitin structures were also found to be a major component of the ducts and seminal vesicles and receptacles. The rapid staining protocol we proposed allowed a new insight into the Oithona internal anatomy and highlighted the role of chitin in the digestion and reproduction. This method could be applied to a wide range of copepods in order to perform comparative anatomy analyses.

15.
Nat Genet ; 50(6): 772-777, 2018 06.
Article in English | MEDLINE | ID: mdl-29713014

ABSTRACT

Roses have high cultural and economic importance as ornamental plants and in the perfume industry. We report the rose whole-genome sequencing and assembly and resequencing of major genotypes that contributed to rose domestication. We generated a homozygous genotype from a heterozygous diploid modern rose progenitor, Rosa chinensis 'Old Blush'. Using single-molecule real-time sequencing and a meta-assembly approach, we obtained one of the most comprehensive plant genomes to date. Diversity analyses highlighted the mosaic origin of 'La France', one of the first hybrids combining the growth vigor of European species and the recurrent blooming of Chinese species. Genomic segments of Chinese ancestry identified new candidate genes for recurrent blooming. Reconstructing regulatory and secondary metabolism pathways allowed us to propose a model of interconnected regulation of scent and flower color. This genome provides a foundation for understanding the mechanisms governing rose traits and should accelerate improvement in roses, Rosaceae and ornamentals.


Subject(s)
Genome, Plant , Rosa/genetics , Domestication , Flowers/genetics , Gene Expression Regulation, Plant , Genes, Plant , Genetic Variation , Genotype , Plant Proteins/genetics , Sequence Analysis, DNA/methods , Whole Genome Sequencing/methods
16.
Nat Commun ; 9(1): 373, 2018 01 25.
Article in English | MEDLINE | ID: mdl-29371626

ABSTRACT

While our knowledge about the roles of microbes and viruses in the ocean has increased tremendously due to recent advances in genomics and metagenomics, research on marine microbial eukaryotes and zooplankton has benefited much less from these new technologies because of their larger genomes, their enormous diversity, and largely unexplored physiologies. Here, we use a metatranscriptomics approach to capture expressed genes in open ocean Tara Oceans stations across four organismal size fractions. The individual sequence reads cluster into 116 million unigenes representing the largest reference collection of eukaryotic transcripts from any single biome. The catalog is used to unveil functions expressed by eukaryotic marine plankton, and to assess their functional biogeography. Almost half of the sequences have no similarity with known proteins, and a great number belong to new gene families with a restricted distribution in the ocean. Overall, the resource provides the foundations for exploring the roles of marine eukaryotes in ocean ecology and biogeochemistry.


Subject(s)
Aquatic Organisms , Eukaryota/genetics , Eukaryotic Cells/metabolism , Metagenome , Phylogeny , Zooplankton/genetics , Amino Acid Sequence , Animals , Atlases as Topic , Bacteria/classification , Bacteria/genetics , Biodiversity , Ecosystem , Eukaryota/classification , Eukaryotic Cells/cytology , Metagenomics/methods , Oceans and Seas , Phytoplankton/classification , Phytoplankton/genetics , Seawater , Viruses/classification , Viruses/genetics , Zooplankton/classification
17.
Mol Ecol ; 26(17): 4467-4482, 2017 Sep.
Article in English | MEDLINE | ID: mdl-28636804

ABSTRACT

In the epipelagic ocean, the genus Oithona is considered as one of the most abundant and widespread copepods and plays an important role in the trophic food web. Despite its ecological importance, little is known about Oithona and cyclopoid copepods genomics. Therefore, we sequenced, assembled and annotated the genome of Oithona nana. The comparative genomic analysis integrating available copepod genomes highlighted the expansions of genes related to stress response, cell differentiation and development, including genes coding Lin12-Notch-repeat (LNR) domain proteins. The Oithona biogeography based on 28S sequences and metagenomic reads from the Tara Oceans expedition showed the presence of O. nana mostly in the Mediterranean Sea (MS) and confirmed the amphitropical distribution of Oithona similis. The population genomics analyses of O. nana in the Northern MS, integrating the Tara Oceans metagenomic data and the O. nana genome, led to the identification of genetic structure between populations from the MS basins. Furthermore, 20 loci were found to be under positive selection including four missense and eight synonymous variants, harbouring soft or hard selective sweep patterns. One of the missense variants was localized in the LNR domain of the coding region of a male-specific gene. The variation in the B-allele frequency with respect to the MS circulation pattern showed the presence of genomic clines between O. nana and another undefined Oithona species possibly imported through Atlantic waters. This study provides new approaches and results in zooplankton population genomics through the integration of metagenomic and oceanographic data.


Subject(s)
Copepoda/genetics , Genetics, Population , Selection, Genetic , Animals , Gene Frequency , Male , Mediterranean Sea , Zooplankton
18.
BMC Bioinformatics ; 17: 115, 2016 Mar 03.
Article in English | MEDLINE | ID: mdl-26936254

ABSTRACT

BACKGROUND: Scaffolding is an essential step in the genome assembly process. Current methods based on large fragment paired-end reads or long reads allow an increase in contiguity but often lack consistency in repetitive regions, resulting in fragmented assemblies. Here, we describe a novel tool to link assemblies to a genome map to aid complex genome reconstruction by detecting assembly errors and allowing scaffold ordering and anchoring. RESULTS: We present MaGuS (map-guided scaffolding), a modular tool that uses a draft genome assembly, a Whole Genome Profiling™ (WGP) map, and high-throughput paired-end sequencing data to estimate the quality and to enhance the contiguity of an assembly. We generated several assemblies of the Arabidopsis genome using different scaffolding programs and applied MaGuS to select the best assembly using quality metrics. Then, we used MaGuS to perform map-guided scaffolding to increase contiguity by creating new scaffold links in low-covered and highly repetitive regions where other commonly used scaffolding methods lack consistency. CONCLUSIONS: MaGuS is a powerful reference-free evaluator of assembly quality and a WGP map-guided scaffolder that is freely available at https://github.com/institut-de-genomique/MaGuS. Its use can be extended to other high-throughput sequencing data (e.g., long-read data) and also to other map data (e.g., genetic maps) to improve the quality and the contiguity of large and complex genome assemblies.


Subject(s)
Arabidopsis/genetics , Chromosomes, Plant/genetics , Genome, Plant , High-Throughput Nucleotide Sequencing/methods , Physical Chromosome Mapping , Sequence Analysis, DNA/methods , Chromosomes, Artificial, Bacterial , Contig Mapping , Repetitive Sequences, Nucleic Acid , Sequence Alignment
19.
BMC Genomics ; 16: 327, 2015 Apr 20.
Article in English | MEDLINE | ID: mdl-25927464

ABSTRACT

BACKGROUND: Long-read sequencing technologies were launched a few years ago, and in contrast with short-read sequencing technologies, they offered a promise of solving assembly problems for large and complex genomes. Moreover by providing long-range information, it could also solve haplotype phasing. However, existing long-read technologies still have several limitations that complicate their use for most research laboratories, as well as in large and/or complex genome projects. In 2014, Oxford Nanopore released the MinION® device, a small and low-cost single-molecule nanopore sequencer, which offers the possibility of sequencing long DNA fragments. RESULTS: The assembly of long reads generated using the Oxford Nanopore MinION® instrument is challenging as existing assemblers were not implemented to deal with long reads exhibiting close to 30% of errors. Here, we presented a hybrid approach developed to take advantage of data generated using MinION® device. We sequenced a well-known bacterium, Acinetobacter baylyi ADP1 and applied our method to obtain a highly contiguous (one single contig) and accurate genome assembly even in repetitive regions, in contrast to an Illumina-only assembly. Our hybrid strategy was able to generate NaS (Nanopore Synthetic-long) reads up to 60 kb that aligned entirely and with no error to the reference genome and that spanned highly conserved repetitive regions. The average accuracy of NaS reads reached 99.99% without losing the initial size of the input MinION® reads. CONCLUSIONS: We described NaS tool, a hybrid approach allowing the sequencing of microbial genomes using the MinION® device. Our method, based ideally on 20x and 50x of NaS and Illumina reads respectively, provides an efficient and cost-effective way of sequencing microbial or small eukaryotic genomes in a very short time even in small facilities. Moreover, we demonstrated that although the Oxford Nanopore technology is a relatively new sequencing technology, currently with a high error rate, it is already useful in the generation of high-quality genome assemblies.


Subject(s)
Acinetobacter/genetics , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , DNA, Bacterial/analysis , Genome, Bacterial , High-Throughput Nucleotide Sequencing/instrumentation , Repetitive Sequences, Nucleic Acid , Sequence Analysis, DNA/instrumentation
20.
BMC Bioinformatics ; 15: 377, 2014 Nov 19.
Article in English | MEDLINE | ID: mdl-25408240

ABSTRACT

BACKGROUND: Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now possible to resequence whole genomes in order to systematically characterize novel TE mobilization in a particular individual. However, this task is made difficult by the inherently repetitive nature of TE sequences, which in some eukaryotes compose over half of the genome sequence. Currently, only a few software tools dedicated to the detection of TE mobilization using next-generation-sequencing are described in the literature. They often target specific TEs for which annotation is available, and are only able to identify families of closely related TEs, rather than individual elements. RESULTS: We present TE-Tracker, a general and accurate computational method for the de-novo detection of germ line TE mobilization from re-sequenced genomes, as well as the identification of both their source and destination sequences. We compare our method with the two classes of existing software: specialized TE-detection tools and generic structural variant (SV) detection tools. We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power. Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool. TE-Tracker demonstrates the benefit of adopting an annotation-independent, de novo approach for the detection of TE mobilization events. We use TE-Tracker to provide a comprehensive view of transposition events induced by loss of DNA methylation in Arabidopsis. TE-Tracker is freely available at http://www.genoscope.cns.fr/TE-Tracker . CONCLUSIONS: We show that TE-Tracker accurately detects both the source and destination of novel transposition events in re-sequenced genomes. Moreover, TE-Tracker is able to detect all potential donor sequences for a given insertion, and can identify the correct one among them. Furthermore, TE-Tracker produces significantly fewer false positives than common SV detection programs, thus greatly facilitating the detection and analysis of TE mobilization events.


Subject(s)
Arabidopsis/genetics , DNA Transposable Elements/genetics , Genes, Plant/genetics , Genome, Plant , High-Throughput Nucleotide Sequencing/methods , Software , DNA Methylation , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...