Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
Add more filters










Publication year range
1.
Biol Rev Camb Philos Soc ; 99(2): 546-561, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38049930

ABSTRACT

Genetic data show that many nominal species are composed of more than one biological species, and thus contain cryptic species in the broad sense (including overlooked species). When ignored, cryptic species generate confusion which, beyond biodiversity or vulnerability underestimation, blurs our understanding of ecological and evolutionary processes and may impact the soundness of decisions in conservation or medicine. However, very few hypotheses have been tested about factors that predispose a taxon to contain cryptic or overlooked species. To fill this gap, we surveyed the literature on free-living marine metazoans and built two data sets, one of 187,603 nominal species and another of 83 classes or phyla, to test several hypotheses, correcting for sequence data availability, taxon size and phylogenetic relatedness. We found a strong effect of scientific history: the probability of a taxon containing cryptic species was highest for the earliest described species and varied among time periods potentially consistently with an influence of prevailing scientific theories. The probability of cryptic species being present was also increased for species with large distribution ranges. They were more frequent in the north polar and south polar zones, contradicting previous predictions of more cryptic species in the tropics, and supporting the hypothesis that many cryptic species diverged recently. The number of cryptic species varied among classes, with an excess in hydrozoans and polychaetes, and a deficit in actinopterygians, for example, but precise class ranking was relatively sensitive to the statistical model used. For all models, biological traits, rather than phylum, appeared responsible for the variation among classes: there were fewer cryptic species than expected in classes with hard skeletons (perhaps because they provide good characters for taxonomy) and image-forming vision (in which selection against heterospecific mating may enhance morphological divergence), and more in classes with internal fertilisation. We estimate that among marine free-living metazoans, several thousand additional cryptic species complexes could be identified as more sequence data become available. The factors identified as important for marine animal cryptic species are likely important for other biomes and taxa and should aid many areas in biology that rely on accurate species identification.


Subject(s)
Biodiversity , Ecosystem , Animals , Phylogeny , Biological Evolution , Models, Statistical
2.
Front Plant Sci ; 14: 1277916, 2023.
Article in English | MEDLINE | ID: mdl-38023870

ABSTRACT

The adaptability of plant populations to a changing environment depends on their genetic diversity, which in turn is influenced by the degree of sexual reproduction and gene flow from distant areas. Aquatic macrophytes can reproduce both sexually and asexually, and their reproductive fragments are spread in various ways (e.g. by water). Although these plants are obviously exposed to hydrological changes, the degree of vulnerability may depend on the types of their reproduction and distribution, as well as the hydrological differences of habitats. The aim of this study was to investigate the genetic diversity of the cosmopolitan macrophyte Ceratophyllum demersum in hydrologically different aquatic habitats, i.e. rivers and backwaters separated from the main river bed to a different extent. For this purpose, the first microsatellite primer set was developed for this species. Using 10 developed primer pairs, a high level of genetic variation was explored in C. demersum populations. Overall, more than 80% of the loci were found to be polymorphic, a total of 46 different multilocus genotypes and 18 private alleles were detected in the 63 individuals examined. The results demonstrated that microsatellite polymorphism in this species depends on habitat hydrology. The greatest genetic variability was revealed in populations of rivers, where flowing water provides constant longitudinal connections with distant habitats. The populations of the hydrologically isolated backwaters showed the lowest microsatellite polymorphism, while plants from an oxbow occasionally flooded by the main river had medium genetic diversity. The results highlight that in contrast to species that spread independently of water flow or among hydrologically isolated water bodies, macrophytes with exclusive or dominant hydrochory may be most severely affected by habitat fragmentation, for example due to climate change.

3.
Comput Struct Biotechnol J ; 21: 1151-1156, 2023.
Article in English | MEDLINE | ID: mdl-36789260

ABSTRACT

To obtain accurate estimates for biodiversity and ecological studies, metabarcoding studies should be carefully designed to minimize both false positive (FP) and false negative (FN) occurrences. Internal controls (mock samples and negative controls), replicates, and overlapping markers allow controlling metabarcoding errors but current metabarcoding software packages do not explicitly integrate these additional experimental data to optimize filtering. We have developed the metabarcoding analysis software VTAM, which uses explicitly these elements of the experimental design to find optimal parameter settings that minimize FP and FN occurrences. VTAM showed similar sensitivity, but a higher precision compared to two other pipelines using three datasets and two different markers (COI, 16S). The stringent filtering procedure implemented in VTAM aims to produce robust metabarcoding data to obtain accurate ecological estimates and represents an important step towards a non-arbitrary and standardized validation of metabarcoding data for conducting ecological studies. VTAM is implemented in Python and available from: https://github.com/aitgon/vtam. The VTAM benchmark code is available from: https://github.com/aitgon/vtam_benchmark.

4.
Mol Ecol Resour ; 23(4): 933-945, 2023 May.
Article in English | MEDLINE | ID: mdl-36656075

ABSTRACT

Reference databases with wide taxonomic coverage are greatly needed in many fields of biology, most particularly for the taxonomic assignment of metabarcoding sequences. Therefore, it is fundamental to be able to access and pool data from different primary databases. The COInr database is a freely available, easy-to-access database of COI reference sequences extracted from the BOLD and NCBI nucleotide databases. It is a comprehensive database: not limited to a taxon, a gene region or a taxonomic rank; therefore, it is a good starting point for creating custom databases. Sequences are dereplicated between databases and within taxa. Each taxon has a unique taxonomic identifier (taxID), fundamental to avoid ambiguous associations of homonyms and synonyms in the source database. TaxIDs form a coherent hierarchical system fully compatible with the NCBI taxIDs, allowing their full or ranked lineages to be created. The mkcoinr tool is a series of Perl scripts designed to download sequences from BOLD and NCBI, to build the COInr database and to customize it according to the users' needs. It is possible to select or eliminate sequences for a list of taxa, select a specific gene region, select for minimum taxonomic resolution, add new custom sequences, and format the database for blast, vtam, qiime and rdp classifier. This is a semi-automated pipeline using command lines in a Linux environment. The COInr database can be downloaded from https://doi.org/10.5281/zenodo.6555985 and mkcoinr and its full documentation is available at https://github.com/meglecz/mkCOInr.

5.
PeerJ ; 11: e14616, 2023.
Article in English | MEDLINE | ID: mdl-36643652

ABSTRACT

Background: In metabarcoding analyses, the taxonomic assignment is crucial to place sequencing data in biological and ecological contexts. This fundamental step depends on a reference database, which should have a good taxonomic coverage to avoid unassigned sequences. However, this goal is rarely achieved in many geographic regions and for several taxonomic groups. On the other hand, more is not necessarily better, as sequences in reference databases belonging to taxonomic groups out of the studied region/environment context might lead to false assignments. Methods: We investigated the effect of using several subsets of a cytochrome c oxidase subunit I (COI) reference database on taxonomic assignment. Published metabarcoding sequences from the Mediterranean Sea were assigned to taxa using COInr, which is a comprehensive, non-redundant and recent database of COI sequences obtained both from BOLD and NCBI, and two of its subsets: (i) all sequences except insects (COInr-WO-Insecta), which represent the overwhelming majority of COInr database, but are irrelevant for marine samples, and (ii) all sequences from taxonomic families present in the Mediterranean Sea (COInr-Med). Four different algorithms for taxonomic assignment were employed in parallel to evaluate differences in their output and data consistency. Results: The reduction of the database to more specific custom subsets increased the number of unassigned sequences. Nevertheless, since most of them were incorrectly assigned by the less specific databases, this is a positive outcome. Moreover, the taxonomic resolution (the lowest taxonomic level to which a sequence is attributed) of several sequences tended to increase when using customized databases. These findings clearly indicated the need for customized databases adapted to each study. However, the very high proportion of unassigned sequences points to the need to enrich the local database with new barcodes specifically obtained from the studied region and/or taxonomic group. Including novel local barcodes to the COI database proved to be very profitable: by adding only 116 new barcodes sequenced in our laboratory, thus increasing the reference database by only 0.04%, we were able to improve the resolution for ca. 0.6-1% of the Amplicon Sequence Variants (ASVs).


Subject(s)
Aquatic Organisms , DNA Barcoding, Taxonomic , Databases, Factual , Mediterranean Sea , Aquatic Organisms/genetics
6.
Biol Futur ; 74(4): 369-375, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38300415

ABSTRACT

Metabarcoding is now a widely used method for biodiversity studies. Taxonomic assignment of environmental sequences is one of the key steps of metabarcoding. Assignments based on lowest common ancestor (LCA) method generally rely on fixed arbitrary thresholds, and this is generally not well adapted for assignment of taxonomically diverse groups with variable coverage in reference databases. The mkLTG is a LCA-based method that uses a series of percentage of identity thresholds starting from stringent parameters and decreasing it if necessary. All parameters can be set separately for each percentage of identity threshold, which makes this tool adaptable for different databases, genetic markers and diverse taxonomic groups. The optimization step was included using the COI marker and a comprehensive, non-redundant database. The mkLTG tool is a command-line application with few dependencies that runs in all operating systems, therefore, it is easy to include into complex pipelines. All scripts are freely available including the benchmarking at https://github.com/meglecz/mkLTG .


Subject(s)
Biodiversity
7.
Mol Ecol ; 31(22): 5889-5908, 2022 11.
Article in English | MEDLINE | ID: mdl-36125278

ABSTRACT

Dietary studies are critical for understanding foraging strategies and have important applications in conservation and habitat management. We applied a robust metabarcoding protocol to characterize the diet of the critically endangered freshwater fish Zingel asper (the Rhone streber). We conducted modelling and simulation analyses to identify and characterize some of the drivers of individual trophic trait variation in this species. We found that population density and ontogeny had minor effects on the trophic niche of Z. asper. Instead, our results suggest that the majority of trophic niche variation was driven by seasonal variation in ecological opportunity. The total trophic niche width of Z. asper seasonally expanded to include a broader range of prey. Furthermore, null model simulations revealed that the increase of between-individual variation in autumn indicates that Z. asper become more opportunistic relative to summer and spring, rather than being associated with a seasonal specialization of individuals. Overall, our results suggest an adaptive variation of individual trophic traits in Z. asper: the species mainly consumes a few ephemeropteran taxa (Baetis fuscatus and Ecdyonurus) but seems to be capable of adapting its foraging strategy to maintain its body condition. This study illustrates how metabarcoding data obtained from faeces can be validated and combined with individual-based modelling and simulation approaches to explore inter- and intrapopulational individual trophic traits variation and to test hypotheses in the conventional analytic framework of trophic ecology.


Subject(s)
DNA Barcoding, Taxonomic , Fishes , Animals , Seasons , Ecosystem , Phenotype
8.
Appl Plant Sci ; 8(2): e11321, 2020 Feb.
Article in English | MEDLINE | ID: mdl-32110501

ABSTRACT

PREMISE: Ferula sadleriana (Apiaceae) is a polycarpic, perennial herb with a very limited range and small populations. It is listed as "endangered" on the IUCN Red List of Threatened Species. Microsatellite markers can contribute to conservation efforts by allowing the study of the genetic structure of its shrinking populations. METHODS AND RESULTS: We used a microsatellite-enriched library combined with an Illumina sequencing approach to develop simple sequence repeat markers in our target species. Out of 44 tested primer pairs, 22 provided specific products, and 13 showed heterologous amplification in the target species. Cross-species amplification was achieved at 20 and 19 loci in two congeneric species, F. soongarica and F. tatarica, respectively. CONCLUSIONS: The primers described here are the first tools that enable the population genetic characterization of F. sadleriana. Our results suggest a wider applicability in the genus Ferula.

9.
Appl Plant Sci ; 7(5): e01245, 2019 May.
Article in English | MEDLINE | ID: mdl-31139511

ABSTRACT

PREMISE: Gladiolus palustris (Iridaceae) is an endangered European perennial tetraploid herb with special conservation interest in the European Union. Microsatellite markers can serve as effective tools for the conservation genetics of this species. METHODS AND RESULTS: We utilized a 454 pyrosequencing approach to identify simple sequence repeat (SSR) regions in a microsatellite-enriched library. Of all SSR regions, 46 were screened for specific PCR amplification, and 15 were found to be applicable in the target species. We found 1.62-3.08 alleles per population (effective alleles: 1.58-2.08) that indicated moderate to high genetic diversity values (0.28-0.44) in three pilot populations. Cross-species amplification was less effective in G. imbricatus and G. tenuis. CONCLUSIONS: The primers reported here can be used for the population genetic characterization of G. palustris. They will help us to better understand the conservation genetics of this highly endangered species.

10.
Ecol Evol ; 9(8): 4603-4620, 2019 Apr.
Article in English | MEDLINE | ID: mdl-31031930

ABSTRACT

In diet metabarcoding analyses, insufficient taxonomic coverage of PCR primer sets generates false negatives that may dramatically distort biodiversity estimates. In this paper, we investigated the taxonomic coverage and complementarity of three cytochrome c oxidase subunit I gene (COI) primer sets based on in silico analyses and we conducted an in vivo evaluation using fecal and spider web samples from different invertivores, environments, and geographic locations. Our results underline the lack of predictability of both the coverage and complementarity of individual primer sets: (a) sharp discrepancies exist observed between in silico and in vivo analyses (to the detriment of in silico analyses); (b) both coverage and complementarity depend greatly on the predator and on the taxonomic level at which preys are considered; (c) primer sets' complementarity is the greatest at fine taxonomic levels (molecular operational taxonomic units [MOTUs] and variants). We then formalized the "one-locus-several-primer-sets" (OLSP) strategy, that is, the use of several primer sets that target the same locus (here the first part of the COI gene) and the same group of taxa (here invertebrates). The proximal aim of the OLSP strategy is to minimize false negatives by increasing total coverage through multiple primer sets. We illustrate that the OLSP strategy is especially relevant from this perspective since distinct variants within the same MOTUs were not equally detected across all primer sets. Furthermore, the OLSP strategy produces largely overlapping and comparable sequences, which cannot be achieved when targeting different loci. This facilitates the use of haplotypic diversity information contained within metabarcoding datasets, for example, for phylogeography and finer analyses of prey-predator interactions.

11.
Mol Ecol Resour ; 17(6): e146-e159, 2017 Nov.
Article in English | MEDLINE | ID: mdl-28776936

ABSTRACT

The main objective of this work was to develop and validate a robust and reliable "from-benchtop-to-desktop" metabarcoding workflow to investigate the diet of invertebrate-eaters. We applied our workflow to faecal DNA samples of an invertebrate-eating fish species. A fragment of the cytochrome c oxidase I (COI) gene was amplified by combining two minibarcoding primer sets to maximize the taxonomic coverage. Amplicons were sequenced by an Illumina MiSeq platform. We developed a filtering approach based on a series of nonarbitrary thresholds established from control samples and from molecular replicates to address the elimination of cross-contamination, PCR/sequencing errors and mistagging artefacts. This resulted in a conservative and informative metabarcoding data set. We developed a taxonomic assignment procedure that combines different approaches and that allowed the identification of ~75% of invertebrate COI variants to the species level. Moreover, based on the diversity of the variants, we introduced a semiquantitative statistic in our diet study, the minimum number of individuals, which is based on the number of distinct variants in each sample. The metabarcoding approach described in this article may guide future diet studies that aim to produce robust data sets associated with a fine and accurate identification of prey items.


Subject(s)
Animal Feed/analysis , Computational Biology/methods , DNA Barcoding, Taxonomic/methods , Fishes/physiology , High-Throughput Nucleotide Sequencing/methods , Invertebrates/classification , Metagenomics/methods , Animals , Feeding Behavior , Invertebrates/genetics , Software , Workflow
12.
Mol Ecol Resour ; 14(6): 1302-13, 2014 Nov.
Article in English | MEDLINE | ID: mdl-24785154

ABSTRACT

Microsatellite marker development has been greatly simplified by the use of high-throughput sequencing followed by in silico microsatellite detection and primer design. However, the selection of markers designed by the existing pipelines depends either on arbitrary criteria, or older studies on PCR success. Based on wet laboratory experiments, we have identified the following factors that are most likely to influence genotyping success rate: alignment score between the primers and the amplicon; the distance between primers and microsatellites; the length of the PCR product; target region complexity and the number of reads underlying the sequence. The QDD pipeline has been modified to include these most pertinent factors in the output to help the selection of markers. Furthermore, new features are also included in the present version: (i) not only raw sequencing reads are accepted as input, but also contigs, allowing the analysis of assembled high-coverage data; (ii) input data can be both in fasta and fastq format to facilitate the use of Illumina and IonTorrent reads; (iii) A comparison to known transposable elements allows their detection; (iv) A contamination check can be carried out by BLASTing potential markers against the nucleotide (nt) database of NCBI; (v) QDD3 is now also available imbedded into a virtual machine making installation easier and operating system independent. It can be used both on command-line version as well as integrated into a Galaxy server, providing a user-friendly interface, as well as the possibility to utilize a large variety of NGS tools.


Subject(s)
Genotyping Techniques/methods , Microsatellite Repeats , Software , Animals , Cyprinidae/classification , Cyprinidae/genetics , DNA Primers/genetics , Molecular Sequence Data , Sequence Analysis, DNA
13.
Mol Ecol Resour ; 14(3): 554-68, 2014 May.
Article in English | MEDLINE | ID: mdl-24165148

ABSTRACT

The development and screening of microsatellite markers have been accelerated by next-generation sequencing (NGS) technology and in particular GS-FLX pyro-sequencing (454). More recent platforms such as the PGM semiconductor sequencer (Ion Torrent) offer potential benefits such as dramatic reductions in cost, but to date have not been well utilized. Here, we critically compare the advantages and disadvantages of microsatellite development using PGM semiconductor sequencing and GS-FLX pyro-sequencing for two gymnosperm (a conifer and a cycad) and one angiosperm species. We show that these NGS platforms differ in the quantity of returned sequence data, unique microsatellite data and primer design opportunities, mostly consistent with the differences in read length. The strength of the PGM lies in the large amount of data generated at a comparatively lower cost and time. The strength of GS-FLX lies in the return of longer average length sequences and therefore greater flexibility in producing markers with variable product length, due to longer flanking regions, which is ideal for capillary multiplexing. These differences need to be considered when choosing a NGS method for microsatellite discovery. However, the ongoing improvement in read lengths of the NGS platforms will reduce the disadvantage of the current short read lengths, particularly for the PGM platform, allowing greater flexibility in primer design coupled with the power of a larger number of sequences.


Subject(s)
Cycadopsida/genetics , High-Throughput Nucleotide Sequencing/instrumentation , Magnoliopsida/genetics , Microsatellite Repeats , DNA Primers/genetics , Genetic Variation , Polymorphism, Genetic
14.
PLoS One ; 7(7): e40861, 2012.
Article in English | MEDLINE | ID: mdl-22815847

ABSTRACT

Microsatellites are ubiquitous in Eukaryotic genomes. A more complete understanding of their origin and spread can be gained from a comparison of their distribution within a phylogenetic context. Although information for model species is accumulating rapidly, it is insufficient due to a lack of species depth, thus intragroup variation is necessarily ignored. As such, apparent differences between groups may be overinflated and generalizations cannot be inferred until an analysis of the variation that exists within groups has been conducted. In this study, we examined microsatellite coverage and motif patterns from 454 shotgun sequences of 154 Eukaryote species from eight distantly related phyla (Cnidaria, Arthropoda, Onychophora, Bryozoa, Mollusca, Echinodermata, Chordata and Streptophyta) to test if a consistent phylogenetic pattern emerges from the microsatellite composition of these species. It is clear from our results that data from model species provide incomplete information regarding the existing microsatellite variability within the Eukaryotes. A very strong heterogeneity of microsatellite composition was found within most phyla, classes and even orders. Autocorrelation analyses indicated that while microsatellite contents of species within clades more recent than 200 Mya tend to be similar, the autocorrelation breaks down and becomes negative or non-significant with increasing divergence time. Therefore, the age of the taxon seems to be a primary factor in degrading the phylogenetic pattern present among related groups. The most recent classes or orders of Chordates still retain the pattern of their common ancestor. However, within older groups, such as classes of Arthropods, the phylogenetic pattern has been scrambled by the long independent evolution of the lineages.


Subject(s)
Eukaryota/genetics , Microsatellite Repeats/genetics , Models, Biological , Phylogeny , Sequence Analysis, DNA/methods , Animals , Base Sequence , Nucleotide Motifs/genetics , Plants/genetics , Repetitive Sequences, Nucleic Acid/genetics , Species Specificity
15.
BMC Res Notes ; 5: 259, 2012 May 28.
Article in English | MEDLINE | ID: mdl-22640415

ABSTRACT

BACKGROUND: Next generation sequencing (NGS) provides a valuable method to quickly obtain sequence information from non-model organisms at a genomic scale. In principle, if sequencing is not targeted for a genomic region or sequence type (e.g. coding region, microsatellites) NGS reads can be used as a genome snapshot and provide information on the different types of sequences in the genome. However, no study has ascertained if a typical 454 dataset of low coverage (1/4-1/8 of a PicoTiter plate leading to generally less than 0.1x of coverage) represents all parts of genomes equally. FINDINGS: Partial genome shotgun sequencing of total DNA (without enrichment) on a 454 NGS platform was used to obtain reads of Apis mellifera (454 reads hereafter). These 454 reads were compared to the assembled chromosomes of this species in three different aspects: (i) dimer and trimer compositions, (ii) the distribution of mapped 454 sequences along the chromosomes and (iii) the numbers of different classes of microsatellites. Highly significant chi-square tests for all three types of analyses indicated that the 454 data is not a perfect random sample of the genome. Only the number of 454 reads mapped to each of the 16 chromosomes and the number of microsatellites pooled by motif (repeat unit) length was not significantly different from the expected values. However, a very strong correlation (correlation coefficients greater than 0.97) was observed between most of the 454 variables (the number of different dimers and trimers, the number of 454 reads mapped to each chromosome fragments of one Mb, the number of 454 reads mapped to each chromosome, the number of microsatellites of each class) and their corresponding genomic variables. CONCLUSIONS: The results of chi square tests suggest that 454 shotgun reads cannot be regarded as a perfect representation of the genome especially if the comparison is done on a finer scale (e.g. chromosome fragments instead of whole chromosomes). However, the high correlation between 454 and genome variables tested indicate that a high proportion of the variability of 454 variables is explained by their genomic counterparts. Therefore, we conclude that using 454 data to obtain information on the genome is biologically meaningful.


Subject(s)
Bees/genetics , Chromosome Mapping , Chromosomes, Insect , Genome, Insect , Sequence Analysis, DNA/methods , Animals , Base Sequence , Chi-Square Distribution , Microsatellite Repeats , Nucleotide Motifs , Reproducibility of Results
16.
Mol Ecol Resour ; 11(4): 638-44, 2011 Jul.
Article in English | MEDLINE | ID: mdl-21676194

ABSTRACT

Microsatellites (or SSRs: simple sequence repeats) are among the most frequently used DNA markers in many areas of research. The use of microsatellite markers is limited by the difficulties involved in their de novo isolation from species for which no genomic resources are available. We describe here a high-throughput method for isolating microsatellite markers based on coupling multiplex microsatellite enrichment and next-generation sequencing on 454 GS-FLX Titanium platforms. The procedure was calibrated on a model species (Apis mellifera) and validated on 13 other species from various taxonomic groups (animals, plants and fungi), including taxa for which severe difficulties were previously encountered using traditional methods. We obtained from 11,497 to 34,483 sequences depending on the species and the number of detected microsatellite loci ranged from 199 to 5791. We thus demonstrated that this procedure can be readily and successfully applied to a large variety of taxonomic groups, at much lower cost than would have been possible with traditional protocols. This method is expected to speed up the acquisition of high-quality genetic markers for nonmodel organisms.


Subject(s)
Bees/genetics , DNA/chemistry , DNA/genetics , Gene Library , Microsatellite Repeats , Molecular Typing/methods , Animals , High-Throughput Nucleotide Sequencing/methods
17.
BMC Genomics ; 12: 245, 2011 May 19.
Article in English | MEDLINE | ID: mdl-21592414

ABSTRACT

BACKGROUND: The rapid evolution of 454 GS-FLX sequencing technology has not been accompanied by a reassessment of the quality and accuracy of the sequences obtained. Current strategies for decision-making and error-correction are based on an initial analysis by Huse et al. in 2007, for the older GS20 system based on experimental sequences. We analyze here the quality of 454 sequencing data and identify factors playing a role in sequencing error, through the use of an extensive dataset for Roche control DNA fragments. RESULTS: We obtained a mean error rate for 454 sequences of 1.07%. More importantly, the error rate is not randomly distributed; it occasionally rose to more than 50% in certain positions, and its distribution was linked to several experimental variables. The main factors related to error are the presence of homopolymers, position in the sequence, size of the sequence and spatial localization in PT plates for insertion and deletion errors. These factors can be described by considering seven variables. No single variable can account for the error rate distribution, but most of the variation is explained by the combination of all seven variables. CONCLUSIONS: The pattern identified here calls for the use of internal controls and error-correcting base callers, to correct for errors, when available (e.g. when sequencing amplicons). For shotgun libraries, the use of both sequencing primers and deep coverage, combined with the use of random sequencing primer sites should partly compensate for even high error rates, although it may prove more difficult than previous thought to distinguish between low-frequency alleles and errors.


Subject(s)
Sequence Analysis, DNA/methods , Titanium , Humans , Nucleotides/genetics , Quality Control , Research Design , Sequence Analysis, DNA/instrumentation , Sequence Analysis, DNA/standards
18.
Bioinformatics ; 27(2): 277-8, 2011 Jan 15.
Article in English | MEDLINE | ID: mdl-21084284

ABSTRACT

SUMMARY: Characterizing genetic diversity through genotyping short amplicons is central to evolutionary biology. Next-generation sequencing (NGS) technologies changed the scale at which these type of data are acquired. SESAME is a web application package that assists genotyping of multiplexed individuals for several markers based on NGS amplicon sequencing. It automatically assigns reads to loci and individuals, corrects reads if standard samples are available and provides an intuitive graphical user interface (GUI) for allele validation based on the sequences and associated decision-making tools. The aim of SESAME is to help allele identification among a large number of sequences. AVAILABILITY: SESAME and its documentation are freely available under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported Licence for Windows and Linux from http://www1.montpellier.inra.fr/CBGP/NGS/ or http://tinyurl.com/ngs-sesame.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Software , Alleles , Genotype , Internet
19.
BMC Genomics ; 11: 560, 2010 Oct 12.
Article in English | MEDLINE | ID: mdl-20939885

ABSTRACT

BACKGROUND: Microsatellites are markers of choice in population genetics and genomics, as they provide useful insight into patterns and processes as diverse as genome evolutionary dynamics and demographic processes. The acquisition of microsatellites through multiplex-enriched libraries and 454 GS-FLX Titanium pyrosequencing is a promising new tool for the isolation of new markers in unknown genomes. This approach can also be used to evaluate the extent to which microsatellite-enriched libraries are representative of the genome from which they were isolated. In this study, we deciphered potential discrepancies in microsatellite content recovery for two reference genomes (Apis mellifera and Danio rerio), selected on the basis of their extreme heterogeneity in terms of the proportions and distributions of microsatellites on chromosomes. RESULTS: The A. mellifera genome, in particular, was found to be highly heterogeneous, due to extremely high rates of recombination, with hotspots, but the only bias consistently introduced into pyrosequenced multiplex-enriched libraries concerned sequence length, with the overrepresentation of sequences 160 to 320 bp in length. Other deviations from expected proportions or distributions of motifs on chromosomes were observed, but the significance and intensity of these deviations was mostly limited. Furthermore, no consistent adverse competition between multiplexed probes was observed during the motif enrichment phase. CONCLUSIONS: This approach therefore appears to be a promising strategy for improving the development of microsatellites, as it introduces no major bias in terms of the proportions and distribution of microsatellites.


Subject(s)
Bees/genetics , Genome/genetics , Microsatellite Repeats/genetics , Sequence Analysis, DNA/methods , Temperature , Titanium/chemistry , Zebrafish/genetics , Animals , Base Sequence , Bias , Chromosomes/genetics , DNA Probes/metabolism , Gene Library , Genetic Loci/genetics , Models, Genetic
20.
BMC Res Notes ; 3: 135, 2010 May 17.
Article in English | MEDLINE | ID: mdl-20478030

ABSTRACT

BACKGROUND: Cyprinids display the most abundant and widespread species among the European freshwater Teleostei and are known to hybridize quite commonly. Nevertheless, a limited number of markers for conducting comparative differentiation, evolutionary and hybridization dynamics studies are available to date. FINDINGS: Five multiplex PCR sets were optimized in order to assay 41 cyprinid-specific polymorphic microsatellite loci (including 10 novel loci isolated from Chondrostoma nasus nasus, Chondrostoma toxostoma toxostoma and Leuciscus leuciscus) for 503 individuals (440 purebred specimens and 63 hybrids) from 15 European cyprinid species. The level of genetic diversity was assessed in Alburnus alburnus, Alburnoides bipunctatus, C. genei, C. n. nasus, C. soetta, C. t. toxostoma, L. idus, L. leuciscus, Pachychilon pictum, Rutilus rutilus, Squalius cephalus and Telestes souffia. The applicability of the markers was also tested on Abramis brama, Blicca bjoerkna and Scardinius erythrophtalmus specimens. Overall, between 24 and 37 of these markers revealed polymorphic for the investigated species and 23 markers amplified for all the 15 European cyprinid species. CONCLUSIONS: The developed set of markers demonstrated its performance in discriminating European cyprinid species. Furthermore, it allowed detecting and characterizing hybrid individuals. These microsatellites will therefore be useful to perform comparative evolutionary and population genetics studies dealing with European cyprinids, what is of particular interest in conservation issues and constitutes a tool of choice to conduct hybridization studies.

SELECTION OF CITATIONS
SEARCH DETAIL
...