Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Appl Plant Sci ; 11(6): e11535, 2023.
Article in English | MEDLINE | ID: mdl-38106539

ABSTRACT

Premise: Universal target enrichment probe kits are used to circumvent the individual identification of loci suitable for phylogenetic studies in a given taxon. Under certain circumstances, however, target capture can be inefficient and costly, and lower numbers of marker loci may be sufficient. We therefore propose a computational pipeline that enables the easy identification of a subset of promising candidate loci for a taxon of interest. Methods and Results: Target sequences used for probe design are filtered based on an assembled reference genome, resulting in presumably intron-containing single-copy loci as present in the reference taxon. The applicability of the proposed approach is demonstrated based on two probe kits (universal and family-specific) in combination with several publicly available reference genomes. Conclusions: Guided by commercial probe kits, LoCoLotive enables fast and cost-efficient marker mining. Its accuracy mainly depends on the quality of the reference genome and its relatedness to the taxa under study.

2.
Ecol Evol ; 13(7): e10190, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37475726

ABSTRACT

In modern plant systematics, target enrichment enables simultaneous analysis of hundreds of genes. However, when dealing with reticulate or polyploidization histories, few markers may suffice, but often are required to be single-copy, a condition that is not necessarily met with commercial capture kits. Also, large genome sizes can render target capture ineffective, so that amplicon sequencing would be preferable; however, knowledge about suitable loci is often missing. Here, we present a comprehensive workflow for the identification of putative single-copy nuclear markers in a genus of interest, by mining a small dataset from target capture using a few representative taxa. The proposed pipeline assesses sequence variability contained in the data from targeted loci and assigns reads to their respective genes, via a combined BLAST/clustering procedure. Cluster consensus sequences are then examined based on four pre-defined criteria presumably indicative for absence of paralogy. This is done by calculating four specialized indices; loci are ranked according to their performance in these indices, and top-scoring loci are considered putatively single- or low copy. The approach can be applied to any probe set. As it relies on long reads, the present contribution also provides template workflows for processing Nanopore-based target capture data. Obtained markers are further tested and then entered into amplicon sequencing. For the detection of possibly remaining paralogy in these data, which might occur in groups with rampant paralogy, we also employ the long-read assembly tool canu. In diploid representatives of the young Compositae genus Leucanthemum, characterized by high levels of polyploidy, our approach resulted in successful amplification of 13 loci. Modifications to remove traces of paralogy were made in seven of these. A species tree from the markers correctly reproduced main relationships in the genus, however, at low resolution. The presented workflow has the potential to valuably support phylogenetic research, for example in polyploid plant groups.

3.
Mol Ecol Resour ; 23(3): 705-711, 2023 Apr.
Article in English | MEDLINE | ID: mdl-36349867

ABSTRACT

When a data set is repeatedly clustered using unsupervised techniques, the resulting clusterings, even if highly similar, may list their clusters in different orders. This so-called 'label-switching' phenomenon obscures meaningful differences between clusterings, complicating their comparison and summary. The problem often arises in the context of population structure analysis based on multilocus genotype data. In this field, a variety of popular tools apply model-based clustering, assigning individuals to a prespecified number of ancestral populations. Since such methods often involve stochastic components, it is a common practice to perform multiple replicate analyses based on the same input data and parameter settings. Available postprocessing tools allow to mitigate label switching, but leave room for improvements, in particular, regarding large input data sets. In this work, I present Crimp, a lightweight command-line tool, which offers a relatively fast and scalable heuristic to align clusters across replicate clusterings consisting of the same number of clusters. For small problem sizes, an exact algorithm can be used as an alternative. Additional features include row-specific weights, input and output files similar to those of CLUMPP (Jakobsson & Rosenberg, 2007) and the evaluation of a given solution in terms of CLUMPP as well as its own objective functions. Benchmark analyses show that Crimp, especially when applied to larger data sets, tends to outperform alternative tools considering runtime requirements and various quality measures. While primarily targeting population structure analysis, Crimp can be used as a generic tool to correct multiple clusterings for label switching. This facilitates their comparison and allows to generate an averaged clustering. Crimp's computational efficiency makes it even applicable to relatively large data sets while offering competitive solution quality.


Subject(s)
Algorithms , Humans , Genotype , Cluster Analysis
4.
BMC Bioinformatics ; 21(1): 441, 2020 Oct 07.
Article in English | MEDLINE | ID: mdl-33028201

ABSTRACT

BACKGROUND: Inferring phylogenetic relationships of polyploid species and their diploid ancestors (leading to reticulate phylogenies in the case of an allopolyploid origin) based on multi-locus sequence data is complicated by the unknown assignment of alleles found in polyploids to diploid subgenomes. A parsimony-based approach to this problem has been proposed by Oberprieler et al. (Methods Ecol Evol 8:835-849, 2017), however, its implementation is of limited practical value. In addition to previously identified shortcomings, it has been found that in some cases, the obtained results barely satisfy the applied criterion. To be of better use to other researchers, a reimplementation with methodological refinement appears to be indispensable. RESULTS: We present the AllCoPol package, which provides a heuristic method for assigning alleles from polyploids to diploid subgenomes based on the Minimizing Deep Coalescences (MDC) criterion in multi-locus sequence datasets. An additional consensus approach further allows to assess the confidence of phylogenetic reconstructions. Simulations of tetra- and hexaploids show that under simplifying assumptions such as completely disomic inheritance, the topological errors of reconstructed phylogenies are similar to those of MDC species trees based on the true allele partition. CONCLUSIONS: AllCoPol is a Python package for phylogenetic reconstructions of polyploids offering enhanced functionality as well as improved usability. The included methods are supplied as command line tools without the need for prior programming knowledge.


Subject(s)
User-Computer Interface , Alleles , Databases, Genetic , Leucanthemum/classification , Leucanthemum/genetics , Multilocus Sequence Typing , Phylogeny , Polyploidy
5.
Mol Phylogenet Evol ; 144: 106702, 2020 03.
Article in English | MEDLINE | ID: mdl-31812569

ABSTRACT

Delineating species boundaries in a group of recently diverged lineages is challenging due to minor morphological differences, low genetic differentiation and the occurrence of gene flow among taxa. Here, we employ traditional Sanger sequencing and restriction-site associated DNA (RAD) sequencing, to investigate species delimitation in the close-knit Moroccan daisy group around Rhodanthemum arundanum B.H.Wilcox & al. that diverged recently during the Quaternary. After evaluation of genotyping errors and parameter optimisation in the course of de-novo assembly of RADseq reads in Ipyrad, we assess hybridisation patterns in the study group based on different data assemblies and methods (Neighbor-Net networks, FastStructure and ABBA-BABA tests). RADseq data and Sanger sequences are subsequently used for delimitation of species, using both, multi-species coalescent methods (Stacey and Snapp) and a novel approach based on consensus k-means clustering. In addition to the unveiling of two novel subspecies in the R. arundanum-group, our study provides insights into the performance of different species delimitation methods in the presence of hybridisation and varying quantities of data.


Subject(s)
Asteraceae/classification , Asteraceae/genetics , Genetic Speciation , Hybridization, Genetic/physiology , Cluster Analysis , Gene Flow , Genotyping Techniques , Nucleic Acid Hybridization , Phylogeny , Sequence Analysis, DNA/methods , Species Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...