Pesquisa | Portal Regional da BVS

snpAIMeR: R package for evaluating ancestry informative marker contributions in non-model population diagnostics.

Vertacnik, Kim L; Vernygora, Oksana V; Dupuis, Julian R.

Bioinformatics ; 40(6)2024 Jun 03.

Artigo em Inglês | MEDLINE | ID: mdl-38885407

RESUMO

MOTIVATION: Single nucleotide polymorphism (SNP) markers are increasingly popular for population genomics and inferring ancestry for individuals of unknown origin. Because large SNP datasets are impractical for rapid and routine analysis, diagnostics rely on panels of highly informative markers. Strategies exist for selecting these markers, however, resources for efficiently evaluating their performance are limited for non-model systems. RESULTS: snpAIMeR is a user-friendly R package that evaluates the efficacy of genomic markers for the cluster assignment of unknown individuals. It is intended to help minimize panel size and genotyping effort by determining the informativeness of candidate diagnostic markers. Provided genotype data from individuals of known origin, it uses leave-one-out cross-validation to determine population assignment rates for individual markers and marker combinations. AVAILABILITY AND IMPLEMENTATION: snpAIMeR is available on CRAN (https://CRAN.R-project.org/package=snpAIMeR).

Assuntos

Polimorfismo de Nucleotídeo Único , Software , Humanos , Marcadores Genéticos , Genética Populacional/métodos , Genômica/métodos , Genótipo

Toward transparent taxonomy: an interactive web-tool for evaluating competing taxonomic arrangements.

Vernygora, Oksana V; Sperling, Felix A H; Dupuis, Julian R.

Cladistics ; 40(2): 181-191, 2024 04.

Artigo em Inglês | MEDLINE | ID: mdl-37824277

RESUMO

Informative and consistent taxonomy above the species level is essential to communication about evolution, biodiversity and conservation, and yet the practice of taxonomy is considered opaque and subjective by non-taxonomist scientists and the public alike. While various proposals have tried to make the basis for the ranking and inclusiveness of taxa more transparent and objective, widespread adoption of these ideas has lagged. Here, we present TaxonomR, an interactive online decision-support tool to evaluate alternative taxonomic classifications. This tool implements an approach that quantifies the criteria commonly used in taxonomic treatments and allows the user to interactively manipulate weightings for different criteria to compare scores for taxonomic groupings under those weights. We use the butterfly taxon Argynnis to demonstrate how different weightings applied to common taxonomic criteria result in fundamentally different genus-level classifications that are predominantly used in different continents and geographic regions. These differences are objectively compared and quantified using TaxonomR to evaluate the kinds of criteria that have been emphasized in earlier classifications, and the nature of the support for current alternative taxonomic arrangements. The main role of TaxonomR is to make taxonomic decisions transparent via an explicit prioritization scheme. TaxonomR is not a prescriptive application. Rather, it aims to be a tool for facilitating our understanding of alternative taxonomic classifications that can, in turn, potentially support global harmony in biodiversity assessments through evidence-based discussion and community-wide resolution of historically entrenched taxonomic tensions.

Assuntos

Biodiversidade , Filogenia

Handling Logical Character Dependency in Phylogenetic Inference: Extensive Performance Testing of Assumptions and Solutions Using Simulated and Empirical Data.

Simões, Tiago R; Vernygora, Oksana V; de Medeiros, Bruno A S; Wright, April M.

Syst Biol ; 72(3): 662-680, 2023 06 17.

Artigo em Inglês | MEDLINE | ID: mdl-36773019

RESUMO

Logical character dependency is a major conceptual and methodological problem in phylogenetic inference of morphological data sets, as it violates the assumption of character independence that is common to all phylogenetic methods. It is more frequently observed in higher-level phylogenies or in data sets characterizing major evolutionary transitions, as these represent parts of the tree of life where (primary) anatomical characters either originate or disappear entirely. As a result, secondary traits related to these primary characters become "inapplicable" across all sampled taxa in which that character is absent. Various solutions have been explored over the last three decades to handle character dependency, such as alternative character coding schemes and, more recently, new algorithmic implementations. However, the accuracy of the proposed solutions, or the impact of character dependency across distinct optimality criteria, has never been directly tested using standard performance measures. Here, we utilize simple and complex simulated morphological data sets analyzed under different maximum parsimony optimization procedures and Bayesian inference to test the accuracy of various coding and algorithmic solutions to character dependency. This is complemented by empirical analyses using a recoded data set on palaeognathid birds. We find that in small, simulated data sets, absent coding performs better than other popular coding strategies available (contingent and multistate), whereas in more complex simulations (larger data sets controlled for different tree structure and character distribution models) contingent coding is favored more frequently. Under contingent coding, a recently proposed weighting algorithm produces the most accurate results for maximum parsimony. However, Bayesian inference outperforms all parsimony-based solutions to handle character dependency due to fundamental differences in their optimization procedures-a simple alternative that has been long overlooked. Yet, we show that the more primary characters bearing secondary (dependent) traits there are in a data set, the harder it is to estimate the true phylogenetic tree, regardless of the optimality criterion, owing to a considerable expansion of the tree parameter space. [Bayesian inference, character dependency, character coding, distance metrics, morphological phylogenetics, maximum parsimony, performance, phylogenetic accuracy.].

Assuntos

Algoritmos , Filogenia , Teorema de Bayes , Fenótipo

HiMAP2: Identifying phylogenetically informative genetic markers from diverse genomic resources.

Vernygora, Oksana V; Congrains, Carlos; Geib, Scott M; Dupuis, Julian R.

Mol Ecol Resour ; 23(5): 1155-1167, 2023 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-36728891

RESUMO

Multiplexed amplicon sequencing offers a cost-effective and rapid solution for phylogenomic studies that include a large number of individuals. Selecting informative genetic markers is a critical initial step in designing such multiplexed amplicon panels, but screening various genomic data and selecting markers that are informative for the question at hand can be laborious. Here, we present a flexible and user-friendly tool, HiMAP2, for identifying, visualizing and filtering phylogenetically informative loci from diverse genomic and transcriptomic resources. This bioinformatics pipeline includes orthology prediction, exon extraction and filtering of aligned exon sequences according to user-defined specifications. Additionally, HiMAP2 facilitates exploration of the final filtered exons by incorporating phylogenetic inference of individual exon trees with raxml-ng as well as the estimation of a species tree using astral. Finally, results of the marker selection can be visualized and refined with an interactive Bokeh application that can be used to generate publication-quality figures. Source code and user instructions for HiMAP2 are available at https://github.com/popphylotools/HiMAP_v2.

Assuntos

Genoma , Genômica , Humanos , Filogenia , Marcadores Genéticos , Software

Gauging ages of tiger swallowtail butterflies using alternate SNP analyses.

Vernygora, Oksana V; Campbell, Erin O; Grishin, Nick V; Sperling, Felix A H; Dupuis, Julian R.

Mol Phylogenet Evol ; 171: 107465, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35351633

RESUMO

Divergence times underpin diverse evolutionary hypotheses, but conflicting age estimates across studies diminish the validity of such hypotheses. These conflicts have continued to grow as large genomics datasets become commonplace and analytical approaches proliferate. To provide more stable temporal intervals, age estimations should be interpreted in the context of both the type of data and analysis being used. Here, we use multispecies coalescent (MSC), concatenation-based, and categorical data transformation approaches on genome-wide SNP data to infer divergence ages within the Papilio glaucus group of tiger swallowtail butterflies in North America. While the SNP data supported previously recognized relationships within the group (P. multicaudata, ((P. eurymedon, P. rutulus), (P. appalachiensis, P. canadensis, P. glaucus))), estimated ages of divergence between the major lineages varied substantially among analyses. MSC produced wide credibility intervals particularly for deeper nodes, reflecting uncertainty in the coalescence times as a possible result of conflicting signal across gene trees. Concatenation, in contrast, gave narrower and more well-defined posterior distributions for the node ages; however, the higher precision of these time estimates is a likely artefact due to more simplistic underlying assumptions of this approach that do not account for conflict among gene trees. Transformed categorical data analysis gave the least precise and the most variable results, with its simple substitution model coupled with a relaxed clock tending to produce spurious results from large genome-wide datasets. While median node ages differed considerably between analyses (â¼2 Mya between MSC and concatenation-based results), their corresponding credibility intervals nonetheless highlight common temporal patterns for deeper divergences in the group as well as finer-scale phylogeography. Age distributions across analyses support an origin of the group during the warm period of the early to mid-Pliocene. Late Pliocene climate aridification and cooling drove divergence between eastern and western groups that further diversified during the period of repeated Pleistocene glaciations. Our results provide a structured comparative assessment of divergence time estimates and evolutionary relationships in a well-studied group of butterflies, and support better understanding of analytical biases in divergence time estimation.

Assuntos

Borboletas , Animais , Evolução Biológica , Borboletas/genética , Genoma , Filogenia , Filogeografia

Evaluating the Performance of Probabilistic Algorithms for Phylogenetic Analysis of Big Morphological Datasets: A Simulation Study.

Vernygora, Oksana V; Simões, Tiago R; Campbell, Erin O.

Syst Biol ; 69(6): 1088-1105, 2020 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-32191335

RESUMO

Reconstructing the tree of life is an essential task in evolutionary biology. It demands accurate phylogenetic inference for both extant and extinct organisms, the latter being almost entirely dependent on morphological data. While parsimony methods have traditionally dominated the field of morphological phylogenetics, a rapidly growing number of studies are now employing probabilistic methods (maximum likelihood and Bayesian inference). The present-day toolkit of probabilistic methods offers varied software with distinct algorithms and assumptions for reaching global optimality. However, benchmark performance assessments of different software packages for the analyses of morphological data, particularly in the era of big data, are still lacking. Here, we test the performance of four major probabilistic software under variable taxonomic sampling and missing data conditions: the Bayesian inference-based programs MrBayes and RevBayes, and the maximum likelihood-based IQ-TREE and RAxML. We evaluated software performance by calculating the distance between inferred and true trees using a variety of metrics, including Robinson-Foulds (RF), Matching Splits (MS), and Kuhner-Felsenstein (KF) distances. Our results show that increased taxonomic sampling improves accuracy, precision, and resolution of reconstructed topologies across all tested probabilistic software applications and all levels of missing data. Under the RF metric, Bayesian inference applications were the most consistent, accurate, and robust to variation in taxonomic sampling in all tested conditions, especially at high levels of missing data, with little difference in performance between the two tested programs. The MS metric favored more resolved topologies that were generally produced by IQ-TREE. Adding more taxa dramatically reduced performance disparities between programs. Importantly, our results suggest that the RF metric penalizes incorrectly resolved nodes (false positives) more severely than the MS metric, which instead tends to penalize polytomies. If false positives are to be avoided in systematics, Bayesian inference should be preferred over maximum likelihood for the analysis of morphological data.

Assuntos

Algoritmos , Classificação/métodos , Simulação por Computador , Filogenia , Modelos Biológicos

Delimitation of Alosa species (Teleostei: Clupeiformes) from the Sea of Azov: integrating morphological and molecular approaches.

Vernygora, Oksana V; Davis, Corey S; Murray, Alison M; Sperling, Felix A H.

J Fish Biol ; 93(6): 1216-1228, 2018 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-30367487

RESUMO

Shads of the genus Alosa are essential to commercial fisheries across North America and Europe, but in some areas their species boundaries remain controversial. Traditional morphology-based taxonomy of Alosa spp. has relied heavily on the number of gill rakers and body proportions, but these can be highly variable. We use mitochondrial (mt)DNA (coI and cytb) and genome-wide single nucleotide polymorphisms (SNP) along with morphological characters to assess differentiation among endemic Ponto-Caspian shads in the Sea of Azov. Morphological species assignments based on gill-raker number were not congruent with genetic lineages shown by mtDNA and SNPs. Iterative analysis revealed that genetic lineages were associated with sampling location and several other morphometric traits (caudal peduncle depth, pre-anal length and head length). Phylogenetic analysis of the genus placed Ponto-Caspian Alosa spp. in the same evolutionary lineage as endangered Alosa spp. endemic to Greece, highlighting the importance of these findings to conservation management. We conclude that gill-raker number is not reliable for delimiting species of Alosa. This taxonomic uncertainty should be addressed by examining type material to provide a robust integrative classification for these commercially important fishes.

Assuntos

Peixes/genética , Animais , Evolução Biológica , Tamanho Corporal , Citocromos b/química , Citocromos b/genética , DNA Mitocondrial/química , Complexo IV da Cadeia de Transporte de Elétrons/química , Complexo IV da Cadeia de Transporte de Elétrons/genética , Peixes/anatomia & histologia , Peixes/classificação , Brânquias , Filogeografia , Polimorfismo de Nucleotídeo Único , Especificidade da Espécie

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA