Search | VHL Regional Portal

1.

A new phylogenetic data standard for computable clade definitions: the Phyloreference Exchange Format (Phyx).

Vaidya, Gaurav; Cellinese, Nico; Lapp, Hilmar.

PeerJ ; 10: e12618, 2022.

Article in English | MEDLINE | ID: mdl-35186448

ABSTRACT

To be computationally reproducible and efficient, integration of disparate data depends on shared entities whose matching meaning (semantics) can be computationally assessed. For biodiversity data one of the most prevalent shared entities for linking data records is the associated taxon concept. Unlike Linnaean taxon names, the traditional way in which taxon concepts are provided, phylogenetic definitions are native to phylogenetic trees and offer well-defined semantics that can be transformed into formal, computationally evaluable logic expressions. These attributes make them highly suitable for phylogeny-driven comparative biology by allowing computationally verifiable and reproducible integration of taxon-linked data against Tree of Life-scale phylogenies. To achieve this, the first step is transforming phylogenetic definitions from the natural language text in which they are published to a structured interoperable data format that maintains strong ties to semantics and lends itself well to sharing, reuse, and long-term archival. To this end, we developed the Phyloreference Exchange Format (Phyx), a JSON-LD-based text format encompassing rich metadata for all elements of a phylogenetic definition, and we created a supporting software library, phyx.js, to streamline computational management of such files. Together they form a foundation layer for digitizing and computing with phylogenetic definitions of clades.

Subject(s)

Semantics , Software , Phylogeny , Biology , Records

2.

COVID-KOP: integrating emerging COVID-19 data with the ROBOKOP database.

Korn, Daniel; Bobrowski, Tesia; Li, Michael; Kebede, Yaphet; Wang, Patrick; Owen, Phillips; Vaidya, Gaurav; Muratov, Eugene; Chirkova, Rada; Bizon, Chris; Tropsha, Alexander.

Bioinformatics ; 37(4): 586-587, 2021 05 01.

Article in English | MEDLINE | ID: mdl-33175089

ABSTRACT

SUMMARY: In response to the COVID-19 pandemic, we established COVID-KOP, a new knowledgebase integrating the existing Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways (ROBOKOP) biomedical knowledge graph with information from recent biomedical literature on COVID-19 annotated in the CORD-19 collection. COVID-KOP can be used effectively to generate new hypotheses concerning repurposing of known drugs and clinical drug candidates against COVID-19 by establishing respective confirmatory pathways of drug action. AVAILABILITY AND IMPLEMENTATION: COVID-KOP is freely accessible at https://covidkop.renci.org/. For code and instructions for the original ROBOKOP, see: https://github.com/NCATS-Gamma/robokop.

Subject(s)

COVID-19 , Databases, Factual , Humans , Knowledge Bases , Pandemics , SARS-CoV-2

3.

Pitch-rotational manipulation of single cells and particles using single-beam thermo-optical tweezers.

Kumar, Sumeet; Gunaseelan, M; Vaippully, Rahul; Kumar, Amrendra; Ajith, Mithun; Vaidya, Gaurav; Dutta, Soumya; Roy, Basudev.

Biomed Opt Express ; 11(7): 3555-3566, 2020 Jul 01.

Article in English | MEDLINE | ID: mdl-33014551

ABSTRACT

3D pitch rotation of microparticles and cells assumes importance in a wide variety of applications in biology, physics, chemistry and medicine. Applications such as cell imaging and injection benefit from pitch-rotational manipulation. Generation of such motion in single beam optical tweezers has remained elusive due to the complexities of generating high enough ellipticity perpendicular to the direction of propagation. Further, trapping a perfectly spherical object at two locations and subsequent pitch rotation hasn't yet been demonstrated to be possible. Here, we use hexagonal-shaped upconverting particles and single cells trapped close to a gold-coated glass cover slip in a sample chamber to generate complete 360 degree and continuous pitch motion even with a single optical tweezer beam. The tweezers beam passing through the gold surface is partially absorbed and generates a hot-spot to produce circulatory convective flows in the vicinity which rotates the objects. The rotation rate can be controlled by the intensity of the laser light. Thus such a simple configuration can turn the particle in the pitch sense. The circulatory flows in this technique have a diameter of about 5 µm which is smaller than those reported using acousto-fluidic techniques.

4.

COVID-KOP: Integrating Emerging COVID-19 Data with the ROBOKOP Database.

Korn, Daniel; Bobrowski, Tesia; Li, Michael; Kebede, Yaphet; Wang, Patrick; Owen, Phillips; Vaidya, Gaurav; Muratov, Eugene; Chirkova, Rada; Bizon, Chris; Tropsha, Alexander.

ChemRxiv ; 2020 Jun 18.

Article in English | MEDLINE | ID: mdl-32601612

ABSTRACT

In response to the COVID-19 pandemic, we established COVID-KOP, a new knowledgebase integrating the existing ROBOKOP biomedical knowledge graph with information from recent biomedical literature on COVID-19 annotated in the CORD-19 collection. COVID-KOP can be used effectively to test new hypotheses concerning repurposing of known drugs and clinical drug candidates against COVID-19. COVID-KOP is freely accessible at https://covidkop.renci.org/. For code and instructions for the original ROBOKOP, see: https://github.com/NCATS-Gamma/robokop.

5.

Developing a vocabulary and ontology for modeling insect natural history data: example data, use cases, and competency questions.

Stucky, Brian J; Balhoff, James P; Barve, Narayani; Barve, Vijay; Brenskelle, Laura; Brush, Matthew H; Dahlem, Gregory A; Gilbert, James D J; Kawahara, Akito Y; Keller, Oliver; Lucky, Andrea; Mayhew, Peter J; Plotkin, David; Seltmann, Katja C; Talamas, Elijah; Vaidya, Gaurav; Walls, Ramona; Yoder, Matt; Zhang, Guanyang; Guralnick, Rob.

Biodivers Data J ; 7: e33303, 2019.

Article in English | MEDLINE | ID: mdl-30918448

ABSTRACT

Insects are possibly the most taxonomically and ecologically diverse class of multicellular organisms on Earth. Consequently, they provide nearly unlimited opportunities to develop and test ecological and evolutionary hypotheses. Currently, however, large-scale studies of insect ecology, behavior, and trait evolution are impeded by the difficulty in obtaining and analyzing data derived from natural history observations of insects. These data are typically highly heterogeneous and widely scattered among many sources, which makes developing robust information systems to aggregate and disseminate them a significant challenge. As a step towards this goal, we report initial results of a new effort to develop a standardized vocabulary and ontology for insect natural history data. In particular, we describe a new database of representative insect natural history data derived from multiple sources (but focused on data from specimens in biological collections), an analysis of the abstract conceptual areas required for a comprehensive ontology of insect natural history data, and a database of use cases and competency questions to guide the development of data systems for insect natural history data. We also discuss data modeling and technology-related challenges that must be overcome to implement robust integration of insect natural history data.

6.

The tempo and mode of the taxonomic correction process: How taxonomists have corrected and recorrected North American bird species over the last 127 years.

Vaidya, Gaurav; Lepage, Denis; Guralnick, Robert.

PLoS One ; 13(4): e0195736, 2018.

Article in English | MEDLINE | ID: mdl-29672539

ABSTRACT

While studies of taxonomy usually focus on species description, there is also a taxonomic correction process that retests and updates existing species circumscriptions on the basis of new evidence. These corrections may themselves be subsequently retested and recorrected. We studied this correction process by using the Check-List of North and Middle American Birds, a well-known taxonomic checklist that spans 130 years. We identified 142 lumps and 95 splits across sixty-three versions of the Check-List and found that while lumping rates have markedly decreased since the 1970s, splitting rates are accelerating. We found that 74% of North American bird species recognized today have never been corrected (i.e., lumped or split) over the period of the checklist, while 16% have been corrected exactly once and 10% have been corrected twice or more. Since North American bird species are known to have been extensively lumped in the first half of the 20th century with the advent of the biological species concept, we determined whether most splits seen today were the result of those lumps being recorrected. We found that 5% of lumps and 23% of splits fully reverted previous corrections, while a further 3% of lumps and 13% of splits are partial reversions. These results show a taxonomic correction process with moderate levels of recorrection, particularly of previous lumps. However, 81% of corrections do not revert any previous corrections, suggesting that the majority result in novel circumscriptions not previously recognized by the Check-List. We could find no order or family with a significantly higher rate of correction than any other, but twenty-two genera as currently recognized by the AOU do have significantly higher rates than others. Given the currently accelerating rate of splitting, prediction of the end-point of the taxonomic recorrection process is difficult, and many entirely new taxonomic concepts are still being, and likely will continue to be, proposed and further tested.

Subject(s)

Biodiversity , Birds/classification , Animals , North America

7.

Avibase - a database system for managing and organizing taxonomic concepts.

Lepage, Denis; Vaidya, Gaurav; Guralnick, Robert.

Zookeys ; (420): 117-35, 2014.

Article in English | MEDLINE | ID: mdl-25061375

ABSTRACT

Scientific names of biological entities offer an imperfect resolution of the concepts that they are intended to represent. Often they are labels applied to entities ranging from entire populations to individual specimens representing those populations, even though such names only unambiguously identify the type specimen to which they were originally attached. Thus the real-life referents of names are constantly changing as biological circumscriptions are redefined and thereby alter the sets of individuals bearing those names. This problem is compounded by other characteristics of names that make them ambiguous identifiers of biological concepts, including emendations, homonymy and synonymy. Taxonomic concepts have been proposed as a way to address issues related to scientific names, but they have yet to receive broad recognition or implementation. Some efforts have been made towards building systems that address these issues by cataloguing and organizing taxonomic concepts, but most are still in conceptual or proof-of-concept stage. We present the on-line database Avibase as one possible approach to organizing taxonomic concepts. Avibase has been successfully used to describe and organize 844,000 species-level and 705,000 subspecies-level taxonomic concepts across every major bird taxonomic checklist of the last 125 years. The use of taxonomic concepts in place of scientific names, coupled with efficient resolution services, is a major step toward addressing some of the main deficiencies in the current practices of scientific name dissemination and use.

8.

Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient.

Stoltzfus, Arlin; Lapp, Hilmar; Matasci, Naim; Deus, Helena; Sidlauskas, Brian; Zmasek, Christian M; Vaidya, Gaurav; Pontelli, Enrico; Cranston, Karen; Vos, Rutger; Webb, Campbell O; Harmon, Luke J; Pirrung, Megan; O'Meara, Brian; Pennell, Matthew W; Mirarab, Siavash; Rosenberg, Michael S; Balhoff, James P; Bik, Holly M; Heath, Tracy A; Midford, Peter E; Brown, Joseph W; McTavish, Emily Jane; Sukumaran, Jeet; Westneat, Mark; Alfaro, Michael E; Steele, Aaron; Jordan, Greg.

BMC Bioinformatics ; 14: 158, 2013 May 13.

Article in English | MEDLINE | ID: mdl-23668630

ABSTRACT

BACKGROUND: Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great potential for a more generalized system that, starting with a query consisting of a list of any known species, would rectify non-standard names, identify expert phylogenies containing the implicated taxa, prune away unneeded parts, and supply branch lengths and annotations, resulting in a custom phylogeny suited to the user's needs. Such a system could become a sustainable community resource if implemented as a distributed system of loosely coupled parts that interact through clearly defined interfaces. RESULTS: With the aim of building such a "phylotastic" system, the NESCent Hackathons, Interoperability, Phylogenies (HIP) working group recruited 2 dozen scientist-programmers to a weeklong programming hackathon in June 2012. During the hackathon (and a three-month follow-up period), 5 teams produced designs, implementations, documentation, presentations, and tests including: (1) a generalized scheme for integrating components; (2) proof-of-concept pruners and controllers; (3) a meta-API for taxonomic name resolution services; (4) a system for storing, finding, and retrieving phylogenies using semantic web technologies for data exchange, storage, and querying; (5) an innovative new service, DateLife.org, which synthesizes pre-computed, time-calibrated phylogenies to assign ages to nodes; and (6) demonstration projects. These outcomes are accessible via a public code repository (GitHub.com), a website (http://www.phylotastic.org), and a server image. CONCLUSIONS: Approximately 9 person-months of effort (centered on a software development hackathon) resulted in the design and implementation of proof-of-concept software for 4 core phylotastic components, 3 controllers, and 3 end-user demonstration tools. While these products have substantial limitations, they suggest considerable potential for a distributed system that makes phylogenetic knowledge readily accessible in computable form. Widespread use of phylotastic systems will create an electronic marketplace for sharing phylogenetic knowledge that will spur innovation in other areas of the ToL enterprise, such as annotation of sources and methods and third-party methods of quality assessment.

Subject(s)

Phylogeny , Software , Internet

9.

From documents to datasets: A MediaWiki-based method of annotating and extracting species observations in century-old field notebooks.

Thomer, Andrea; Vaidya, Gaurav; Guralnick, Robert; Bloom, David; Russell, Laura.

Zookeys ; (209): 235-53, 2012.

Article in English | MEDLINE | ID: mdl-22859891

ABSTRACT

Part diary, part scientific record, biological field notebooks often contain details necessary to understanding the location and environmental conditions existent during collecting events. Despite their clear value for (and recent use in) global change studies, the text-mining outputs from field notebooks have been idiosyncratic to specific research projects, and impossible to discover or re-use. Best practices and workflows for digitization, transcription, extraction, and integration with other sources are nascent or non-existent. In this paper, we demonstrate a workflow to generate structured outputs while also maintaining links to the original texts. The first step in this workflow was to place already digitized and transcribed field notebooks from the University of Colorado Museum of Natural History founder, Junius Henderson, on Wikisource, an open text transcription platform. Next, we created Wikisource templates to document places, dates, and taxa to facilitate annotation and wiki-linking. We then requested help from the public, through social media tools, to take advantage of volunteer efforts and energy. After three notebooks were fully annotated, content was converted into XML and annotations were extracted and cross-walked into Darwin Core compliant record sets. Finally, these recordsets were vetted, to provide valid taxon names, via a process we call "taxonomic referencing." The result is identification and mobilization of 1,068 observations from three of Henderson's thirteen notebooks and a publishable Darwin Core record set for use in other analyses. Although challenges remain, this work demonstrates a feasible approach to unlock observations from field notebooks that enhances their discovery and interoperability without losing the narrative context from which those observations are drawn."Compose your notes as if you were writing a letter to someone a century in the future."Perrine and Patton (2011).

10.

Is the COI barcoding gene involved in speciation through intergenomic conflict?

Kwong, Shiyang; Srivathsan, Amrita; Vaidya, Gaurav; Meier, Rudolf.

Mol Phylogenet Evol ; 62(3): 1009-12, 2012 Mar.

Article in English | MEDLINE | ID: mdl-22182989

ABSTRACT

We here test the proposition that changes in the barcoding region of COI are commonly involved in speciation through intergenomic conflict. We demonstrate that this is unlikely given that even with incomplete taxon sampling, 78-90% of closely-related animal species have identical COI amino acid sequences. In addition, in those cases where amino acid substitutions between closely related species are observed, the inter- and intra-specific substitution patterns are very similar and/or lack consistent differences in the number, position and type of amino acid change. Overall, we conclude that there is little evidence for a widespread involvement of the barcoding gene in speciation.

Subject(s)

DNA Barcoding, Taxonomic , Electron Transport Complex IV/genetics , Amino Acid Substitution , DNA, Mitochondrial , Genetic Variation , Species Specificity

11.

SequenceMatrix: concatenation software for the fast assembly of multi-gene datasets with character set and codon information.

Vaidya, Gaurav; Lohman, David J; Meier, Rudolf.

Cladistics ; 27(2): 171-180, 2011 Apr.

Article in English | MEDLINE | ID: mdl-34875773

ABSTRACT

We present SequenceMatrix, software that is designed to facilitate the assembly and analysis of multi-gene datasets. Genes are concatenated by dragging and dropping FASTA, NEXUS, or TNT files with aligned sequences into the program window. A multi-gene dataset is concatenated and displayed in a spreadsheet; each sequence is represented by a cell that provides information on sequence length, number of indels, the number of ambiguous bases ("Ns"), and the availability of codon information. Alternatively, GenBank numbers for the sequences can be displayed and exported. Matrices with hundreds of genes and taxa can be concatenated within minutes and exported in TNT, NEXUS, or PHYLIP formats, preserving both character set and codon information for TNT and NEXUS files. SequenceMatrix also creates taxon sets listing taxa with a minimum number of characters or gene fragments, which helps assess preliminary datasets. Entire taxa, whole gene fragments, or individual sequences for a particular gene and species can be excluded from export. Data matrices can be re-split into their component genes and the gene fragments can be exported as individual gene files. SequenceMatrix also includes two tools that help to identify sequences that may have been compromised through laboratory contamination or data management error. One tool lists identical or near-identical sequences within genes, while the other compares the pairwise distance pattern of one gene against the pattern for all remaining genes combined. SequenceMatrix is Java-based and compatible with the Microsoft Windows, Apple MacOS X and Linux operating systems. The software is freely available from http://code.google.com/p/sequencematrix/. © The Willi Hennig Society 2010.

12.

DNA barcoding and taxonomy in Diptera: a tale of high intraspecific variability and low identification success.

Meier, Rudolf; Shiyang, Kwong; Vaidya, Gaurav; Ng, Peter K L.

Syst Biol ; 55(5): 715-28, 2006 Oct.

Article in English | MEDLINE | ID: mdl-17060194

ABSTRACT

DNA barcoding and DNA taxonomy have recently been proposed as solutions to the crisis of taxonomy and received significant attention from scientific journals, grant agencies, natural history museums, and mainstream media. Here, we test two key claims of molecular taxonomy using 1333 mitochondrial COI sequences for 449 species of Diptera. We investigate whether sequences can be used for species identification ("DNA barcoding") and find a relatively low success rate (< 70%) based on tree-based and newly proposed species identification criteria. Misidentifications are due to wide overlap between intra- and interspecific genetic variability, which causes 6.5% of all query sequences to have allospecific or a mixture of allo- and conspecific (3.6%) best-matching barcodes. Even when two COI sequences are identical, there is a 6% chance that they belong to different species. We also find that 21% of all species lack unique barcodes when consensus sequences of all conspecific sequences are used. Lastly, we test whether DNA sequences yield an unambiguous species-level taxonomy when sequence profiles are assembled based on pairwise distance thresholds. We find many sequence triplets for which two of the three pairwise distances remain below the threshold, whereas the third exceeds it; i.e., it is impossible to consistently delimit species based on pairwise distances. Furthermore, for species profiles based on a 3% threshold, only 47% of all profiles are consistent with currently accepted species limits, 20% contain more than one species, and 33% only some sequences from one species; i.e., adopting such a DNA taxonomy would require the redescription of a large proportion of the known species, thus worsening the taxonomic impediment. We conclude with an outlook on the prospects of obtaining complete barcode databases and the future use of DNA sequences in a modern integrative taxonomy.

Subject(s)

Diptera/classification , Genetic Variation , Animals , Base Sequence , Classification/methods , Consensus Sequence , DNA, Mitochondrial/chemistry , Diptera/genetics , Electron Transport Complex IV/chemistry , Electron Transport Complex IV/genetics , Phylogeny , Sequence Analysis, DNA , Species Specificity

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL