Search | VHL Regional Portal

The Glycan Structure Dictionary-a dictionary describing commonly used glycan structure terms.

Vora, Jeet; Navelkar, Rahi; Vijay-Shanker, K; Edwards, Nathan; Martinez, Karina; Ding, Xiying; Wang, Tianyi; Su, Peng; Ross, Karen; Lisacek, Frederique; Hayes, Catherine; Kahsay, Robel; Ranzinger, Rene; Tiemeyer, Michael; Mazumder, Raja.

Glycobiology ; 33(5): 354-357, 2023 06 03.

Article in English | MEDLINE | ID: mdl-36799723

ABSTRACT

Recent technological advances in glycobiology have resulted in a large influx of data and the publication of many papers describing discoveries in glycoscience. However, the terms used in describing glycan structural features are not standardized, making it difficult to harmonize data across biomolecular databases, hampering the harvesting of information across studies and hindering text mining and curation efforts. To address this shortcoming, the Glycan Structure Dictionary has been developed as a reference dictionary to provide a standardized list of widely used glycan terms that can help in the curation and mapping of glycan structures described in publications. Currently, the dictionary has 190 glycan structure terms with 297 synonyms linked to 3,332 publications. For a term to be included in the dictionary, it must be present in at least 2 peer-reviewed publications. Synonyms, annotations, and cross-references to GlyTouCan, GlycoMotif, and other relevant databases and resources are also provided when available. The purpose of this effort is to facilitate biocuration, assist in the development of text mining tools, improve the harmonization of search, and browse capabilities in glycoinformatics resources and help to map glycan structures to function and disease. It is also expected that authors will use these terms to describe glycan structures in their manuscripts over time. A mechanism is also provided for researchers to submit terms for potential incorporation. The dictionary is available at https://wiki.glygen.org/Glycan_structure_dictionary.

Subject(s)

Data Mining , Polysaccharides , Data Mining/methods , Databases, Factual , Polysaccharides/chemistry , Glycomics/methods

Enhancing the interoperability of glycan data flow between ChEBI, PubChem and GlyGen.

Navelkar, Rahi; Owen, Gareth; Mutherkrishnan, Venkatesh; Thiessen, Paul; Cheng, Tiejun; Bolton, Evan; Edwards, Nathan; Tiemeyer, Michael; Campbell, Matthew P; Martin, Maria; Vora, Jeet; Kahsay, Robel; Mazumder, Raja.

Glycobiology ; 31(11): 1510-1519, 2021 12 18.

Article in English | MEDLINE | ID: mdl-34314492

ABSTRACT

Glycans play a vital role in health, disease, bioenergy, biomaterials and bio-therapeutics. As a result, there is keen interest to identify and increase glycan data in bioinformatics databases like ChEBI and PubChem, and connecting them to resources at the EMBL-EBI and NCBI to facilitate access to important annotations at a global level. GlyTouCan is a comprehensive archival database that contains glycans obtained primarily through batch upload from glycan repositories, glycoprotein databases and individual laboratories. In many instances, the glycan structures deposited in GlyTouCan may not be fully defined or have supporting experimental evidence and citations. Databases like ChEBI and PubChem were designed to accommodate complete atomistic structures with well-defined chemical linkages. As a result, they cannot easily accommodate the structural ambiguity inherent in glycan databases. Consequently, there is a need to improve the organization of glycan data coherently to enhance connectivity across the major NCBI, EMBL-EBI and glycoscience databases. This paper outlines a workflow developed in collaboration between GlyGen, ChEBI and PubChem to improve the visibility and connectivity of glycan data across these resources. GlyGen hosts a subset of glycans (~29,000) from the GlyTouCan database and has submitted valuable glycan annotations to the PubChem database and integrated over 10,500 (including ambiguously defined) glycans into the ChEBI database. The integrated glycans were prioritized based on links to PubChem and connectivity to glycoprotein data. The pipeline provides a blueprint for how glycan data can be harmonized between different resources. The current PubChem, ChEBI and GlyTouCan mappings can be downloaded from GlyGen (https://data.glygen.org).

Subject(s)

Databases, Chemical , Glycoproteins/chemistry , Polysaccharides/chemistry , Software , Carbohydrate Conformation , Glycomics

Bioinformatics tools developed to support BioCompute Objects.

Patel, Janisha A; Dean, Dennis A; King, Charles Hadley; Xiao, Nan; Koc, Soner; Minina, Ekaterina; Golikov, Anton; Brooks, Phillip; Kahsay, Robel; Navelkar, Rahi; Ray, Manisha; Roberson, Dave; Armstrong, Chris; Mazumder, Raja; Keeney, Jonathon.

Database (Oxford) ; 20212021 03 30.

Article in English | MEDLINE | ID: mdl-33784373

ABSTRACT

Developments in high-throughput sequencing (HTS) result in an exponential increase in the amount of data generated by sequencing experiments, an increase in the complexity of bioinformatics analysis reporting and an increase in the types of data generated. These increases in volume, diversity and complexity of the data generated and their analysis expose the necessity of a structured and standardized reporting template. BioCompute Objects (BCOs) provide the requisite support for communication of HTS data analysis that includes support for workflow, as well as data, curation, accessibility and reproducibility of communication. BCOs standardize how researchers report provenance and the established verification and validation protocols used in workflows while also being robust enough to convey content integration or curation in knowledge bases. BCOs that encapsulate tools, platforms, datasets and workflows are FAIR (findable, accessible, interoperable and reusable) compliant. Providing operational workflow and data information facilitates interoperability between platforms and incorporation of future dataset within an HTS analysis for use within industrial, academic and regulatory settings. Cloud-based platforms, including High-performance Integrated Virtual Environment (HIVE), Cancer Genomics Cloud (CGC) and Galaxy, support BCO generation for users. Given the 100K+ userbase between these platforms, BioCompute can be leveraged for workflow documentation. In this paper, we report the availability of platform-dependent and platform-independent BCO tools: HIVE BCO App, CGC BCO App, Galaxy BCO API Extension and BCO Portal. Community engagement was utilized to evaluate tool efficacy. We demonstrate that these tools further advance BCO creation from text editing approaches used in earlier releases of the standard. Moreover, we demonstrate that integrating BCO generation within existing analysis platforms greatly streamlines BCO creation while capturing granular workflow details. We also demonstrate that the BCO tools described in the paper provide an approach to solve the long-standing challenge of standardizing workflow descriptions that are both human and machine readable while accommodating manual and automated curation with evidence tagging. Database URL: https://www.biocomputeobject.org/resources.

Subject(s)

Computational Biology , Genomics , High-Throughput Nucleotide Sequencing , Humans , Reproducibility of Results , Software , Workflow

GlyGen data model and processing workflow.

Kahsay, Robel; Vora, Jeet; Navelkar, Rahi; Mousavi, Reza; Fochtman, Brian C; Holmes, Xavier; Pattabiraman, Nagarajan; Ranzinger, Rene; Mahadik, Rupali; Williamson, Tatiana; Kulkarni, Sujeet; Agarwal, Gaurav; Martin, Maria; Vasudev, Preethi; Garcia, Leyla; Edwards, Nathan; Zhang, Wenjin; Natale, Darren A; Ross, Karen; Aoki-Kinoshita, Kiyoko F; Campbell, Matthew P; York, William S; Mazumder, Raja.

Bioinformatics ; 36(12): 3941-3943, 2020 06 01.

Article in English | MEDLINE | ID: mdl-32324859

ABSTRACT

SUMMARY: Glycoinformatics plays a major role in glycobiology research, and the development of a comprehensive glycoinformatics knowledgebase is critical. This application note describes the GlyGen data model, processing workflow and the data access interfaces featuring programmatic use case example queries based on specific biological questions. The GlyGen project is a data integration, harmonization and dissemination project for carbohydrate and glycoconjugate-related data retrieved from multiple international data sources including UniProtKB, GlyTouCan, UniCarbKB and other key resources. AVAILABILITY AND IMPLEMENTATION: GlyGen web portal is freely available to access at https://glygen.org. The data portal, web services, SPARQL endpoint and GitHub repository are also freely available at https://data.glygen.org, https://api.glygen.org, https://sparql.glygen.org and https://github.com/glygener, respectively. All code is released under license GNU General Public License version 3 (GNU GPLv3) and is available on GitHub https://github.com/glygener. The datasets are made available under Creative Commons Attribution 4.0 International (CC BY 4.0) license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Knowledge Bases , Software , Glycomics , Information Storage and Retrieval , Workflow

GlyGen: Computational and Informatics Resources for Glycoscience.

York, William S; Mazumder, Raja; Ranzinger, Rene; Edwards, Nathan; Kahsay, Robel; Aoki-Kinoshita, Kiyoko F; Campbell, Matthew P; Cummings, Richard D; Feizi, Ten; Martin, Maria; Natale, Darren A; Packer, Nicolle H; Woods, Robert J; Agarwal, Gaurav; Arpinar, Sena; Bhat, Sanath; Blake, Judith; Castro, Leyla Jael Garcia; Fochtman, Brian; Gildersleeve, Jeffrey; Goldman, Radoslav; Holmes, Xavier; Jain, Vinamra; Kulkarni, Sujeet; Mahadik, Rupali; Mehta, Akul; Mousavi, Reza; Nakarakommula, Sandeep; Navelkar, Rahi; Pattabiraman, Nagarajan; Pierce, Michael J; Ross, Karen; Vasudev, Preethi; Vora, Jeet; Williamson, Tatiana; Zhang, Wenjin.

Glycobiology ; 30(2): 72-73, 2020 01 28.

Article in English | MEDLINE | ID: mdl-31616925

Subject(s)

Computational Biology , Polysaccharides , Software , Polysaccharides/chemistry , Polysaccharides/genetics , Polysaccharides/metabolism

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL