Search | VHL Regional Portal

1.

Modeling community standards for metadata as templates makes data FAIR.

Musen, Mark A; O'Connor, Martin J; Schultes, Erik; Martínez-Romero, Marcos; Hardi, Josef; Graybeal, John.

Sci Data ; 9(1): 696, 2022 11 12.

Article in English | MEDLINE | ID: mdl-36371407

ABSTRACT

It is challenging to determine whether datasets are findable, accessible, interoperable, and reusable (FAIR) because the FAIR Guiding Principles refer to highly idiosyncratic criteria regarding the metadata used to annotate datasets. Specifically, the FAIR principles require metadata to be "rich" and to adhere to "domain-relevant" community standards. Scientific communities should be able to define their own machine-actionable templates for metadata that encode these "rich," discipline-specific elements. We have explored this template-based approach in the context of two software systems. One system is the CEDAR Workbench, which investigators use to author new metadata. The other is the FAIRware Workbench, which evaluates the metadata of archived datasets for their adherence to community standards. Benefits accrue when templates for metadata become central elements in an ecosystem of tools to manage online datasets-both because the templates serve as a community reference for what constitutes FAIR data, and because they embody that perspective in a form that can be distributed among a variety of software applications to assist with data stewardship and data sharing.

2.

A Simple Standard for Sharing Ontological Mappings (SSSOM).

Matentzoglu, Nicolas; Balhoff, James P; Bello, Susan M; Bizon, Chris; Brush, Matthew; Callahan, Tiffany J; Chute, Christopher G; Duncan, William D; Evelo, Chris T; Gabriel, Davera; Graybeal, John; Gray, Alasdair; Gyori, Benjamin M; Haendel, Melissa; Harmse, Henriette; Harris, Nomi L; Harrow, Ian; Hegde, Harshad B; Hoyt, Amelia L; Hoyt, Charles T; Jiao, Dazhi; Jiménez-Ruiz, Ernesto; Jupp, Simon; Kim, Hyeongsik; Koehler, Sebastian; Liener, Thomas; Long, Qinqin; Malone, James; McLaughlin, James A; McMurry, Julie A; Moxon, Sierra; Munoz-Torres, Monica C; Osumi-Sutherland, David; Overton, James A; Peters, Bjoern; Putman, Tim; Queralt-Rosinach, Núria; Shefchek, Kent; Solbrig, Harold; Thessen, Anne; Tudorache, Tania; Vasilevsky, Nicole; Wagner, Alex H; Mungall, Christopher J.

Database (Oxford) ; 20222022 05 25.

Article in English | MEDLINE | ID: mdl-35616100

ABSTRACT

Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec.

Subject(s)

Metadata , Semantic Web , Data Management , Databases, Factual , Workflow

3.

Design of a FAIR digital data health infrastructure in Africa for COVID-19 reporting and research.

van Reisen, Mirjam; Oladipo, Francisca; Stokmans, Mia; Mpezamihgo, Mouhamed; Folorunso, Sakinat; Schultes, Erik; Basajja, Mariam; Aktau, Aliya; Amare, Samson Yohannes; Taye, Getu Tadele; Purnama Jati, Putu Hadi; Chindoza, Kudakwashe; Wirtz, Morgane; Ghardallou, Meriem; van Stam, Gertjan; Ayele, Wondimu; Nalugala, Reginald; Abdullahi, Ibrahim; Osigwe, Obinna; Graybeal, John; Medhanyie, Araya Abrha; Kawu, Abdullahi Abubakar; Liu, Fenghong; Wolstencroft, Katy; Flikkenschild, Erik; Lin, Yi; Stocker, Joëlle; Musen, Mark A.

Adv Genet (Hoboken) ; 2(2): e10050, 2021 Jun.

Article in English | MEDLINE | ID: mdl-34514430

ABSTRACT

The limited volume of COVID-19 data from Africa raises concerns for global genome research, which requires a diversity of genotypes for accurate disease prediction, including on the provenance of the new SARS-CoV-2 mutations. The Virus Outbreak Data Network (VODAN)-Africa studied the possibility of increasing the production of clinical data, finding concerns about data ownership, and the limited use of health data for quality treatment at point of care. To address this, VODAN Africa developed an architecture to record clinical health data and research data collected on the incidence of COVID-19, producing these as human- and machine-readable data objects in a distributed architecture of locally governed, linked, human- and machine-readable data. This architecture supports analytics at the point of care and-through data visiting, across facilities-for generic analytics. An algorithm was run across FAIR Data Points to visit the distributed data and produce aggregate findings. The FAIR data architecture is deployed in Uganda, Ethiopia, Liberia, Nigeria, Kenya, Somalia, Tanzania, Zimbabwe, and Tunisia.

4.

Using association rule mining and ontologies to generate metadata recommendations from multiple biomedical databases.

Martínez-Romero, Marcos; O'Connor, Martin J; Egyedi, Attila L; Willrett, Debra; Hardi, Josef; Graybeal, John; Musen, Mark A.

Database (Oxford) ; 20192019 01 01.

Article in English | MEDLINE | ID: mdl-31210270

ABSTRACT

Metadata-the machine-readable descriptions of the data-are increasingly seen as crucial for describing the vast array of biomedical datasets that are currently being deposited in public repositories. While most public repositories have firm requirements that metadata must accompany submitted datasets, the quality of those metadata is generally very poor. A key problem is that the typical metadata acquisition process is onerous and time consuming, with little interactive guidance or assistance provided to users. Secondary problems include the lack of validation and sparse use of standardized terms or ontologies when authoring metadata. There is a pressing need for improvements to the metadata acquisition process that will help users to enter metadata quickly and accurately. In this paper, we outline a recommendation system for metadata that aims to address this challenge. Our approach uses association rule mining to uncover hidden associations among metadata values and to represent them in the form of association rules. These rules are then used to present users with real-time recommendations when authoring metadata. The novelties of our method are that it is able to combine analyses of metadata from multiple repositories when generating recommendations and can enhance those recommendations by aligning them with ontology terms. We implemented our approach as a service integrated into the CEDAR Workbench metadata authoring platform, and evaluated it using metadata from two public biomedical repositories: US-based National Center for Biotechnology Information BioSample and European Bioinformatics Institute BioSamples. The results show that our approach is able to use analyses of previously entered metadata coupled with ontology-based mappings to present users with accurate recommendations when authoring metadata.

Subject(s)

Data Mining/methods , Data Mining/standards , Databases, Factual/standards , Metadata , Computational Biology/standards

5.

Unleashing the value of Common Data Elements through the CEDAR Workbench.

O'Connor, Martin J; Warzel, Denise B; Martínez-Romero, Marcos; Hardi, Josef; Willrett, Debra; Egyedi, Attila L; Eftekhari, Aras; Graybeal, John; Musen, Mark A.

AMIA Annu Symp Proc ; 2019: 681-690, 2019.

Article in English | MEDLINE | ID: mdl-32308863

ABSTRACT

Developing promising treatments in biomedicine often requires aggregation and analysis of data from disparate sources across the healthcare and research spectrum. To facilitate these approaches, there is a growing focus on supporting interoperation of datasets by standardizing data-capture and reporting requirements. Common Data Elements (CDEs)-precise specifications of questions and the set of allowable answers to each question-are increasingly being adopted to help meet these standardization goals. While CDEs can provide a strong conceptual foundation for interoperation, there are no widely recognized serialization or interchange formats to describe and exchange their definitions. As a result, CDEs defined in one system cannot be easily be reused by other systems. An additional problem is that current CDE-based systems tend to be rather heavyweight and cannot be easily adopted and used by third-parties. To address these problems, we developed extensions to a metadata management system called the CEDAR Workbench to provide a platform to simplify the creation, exchange, and use of CDEs. We show how the resulting system allows users to quickly define and share CDEs and to immediately use these CDEs to build and deploy Web-based forms to acquire conforming metadata. We also show how we incorporated a large CDE library from the National Cancer Institute's caDSR system and made these CDEs publicly available for general use.

Subject(s)

Biomedical Research , Common Data Elements , Data Collection/standards , Data Management/methods , Common Data Elements/standards , Data Management/standards , Humans , Internet , Metadata , National Institutes of Health (U.S.) , Registries , United States , User-Computer Interface

6.

The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories.

Bukhari, Syed Ahmad Chan; O'Connor, Martin J; Martínez-Romero, Marcos; Egyedi, Attila L; Willrett, Debra; Graybeal, John; Musen, Mark A; Rubelt, Florian; Cheung, Kei-Hoi; Kleinstein, Steven H.

Front Immunol ; 9: 1877, 2018.

Article in English | MEDLINE | ID: mdl-30166985

ABSTRACT

The adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious disease, autoimmunity, and cancer. The increasingly wide application of AIRR-seq is leading to a critical mass of studies being deposited in the public domain, offering the possibility of novel scientific insights through secondary analyses and meta-analyses. However, effective sharing of these large-scale data remains a challenge. The AIRR community has proposed minimal information about adaptive immune receptor repertoire (MiAIRR), a standard for reporting AIRR-seq studies. The MiAIRR standard has been operationalized using the National Center for Biotechnology Information (NCBI) repositories. Submissions of AIRR-seq data to the NCBI repositories typically use a combination of web-based and flat-file templates and include only a minimal amount of terminology validation. As a result, AIRR-seq studies at the NCBI are often described using inconsistent terminologies, limiting scientists' ability to access, find, interoperate, and reuse the data sets. In order to improve metadata quality and ease submission of AIRR-seq studies to the NCBI, we have leveraged the software framework developed by the Center for Expanded Data Annotation and Retrieval (CEDAR), which develops technologies involving the use of data standards and ontologies to improve metadata quality. The resulting CEDAR-AIRR (CAIRR) pipeline enables data submitters to: (i) create web-based templates whose entries are controlled by ontology terms, (ii) generate and validate metadata, and (iii) submit the ontology-linked metadata and sequence files (FASTQ) to the NCBI BioProject, BioSample, and Sequence Read Archive databases. Overall, CAIRR provides a web-based metadata submission interface that supports compliance with the MiAIRR standard. This pipeline is available at http://cairr.miairr.org, and will facilitate the NCBI submission process and improve the metadata quality of AIRR-seq studies.

Subject(s)

Computational Biology/methods , Databases, Nucleic Acid , Receptors, Antigen, B-Cell/genetics , Receptors, Antigen, T-Cell/genetics , Software , Computational Biology/organization & administration , Data Mining , Gene Ontology , Humans , Metadata , Reproducibility of Results , User-Computer Interface , Workflow

7.

CEDAR OnDemand: a browser extension to generate ontology-based scientific metadata.

Bukhari, Syed Ahmad Chan; Martínez-Romero, Marcos; O' Connor, Martin J; Egyedi, Attila L; Willrett, Debra; Graybeal, John; Musen, Mark A; Cheung, Kei-Hoi; Kleinstein, Steven H.

BMC Bioinformatics ; 19(1): 268, 2018 07 16.

Article in English | MEDLINE | ID: mdl-30012108

ABSTRACT

BACKGROUND: Public biomedical data repositories often provide web-based interfaces to collect experimental metadata. However, these interfaces typically reflect the ad hoc metadata specification practices of the associated repositories, leading to a lack of standardization in the collected metadata. This lack of standardization limits the ability of the source datasets to be broadly discovered, reused, and integrated with other datasets. To increase reuse, discoverability, and reproducibility of the described experiments, datasets should be appropriately annotated by using agreed-upon terms, ideally from ontologies or other controlled term sources. RESULTS: This work presents "CEDAR OnDemand", a browser extension powered by the NCBO (National Center for Biomedical Ontology) BioPortal that enables users to seamlessly enter ontology-based metadata through existing web forms native to individual repositories. CEDAR OnDemand analyzes the web page contents to identify the text input fields and associate them with relevant ontologies which are recommended automatically based upon input fields' labels (using the NCBO ontology recommender) and a pre-defined list of ontologies. These field-specific ontologies are used for controlling metadata entry. CEDAR OnDemand works for any web form designed in the HTML format. We demonstrate how CEDAR OnDemand works through the NCBI (National Center for Biotechnology Information) BioSample web-based metadata entry. CONCLUSION: CEDAR OnDemand helps lower the barrier of incorporating ontologies into standardized metadata entry for public data repositories. CEDAR OnDemand is available freely on the Google Chrome store https://chrome.google.com/webstore/search/CEDAROnDemand.

Subject(s)

Biological Ontologies , Internet , Metadata , Software , Algorithms , Humans

8.

NCBO Ontology Recommender 2.0: an enhanced approach for biomedical ontology recommendation.

Martínez-Romero, Marcos; Jonquet, Clement; O'Connor, Martin J; Graybeal, John; Pazos, Alejandro; Musen, Mark A.

J Biomed Semantics ; 8(1): 21, 2017 Jun 07.

Article in English | MEDLINE | ID: mdl-28592275

ABSTRACT

BACKGROUND: Ontologies and controlled terminologies have become increasingly important in biomedical research. Researchers use ontologies to annotate their data with ontology terms, enabling better data integration and interoperability across disparate datasets. However, the number, variety and complexity of current biomedical ontologies make it cumbersome for researchers to determine which ones to reuse for their specific needs. To overcome this problem, in 2010 the National Center for Biomedical Ontology (NCBO) released the Ontology Recommender, which is a service that receives a biomedical text corpus or a list of keywords and suggests ontologies appropriate for referencing the indicated terms. METHODS: We developed a new version of the NCBO Ontology Recommender. Called Ontology Recommender 2.0, it uses a novel recommendation approach that evaluates the relevance of an ontology to biomedical text data according to four different criteria: (1) the extent to which the ontology covers the input data; (2) the acceptance of the ontology in the biomedical community; (3) the level of detail of the ontology classes that cover the input data; and (4) the specialization of the ontology to the domain of the input data. RESULTS: Our evaluation shows that the enhanced recommender provides higher quality suggestions than the original approach, providing better coverage of the input data, more detailed information about their concepts, increased specialization for the domain of the input data, and greater acceptance and use in the community. In addition, it provides users with more explanatory information, along with suggestions of not only individual ontologies but also groups of ontologies to use together. It also can be customized to fit the needs of different ontology recommendation scenarios. CONCLUSIONS: Ontology Recommender 2.0 suggests relevant ontologies for annotating biomedical text data. It combines the strengths of its predecessor with a range of adjustments and new features that improve its reliability and usefulness. Ontology Recommender 2.0 recommends over 500 biomedical ontologies from the NCBO BioPortal platform, where it is openly available (both via the user interface at http://bioportal.bioontology.org/recommender , and via a Web service API).

Subject(s)

Biological Ontologies , National Institutes of Health (U.S.) , Semantics , United States

9.

Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations.

Martínez-Romero, Marcos; O'Connor, Martin J; Shankar, Ravi D; Panahiazar, Maryam; Willrett, Debra; Egyedi, Attila L; Gevaert, Olivier; Graybeal, John; Musen, Mark A.

AMIA Annu Symp Proc ; 2017: 1272-1281, 2017.

Article in English | MEDLINE | ID: mdl-29854196

ABSTRACT

In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies. There is a pressing need for methods and tools to speed up the metadata acquisition process and to increase the quality of metadata that are entered. In this paper, we describe a methodology and set of associated tools that we developed to address this challenge. A core component of this approach is a value recommendation framework that uses analysis of previously entered metadata and ontology-based metadata specifications to help users rapidly and accurately enter their metadata. We performed an initial evaluation of this approach using metadata from a public metadata repository.

Subject(s)

Metadata , Biological Ontologies , Biomedical Research , Data Accuracy , Data Analysis , Metadata/standards , Methods

10.

The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata that Describe Scientific Experiments.

Gonçalves, Rafael S; O'Connor, Martin J; Martínez-Romero, Marcos; Egyedi, Attila L; Willrett, Debra; Graybeal, John; Musen, Mark A.

Semant Web ISWC ; 10588: 103-110, 2017 Oct.

Article in English | MEDLINE | ID: mdl-32219223

ABSTRACT

The Center for Expanded Data Annotation and Retrieval (CEDAR) aims to revolutionize the way that metadata describing scientific experiments are authored. The software we have developed-the CEDAR Workbench-is a suite of Web-based tools and REST APIs that allows users to construct metadata templates, to fill in templates to generate high-quality metadata, and to share and manage these resources. The CEDAR Workbench provides a versatile, REST-based environment for authoring metadata that are enriched with terms from ontologies. The metadata are available as JSON, JSON-LD, or RDF for easy integration in scientific applications and reusability on the Web. Users can leverage our APIs for validating and submitting metadata to external repositories. The CEDAR Workbench is freely available and open-source.

11.

Human ApoE Îµ4 alters circadian rhythm activity, IL-1ß, and GFAP in CRND8 mice.

Graybeal, John J; Bozzelli, P Lorenzo; Graybeal, Lacey L; Groeber, Caitlin M; McKnight, Patrick E; Cox, Daniel N; Flinn, Jane M.

J Alzheimers Dis ; 43(3): 823-34, 2015.

Article in English | MEDLINE | ID: mdl-25159669

ABSTRACT

Disruptions to daily living, inflammation, and astrogliosis are characteristics of Alzheimer's disease. Thus, circadian rhythms, nest construction, IL-1ß and TNF-α, and glial fibrillary acidic protein (GFAP) were examined in a mouse model developed to model late-onset Alzheimer's disease-the most common form of the disease. Mice carrying both the mutated human AßPP transgene found in the CRND8 mouse and the human apolipoprotein E Îµ4 allele (CRND8/E4) were compared with CRND8 mice and wildtype (WT) mice. Circadian rhythms were evaluated by wheel-running behavior. Activity of daily living was measured by nest construction. This study then examined mRNA levels of the inflammatory cytokines IL-1ß and TNF-α as well as protein levels of GFAP. Behavioral outcomes were then correlated with cytokines and GFAP. Compared to WT controls, both CRND8 and CRND8/E4 mice showed significantly more frequent, but shorter, bouts of activity. In the three groups, the CRND8/E4 mice had intermediate disruptions in circadian rhythms. Both CRND8/E4 mice and CRND8 mice showed significant impairments in nesting behavior compared to WTs. While CRND8 mice expressed significantly increased IL-1ß and GFAP expression compared to WT controls, CRND8/E4 mice expressed intermediate IL-1ß and GFAP levels. Significant correlations between IL-1ß, GFAP, and behavior were observed. These data are congruent with other studies showing that human ApoE Îµ4 is protective early in life in transgenic mice modeling Alzheimer's disease.

Subject(s)

Apolipoprotein E4/genetics , Brain/metabolism , Circadian Rhythm/physiology , Glial Fibrillary Acidic Protein/metabolism , Interleukin-1beta/metabolism , Amyloid beta-Protein Precursor/genetics , Animals , Humans , Interleukin-1beta/genetics , Mice , Mice, Transgenic , Nesting Behavior/physiology , RNA, Messenger/genetics , RNA, Messenger/metabolism , Running/physiology , Tumor Necrosis Factor-alpha/genetics , Tumor Necrosis Factor-alpha/metabolism

12.

The effect of motion on pulse oximetry and its clinical significance.

Petterson, Michael T; Begnoche, Valerie L; Graybeal, John M.

Anesth Analg ; 105(6 Suppl): S78-S84, 2007 Dec.

Article in English | MEDLINE | ID: mdl-18048903

ABSTRACT

Pulse oximetry is an important diagnostic and patient monitoring tool. However, motion can induce considerable error into pulse oximetry accuracy, resulting in loss of data, inaccurate readings, and false alarms. We will discuss how motion artifact affects pulse oximetry accuracy, the clinical consequences of motion artifact, and the methods used by various technologies to minimize the impact of the motion noise.

Subject(s)

Artifacts , Movement , Oximetry , Oxygen/blood , Signal Processing, Computer-Assisted , Equipment Design , Humans , Models, Cardiovascular , Motion , Oximetry/instrumentation , Reproducibility of Results

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL