Search | VHL Regional Portal

Local data commons: the sleeping beauty in the community of data commons.

Jeong, Jong Cheol; Hands, Isaac; Kolesar, Jill M; Rao, Mahadev; Davis, Bront; Dobyns, York; Hurt-Mueller, Joseph; Levens, Justin; Gregory, Jenny; Williams, John; Witt, Lisa; Kim, Eun Mi; Burton, Carlee; Elbiheary, Amir A; Chang, Mingguang; Durbin, Eric B.

BMC Bioinformatics ; 23(Suppl 12): 386, 2022 Sep 23.

Article in English | MEDLINE | ID: mdl-36151511

ABSTRACT

BACKGROUND: Public Data Commons (PDC) have been highlighted in the scientific literature for their capacity to collect and harmonize big data. On the other hand, local data commons (LDC), located within an institution or organization, have been underrepresented in the scientific literature, even though they are a critical part of research infrastructure. Being closest to the sources of data, LDCs provide the ability to collect and maintain the most up-to-date, high-quality data within an organization, closest to the sources of the data. As a data provider, LDCs have many challenges in both collecting and standardizing data, moreover, as a consumer of PDC, they face problems of data harmonization stemming from the monolithic harmonization pipeline designs commonly adapted by many PDCs. Unfortunately, existing guidelines and resources for building and maintaining data commons exclusively focus on PDC and provide very little information on LDC. RESULTS: This article focuses on four important observations. First, there are three different types of LDC service models that are defined based on their roles and requirements. These can be used as guidelines for building new LDC or enhancing the services of existing LDC. Second, the seven core services of LDC are discussed, including cohort identification and facilitation of genomic sequencing, the management of molecular reports and associated infrastructure, quality control, data harmonization, data integration, data sharing, and data access control. Third, instead of commonly developed monolithic systems, we propose a new data sharing method for data harmonization that combines both divide-and-conquer and bottom-up approaches. Finally, an end-to-end LDC implementation is introduced with real-world examples. CONCLUSIONS: Although LDCs are an optimal place to identify and address data quality issues, they have traditionally been relegated to the role of passive data provider for much larger PDC. Indeed, many LDCs limit their functions to only conducting routine data storage and transmission tasks due to a lack of information on how to design, develop, and improve their services using limited resources. We hope that this work will be the first small step in raising awareness among the LDCs of their expanded utility and to publicize to a wider audience the importance of LDC.

Subject(s)

Big Data , Information Dissemination , Developing Countries , Humans

Web-based interactive mapping from data dictionaries to ontologies, with an application to cancer registry.

Tao, Shiqiang; Zeng, Ningzhou; Hands, Isaac; Hurt-Mueller, Joseph; Durbin, Eric B; Cui, Licong; Zhang, Guo-Qiang.

BMC Med Inform Decis Mak ; 20(Suppl 10): 271, 2020 12 15.

Article in English | MEDLINE | ID: mdl-33319710

ABSTRACT

BACKGROUND: The Kentucky Cancer Registry (KCR) is a central cancer registry for the state of Kentucky that receives data about incident cancer cases from all healthcare facilities in the state within 6 months of diagnosis. Similar to all other U.S. and Canadian cancer registries, KCR uses a data dictionary provided by the North American Association of Central Cancer Registries (NAACCR) for standardized data entry. The NAACCR data dictionary is not an ontological system. Mapping between the NAACCR data dictionary and the National Cancer Institute (NCI) Thesaurus (NCIt) will facilitate the enrichment, dissemination and utilization of cancer registry data. We introduce a web-based system, called Interactive Mapping Interface (IMI), for creating mappings from data dictionaries to ontologies, in particular from NAACCR to NCIt. METHOD: IMI has been designed as a general approach with three components: (1) ontology library; (2) mapping interface; and (3) recommendation engine. The ontology library provides a list of ontologies as targets for building mappings. The mapping interface consists of six modules: project management, mapping dashboard, access control, logs and comments, hierarchical visualization, and result review and export. The built-in recommendation engine automatically identifies a list of candidate concepts to facilitate the mapping process. RESULTS: We report the architecture design and interface features of IMI. To validate our approach, we implemented an IMI prototype and pilot-tested features using the IMI interface to map a sample set of NAACCR data elements to NCIt concepts. 47 out of 301 NAACCR data elements have been mapped to NCIt concepts. Five branches of hierarchical tree have been identified from these mapped concepts for visual inspection. CONCLUSIONS: IMI provides an interactive, web-based interface for building mappings from data dictionaries to ontologies. Although our pilot-testing scope is limited, our results demonstrate feasibility using IMI for semantic enrichment of cancer registry data by mapping NAACCR data elements to NCIt concepts.

Subject(s)

Biological Ontologies , Neoplasms , Canada/epidemiology , Humans , Internet , Neoplasms/diagnosis , Neoplasms/epidemiology , Registries , Vocabulary, Controlled

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL