Your browser doesn't support javascript.
Data harmonization for COVID-19 andcancer research registries
Clinical Cancer Research ; 26(18 SUPPL), 2020.
Article in English | EMBASE | ID: covidwho-992046
ABSTRACT

Introduction:

The need to rapidly collect, integrate, and share data on COVID-19 patients with cancer at scale hasgiven rise to multiple internal and cross-institutional research registries. These registries support use cases thatrequire data at different levels of granularity and are built using mixed standards. Ensuring semantic interoperabilityand quality of this data is critical for generating reliable and reproducible evidence. At MSK, we created a frameworkthat enabled the rapid development of semantically compatible COVID and cancer registries and data exchange.

Background:

Handling and harmonizing real-world data for COVID and cancer research presented with typicalchallenges maintenance of complex patient cohorts;reconciling different levels of temporal and semanticgranularity;supporting crosswalks between different representations without information loss;and sharing itinternally and with research consortia. Solving these challenges for COVID and cancer studies necessitatedadvanced infrastructure and harmonization solutions.

Methods:

We used MSK Extract, our research platform, to create an integrated COVID and cancer data researchframework. It included a library of reusable standardized REDCap used in multiple RedCap instances supportingindividual research studies;PostgreSQL database containing patient cohorts and data from Electronic HealthRecords (EHR) standardized to OMOP;and ETL pipelines. Our approach to the REDCap design and datamanagement allowed for combined sets of detailed, atomic, and aggregate-level data through a combination of abstraction, curation, and extraction of data from different sources. We developed reconciliation methodologybetween initial curation, available raw data, and the subsequent abstraction. We enforced consistent temporalconstraints on data extraction and curation. We used the OMOP vocabulary for semantic harmonization, mappingmetadata from internal and external registries to OMOP concepts. We linked procedure and medication codes tohigh-level treatment groups leveraging classifications available in the OMOP vocabulary.

Results:

Our approach to the REDCap design supported various analytical use cases and enabled data sharingbetween different investigators and registries. Reuse of the data that was previously abstracted complemented withthe data extracted from EHR allowed investigators and their teams to quickly review, validate, and update the priorcuration. Explicit temporal constraints supported alignment between different registries. Using the OMOP standardsand high-level treatment classifications supported data conversion between various registries and integration of thedata collected via REDCap and sourced from EHR.

Conclusion:

Using real-world data for observational COVID and cancer research presented us with opportunities toimprove and mature our evolving research infrastructure and better support internal and distributed research, andhighlighted the need for uniform data standards in the cancer domain.

Full text: Available Collection: Databases of international organizations Database: EMBASE Language: English Journal: Clinical Cancer Research Year: 2020 Document Type: Article

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: Databases of international organizations Database: EMBASE Language: English Journal: Clinical Cancer Research Year: 2020 Document Type: Article