Your browser doesn't support javascript.
Harmonizing units and values of quantitative data elements in a very large nationally pooled electronic health record (EHR) dataset.
Bradwell, Katie R; Wooldridge, Jacob T; Amor, Benjamin; Bennett, Tellen D; Anand, Adit; Bremer, Carolyn; Yoo, Yun Jae; Qian, Zhenglong; Johnson, Steven G; Pfaff, Emily R; Girvin, Andrew T; Manna, Amin; Niehaus, Emily A; Hong, Stephanie S; Zhang, Xiaohan Tanner; Zhu, Richard L; Bissell, Mark; Qureshi, Nabeel; Saltz, Joel; Haendel, Melissa A; Chute, Christopher G; Lehmann, Harold P; Moffitt, Richard A.
  • Bradwell KR; Palantir Technologies, Denver, Colorado, USA.
  • Wooldridge JT; Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA.
  • Amor B; Palantir Technologies, Denver, Colorado, USA.
  • Bennett TD; Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora, Colorado, USA.
  • Anand A; Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA.
  • Bremer C; Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA.
  • Yoo YJ; Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA.
  • Qian Z; Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA.
  • Johnson SG; Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA.
  • Pfaff ER; Department of Medicine, North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
  • Girvin AT; Palantir Technologies, Denver, Colorado, USA.
  • Manna A; Palantir Technologies, Denver, Colorado, USA.
  • Niehaus EA; Palantir Technologies, Denver, Colorado, USA.
  • Hong SS; School of Medicine, Section of Biomedical Informatics and Data Science, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA.
  • Zhang XT; Department of Medicine, Johns Hopkins, Baltimore, Maryland, USA.
  • Zhu RL; Department of Medicine, Johns Hopkins, Baltimore, Maryland, USA.
  • Bissell M; Palantir Technologies, Denver, Colorado, USA.
  • Qureshi N; Palantir Technologies, Denver, Colorado, USA.
  • Saltz J; Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA.
  • Haendel MA; Center for Health AI, University of Colorado, Aurora, Colorado, USA.
  • Chute CG; Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, Maryland, USA.
  • Lehmann HP; Department of Medicine, Johns Hopkins, Baltimore, Maryland, USA.
  • Moffitt RA; Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA.
J Am Med Inform Assoc ; 29(7): 1172-1182, 2022 06 14.
Article in English | MEDLINE | ID: covidwho-1795238
ABSTRACT

OBJECTIVE:

The goals of this study were to harmonize data from electronic health records (EHRs) into common units, and impute units that were missing. MATERIALS AND

METHODS:

The National COVID Cohort Collaborative (N3C) table of laboratory measurement data-over 3.1 billion patient records and over 19 000 unique measurement concepts in the Observational Medical Outcomes Partnership (OMOP) common-data-model format from 55 data partners. We grouped ontologically similar OMOP concepts together for 52 variables relevant to COVID-19 research, and developed a unit-harmonization pipeline comprised of (1) selecting a canonical unit for each measurement variable, (2) arriving at a formula for conversion, (3) obtaining clinical review of each formula, (4) applying the formula to convert data values in each unit into the target canonical unit, and (5) removing any harmonized value that fell outside of accepted value ranges for the variable. For data with missing units for all the results within a lab test for a data partner, we compared values with pooled values of all data partners, using the Kolmogorov-Smirnov test.

RESULTS:

Of the concepts without missing values, we harmonized 88.1% of the values, and imputed units for 78.2% of records where units were absent (41% of contributors' records lacked units).

DISCUSSION:

The harmonization and inference methods developed herein can serve as a resource for initiatives aiming to extract insight from heterogeneous EHR collections. Unique properties of centralized data are harnessed to enable unit inference.

CONCLUSION:

The pipeline we developed for the pooled N3C data enables use of measurements that would otherwise be unavailable for analysis.
Subject(s)
Keywords

Full text: Available Collection: International databases Database: MEDLINE Main subject: Electronic Health Records / COVID-19 Type of study: Cohort study / Observational study / Prognostic study Limits: Humans Language: English Journal: J Am Med Inform Assoc Journal subject: Medical Informatics Year: 2022 Document Type: Article Affiliation country: Jamia

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: Electronic Health Records / COVID-19 Type of study: Cohort study / Observational study / Prognostic study Limits: Humans Language: English Journal: J Am Med Inform Assoc Journal subject: Medical Informatics Year: 2022 Document Type: Article Affiliation country: Jamia