medRxiv ; 2020 Oct 04.
Article in English | MEDLINE | ID: covidwho-835253


IMPORTANCE: COVID-19 racial disparities have gained significant attention yet little is known about how age distributions obscure racial-ethnic disparities in COVID-19 case fatality ratios (CFR). OBJECTIVE: We filled this gap by assessing relevant data availability and quality across states, and in states with available data, investigating how racial-ethnic disparities in CFR changed after age adjustment. Design/Setting/Participants/Exposure: We conducted a landscape analysis as of July 1st, 2020 and developed a grading system to assess COVID-19 case and death data by age and race in 50 states and DC. In states where age- and race-specific data were available, we applied direct age standardization to compare CFR across race-ethnicities. We developed an online dashboard to automatically and continuously update our results. Main Outcome and Measure: Our main outcome was CFR (deaths per 100 confirmed cases). We examined CFR by age and race-ethnicities. RESULTS: We found substantial variations in disaggregating and reporting case and death data across states. Only three states, California, Illinois and Ohio, had sufficient age- and race-ethnicity-disaggregation to allow the investigation of racial-ethnic disparities in CFR while controlling for age. In total, we analyzed 391,991confirmed cases and 17,612 confirmed deaths. The crude CFRs varied from, e.g. 7.35% among Non-Hispanic (NH) White population to 1.39% among Hispanic population in Ohio. After age standardization, racial-ethnic differences in CFR narrowed, e.g. from 5.28% among NH White population to 3.79% among NH Asian population in Ohio, or an over one-fold difference. In addition, the ranking of race-ethnic-specific CFRs changed after age standardization. NH White population had the leading crude CFRs whereas NH Black and NH Asian population had the leading and second leading age-adjusted CFRs respectively in two of the three states. Hispanic populations age-adjusted CFR were substantially higher than the crude. Sensitivity analysis did not change these results qualitatively. CONCLUSIONS AND RELEVANCE: The availability and quality of age- and race-ethnic-specific COVID-19 case and death data varied greatly across states. Age distributions in confirmed cases obscured racial-ethnic disparities in COVID-19 CFR. Age standardization narrows racial-ethnic disparities and changes ranking. Public COVID-19 data availability, quality, and harmonization need improvement to address racial disparities in this pandemic.

J Am Med Inform Assoc ; 28(3): 427-443, 2021 03 01.
Article in English | MEDLINE | ID: covidwho-719257


OBJECTIVE: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. MATERIALS AND METHODS: The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. RESULTS: Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. CONCLUSIONS: The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19.

COVID-19 , Data Science/organization & administration , Information Dissemination , Intersectoral Collaboration , Computer Security , Data Analysis , Ethics Committees, Research , Government Regulation , Humans , National Institutes of Health (U.S.) , United States