Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
1.
JMIR Med Inform ; 10(9): e39235, 2022 09 06.
Article in English | MEDLINE | ID: covidwho-2022413

ABSTRACT

BACKGROUND: The adverse impact of COVID-19 on marginalized and under-resourced communities of color has highlighted the need for accurate, comprehensive race and ethnicity data. However, a significant technical challenge related to integrating race and ethnicity data in large, consolidated databases is the lack of consistency in how data about race and ethnicity are collected and structured by health care organizations. OBJECTIVE: This study aims to evaluate and describe variations in how health care systems collect and report information about the race and ethnicity of their patients and to assess how well these data are integrated when aggregated into a large clinical database. METHODS: At the time of our analysis, the National COVID Cohort Collaborative (N3C) Data Enclave contained records from 6.5 million patients contributed by 56 health care institutions. We quantified the variability in the harmonized race and ethnicity data in the N3C Data Enclave by analyzing the conformance to health care standards for such data. We conducted a descriptive analysis by comparing the harmonized data available for research purposes in the database to the original source data contributed by health care institutions. To make the comparison, we tabulated the original source codes, enumerating how many patients had been reported with each encoded value and how many distinct ways each category was reported. The nonconforming data were also cross tabulated by 3 factors: patient ethnicity, the number of data partners using each code, and which data models utilized those particular encodings. For the nonconforming data, we used an inductive approach to sort the source encodings into categories. For example, values such as "Declined" were grouped with "Refused," and "Multiple Race" was grouped with "Two or more races" and "Multiracial." RESULTS: "No matching concept" was the second largest harmonized concept used by the N3C to describe the race of patients in their database. In addition, 20.7% of the race data did not conform to the standard; the largest category was data that were missing. Hispanic or Latino patients were overrepresented in the nonconforming racial data, and data from American Indian or Alaska Native patients were obscured. Although only a small proportion of the source data had not been mapped to the correct concepts (0.6%), Black or African American and Hispanic/Latino patients were overrepresented in this category. CONCLUSIONS: Differences in how race and ethnicity data are conceptualized and encoded by health care institutions can affect the quality of the data in aggregated clinical databases. The impact of data quality issues in the N3C Data Enclave was not equal across all races and ethnicities, which has the potential to introduce bias in analyses and conclusions drawn from these data. Transparency about how data have been transformed can help users make accurate analyses and inferences and eventually better guide clinical care and public policy.

2.
Journal of Medical Internet Research Vol 23(10), 2021, ArtID e30697 ; 23(10), 2021.
Article in English | APA PsycInfo | ID: covidwho-1918640

ABSTRACT

Background: Computationally derived ("synthetic") data can enable the creation and analysis of clinical, laboratory, and diagnostic data as if they were the original electronic health record data. Synthetic data can support data sharing to answer critical research questions to address the COVID-19 pandemic. Objective: We aim to compare the results from analyses of synthetic data to those from original data and assess the strengths and limitations of leveraging computationally derived data for research purposes. Methods: We used the National COVID Cohort Collaborative's instance of MDClone, a big data platform with data-synthesizing capabilities (MDClone Ltd). We downloaded electronic health record data from 34 National COVID Cohort Collaborative institutional partners and tested three use cases, including (1) exploring the distributions of key features of the COVID-19-positive cohort;(2) training and testing predictive models for assessing the risk of admission among these patients;and (3) determining geospatial and temporal COVID-19-related measures and outcomes, and constructing their epidemic curves. We compared the results from synthetic data to those from original data using traditional statistics, machine learning approaches, and temporal and spatial representations of the data. Results: For each use case, the results of the synthetic data analyses successfully mimicked those of the original data such that the distributions of the data were similar and the predictive models demonstrated comparable performance. Although the synthetic and original data yielded overall nearly the same results, there were exceptions that included an odds ratio on either side of the null in multivariable analyses (0.97 vs 1.01) and differences in the magnitude of epidemic curves constructed for zip codes with low population counts. Conclusions: This paper presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in collaborative research for faster insights. (PsycInfo Database Record (c) 2022 APA, all rights reserved)

3.
J Am Med Inform Assoc ; 29(8): 1350-1365, 2022 07 12.
Article in English | MEDLINE | ID: covidwho-1769308

ABSTRACT

OBJECTIVE: This study sought to evaluate whether synthetic data derived from a national coronavirus disease 2019 (COVID-19) dataset could be used for geospatial and temporal epidemic analyses. MATERIALS AND METHODS: Using an original dataset (n = 1 854 968 severe acute respiratory syndrome coronavirus 2 tests) and its synthetic derivative, we compared key indicators of COVID-19 community spread through analysis of aggregate and zip code-level epidemic curves, patient characteristics and outcomes, distribution of tests by zip code, and indicator counts stratified by month and zip code. Similarity between the data was statistically and qualitatively evaluated. RESULTS: In general, synthetic data closely matched original data for epidemic curves, patient characteristics, and outcomes. Synthetic data suppressed labels of zip codes with few total tests (mean = 2.9 ± 2.4; max = 16 tests; 66% reduction of unique zip codes). Epidemic curves and monthly indicator counts were similar between synthetic and original data in a random sample of the most tested (top 1%; n = 171) and for all unsuppressed zip codes (n = 5819), respectively. In small sample sizes, synthetic data utility was notably decreased. DISCUSSION: Analyses on the population-level and of densely tested zip codes (which contained most of the data) were similar between original and synthetically derived datasets. Analyses of sparsely tested populations were less similar and had more data suppression. CONCLUSION: In general, synthetic data were successfully used to analyze geospatial and temporal trends. Analyses using small sample sizes or populations were limited, in part due to purposeful data label suppression-an attribute disclosure countermeasure. Users should consider data fitness for use in these cases.


Subject(s)
COVID-19 , SARS-CoV-2 , Cohort Studies , Humans , United States/epidemiology
4.
Learn Health Syst ; 6(2): e10309, 2022 Apr.
Article in English | MEDLINE | ID: covidwho-1763262

ABSTRACT

The growing availability of multi-scale biomedical data sources that can be used to enable research and improve healthcare delivery has brought about what can be described as a healthcare "data age." This new era is defined by the explosive growth in bio-molecular, clinical, and population-level data that can be readily accessed by researchers, clinicians, and decision-makers, and utilized for systems-level approaches to hypothesis generation and testing as well as operational decision-making. However, taking full advantage of these unprecedented opportunities presents an opportunity to revisit the alignment between traditionally academic biomedical informatics (BMI) and operational healthcare information technology (HIT) personnel and activities in academic health systems. While the history of the academic field of BMI includes active engagement in the delivery of operational HIT platforms, in many contemporary settings these efforts have grown distinct. Recent experiences during the COVID-19 pandemic have demonstrated greater coordination of BMI and HIT activities that have allowed organizations to respond to pandemic-related changes more effectively, with demonstrable and positive impact as a result. In this position paper, we discuss the challenges and opportunities associated with driving alignment between BMI and HIT, as viewed from the perspective of a learning healthcare system. In doing so, we hope to illustrate the benefits of coordination between BMI and HIT in terms of the quality, safety, and outcomes of care provided to patients and populations, demonstrating that these two groups can be "better together."

5.
Clin Epidemiol ; 14: 369-384, 2022.
Article in English | MEDLINE | ID: covidwho-1760056

ABSTRACT

Purpose: Routinely collected real world data (RWD) have great utility in aiding the novel coronavirus disease (COVID-19) pandemic response. Here we present the international Observational Health Data Sciences and Informatics (OHDSI) Characterizing Health Associated Risks and Your Baseline Disease In SARS-COV-2 (CHARYBDIS) framework for standardisation and analysis of COVID-19 RWD. Patients and Methods: We conducted a descriptive retrospective database study using a federated network of data partners in the United States, Europe (the Netherlands, Spain, the UK, Germany, France and Italy) and Asia (South Korea and China). The study protocol and analytical package were released on 11th June 2020 and are iteratively updated via GitHub. We identified three non-mutually exclusive cohorts of 4,537,153 individuals with a clinical COVID-19 diagnosis or positive test, 886,193 hospitalized with COVID-19, and 113,627 hospitalized with COVID-19 requiring intensive services. Results: We aggregated over 22,000 unique characteristics describing patients with COVID-19. All comorbidities, symptoms, medications, and outcomes are described by cohort in aggregate counts and are readily available online. Globally, we observed similarities in the USA and Europe: more women diagnosed than men but more men hospitalized than women, most diagnosed cases between 25 and 60 years of age versus most hospitalized cases between 60 and 80 years of age. South Korea differed with more women than men hospitalized. Common comorbidities included type 2 diabetes, hypertension, chronic kidney disease and heart disease. Common presenting symptoms were dyspnea, cough and fever. Symptom data availability was more common in hospitalized cohorts than diagnosed. Conclusion: We constructed a global, multi-centre view to describe trends in COVID-19 progression, management and evolution over time. By characterising baseline variability in patients and geography, our work provides critical context that may otherwise be misconstrued as data quality issues. This is important as we perform studies on adverse events of special interest in COVID-19 vaccine surveillance.

6.
JAMA Netw Open ; 4(10): e2124946, 2021 10 01.
Article in English | MEDLINE | ID: covidwho-1460117

ABSTRACT

Importance: Machine learning could be used to predict the likelihood of diagnosis and severity of illness. Lack of COVID-19 patient data has hindered the data science community in developing models to aid in the response to the pandemic. Objectives: To describe the rapid development and evaluation of clinical algorithms to predict COVID-19 diagnosis and hospitalization using patient data by citizen scientists, provide an unbiased assessment of model performance, and benchmark model performance on subgroups. Design, Setting, and Participants: This diagnostic and prognostic study operated a continuous, crowdsourced challenge using a model-to-data approach to securely enable the use of regularly updated COVID-19 patient data from the University of Washington by participants from May 6 to December 23, 2020. A postchallenge analysis was conducted from December 24, 2020, to April 7, 2021, to assess the generalizability of models on the cumulative data set as well as subgroups stratified by age, sex, race, and time of COVID-19 test. By December 23, 2020, this challenge engaged 482 participants from 90 teams and 7 countries. Main Outcomes and Measures: Machine learning algorithms used patient data and output a score that represented the probability of patients receiving a positive COVID-19 test result or being hospitalized within 21 days after receiving a positive COVID-19 test result. Algorithms were evaluated using area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC) scores. Ensemble models aggregating models from the top challenge teams were developed and evaluated. Results: In the analysis using the cumulative data set, the best performance for COVID-19 diagnosis prediction was an AUROC of 0.776 (95% CI, 0.775-0.777) and an AUPRC of 0.297, and for hospitalization prediction, an AUROC of 0.796 (95% CI, 0.794-0.798) and an AUPRC of 0.188. Analysis on top models submitting to the challenge showed consistently better model performance on the female group than the male group. Among all age groups, the best performance was obtained for the 25- to 49-year age group, and the worst performance was obtained for the group aged 17 years or younger. Conclusions and Relevance: In this diagnostic and prognostic study, models submitted by citizen scientists achieved high performance for the prediction of COVID-19 testing and hospitalization outcomes. Evaluation of challenge models on demographic subgroups and prospective data revealed performance discrepancies, providing insights into the potential bias and limitations in the models.


Subject(s)
Algorithms , Benchmarking , COVID-19/diagnosis , Clinical Decision Rules , Crowdsourcing , Hospitalization/statistics & numerical data , Machine Learning , Adolescent , Adult , Aged , Aged, 80 and over , Area Under Curve , COVID-19/epidemiology , COVID-19/therapy , COVID-19 Testing , Child , Child, Preschool , Female , Humans , Infant , Infant, Newborn , Male , Middle Aged , Models, Statistical , Prognosis , ROC Curve , Severity of Illness Index , Washington/epidemiology , Young Adult
7.
J Clin Transl Sci ; 5(1): e110, 2021 Mar 16.
Article in English | MEDLINE | ID: covidwho-1269358

ABSTRACT

The recipients of NIH's Clinical and Translational Science Awards (CTSA) have worked for over a decade to build informatics infrastructure in support of clinical and translational research. This infrastructure has proved invaluable for supporting responses to the current COVID-19 pandemic through direct patient care, clinical decision support, training researchers and practitioners, as well as public health surveillance and clinical research to levels that could not have been accomplished without the years of ground-laying work by the CTSAs. In this paper, we provide a perspective on our COVID-19 work and present relevant results of a survey of CTSA sites to broaden our understanding of the key features of their informatics programs, the informatics-related challenges they have experienced under COVID-19, and some of the innovations and solutions they developed in response to the pandemic. Responses demonstrated increased reliance by healthcare providers and researchers on access to electronic health record (EHR) data, both for local needs and for sharing with other institutions and national consortia. The initial work of the CTSAs on data capture, standards, interchange, and sharing policies all contributed to solutions, best illustrated by the creation, in record time, of a national clinical data repository in the National COVID-19 Cohort Collaborative (N3C). The survey data support seven recommendations for areas of informatics and public health investment and further study to support clinical and translational research in the post-COVID-19 era.

8.
J Am Med Inform Assoc ; 28(2): 393-401, 2021 02 15.
Article in English | MEDLINE | ID: covidwho-1054313

ABSTRACT

Our goal is to summarize the collective experience of 15 organizations in dealing with uncoordinated efforts that result in unnecessary delays in understanding, predicting, preparing for, containing, and mitigating the COVID-19 pandemic in the US. Response efforts involve the collection and analysis of data corresponding to healthcare organizations, public health departments, socioeconomic indicators, as well as additional signals collected directly from individuals and communities. We focused on electronic health record (EHR) data, since EHRs can be leveraged and scaled to improve clinical care, research, and to inform public health decision-making. We outline the current challenges in the data ecosystem and the technology infrastructure that are relevant to COVID-19, as witnessed in our 15 institutions. The infrastructure includes registries and clinical data networks to support population-level analyses. We propose a specific set of strategic next steps to increase interoperability, overall organization, and efficiencies.


Subject(s)
COVID-19 , Electronic Health Records , Information Dissemination , Information Systems/organization & administration , Public Health Practice , Academic Medical Centers , Humans , Registries , United States
9.
J Am Med Inform Assoc ; 27(11): 1721-1726, 2020 11 01.
Article in English | MEDLINE | ID: covidwho-1024117

ABSTRACT

Global pandemics call for large and diverse healthcare data to study various risk factors, treatment options, and disease progression patterns. Despite the enormous efforts of many large data consortium initiatives, scientific community still lacks a secure and privacy-preserving infrastructure to support auditable data sharing and facilitate automated and legally compliant federated analysis on an international scale. Existing health informatics systems do not incorporate the latest progress in modern security and federated machine learning algorithms, which are poised to offer solutions. An international group of passionate researchers came together with a joint mission to solve the problem with our finest models and tools. The SCOR Consortium has developed a ready-to-deploy secure infrastructure using world-class privacy and security technologies to reconcile the privacy/utility conflicts. We hope our effort will make a change and accelerate research in future pandemics with broad and diverse samples on an international scale.


Subject(s)
Biomedical Research , Computer Security , Coronavirus Infections , Information Dissemination , Pandemics , Pneumonia, Viral , Privacy , COVID-19 , Humans , Information Dissemination/ethics , Internationality , Machine Learning
10.
medRxiv ; 2020 Oct 27.
Article in English | MEDLINE | ID: covidwho-915971

ABSTRACT

Early identification of symptoms and comorbidities most predictive of COVID-19 is critical to identify infection, guide policies to effectively contain the pandemic, and improve health systems' response. Here, we characterised socio-demographics and comorbidity in 3,316,107persons tested and 219,072 persons tested positive for SARS-CoV-2 since January 2020, and their key health outcomes in the month following the first positive test. Routine care data from primary care electronic health records (EHR) from Spain, hospital EHR from the United States (US), and claims data from South Korea and the US were used. The majority of study participants were women aged 18-65 years old. Positive/tested ratio varied greatly geographically (2.2:100 to 31.2:100) and over time (from 50:100 in February-April to 6.8:100 in May-June). Fever, cough and dyspnoea were the most common symptoms at presentation. Between 4%-38% required admission and 1-10.5% died within a month from their first positive test. Observed disparity in testing practices led to variable baseline characteristics and outcomes, both nationally (US) and internationally. Our findings highlight the importance of large scale characterization of COVID-19 international cohorts to inform planning and resource allocation including testing as countries face a second wave.

11.
J Am Med Inform Assoc ; 28(3): 427-443, 2021 03 01.
Article in English | MEDLINE | ID: covidwho-719257

ABSTRACT

OBJECTIVE: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. MATERIALS AND METHODS: The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. RESULTS: Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. CONCLUSIONS: The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19.


Subject(s)
COVID-19 , Data Science/organization & administration , Information Dissemination , Intersectoral Collaboration , Computer Security , Data Analysis , Ethics Committees, Research , Government Regulation , Humans , National Institutes of Health (U.S.) , United States
SELECTION OF CITATIONS
SEARCH DETAIL