Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 3 de 3
J Am Med Inform Assoc ; 29(4): 609-618, 2022 03 15.
Article in English | MEDLINE | ID: covidwho-1443051


OBJECTIVE: In response to COVID-19, the informatics community united to aggregate as much clinical data as possible to characterize this new disease and reduce its impact through collaborative analytics. The National COVID Cohort Collaborative (N3C) is now the largest publicly available HIPAA limited dataset in US history with over 6.4 million patients and is a testament to a partnership of over 100 organizations. MATERIALS AND METHODS: We developed a pipeline for ingesting, harmonizing, and centralizing data from 56 contributing data partners using 4 federated Common Data Models. N3C data quality (DQ) review involves both automated and manual procedures. In the process, several DQ heuristics were discovered in our centralized context, both within the pipeline and during downstream project-based analysis. Feedback to the sites led to many local and centralized DQ improvements. RESULTS: Beyond well-recognized DQ findings, we discovered 15 heuristics relating to source Common Data Model conformance, demographics, COVID tests, conditions, encounters, measurements, observations, coding completeness, and fitness for use. Of 56 sites, 37 sites (66%) demonstrated issues through these heuristics. These 37 sites demonstrated improvement after receiving feedback. DISCUSSION: We encountered site-to-site differences in DQ which would have been challenging to discover using federated checks alone. We have demonstrated that centralized DQ benchmarking reveals unique opportunities for DQ improvement that will support improved research analytics locally and in aggregate. CONCLUSION: By combining rapid, continual assessment of DQ with a large volume of multisite data, it is possible to support more nuanced scientific questions with the scale and rigor that they require.

COVID-19 , Cohort Studies , Data Accuracy , Health Insurance Portability and Accountability Act , Humans , United States
Microbiol Spectr ; 9(1): e0032721, 2021 09 03.
Article in English | MEDLINE | ID: covidwho-1361971


In the absence of genome sequencing, two positive molecular tests for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) separated by negative tests, prolonged time, and symptom resolution remain the best surrogate measure of possible reinfection. Using a large electronic health record database, we characterized clinical and testing data for 23 patients with repeatedly positive SARS-CoV-2 PCR test results ≥60 days apart, separated by ≥2 consecutive negative test results. The prevalence of chronic medical conditions, symptoms, and severe outcomes related to coronavirus disease 19 (COVID-19) illness were ascertained. The median age of patients was 64.5 years, 40% were Black, and 39% were female. A total of 83% smoked within the prior year, 61% were overweight/obese, 83% had immunocompromising conditions, and 96% had ≥2 comorbidities. The median interval between the two positive tests was 77 days. Among the 19 patients with 60 to 89 days between positive tests, 17 (89%) exhibited symptoms or clinical manifestations consistent with COVID-19 at the time of the second positive test and 14 (74%) were hospitalized at the second positive test. Of the four patients with ≥90 days between two positive tests (patient 2 [PT2], PT8, PT14, and PT19), two had mild or no symptoms at the second positive test and one, an immunocompromised patient, had a brief hospitalization at the first diagnosis, followed by intensive care unit (ICU) admission at the second diagnosis 3 months later. Our study demonstrated a high prevalence of compromised immune systems, comorbidities, obesity, and smoking among patients with repeatedly positive SARS-CoV-2 tests. Despite limitations, including a lack of semiquantitative estimates of viral load, these data may help prioritize suspected cases of reinfection for investigation and continued surveillance. IMPORTANCE The comprehensive characterization of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) testing and clinical data for patients with repeatedly positive SARS-CoV-2 tests can help prioritize suspected cases of reinfection for investigation in the absence of genome sequencing data and for continued surveillance of the potential long-term health consequences of SARS-CoV-2 infection.

COVID-19 Testing , COVID-19/diagnosis , COVID-19/epidemiology , Electronic Health Records , SARS-CoV-2/isolation & purification , Adult , Aged , Comorbidity , Databases, Factual , Female , Health Surveys , Humans , Immune System , Male , Middle Aged , Obesity , Polymerase Chain Reaction , Risk Factors , Smoking , Viral Load
J Am Med Inform Assoc ; 27(11): 1721-1726, 2020 11 01.
Article in English | MEDLINE | ID: covidwho-1024117


Global pandemics call for large and diverse healthcare data to study various risk factors, treatment options, and disease progression patterns. Despite the enormous efforts of many large data consortium initiatives, scientific community still lacks a secure and privacy-preserving infrastructure to support auditable data sharing and facilitate automated and legally compliant federated analysis on an international scale. Existing health informatics systems do not incorporate the latest progress in modern security and federated machine learning algorithms, which are poised to offer solutions. An international group of passionate researchers came together with a joint mission to solve the problem with our finest models and tools. The SCOR Consortium has developed a ready-to-deploy secure infrastructure using world-class privacy and security technologies to reconcile the privacy/utility conflicts. We hope our effort will make a change and accelerate research in future pandemics with broad and diverse samples on an international scale.

Biomedical Research , Computer Security , Coronavirus Infections , Information Dissemination , Pandemics , Pneumonia, Viral , Privacy , COVID-19 , Humans , Information Dissemination/ethics , Internationality , Machine Learning