Natural Language Processing for Improved Characterization of COVID-19 Symptoms: An Observational Study of 350,000 Patients in a Large Integrated Healthcare System.
JMIR Public Health Surveill
; 2022 Nov 29.
Article
in English
| MEDLINE | ID: covidwho-2141444
ABSTRACT
BACKGROUND:
Natural language processing (NLP) of unstructured text from Electronic Medical Records (EMR) can improve characterization of COVID-19 signs and symptoms, but large-scale studies demonstrating the real-world application and validation of NLP for this purpose are limited.OBJECTIVE:
To assess the contribution of NLP when identifying COVID-19 signs and symptoms from EMR.METHODS:
This study was conducted in Kaiser Permanente Southern California, a large integrated healthcare system using data from all patients with positive SARS-CoV-2 laboratory tests from March 2020 to May 2021. An NLP algorithm was developed to extract free text from EMR on 12 established signs and symptoms of COVID-19, including fever, cough, headache, fatigue, dyspnea, chills, sore throat, myalgia, anosmia, diarrhea, vomiting/nausea and abdominal pain. The proportion of patients reporting each symptom and the corresponding onset dates were described before and after supplementing structured EMR data with NLP-extracted signs and symptoms. A random sample of 100 chart-reviewed and adjudicated SARS-CoV-2 positive cases were used to validate the algorithm performance.RESULTS:
A total of 359,938 patients (mean age 40.4 years; 53% female) with confirmed SARS-CoV-2 infection were identified over the study period. The most common signs and symptoms identified through NLP-supplemented analyses were cough (61%), fever (52%), myalgia (43%), and headache (40%). The NLP algorithm identified an additional 55,568 (15%) symptomatic cases that were previously defined as asymptomatic using structured data alone. The proportion of additional cases with each selected symptom identified in NLP-supplemented analysis varied across the selected symptoms, from 29% of all records for cough, to 61% of all records with nausea or vomiting. Of 295,305 symptomatic patients, the median time from symptom onset to testing was 3 days using structured data alone, whereas the NLP-algorithm identified signs or symptoms approximately one day earlier. When validated against chart-reviewed cases, the NLP algorithm successfully identified most signs and symptoms with consistently high sensitivity (ranging from 87% to 100%) and specificity (94% to 100%).CONCLUSIONS:
These findings demonstrate that NLP can identify and characterize a broad set of COVID-19 signs and symptoms from unstructured data within the EMR, with enhanced detail and timeliness compared with structured data alone.
Full text:
Available
Collection:
International databases
Database:
MEDLINE
Type of study:
Experimental Studies
/
Observational study
/
Prognostic study
/
Randomized controlled trials
Language:
English
Year:
2022
Document Type:
Article
Affiliation country:
41529
Similar
MEDLINE
...
LILACS
LIS