Natural Language Processing for Improved Characterization of COVID-19 Symptoms: An Observational Study of 350,000 Patients in a Large Integrated Healthcare System.

Malden, Deborah Ellen; Tartof, Sara Y; Ackerson, Bradley Kent; Hong, Vennis; Skarbinski, Jacek; Yau, Vincent; Qian, Lei; Fischer, Heidi; Shaw, Sally; Caparosa, Susan; Xie, Fagen

Malden, Deborah Ellen; Tartof, Sara Y; Ackerson, Bradley Kent; Hong, Vennis; Skarbinski, Jacek; Yau, Vincent; Qian, Lei; Fischer, Heidi; Shaw, Sally; Caparosa, Susan; Xie, Fagen.

Malden DE; Epidemic Intelligence Service, Centers for Disease Control & Prevention, 1200 Clifton Rd, Atlanta, US.
Tartof SY; Department of Research & Evaluation, Kaiser Permanente Southern California, Pasadena, US.
Ackerson BK; Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, US.
Hong V; Southern California Permanente Medical Group,, Harbor City, US.
Skarbinski J; Department of Research & Evaluation, Kaiser Permanente Southern California, Pasadena, US.
Yau V; The Permanente Medical Group and Division of Research, Kaiser Permanente Northern California, Oakland, US.
Qian L; Genentech, a Member of the Roche Group, San Francisco, US.
Fischer H; Department of Research & Evaluation, Kaiser Permanente Southern California, Pasadena, US.
Shaw S; Department of Research & Evaluation, Kaiser Permanente Southern California, Pasadena, US.
Caparosa S; Department of Research & Evaluation, Kaiser Permanente Southern California, Pasadena, US.
Xie F; Department of Research & Evaluation, Kaiser Permanente Southern California, Pasadena, US.

JMIR Public Health Surveill ; 2022 Nov 29.

Article in English | MEDLINE | ID: covidwho-2141444

ABSTRACT

ABSTRACT

BACKGROUND:

Natural language processing (NLP) of unstructured text from Electronic Medical Records (EMR) can improve characterization of COVID-19 signs and symptoms, but large-scale studies demonstrating the real-world application and validation of NLP for this purpose are limited.

OBJECTIVE:

To assess the contribution of NLP when identifying COVID-19 signs and symptoms from EMR.

METHODS:

This study was conducted in Kaiser Permanente Southern California, a large integrated healthcare system using data from all patients with positive SARS-CoV-2 laboratory tests from March 2020 to May 2021. An NLP algorithm was developed to extract free text from EMR on 12 established signs and symptoms of COVID-19, including fever, cough, headache, fatigue, dyspnea, chills, sore throat, myalgia, anosmia, diarrhea, vomiting/nausea and abdominal pain. The proportion of patients reporting each symptom and the corresponding onset dates were described before and after supplementing structured EMR data with NLP-extracted signs and symptoms. A random sample of 100 chart-reviewed and adjudicated SARS-CoV-2 positive cases were used to validate the algorithm performance.

RESULTS:

A total of 359,938 patients (mean age 40.4 years; 53% female) with confirmed SARS-CoV-2 infection were identified over the study period. The most common signs and symptoms identified through NLP-supplemented analyses were cough (61%), fever (52%), myalgia (43%), and headache (40%). The NLP algorithm identified an additional 55,568 (15%) symptomatic cases that were previously defined as asymptomatic using structured data alone. The proportion of additional cases with each selected symptom identified in NLP-supplemented analysis varied across the selected symptoms, from 29% of all records for cough, to 61% of all records with nausea or vomiting. Of 295,305 symptomatic patients, the median time from symptom onset to testing was 3 days using structured data alone, whereas the NLP-algorithm identified signs or symptoms approximately one day earlier. When validated against chart-reviewed cases, the NLP algorithm successfully identified most signs and symptoms with consistently high sensitivity (ranging from 87% to 100%) and specificity (94% to 100%).

CONCLUSIONS:

These findings demonstrate that NLP can identify and characterize a broad set of COVID-19 signs and symptoms from unstructured data within the EMR, with enhanced detail and timeliness compared with structured data alone.

Fulltext

XML

PubMed Links

Search on Google

Full text: Available Collection: International databases Database: MEDLINE Type of study: Experimental Studies / Observational study / Prognostic study / Randomized controlled trials Language: English Year: 2022 Document Type: Article Affiliation country: 41529

Similar

MEDLINE

LILACS

LIS

Fulltext

XML

PubMed Links

Search on Google