Your browser doesn't support javascript.
Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing.
Fernandes, Marta; Sun, Haoqi; Jain, Aayushee; Alabsi, Haitham S; Brenner, Laura N; Ye, Elissa; Ge, Wendong; Collens, Sarah I; Leone, Michael J; Das, Sudeshna; Robbins, Gregory K; Mukerji, Shibani S; Westover, M Brandon.
  • Fernandes M; Department of Neurology, Massachusetts General Hospital, Boston, MA, United States.
  • Sun H; Clinical Data Animation Center, Boston, MA, United States.
  • Jain A; Harvard Medical School, Boston, MA, United States.
  • Alabsi HS; Department of Neurology, Massachusetts General Hospital, Boston, MA, United States.
  • Brenner LN; Clinical Data Animation Center, Boston, MA, United States.
  • Ye E; Harvard Medical School, Boston, MA, United States.
  • Ge W; Department of Neurology, Massachusetts General Hospital, Boston, MA, United States.
  • Collens SI; Clinical Data Animation Center, Boston, MA, United States.
  • Leone MJ; Department of Neurology, Massachusetts General Hospital, Boston, MA, United States.
  • Das S; Harvard Medical School, Boston, MA, United States.
  • Robbins GK; Harvard Medical School, Boston, MA, United States.
  • Mukerji SS; Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, MA, United States.
  • Westover MB; Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA, United States.
JMIR Med Inform ; 9(2): e25457, 2021 Feb 10.
Article in English | MEDLINE | ID: covidwho-1032549
ABSTRACT

BACKGROUND:

Medical notes are a rich source of patient data; however, the nature of unstructured text has largely precluded the use of these data for large retrospective analyses. Transforming clinical text into structured data can enable large-scale research studies with electronic health records (EHR) data. Natural language processing (NLP) can be used for text information retrieval, reducing the need for labor-intensive chart review. Here we present an application of NLP to large-scale analysis of medical records at 2 large hospitals for patients hospitalized with COVID-19.

OBJECTIVE:

Our study goal was to develop an NLP pipeline to classify the discharge disposition (home, inpatient rehabilitation, skilled nursing inpatient facility [SNIF], and death) of patients hospitalized with COVID-19 based on hospital discharge summary notes.

METHODS:

Text mining and feature engineering were applied to unstructured text from hospital discharge summaries. The study included patients with COVID-19 discharged from 2 hospitals in the Boston, Massachusetts area (Massachusetts General Hospital and Brigham and Women's Hospital) between March 10, 2020, and June 30, 2020. The data were divided into a training set (70%) and hold-out test set (30%). Discharge summaries were represented as bags-of-words consisting of single words (unigrams), bigrams, and trigrams. The number of features was reduced during training by excluding n-grams that occurred in fewer than 10% of discharge summaries, and further reduced using least absolute shrinkage and selection operator (LASSO) regularization while training a multiclass logistic regression model. Model performance was evaluated using the hold-out test set.

RESULTS:

The study cohort included 1737 adult patients (median age 61 [SD 18] years; 55% men; 45% White and 16% Black; 14% nonsurvivors and 61% discharged home). The model selected 179 from a vocabulary of 1056 engineered features, consisting of combinations of unigrams, bigrams, and trigrams. The top features contributing most to the classification by the model (for each outcome) were the following "appointments specialty," "home health," and "home care" (home); "intubate" and "ARDS" (inpatient rehabilitation); "service" (SNIF); "brief assessment" and "covid" (death). The model achieved a micro-average area under the receiver operating characteristic curve value of 0.98 (95% CI 0.97-0.98) and average precision of 0.81 (95% CI 0.75-0.84) in the testing set for prediction of discharge disposition.

CONCLUSIONS:

A supervised learning-based NLP approach is able to classify the discharge disposition of patients hospitalized with COVID-19. This approach has the potential to accelerate and increase the scale of research on patients' discharge disposition that is possible with EHR data.
Keywords

Full text: Available Collection: International databases Database: MEDLINE Type of study: Cohort study / Experimental Studies / Observational study / Prognostic study Language: English Journal: JMIR Med Inform Year: 2021 Document Type: Article Affiliation country: 25457

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Type of study: Cohort study / Experimental Studies / Observational study / Prognostic study Language: English Journal: JMIR Med Inform Year: 2021 Document Type: Article Affiliation country: 25457