Comparison of the Quality of Discharge Letters Written by Large Language Models and Junior Clinicians: Single-Blinded Study.

Tung, Joshua Yi Min; Gill, Sunil Ravinder; Sng, Gerald Gui Ren; Lim, Daniel Yan Zheng; Ke, Yuhe; Tan, Ting Fang; Jin, Liyuan; Elangovan, Kabilan; Ong, Jasmine Chiat Ling; Abdullah, Hairil Rizal; Ting, Daniel Shu Wei; Chong, Tsung Wen

Tung, Joshua Yi Min; Gill, Sunil Ravinder; Sng, Gerald Gui Ren; Lim, Daniel Yan Zheng; Ke, Yuhe; Tan, Ting Fang; Jin, Liyuan; Elangovan, Kabilan; Ong, Jasmine Chiat Ling; Abdullah, Hairil Rizal; Ting, Daniel Shu Wei; Chong, Tsung Wen.

Afiliación

Tung JYM; Department of Urology, Singapore General Hospital, Singapore, Singapore.
Gill SR; Data Science and Artificial Intelligence Laboratory, Singapore General Hospital, Singapore, Singapore.
Sng GGR; Department of Urology, Singapore General Hospital, Singapore, Singapore.
Lim DYZ; Data Science and Artificial Intelligence Laboratory, Singapore General Hospital, Singapore, Singapore.
Ke Y; Department of Endocrinology, Singapore General Hospital, Singapore, Singapore.
Tan TF; Data Science and Artificial Intelligence Laboratory, Singapore General Hospital, Singapore, Singapore.
Jin L; Department of Gastroenterology and Hepatology, Singapore General Hospital, Singapore, Singapore.
Elangovan K; Data Science and Artificial Intelligence Laboratory, Singapore General Hospital, Singapore, Singapore.
Ong JCL; Department of Anesthesiology, Singapore General Hospital, Singapore, Singapore.
Abdullah HR; Singapore Eye Research Institute, Singapore National Eye Center, Singapore, Singapore.
Ting DSW; Duke-NUS Medical School, Singapore, Singapore.
Chong TW; Artificial Intelligence Office, Singapore Health Services, Singapore, Singapore.

J Med Internet Res ; 26: e57721, 2024 Jul 24.

Article en En | MEDLINE | ID: mdl-39047282

ABSTRACT

ABSTRACT

BACKGROUND:

Discharge letters are a critical component in the continuity of care between specialists and primary care providers. However, these letters are time-consuming to write, underprioritized in comparison to direct clinical care, and are often tasked to junior doctors. Prior studies assessing the quality of discharge summaries written for inpatient hospital admissions show inadequacies in many domains. Large language models such as GPT have the ability to summarize large volumes of unstructured free text such as electronic medical records and have the potential to automate such tasks, providing time savings and consistency in quality.

OBJECTIVE:

The aim of this study was to assess the performance of GPT-4 in generating discharge letters written from urology specialist outpatient clinics to primary care providers and to compare their quality against letters written by junior clinicians.

METHODS:

Fictional electronic records were written by physicians simulating 5 common urology outpatient cases with long-term follow-up. Records comprised simulated consultation notes, referral letters and replies, and relevant discharge summaries from inpatient admissions. GPT-4 was tasked to write discharge letters for these cases with a specified target audience of primary care providers who would be continuing the patient's care. Prompts were written for safety, content, and style. Concurrently, junior clinicians were provided with the same case records and instructional prompts. GPT-4 output was assessed for instances of hallucination. A blinded panel of primary care physicians then evaluated the letters using a standardized questionnaire tool.

RESULTS:

GPT-4 outperformed human counterparts in information provision (mean 4.32, SD 0.95 vs 3.70, SD 1.27; P=.03) and had no instances of hallucination. There were no statistically significant differences in the mean clarity (4.16, SD 0.95 vs 3.68, SD 1.24; P=.12), collegiality (4.36, SD 1.00 vs 3.84, SD 1.22; P=.05), conciseness (3.60, SD 1.12 vs 3.64, SD 1.27; P=.71), follow-up recommendations (4.16, SD 1.03 vs 3.72, SD 1.13; P=.08), and overall satisfaction (3.96, SD 1.14 vs 3.62, SD 1.34; P=.36) between the letters generated by GPT-4 and humans, respectively.

CONCLUSIONS:

Discharge letters written by GPT-4 had equivalent quality to those written by junior clinicians, without any hallucinations. This study provides a proof of concept that large language models can be useful and safe tools in clinical documentation.

Asunto(s)

Alta del Paciente; Humanos; Alta del Paciente/normas; Registros Electrónicos de Salud/normas; Método Simple Ciego; Lenguaje

Palabras clave

AI; ChatGPT; LLM; artificial intelligence; consultation note; continuity of care; discharge summaries; fictional electronic record; junior clinician; large language model; letter writing; primary care; referral letter; simulated environment; single-blinded; urology

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Alta del Paciente Límite: Humans Idioma: En Revista: J Med Internet Res Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Singapur

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google