Accuracy and consistency of publicly available Large Language Models as clinical decision support tools for the management of colon cancer.

Kaiser, Kristen N; Hughes, Alexa J; Yang, Anthony D; Turk, Anita A; Mohanty, Sanjay; Gonzalez, Andrew A; Patzer, Rachel E; Bilimoria, Karl Y; Ellis, Ryan J

Kaiser, Kristen N; Hughes, Alexa J; Yang, Anthony D; Turk, Anita A; Mohanty, Sanjay; Gonzalez, Andrew A; Patzer, Rachel E; Bilimoria, Karl Y; Ellis, Ryan J.

Afiliação

Kaiser KN; Department of Surgery, Indiana School of Medicine, Surgical Outcomes and Quality Improvement Center (SOQIC), Indianapolis, Indiana, USA.
Hughes AJ; Department of Surgery, Indiana School of Medicine, Surgical Outcomes and Quality Improvement Center (SOQIC), Indianapolis, Indiana, USA.
Yang AD; Department of Surgery, Indiana School of Medicine, Surgical Outcomes and Quality Improvement Center (SOQIC), Indianapolis, Indiana, USA.
Turk AA; Department of Surgery, Division of Surgical Oncology, Indiana University School of Medicine, Indianapolis, Indiana, USA.
Mohanty S; Department of Medicine, Division of Hematology & Oncology, Indiana University School of Medicine, Indianapolis, Indiana, USA.
Gonzalez AA; Department of Surgery, Indiana School of Medicine, Surgical Outcomes and Quality Improvement Center (SOQIC), Indianapolis, Indiana, USA.
Patzer RE; Department of Surgery, Division of Surgical Oncology, Indiana University School of Medicine, Indianapolis, Indiana, USA.
Bilimoria KY; Department of Surgery, Indiana School of Medicine, Surgical Outcomes and Quality Improvement Center (SOQIC), Indianapolis, Indiana, USA.
Ellis RJ; Department of Surgery, Indiana School of Medicine, Surgical Outcomes and Quality Improvement Center (SOQIC), Indianapolis, Indiana, USA.

J Surg Oncol ; 2024 Aug 19.

Article em En | MEDLINE | ID: mdl-39155667

ABSTRACT

ABSTRACT

BACKGROUND:

Large Language Models (LLM; e.g., ChatGPT) may be used to assist clinicians and form the basis of future clinical decision support (CDS) for colon cancer. The objectives of this study were to (1) evaluate the response accuracy of two LLM-powered interfaces in identifying guideline-based care in simulated clinical scenarios and (2) define response variation between and within LLMs.

METHODS:

Clinical scenarios with "next steps in management" queries were developed based on National Comprehensive Cancer Network guidelines. Prompts were entered into OpenAI ChatGPT and Microsoft Copilot in independent sessions, yielding four responses per scenario. Responses were compared to clinician-developed responses and assessed for accuracy, consistency, and verbosity.

RESULTS:

Across 108 responses to 27 prompts, both platforms yielded completely correct responses to 36% of scenarios (n = 39). For ChatGPT, 39% (n = 21) were missing information and 24% (n = 14) contained inaccurate/misleading information. Copilot performed similarly, with 37% (n = 20) having missing information and 28% (n = 15) containing inaccurate/misleading information (p = 0.96). Clinician responses were significantly shorter (34 ± 15.5 words) than both ChatGPT (251 ± 86 words) and Copilot (271 ± 67 words; both p < 0.01).

CONCLUSIONS:

Publicly available LLM applications often provide verbose responses with vague or inaccurate information regarding colon cancer management. Significant optimization is required before use in formal CDS.

Palavras-chave

ChatGPT; NCCN Guidelines; clinical decision support; colorectal; large language models; oncology

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: J Surg Oncol / J. surg. oncol / Journal of surgical oncology Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos País de publicação: Estados Unidos

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google