Accuracy and consistency of publicly available Large Language Models as clinical decision support tools for the management of colon cancer.
J Surg Oncol
; 2024 Aug 19.
Article
em En
| MEDLINE
| ID: mdl-39155667
ABSTRACT
BACKGROUND:
Large Language Models (LLM; e.g., ChatGPT) may be used to assist clinicians and form the basis of future clinical decision support (CDS) for colon cancer. The objectives of this study were to (1) evaluate the response accuracy of two LLM-powered interfaces in identifying guideline-based care in simulated clinical scenarios and (2) define response variation between and within LLMs.METHODS:
Clinical scenarios with "next steps in management" queries were developed based on National Comprehensive Cancer Network guidelines. Prompts were entered into OpenAI ChatGPT and Microsoft Copilot in independent sessions, yielding four responses per scenario. Responses were compared to clinician-developed responses and assessed for accuracy, consistency, and verbosity.RESULTS:
Across 108 responses to 27 prompts, both platforms yielded completely correct responses to 36% of scenarios (n = 39). For ChatGPT, 39% (n = 21) were missing information and 24% (n = 14) contained inaccurate/misleading information. Copilot performed similarly, with 37% (n = 20) having missing information and 28% (n = 15) containing inaccurate/misleading information (p = 0.96). Clinician responses were significantly shorter (34 ± 15.5 words) than both ChatGPT (251 ± 86 words) and Copilot (271 ± 67 words; both p < 0.01).CONCLUSIONS:
Publicly available LLM applications often provide verbose responses with vague or inaccurate information regarding colon cancer management. Significant optimization is required before use in formal CDS.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Idioma:
En
Revista:
J Surg Oncol
/
J. surg. oncol
/
Journal of surgical oncology
Ano de publicação:
2024
Tipo de documento:
Article
País de afiliação:
Estados Unidos
País de publicação:
Estados Unidos