Pesquisa | Portal Regional da BVS (teste)

A comparative analysis of ChatGPT, ChatGPT-4 and Google Bard performances at the Advanced Burn Life Support Exam.

Alessandri-Bonetti, Mario; Liu, Hilary Y; Donovan, James M; Ziembicki, Jenny A; Egro, Francesco M.

J Burn Care Res ; 2024 Jun 04.

Artigo em Inglês | MEDLINE | ID: mdl-38833383

RESUMO

Artificial intelligence and Large Language Models (LLM) have recently gained attention as promising tools in various healthcare domains, offering potential benefits in clinical decision-making, medical education and research. The Advanced Burn Life Support (ABLS) program is a didactic initiative endorsed by the American Burn Association, aiming to provide knowledge on the immediate care of the severely burn patient. The aim of the study was to compare the performance of three LLMs (ChatGPT-3.5, ChatGPT-4 and Google Bard) on the ABLS exam. The ABLS exam consists of 50 questions with 5 multiple choice answers. The passing threshold is 80% of correct answers. The three LLMs were queried with the 50 questions included in the latest version of the ABLS exam, on July 18th, 2023. ChatGPT-3.5 scored 86% (43 out of 50), ChatGPT-4 scored 90% (45 out of 50), and Bard scored 70% (35 out of 50). No difference was measured between ChatGPT-3.5 and ChatGPT-4 (p=0.538) and between ChatGPT-3.5 and Bard (p=0.054), despite the borderline p-value. ChatGPT-4 performed significantly better than Bard (p=0.012). Out of the 50 questions, 78% (n=39) were direct questions, while 12% (n=11) were presented as clinical scenarios. No difference in the rate of wrong answers was found based on the type of question for the three LLMs. ChatGPT-3.5 and ChatGPT-4 demonstrated high accuracy at the ABLS exam, and outperformed Google Bard. However, the potential multiple applications of LLMs in emergency burn and trauma care necessitate appropriate surveillance and most likely should represent a tool to complement human cognition.

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA