Your browser doesn't support javascript.
loading
Comparative Evaluation of AI Models Such as ChatGPT 3.5, ChatGPT 4.0, and Google Gemini in Neuroradiology Diagnostics.
Gupta, Rishi; Hamid, Abdullgabbar M; Jhaveri, Miral; Patel, Niki; Suthar, Pokhraj P.
Afiliação
  • Gupta R; Department of Diagnostic Radiology and Nuclear Medicine, Rush University Medical Center, Chicago, USA.
  • Hamid AM; Department of Diagnostic Radiology and Nuclear Medicine, Rush University Medical Center, Chicago, USA.
  • Jhaveri M; Department of Diagnostic Radiology and Nuclear Medicine, Rush University Medical Center, Chicago, USA.
  • Patel N; Department of Osteopathic Medicine, Kentucky College of Osteopathic Medicine, Pikeville, USA.
  • Suthar PP; Department of Diagnostic Radiology and Nuclear Medicine, Rush University Medical Center, Chicago, USA.
Cureus ; 16(8): e67766, 2024 Aug.
Article em En | MEDLINE | ID: mdl-39323714
ABSTRACT
AIMS AND

OBJECTIVE:

Advances in artificial intelligence (AI), particularly in large language models (LLMs) like ChatGPT (versions 3.5 and 4.0) and Google Gemini, are transforming healthcare. This study explores the performance of these AI models in solving diagnostic quizzes from "Neuroradiology A Core Review" to evaluate their potential as diagnostic tools in radiology. MATERIALS AND

METHODS:

We assessed the accuracy of ChatGPT 3.5, ChatGPT 4.0, and Google Gemini using 262 multiple-choice questions covering brain, head and neck, spine, and non-interpretive skills. Each AI tool provided answers and explanations, which were compared to textbook answers. The analysis followed the STARD (Standards for Reporting of Diagnostic Accuracy Studies) guidelines, and accuracy was calculated for each AI tool and subgroup.

RESULTS:

ChatGPT 4.0 achieved the highest overall accuracy at 64.89%, outperforming ChatGPT 3.5 (62.60%) and Google Gemini (55.73%). ChatGPT 4.0 excelled in brain, head, and neck diagnostics, while Google Gemini performed best in head and neck but lagged in other areas. ChatGPT 3.5 showed consistent performance across all subgroups.

CONCLUSION:

This study found that advanced AI models, including ChatGPT 4.0 and Google Gemini, vary in diagnostic accuracy, with ChatGPT 4.0 leading at 64.89% overall. While these tools are promising in improving diagnostics and medical education, their effectiveness varies by area, and Google Gemini performs unevenly across different categories. The study underscores the need for ongoing improvements and broader evaluation to address ethical concerns and optimize AI use in patient care.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Cureus Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos País de publicação: Estados Unidos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Cureus Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos País de publicação: Estados Unidos