Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.

Is, Enes Efe; Menekseoglu, Ahmet Kivanc

Is, Enes Efe; Menekseoglu, Ahmet Kivanc.

Afiliação

Is EE; Department of Physical Medicine and Rehabilitation, Sisli Hamidiye Etfal Training and Research Hospital, University of Health Sciences, Seyrantepe Campus, Cumhuriyet ve Demokrasi Avenue, Istanbul, Turkey. enefeis@gmail.com.
Menekseoglu AK; Department of Physical Medicine and Rehabilitation, Kanuni Sultan Süleyman Training and Research Hospital, University of Health Sciences, Istanbul, Turkey.

Clin Rheumatol ; 2024 Sep 28.

Article em En | MEDLINE | ID: mdl-39340572

ABSTRACT

ABSTRACT

OBJECTIVES:

This study evaluates the performance of AI models, ChatGPT-4o and Google Gemini, in answering rheumatology board-level questions, comparing their effectiveness, reliability, and applicability in clinical practice.

METHOD:

A cross-sectional study was conducted using 420 rheumatology questions from the BoardVitals question bank, excluding 27 visual data questions. Both artificial intelligence models categorized the questions according to difficulty (easy, medium, hard) and answered them. In addition, the reliability of the answers was assessed by asking the questions a second time. The accuracy, reliability, and difficulty categorization of the AI models' response to the questions were analyzed.

RESULTS:

ChatGPT-4o answered 86.9% of the questions correctly, significantly outperforming Google Gemini's 60.2% accuracy (p < 0.001). When the questions were asked a second time, the success rate was 86.7% for ChatGPT-4o and 60.5% for Google Gemini. Both models mainly categorized questions as medium difficulty. ChatGPT-4o showed higher accuracy in various rheumatology subfields, notably in Basic and Clinical Science (p = 0.028), Osteoarthritis (p = 0.023), and Rheumatoid Arthritis (p < 0.001).

CONCLUSIONS:

ChatGPT-4o significantly outperformed Google Gemini in rheumatology board-level questions. This demonstrates the success of ChatGPT-4o in situations requiring complex and specialized knowledge related to rheumatological diseases. The performance of both AI models decreased as the question difficulty increased. This study demonstrates the potential of AI in clinical applications and suggests that its use as a tool to assist clinicians may improve healthcare efficiency in the future. Future studies using real clinical scenarios and real board questions are recommended. Key Points â¢ChatGPT-4o significantly outperformed Google Gemini in answering rheumatology board-level questions, achieving 86.9% accuracy compared to Google Gemini's 60.2%. â¢For both AI models, the correct answer rate decreased as the question difficulty increased. â¢The study demonstrates the potential for AI models to be used in clinical practice as a tool to assist clinicians and improve healthcare efficiency.

Palavras-chave

Artificial intelligence; ChatGPT; Google Gemini; Large language models; Rheumatology

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Clin Rheumatol / Clin. rheumatol / Clinical rheumatology Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Turquia País de publicação: Alemanha

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google