Comparative performance analysis of large language models: ChatGPT-3.5, ChatGPT-4 and Google Gemini in glucocorticoid-induced osteoporosis.

Tong, Linjian; Zhang, Chaoyang; Liu, Rui; Yang, Jia; Sun, Zhiming

Tong, Linjian; Zhang, Chaoyang; Liu, Rui; Yang, Jia; Sun, Zhiming.

Afiliação

Tong L; Clinical College of Neurology, Neurosurgery and Neurorehabilitation, Tianjin Medical University, Tianjin, 300070, China.
Zhang C; Department of Orthopedics, Tianjin Medical University Baodi Hospital, Tianjin, 301800, China.
Liu R; Clinical College of Neurology, Neurosurgery and Neurorehabilitation, Tianjin Medical University, Tianjin, 300070, China.
Yang J; Clinical College of Neurology, Neurosurgery and Neurorehabilitation, Tianjin Medical University, Tianjin, 300070, China.
Sun Z; Clinical College of Neurology, Neurosurgery and Neurorehabilitation, Tianjin Medical University, Tianjin, 300070, China. szhm0618@163.com.

J Orthop Surg Res ; 19(1): 574, 2024 Sep 18.

Article em En | MEDLINE | ID: mdl-39289734

ABSTRACT

ABSTRACT

BACKGROUNDS The use of large language models (LLMs) in medicine can help physicians improve the quality and effectiveness of health care by increasing the efficiency of medical information management, patient care, medical research, and clinical decision-making.

METHODS:

We collected 34 frequently asked questions about glucocorticoid-induced osteoporosis (GIOP), covering topics related to the disease's clinical manifestations, pathogenesis, diagnosis, treatment, prevention, and risk factors. We also generated 25 questions based on the 2022 American College of Rheumatology Guideline for the Prevention and Treatment of Glucocorticoid-Induced Osteoporosis (2022 ACR-GIOP Guideline). Each question was posed to the LLM (ChatGPT-3.5, ChatGPT-4, and Google Gemini), and three senior orthopedic surgeons independently rated the responses generated by the LLMs. Three senior orthopedic surgeons independently rated the answers based on responses ranging between 1 and 4 points. A total score (TS) > 9 indicated 'good' responses, 6 ≤ TS ≤ 9 indicated 'moderate' responses, and TS < 6 indicated 'poor' responses.

RESULTS:

In response to the general questions related to GIOP and the 2022 ACR-GIOP Guidelines, Google Gemini provided more concise answers than the other LLMs. In terms of pathogenesis, ChatGPT-4 had significantly higher total scores (TSs) than ChatGPT-3.5. The TSs for answering questions related to the 2022 ACR-GIOP Guideline by ChatGPT-4 were significantly higher than those for Google Gemini. ChatGPT-3.5 and ChatGPT-4 had significantly higher self-corrected TSs than pre-corrected TSs, while Google Gemini self-corrected for responses that were not significantly different than before.

CONCLUSIONS:

Our study showed that Google Gemini provides more concise and intuitive responses than ChatGPT-3.5 and ChatGPT-4. ChatGPT-4 performed significantly better than ChatGPT3.5 and Google Gemini in terms of answering general questions about GIOP and the 2022 ACR-GIOP Guidelines. ChatGPT3.5 and ChatGPT-4 self-corrected better than Google Gemini.

Assuntos

Glucocorticoides; Osteoporose; Humanos; Osteoporose/induzido quimicamente; Glucocorticoides/efeitos adversos; Inquéritos e Questionários

Palavras-chave

AI; ChatGPT; Glucocorticoid-Induced osteoporosis; Google Gemini; Large language models

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Osteoporose / Glucocorticoides Limite: Humans Idioma: En Revista: J Orthop Surg Res / J. orthop. surg. res / Journal of orthopaedic surgery and research Ano de publicação: 2024 Tipo de documento: Article País de afiliação: China País de publicação: Reino Unido

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google