Evaluación de la fiabilidad y legibilidad de las respuestas de los chatbots como recurso de información al paciente para las exploraciones PET-TC más communes.

Aydinbelge-Dizdar, N; Dizdar, K

Aydinbelge-Dizdar, N; Dizdar, K.

Afiliação

Aydinbelge-Dizdar N; Department of Nuclear Medicine, Ankara Etlik City Hospital, Ankara, Turkiye. Electronic address: fnuraydinbelge@gmail.com.
Dizdar K; Department of Software Engineering, ASELSAN Inc., Ankara, Turkiye. Electronic address: kydz93@yahoo.com.

Rev Esp Med Nucl Imagen Mol (Engl Ed) ; : 500065, 2024 Sep 28.

Article em En | MEDLINE | ID: mdl-39349172

ABSTRACT

ABSTRACT

PURPOSE:

This study aimed to evaluate the reliability and readability of responses generated by two popular AI-chatbots, 'ChatGPT-4.0' and 'Google Gemini', to potential patient questions about PET/CT scans. MATERIALS AND

METHODS:

Thirty potential questions for each of [18F]FDG and [68Ga]Ga-DOTA-SSTR PET/CT, and twenty-nine potential questions for [68Ga]Ga-PSMA PET/CT were asked separately to ChatGPT-4 and Gemini in May 2024. The responses were evaluated for reliability and readability using the modified DISCERN (mDISCERN) scale, Flesch Reading Ease (FRE), Gunning Fog Index (GFI), and Flesch-Kincaid Reading Grade Level (FKRGL). The inter-rater reliability of mDISCERN scores provided by three raters (ChatGPT-4, Gemini, and a nuclear medicine physician) for the responses was assessed.

RESULTS:

The median [min-max] mDISCERN scores reviewed by the physician for responses about FDG, PSMA and DOTA PET/CT scans were 3.5 [2-4], 3 [3-4], 3 [3-4] for ChatPT-4 and 4 [2-5], 4 [2-5], 3.5 [3-5] for Gemini, respectively. The mDISCERN scores assessed using ChatGPT-4 for answers about FDG, PSMA, and DOTA-SSTR PET/CT scans were 3.5 [3-5], 3 [3-4], 3 [2-3] for ChatGPT-4, and 4 [3-5], 4 [3-5], 4 [3-5] for Gemini, respectively. The mDISCERN scores evaluated using Gemini for responses FDG, PSMA, and DOTA-SSTR PET/CTs were 3 [2-4], 2 [2-4], 3 [2-4] for ChatGPT-4, and 3 [2-5], 3 [1-5], 3 [2-5] for Gemini, respectively. The inter-rater reliability correlation coefficient of mDISCERN scores for ChatGPT-4 responses about FDG, PSMA, and DOTA-SSTR PET/CT scans were 0.629 (95% CIâ¯=â¯0,32-0,812), 0.707 (95% CIâ¯=â¯0.458-0.853) and 0.738 (95% CIâ¯=â¯0.519-0.866), respectively (pâ¯<â¯0.001). The correlation coefficient of mDISCERN scores for Gemini responses about FDG, PSMA, and DOTA-SSTR PET/CT scans were 0.824 (95% CIâ¯=â¯0.677-0.910), 0.881 (95% CIâ¯=â¯0.78-0.94) and 0.847 (95% CIâ¯=â¯0.719-0.922), respectively (pâ¯<â¯0.001). The mDISCERN scores assessed by ChatGPT-4, Gemini, and the physician showed that the chatbots' responses about all PET/CT scans had moderate to good statistical agreement according to the inter-rater reliability correlation coefficient (pâ¯<â¯0,001). There was a statistically significant difference in all readability scores (FKRGL, GFI, and FRE) of ChatGPT-4 and Gemini responses about PET/CT scans (pâ¯<â¯0,001). Gemini responses were shorter and had better readability scores than ChatGPT-4 responses.

CONCLUSION:

There was an acceptable level of agreement between raters for the mDISCERN score, indicating agreement with the overall reliability of the responses. However, the information provided by AI-chatbots cannot be easily read by the public.

Palavras-chave

Artificial intelligence; Cancer; ChatGPT-4; Cáncer; Google Gemini; Información para pacientes; Inteligencia artificial; PET-TC; PET/CT; Patient information

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Rev Esp Med Nucl Imagen Mol (Engl Ed) / Rev. esp. med. nucl. imagen mol. (Internet, Engl. ed.) / Revista espanola de medicina nuclear e imagen molecular (Internet. English ed.) Ano de publicação: 2024 Tipo de documento: Article País de publicação: Espanha

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google