Your browser doesn't support javascript.
loading
Performance of large language artificial intelligence models on solving restorative dentistry and endodontics student assessments.
Künzle, Paul; Paris, Sebastian.
Afiliação
  • Künzle P; Department of Operative, Preventive and Pediatric Dentistry, Charité - Universitätsmedizin Berlin, Aßmannshauser Str. 4-6, Berlin, 14197, Germany. paul.kuenzle@charite.de.
  • Paris S; Department of Operative, Preventive and Pediatric Dentistry, Charité - Universitätsmedizin Berlin, Aßmannshauser Str. 4-6, Berlin, 14197, Germany.
Clin Oral Investig ; 28(11): 575, 2024 Oct 07.
Article em En | MEDLINE | ID: mdl-39373739
ABSTRACT

OBJECTIVES:

The advent of artificial intelligence (AI) and large language model (LLM)-based AI applications (LLMAs) has tremendous implications for our society. This study analyzed the performance of LLMAs on solving restorative dentistry and endodontics (RDE) student assessment questions. MATERIALS AND

METHODS:

151 questions from a RDE question pool were prepared for prompting using LLMAs from OpenAI (ChatGPT-3.5,-4.0 and -4.0o) and Google (Gemini 1.0). Multiple-choice questions were sorted into four question subcategories, entered into LLMAs and answers recorded for analysis. P-value and chi-square statistical analyses were performed using Python 3.9.16.

RESULTS:

The total answer accuracy of ChatGPT-4.0o was the highest, followed by ChatGPT-4.0, Gemini 1.0 and ChatGPT-3.5 (72%, 62%, 44% and 25%, respectively) with significant differences between all LLMAs except GPT-4.0 models. The performance on subcategories direct restorations and caries was the highest, followed by indirect restorations and endodontics.

CONCLUSIONS:

Overall, there are large performance differences among LLMAs. Only the ChatGPT-4 models achieved a success ratio that could be used with caution to support the dental academic curriculum. CLINICAL RELEVANCE While LLMAs could support clinicians to answer dental field-related questions, this capacity depends strongly on the employed model. The most performant model ChatGPT-4.0o achieved acceptable accuracy rates in some subject sub-categories analyzed.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Inteligência Artificial / Endodontia Limite: Humans Idioma: En Revista: Clin Oral Investig / Clin. oral investig / Clinical oral investigations Assunto da revista: ODONTOLOGIA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Alemanha País de publicação: Alemanha

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Inteligência Artificial / Endodontia Limite: Humans Idioma: En Revista: Clin Oral Investig / Clin. oral investig / Clinical oral investigations Assunto da revista: ODONTOLOGIA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Alemanha País de publicação: Alemanha