Your browser doesn't support javascript.
loading
Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation
Gobira, Mauro; Nakayama, Luis Filipe; Moreira, Rodrigo; Andrade, Eric; Regatieri, Caio Vinicius Saito; Belfort Jr., Rubens.
  • Gobira, Mauro; Instituto Paulista de Estudos e Pesquisas em Oftalmologia. Vision Institute. São Paulo. BR
  • Nakayama, Luis Filipe; Instituto Paulista de Estudos e Pesquisas em Oftalmologia. Vision Institute. São Paulo. BR
  • Moreira, Rodrigo; Instituto Paulista de Estudos e Pesquisas em Oftalmologia. Vision Institute. São Paulo. BR
  • Andrade, Eric; Universidade Federal de São Paulo. Department of Ophthalmology. São Paulo. BR
  • Regatieri, Caio Vinicius Saito; Universidade Federal de São Paulo. Department of Ophthalmology. São Paulo. BR
  • Belfort Jr., Rubens; Universidade Federal de São Paulo. Department of Ophthalmology. São Paulo. BR
Rev. Assoc. Med. Bras. (1992, Impr.) ; 69(10): e20230848, 2023. graf
Article in English | LILACS-Express | LILACS | ID: biblio-1514686
ABSTRACT
SUMMARY

OBJECTIVE:

The aim of this study was to evaluate the performance of ChatGPT-4.0 in answering the 2022 Brazilian National Examination for Medical Degree Revalidation (Revalida) and as a tool to provide feedback on the quality of the examination.

METHODS:

A total of two independent physicians entered all examination questions into ChatGPT-4.0. After comparing the outputs with the test solutions, they classified the large language model answers as adequate, inadequate, or indeterminate. In cases of disagreement, they adjudicated and achieved a consensus decision on the ChatGPT accuracy. The performance across medical themes and nullified questions was compared using chi-square statistical analysis.

RESULTS:

In the Revalida examination, ChatGPT-4.0 answered 71 (87.7%) questions correctly and 10 (12.3%) incorrectly. There was no statistically significant difference in the proportions of correct answers among different medical themes (p=0.4886). The artificial intelligence model had a lower accuracy of 71.4% in nullified questions, with no statistical difference (p=0.241) between non-nullified and nullified groups.

CONCLUSION:

ChatGPT-4.0 showed satisfactory performance for the 2022 Brazilian National Examination for Medical Degree Revalidation. The large language model exhibited worse performance on subjective questions and public healthcare themes. The results of this study suggested that the overall quality of the Revalida examination questions is satisfactory and corroborates the nullified questions.


Full text: Available Index: LILACS (Americas) Country/Region as subject: South America / Brazil Language: English Journal: Rev. Assoc. Med. Bras. (1992, Impr.) Journal subject: Educa‡Æo em Sa£de / GestÆo do Conhecimento para a Pesquisa em Sa£de / Medicine Year: 2023 Type: Article Affiliation country: Brazil Institution/Affiliation country: Instituto Paulista de Estudos e Pesquisas em Oftalmologia/BR / Universidade Federal de São Paulo/BR

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Index: LILACS (Americas) Country/Region as subject: South America / Brazil Language: English Journal: Rev. Assoc. Med. Bras. (1992, Impr.) Journal subject: Educa‡Æo em Sa£de / GestÆo do Conhecimento para a Pesquisa em Sa£de / Medicine Year: 2023 Type: Article Affiliation country: Brazil Institution/Affiliation country: Instituto Paulista de Estudos e Pesquisas em Oftalmologia/BR / Universidade Federal de São Paulo/BR