Search | Global Index Medicus

Comparative study of different large language models and medical professionals of different levels responding to ophthalmology questions / 国际眼科杂志(Guoji Yanke Zazhi)

Hui HUANG; Jinyu HU; Xiaoyu WANG; Shuyuan YE; Shinan WU; Cheng CHEN; Liangqi HE; Yanmei ZENG; Hong WEI; Yi SHAO.

International Eye Science ; (12): 458-462, 2024.

Article in Chinese | WPRIM | ID: wpr-1011401

ABSTRACT

AIM: To evaluate the performance of three distinct large language models(LLM), including GPT-3.5, GPT-4, and PaLM2, in responding to queries within the field of ophthalmology, and to compare their performance with three different levels of medical professionals: medical undergraduates, master of medicine, and attending physicians.METHODS: A total of 100 ophthalmic multiple-choice tests, which covered ophthalmic basic knowledge, clinical knowledge, ophthalmic examination and diagnostic methods, and treatment for ocular disease, were conducted on three different kinds of LLM and three different levels of medical professionals(9 undergraduates, 6 postgraduates and 3 attending physicians), respectively. The performance of LLM was comprehensively evaluated from the aspects of mean scores, consistency and confidence of response, and it was compared with human.RESULTS: Notably, each LLM surpassed the average performance of undergraduate medical students(GPT-4:56, GPT-3.5:42, PaLM2:47, undergraduate students:40). Specifically, performance of GPT-3.5 and PaLM2 was slightly lower than those of master's students(51), while GPT-4 exhibited a performance comparable to attending physicians(62). Furthermore, GPT-4 showed significantly higher response consistency and self-confidence compared with GPT-3.5 and PaLM2.CONCLUSION: LLM represented by GPT-4 performs well in the field of ophthalmology, and the LLM model can provide clinical decision-making and teaching aids for clinicians and medical education.

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL