Search | VHL Regional Portal

Popular large language model chatbots' accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries.

Pushpanathan, Krithi; Lim, Zhi Wei; Er Yew, Samantha Min; Chen, David Ziyou; Hui'En Lin, Hazel Anne; Lin Goh, Jocelyn Hui; Wong, Wendy Meihua; Wang, Xiaofei; Jin Tan, Marcus Chun; Chang Koh, Victor Teck; Tham, Yih-Chung.

iScience ; 26(11): 108163, 2023 Nov 17.

Article in English | MEDLINE | ID: mdl-37915603

ABSTRACT

In light of growing interest in using emerging large language models (LLMs) for self-diagnosis, we systematically assessed the performance of ChatGPT-3.5, ChatGPT-4.0, and Google Bard in delivering proficient responses to 37 common inquiries regarding ocular symptoms. Responses were masked, randomly shuffled, and then graded by three consultant-level ophthalmologists for accuracy (poor, borderline, good) and comprehensiveness. Additionally, we evaluated the self-awareness capabilities (ability to self-check and self-correct) of the LLM-Chatbots. 89.2% of ChatGPT-4.0 responses were 'good'-rated, outperforming ChatGPT-3.5 (59.5%) and Google Bard (40.5%) significantly (all p < 0.001). All three LLM-Chatbots showed optimal mean comprehensiveness scores as well (ranging from 4.6 to 4.7 out of 5). However, they exhibited subpar to moderate self-awareness capabilities. Our study underscores the potential of ChatGPT-4.0 in delivering accurate and comprehensive responses to ocular symptom inquiries. Future rigorous validation of their performance is crucial to ensure their reliability and appropriateness for actual clinical use.

Axial length elongation profiles from 3 to 6 years in an Asian paediatric population: the Growing Up in Singapore Towards Healthy Outcomes birth cohort study (GUSTO).

Chen, David Ziyou; Wong, Charlene; Lam, Janice Sing Harn; Sun, Chen-Hsin; Lai, Yien; Koh, Victor Teck Chang; Chong, Yap-Seng; Saw, Seang-Mei; Tham, Yih-Chung; Ngo, Cheryl.

Br J Ophthalmol ; 2023 Sep 19.

Article in English | MEDLINE | ID: mdl-37726156

ABSTRACT

AIMS: To determine axial length (AL) elongation profiles in children aged 3-6 years in an Asian population. METHODS: Eligible subjects were recruited from the Growing Up in Singapore Towards Healthy Outcomes birth cohort. AL measurement was performed using IOLMaster (Carl Zeiss Meditec, Jena, Germany) at 3 and 6 years. Anthropometric measurements at birth, cycloplegic refraction at 3 and 6 years, questionnaires on the children's behavioural habits at 2 years and parental spherical equivalent refraction were performed. Multivariable linear regression model with generalised estimating equation was performed to determine factors associated with AL elongation. RESULTS: 273 eyes of 194 children were included. The mean AL increased from 21.72±0.59 mm at 3 years to 22.52±0.66 mm at 6 years (p<0.001). Myopic eyes at 6 years had greater AL elongation (1.02±0.34 mm) compared with emmetropic eyes (0.85±0.25 mm, p=0.008) and hyperopic eyes (0.74±0.16 mm, p<0.001). The 95th percentile limit of AL elongation was 1.59 mm in myopes, 1.34 mm in emmetropes and 1.00 mm in hyperopes. Greater birth weight (per 100 g, ß=0.010, p=0.02) was significantly associated with greater AL elongation from 3 to 6 years, while parental and other behavioural factors assessed at 2 years were not (all p≥0.08). CONCLUSION: In this preschool cohort, AL elongates at an average length of 0.80 mm from 3 to 6 years, with myopes demonstrating the greatest elongation. The differences in 95th percentile limits for AL elongation between myopes, emmetropes and hyperopes can be valuable information in identifying myopia development in preschool children.

Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard.

Lim, Zhi Wei; Pushpanathan, Krithi; Yew, Samantha Min Er; Lai, Yien; Sun, Chen-Hsin; Lam, Janice Sing Harn; Chen, David Ziyou; Goh, Jocelyn Hui Lin; Tan, Marcus Chun Jin; Sheng, Bin; Cheng, Ching-Yu; Koh, Victor Teck Chang; Tham, Yih-Chung.

EBioMedicine ; 95: 104770, 2023 Sep.

Article in English | MEDLINE | ID: mdl-37625267

ABSTRACT

BACKGROUND: Large language models (LLMs) are garnering wide interest due to their human-like and contextually relevant responses. However, LLMs' accuracy across specific medical domains has yet been thoroughly evaluated. Myopia is a frequent topic which patients and parents commonly seek information online. Our study evaluated the performance of three LLMs namely ChatGPT-3.5, ChatGPT-4.0, and Google Bard, in delivering accurate responses to common myopia-related queries. METHODS: We curated thirty-one commonly asked myopia care-related questions, which were categorised into six domains-pathogenesis, risk factors, clinical presentation, diagnosis, treatment and prevention, and prognosis. Each question was posed to the LLMs, and their responses were independently graded by three consultant-level paediatric ophthalmologists on a three-point accuracy scale (poor, borderline, good). A majority consensus approach was used to determine the final rating for each response. 'Good' rated responses were further evaluated for comprehensiveness on a five-point scale. Conversely, 'poor' rated responses were further prompted for self-correction and then re-evaluated for accuracy. FINDINGS: ChatGPT-4.0 demonstrated superior accuracy, with 80.6% of responses rated as 'good', compared to 61.3% in ChatGPT-3.5 and 54.8% in Google Bard (Pearson's chi-squared test, all p ≤ 0.009). All three LLM-Chatbots showed high mean comprehensiveness scores (Google Bard: 4.35; ChatGPT-4.0: 4.23; ChatGPT-3.5: 4.11, out of a maximum score of 5). All LLM-Chatbots also demonstrated substantial self-correction capabilities: 66.7% (2 in 3) of ChatGPT-4.0's, 40% (2 in 5) of ChatGPT-3.5's, and 60% (3 in 5) of Google Bard's responses improved after self-correction. The LLM-Chatbots performed consistently across domains, except for 'treatment and prevention'. However, ChatGPT-4.0 still performed superiorly in this domain, receiving 70% 'good' ratings, compared to 40% in ChatGPT-3.5 and 45% in Google Bard (Pearson's chi-squared test, all p ≤ 0.001). INTERPRETATION: Our findings underscore the potential of LLMs, particularly ChatGPT-4.0, for delivering accurate and comprehensive responses to myopia-related queries. Continuous strategies and evaluations to improve LLMs' accuracy remain crucial. FUNDING: Dr Yih-Chung Tham was supported by the National Medical Research Council of Singapore (NMRC/MOH/HCSAINV21nov-0001).

Subject(s)

Benchmarking , Myopia , Humans , Child , Search Engine , Consensus , Language , Myopia/diagnosis , Myopia/epidemiology , Myopia/therapy

Areas and factors associated with patients' dissatisfaction with glaucoma care.

Foo, Valencia Hui Xian; Tan, Sarah En Mei; Chen, David Ziyou; Perera, Shamira A; Sabayanagam, Charumathi; Fenwick, Eva Katie; Wong, Tina T; Lamoureux, Ecosse L.

Clin Ophthalmol ; 11: 1849-1857, 2017.

Article in English | MEDLINE | ID: mdl-29075097

ABSTRACT

OBJECTIVE: The aim of this study was to evaluate patients' dissatisfaction with overall and specific aspects of a tertiary glaucoma service and to determine their independent factors, including intraocular pressure (IOP) and visual acuity (VA). METHODS: Patients, aged ≥21 years, from a specialist glaucoma service in a tertiary eye hospital in Singapore for at least 6 months, were recruited for this cross-sectional study between March and June 2014. All consenting patients completed a 7-area glaucoma-specific satisfaction questionnaire and one item related to satisfaction with overall glaucoma care. We determined the top three areas of dissatisfaction and overall dissatisfaction with the glaucoma service. We also explored the independent factors associated with overall and specific areas of patients' dissatisfaction with their glaucoma care, including VA and IOP by using logistic regression models. RESULTS: Of the 518 patients recruited, 438 (84.6%) patients completed the study. Patients' dissatisfaction with the overall glaucoma service was 7.5%. The three areas of glaucoma service with the highest dissatisfaction rates were as follows: 1) explanation of test results (24.8%); 2) explanation of glaucoma complications (23.7%); and 3) advice on managing glaucoma (23.5%). Patients who were dissatisfied with the overall service had a worse mean VA compared with satisfied patients (logarithm of the minimum angle of resolution =0.41±0.43 vs 0.27±0.49, p=0.005), whereas mean IOP remained well-controlled in both the groups (13.55±2.46 mmHg vs 14.82±2.86 mmHg, p=0.014). In adjusted models, factors associated with overall dissatisfaction with glaucoma care included a pre-university education and above (odds ratio [OR] =8.06, 95% CI =1.57-41.27) and lower IOP (OR =0.83, 95% CI =0.71-0.98). CONCLUSION: Although less than one tenth of glaucoma patients were dissatisfied with the overall glaucoma service, one in four patients were dissatisfied with three specific aspects of care. A lower IOP, ironically, and education level were associated with overall dissatisfaction. Improving patients' understanding of glaucoma test results, glaucoma complications, and disease management may increase patient satisfaction levels.

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL