Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Cureus ; 16(8): e68307, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39350844

RESUMO

Introduction The study assesses the readability of AI-generated brochures for common emergency medical conditions like heart attack, anaphylaxis, and syncope. Thus, the study aims to compare the AI-generated responses for patient information guides of common emergency medical conditions using ChatGPT and Google Gemini. Methodology Brochures for each condition were created by both AI tools. Readability was assessed using the Flesch-Kincaid Calculator, evaluating word count, sentence count and ease of understanding. Reliability was measured using the Modified DISCERN Score. The similarity between AI outputs was determined using Quillbot. Statistical analysis was performed with R (v4.3.2). Results ChatGPT and Gemini produced brochures with no statistically significant differences in word count (p= 0.2119), sentence count (p=0.1276), readability (p=0.3796), or reliability (p=0.7407). However, ChatGPT provided more detailed content with 32.4% more words (582.80 vs. 440.20) and 51.6% more sentences (67.00 vs. 44.20). In addition, Gemini's brochures were slightly easier to read with a higher ease score (50.62 vs. 41.88). Reliability varied by topic with ChatGPT scoring higher for Heart Attack (4 vs. 3) and Choking (3 vs. 2), while Google Gemini scored higher for Anaphylaxis (4 vs. 3) and Drowning (4 vs. 3), highlighting the need for topic-specific evaluation. Conclusions Although AI-generated brochures from ChatGPT and Gemini are comparable in readability and reliability for patient information on emergency medical conditions, this study highlights that there is no statistically significant difference in the responses generated by the two AI tools.

2.
Int J Retina Vitreous ; 10(1): 61, 2024 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-39223678

RESUMO

BACKGROUND: Large language models (LLMs) such as ChatGPT-4 and Google Gemini show potential for patient health education, but concerns about their accuracy require careful evaluation. This study evaluates the readability and accuracy of ChatGPT-4 and Google Gemini in answering questions about retinal detachment. METHODS: Comparative study analyzing responses from ChatGPT-4 and Google Gemini to 13 retinal detachment questions, categorized by difficulty levels (D1, D2, D3). Masked responses were reviewed by ten vitreoretinal specialists and rated on correctness, errors, thematic accuracy, coherence, and overall quality grading. Analysis included Flesch Readability Ease Score, word and sentence counts. RESULTS: Both Artificial Intelligence tools required college-level understanding for all difficulty levels. Google Gemini was easier to understand (p = 0.03), while ChatGPT-4 provided more correct answers for the more difficult questions (p = 0.0005) with fewer serious errors. ChatGPT-4 scored highest on most challenging questions, showing superior thematic accuracy (p = 0.003). ChatGPT-4 outperformed Google Gemini in 8 of 13 questions, with higher overall quality grades in the easiest (p = 0.03) and hardest levels (p = 0.0002), showing a lower grade as question difficulty increased. CONCLUSIONS: ChatGPT-4 and Google Gemini effectively address queries about retinal detachment, offering mostly accurate answers with few critical errors, though patients require higher education for comprehension. The implementation of AI tools may contribute to improving medical care by providing accurate and relevant healthcare information quickly.

3.
Cureus ; 16(8): e67766, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39323714

RESUMO

AIMS AND OBJECTIVE: Advances in artificial intelligence (AI), particularly in large language models (LLMs) like ChatGPT (versions 3.5 and 4.0) and Google Gemini, are transforming healthcare. This study explores the performance of these AI models in solving diagnostic quizzes from "Neuroradiology: A Core Review" to evaluate their potential as diagnostic tools in radiology. MATERIALS AND METHODS: We assessed the accuracy of ChatGPT 3.5, ChatGPT 4.0, and Google Gemini using 262 multiple-choice questions covering brain, head and neck, spine, and non-interpretive skills. Each AI tool provided answers and explanations, which were compared to textbook answers. The analysis followed the STARD (Standards for Reporting of Diagnostic Accuracy Studies) guidelines, and accuracy was calculated for each AI tool and subgroup. RESULTS: ChatGPT 4.0 achieved the highest overall accuracy at 64.89%, outperforming ChatGPT 3.5 (62.60%) and Google Gemini (55.73%). ChatGPT 4.0 excelled in brain, head, and neck diagnostics, while Google Gemini performed best in head and neck but lagged in other areas. ChatGPT 3.5 showed consistent performance across all subgroups. CONCLUSION: This study found that advanced AI models, including ChatGPT 4.0 and Google Gemini, vary in diagnostic accuracy, with ChatGPT 4.0 leading at 64.89% overall. While these tools are promising in improving diagnostics and medical education, their effectiveness varies by area, and Google Gemini performs unevenly across different categories. The study underscores the need for ongoing improvements and broader evaluation to address ethical concerns and optimize AI use in patient care.

4.
Clin Rheumatol ; 2024 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-39340572

RESUMO

OBJECTIVES: This study evaluates the performance of AI models, ChatGPT-4o and Google Gemini, in answering rheumatology board-level questions, comparing their effectiveness, reliability, and applicability in clinical practice. METHOD: A cross-sectional study was conducted using 420 rheumatology questions from the BoardVitals question bank, excluding 27 visual data questions. Both artificial intelligence models categorized the questions according to difficulty (easy, medium, hard) and answered them. In addition, the reliability of the answers was assessed by asking the questions a second time. The accuracy, reliability, and difficulty categorization of the AI models' response to the questions were analyzed. RESULTS: ChatGPT-4o answered 86.9% of the questions correctly, significantly outperforming Google Gemini's 60.2% accuracy (p < 0.001). When the questions were asked a second time, the success rate was 86.7% for ChatGPT-4o and 60.5% for Google Gemini. Both models mainly categorized questions as medium difficulty. ChatGPT-4o showed higher accuracy in various rheumatology subfields, notably in Basic and Clinical Science (p = 0.028), Osteoarthritis (p = 0.023), and Rheumatoid Arthritis (p < 0.001). CONCLUSIONS: ChatGPT-4o significantly outperformed Google Gemini in rheumatology board-level questions. This demonstrates the success of ChatGPT-4o in situations requiring complex and specialized knowledge related to rheumatological diseases. The performance of both AI models decreased as the question difficulty increased. This study demonstrates the potential of AI in clinical applications and suggests that its use as a tool to assist clinicians may improve healthcare efficiency in the future. Future studies using real clinical scenarios and real board questions are recommended. Key Points •ChatGPT-4o significantly outperformed Google Gemini in answering rheumatology board-level questions, achieving 86.9% accuracy compared to Google Gemini's 60.2%. •For both AI models, the correct answer rate decreased as the question difficulty increased. •The study demonstrates the potential for AI models to be used in clinical practice as a tool to assist clinicians and improve healthcare efficiency.

5.
Artigo em Inglês | MEDLINE | ID: mdl-39349172

RESUMO

PURPOSE: This study aimed to evaluate the reliability and readability of responses generated by two popular AI-chatbots, 'ChatGPT-4.0' and 'Google Gemini', to potential patient questions about PET/CT scans. MATERIALS AND METHODS: Thirty potential questions for each of [18F]FDG and [68Ga]Ga-DOTA-SSTR PET/CT, and twenty-nine potential questions for [68Ga]Ga-PSMA PET/CT were asked separately to ChatGPT-4 and Gemini in May 2024. The responses were evaluated for reliability and readability using the modified DISCERN (mDISCERN) scale, Flesch Reading Ease (FRE), Gunning Fog Index (GFI), and Flesch-Kincaid Reading Grade Level (FKRGL). The inter-rater reliability of mDISCERN scores provided by three raters (ChatGPT-4, Gemini, and a nuclear medicine physician) for the responses was assessed. RESULTS: The median [min-max] mDISCERN scores reviewed by the physician for responses about FDG, PSMA and DOTA PET/CT scans were 3.5 [2-4], 3 [3-4], 3 [3-4] for ChatPT-4 and 4 [2-5], 4 [2-5], 3.5 [3-5] for Gemini, respectively. The mDISCERN scores assessed using ChatGPT-4 for answers about FDG, PSMA, and DOTA-SSTR PET/CT scans were 3.5 [3-5], 3 [3-4], 3 [2-3] for ChatGPT-4, and 4 [3-5], 4 [3-5], 4 [3-5] for Gemini, respectively. The mDISCERN scores evaluated using Gemini for responses FDG, PSMA, and DOTA-SSTR PET/CTs were 3 [2-4], 2 [2-4], 3 [2-4] for ChatGPT-4, and 3 [2-5], 3 [1-5], 3 [2-5] for Gemini, respectively. The inter-rater reliability correlation coefficient of mDISCERN scores for ChatGPT-4 responses about FDG, PSMA, and DOTA-SSTR PET/CT scans were 0.629 (95% CI = 0,32-0,812), 0.707 (95% CI = 0.458-0.853) and 0.738 (95% CI = 0.519-0.866), respectively (p < 0.001). The correlation coefficient of mDISCERN scores for Gemini responses about FDG, PSMA, and DOTA-SSTR PET/CT scans were 0.824 (95% CI = 0.677-0.910), 0.881 (95% CI = 0.78-0.94) and 0.847 (95% CI = 0.719-0.922), respectively (p < 0.001). The mDISCERN scores assessed by ChatGPT-4, Gemini, and the physician showed that the chatbots' responses about all PET/CT scans had moderate to good statistical agreement according to the inter-rater reliability correlation coefficient (p < 0,001). There was a statistically significant difference in all readability scores (FKRGL, GFI, and FRE) of ChatGPT-4 and Gemini responses about PET/CT scans (p < 0,001). Gemini responses were shorter and had better readability scores than ChatGPT-4 responses. CONCLUSION: There was an acceptable level of agreement between raters for the mDISCERN score, indicating agreement with the overall reliability of the responses. However, the information provided by AI-chatbots cannot be easily read by the public.

6.
J Orthop Surg Res ; 19(1): 574, 2024 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-39289734

RESUMO

BACKGROUNDS: The use of large language models (LLMs) in medicine can help physicians improve the quality and effectiveness of health care by increasing the efficiency of medical information management, patient care, medical research, and clinical decision-making. METHODS: We collected 34 frequently asked questions about glucocorticoid-induced osteoporosis (GIOP), covering topics related to the disease's clinical manifestations, pathogenesis, diagnosis, treatment, prevention, and risk factors. We also generated 25 questions based on the 2022 American College of Rheumatology Guideline for the Prevention and Treatment of Glucocorticoid-Induced Osteoporosis (2022 ACR-GIOP Guideline). Each question was posed to the LLM (ChatGPT-3.5, ChatGPT-4, and Google Gemini), and three senior orthopedic surgeons independently rated the responses generated by the LLMs. Three senior orthopedic surgeons independently rated the answers based on responses ranging between 1 and 4 points. A total score (TS) > 9 indicated 'good' responses, 6 ≤ TS ≤ 9 indicated 'moderate' responses, and TS < 6 indicated 'poor' responses. RESULTS: In response to the general questions related to GIOP and the 2022 ACR-GIOP Guidelines, Google Gemini provided more concise answers than the other LLMs. In terms of pathogenesis, ChatGPT-4 had significantly higher total scores (TSs) than ChatGPT-3.5. The TSs for answering questions related to the 2022 ACR-GIOP Guideline by ChatGPT-4 were significantly higher than those for Google Gemini. ChatGPT-3.5 and ChatGPT-4 had significantly higher self-corrected TSs than pre-corrected TSs, while Google Gemini self-corrected for responses that were not significantly different than before. CONCLUSIONS: Our study showed that Google Gemini provides more concise and intuitive responses than ChatGPT-3.5 and ChatGPT-4. ChatGPT-4 performed significantly better than ChatGPT3.5 and Google Gemini in terms of answering general questions about GIOP and the 2022 ACR-GIOP Guidelines. ChatGPT3.5 and ChatGPT-4 self-corrected better than Google Gemini.


Assuntos
Glucocorticoides , Osteoporose , Humanos , Osteoporose/induzido quimicamente , Glucocorticoides/efeitos adversos , Inquéritos e Questionários
7.
Cureus ; 16(7): e63865, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-39099896

RESUMO

BACKGROUND: Artificial intelligence (AI) is a burgeoning new field that has increased in popularity over the past couple of years, coinciding with the public release of large language model (LLM)-driven chatbots. These chatbots, such as ChatGPT, can be engaged directly in conversation, allowing users to ask them questions or issue other commands. Since LLMs are trained on large amounts of text data, they can also answer questions reliably and factually, an ability that has allowed them to serve as a source for medical inquiries. This study seeks to assess the readability of patient education materials on cardiac catheterization across four of the most common chatbots: ChatGPT, Microsoft Copilot, Google Gemini, and Meta AI. METHODOLOGY: A set of 10 questions regarding cardiac catheterization was developed using website-based patient education materials on the topic. We then asked these questions in consecutive order to four of the most common chatbots: ChatGPT, Microsoft Copilot, Google Gemini, and Meta AI. The Flesch Reading Ease Score (FRES) was used to assess the readability score. Readability grade levels were assessed using six tools: Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), Simple Measure of Gobbledygook (SMOG) Index, Automated Readability Index (ARI), and FORCAST Grade Level. RESULTS: The mean FRES across all four chatbots was 40.2, while overall mean grade levels for the four chatbots were 11.2, 13.7, 13.7, 13.3, 11.2, and 11.6 across the FKGL, GFI, CLI, SMOG, ARI, and FORCAST indices, respectively. Mean reading grade levels across the six tools were 14.8 for ChatGPT, 12.3 for Microsoft Copilot, 13.1 for Google Gemini, and 9.6 for Meta AI. Further, FRES values for the four chatbots were 31, 35.8, 36.4, and 57.7, respectively. CONCLUSIONS: This study shows that AI chatbots are capable of providing answers to medical questions regarding cardiac catheterization. However, the responses across the four chatbots had overall mean reading grade levels at the 11th-13th-grade level, depending on the tool used. This means that the materials were at the high school and even college reading level, which far exceeds the recommended sixth-grade level for patient education materials. Further, there is significant variability in the readability levels provided by different chatbots as, across all six grade-level assessments, Meta AI had the lowest scores and ChatGPT generally had the highest.

8.
Radiol Med ; 2024 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-39138732

RESUMO

Applications of large language models (LLMs) in the healthcare field have shown promising results in processing and summarizing multidisciplinary information. This study evaluated the ability of three publicly available LLMs (GPT-3.5, GPT-4, and Google Gemini-then called Bard) to answer 60 multiple-choice questions (29 sourced from public databases, 31 newly formulated by experienced breast radiologists) about different aspects of breast cancer care: treatment and prognosis, diagnostic and interventional techniques, imaging interpretation, and pathology. Overall, the rate of correct answers significantly differed among LLMs (p = 0.010): the best performance was achieved by GPT-4 (95%, 57/60) followed by GPT-3.5 (90%, 54/60) and Google Gemini (80%, 48/60). Across all LLMs, no significant differences were observed in the rates of correct replies to questions sourced from public databases and newly formulated ones (p ≥ 0.593). These results highlight the potential benefits of LLMs in breast cancer care, which will need to be further refined through in-context training.

9.
Cureus ; 16(6): e62471, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-39015855

RESUMO

PURPOSE: To evaluate the efficiency of three artificial intelligence (AI) chatbots (ChatGPT-3.5 (OpenAI, San Francisco, California, United States), Bing Copilot (Microsoft Corporation, Redmond, Washington, United States), Google Gemini (Google LLC, Mountain View, California, United States)) in assisting the ophthalmologist in the diagnostic approach and management of challenging ophthalmic cases and compare their performance with that of a practicing human ophthalmic specialist. The secondary aim was to assess the short- and medium-term consistency of ChatGPT's responses. METHODS: Eleven ophthalmic case scenarios of variable complexity were presented to the AI chatbots and to an ophthalmic specialist in a stepwise fashion. Advice regarding the initial differential diagnosis, the final diagnosis, further investigation, and management was asked for. One month later, the same process was repeated twice on the same day for ChatGPT only. RESULTS: The individual diagnostic performance of all three AI chatbots was inferior to that of the ophthalmic specialist; however, they provided useful complementary input in the diagnostic algorithm. This was especially true for ChatGPT and Bing Copilot. ChatGPT exhibited reasonable short- and medium-term consistency, with the mean Jaccard similarity coefficient of responses varying between 0.58 and 0.76. CONCLUSION: AI chatbots may act as useful assisting tools in the diagnosis and management of challenging ophthalmic cases; however, their responses should be scrutinized for potential inaccuracies, and by no means can they replace consultation with an ophthalmic specialist.

10.
J Periodontal Res ; 2024 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-39030766

RESUMO

INTRODUCTION: The emerging rise in novel computer technologies and automated data analytics has the potential to change the course of dental education. In line with our long-term goal of harnessing the power of AI to augment didactic teaching, the objective of this study was to quantify and compare the accuracy of responses provided by ChatGPT (GPT-4 and GPT-3.5) and Google Gemini, the three primary large language models (LLMs), to human graduate students (control group) to the annual in-service examination questions posed by the American Academy of Periodontology (AAP). METHODS: Under a comparative cross-sectional study design, a corpus of 1312 questions from the annual in-service examination of AAP administered between 2020 and 2023 were presented to the LLMs. Their responses were analyzed using chi-square tests, and the performance was juxtaposed to the scores of periodontal residents from corresponding years, as the human control group. Additionally, two sub-analyses were performed: one on the performance of the LLMs on each section of the exam; and in answering the most difficult questions. RESULTS: ChatGPT-4 (total average: 79.57%) outperformed all human control groups as well as GPT-3.5 and Google Gemini in all exam years (p < .001). This chatbot showed an accuracy range between 78.80% and 80.98% across the various exam years. Gemini consistently recorded superior performance with scores of 70.65% (p = .01), 73.29% (p = .02), 75.73% (p < .01), and 72.18% (p = .0008) for the exams from 2020 to 2023 compared to ChatGPT-3.5, which achieved 62.5%, 68.24%, 69.83%, and 59.27% respectively. Google Gemini (72.86%) surpassed the average scores achieved by first- (63.48% ± 31.67) and second-year residents (66.25% ± 31.61) when all exam years combined. However, it could not surpass that of third-year residents (69.06% ± 30.45). CONCLUSIONS: Within the confines of this analysis, ChatGPT-4 exhibited a robust capability in answering AAP in-service exam questions in terms of accuracy and reliability while Gemini and ChatGPT-3.5 showed a weaker performance. These findings underscore the potential of deploying LLMs as an educational tool in periodontics and oral implantology domains. However, the current limitations of these models such as inability to effectively process image-based inquiries, the propensity for generating inconsistent responses to the same prompts, and achieving high (80% by GPT-4) but not absolute accuracy rates should be considered. An objective comparison of their capability versus their capacity is required to further develop this field of study.

11.
Cureus ; 16(4): e58232, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38745784

RESUMO

OBJECTIVE: We aim to compare the capabilities of ChatGPT 3.5, Microsoft Bing, and Google Gemini in handling neuro-ophthalmological case scenarios. METHODS: Ten randomly chosen neuro-ophthalmological cases from a publicly accessible database were used to test the accuracy and suitability of all three models, and the case details were followed by the following query: "What is the most probable diagnosis?" RESULTS: On the basis of the accuracy of diagnosis, all three chat boxes (ChatGPT 3.5, Microsoft Bing, and Google Gemini) gave the correct diagnosis in four (40%) out of 10 cases, whereas in terms of suitability, ChatGPT 3.5, Microsoft Bing, and Google Gemini gave six (60%), five (50%), and five (50%) out of 10 case scenarios, respectively. CONCLUSION: ChatGPT 3.5 performs better than the other two when it comes to handling neuro-ophthalmological case difficulties. These results highlight the potential benefits of developing artificial intelligence (AI) models for improving medical education and ocular diagnostics.

12.
Cureus ; 16(4): e57795, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38721180

RESUMO

Artificial Intelligence (AI) in healthcare marks a new era of innovation and efficiency, characterized by the emergence of sophisticated language models such as ChatGPT (OpenAI, San Francisco, CA, USA), Gemini Advanced (Google LLC, Mountain View, CA, USA), and Co-pilot (Microsoft Corp, Redmond, WA, USA). This review explores the transformative impact of these AI technologies on various facets of healthcare, from enhancing patient care and treatment protocols to revolutionizing medical research and tackling intricate health science challenges. ChatGPT, with its advanced natural language processing capabilities, leads the way in providing personalized mental health support and improving chronic condition management. Gemini Advanced extends the boundary of AI in healthcare through data analytics, facilitating early disease detection and supporting medical decision-making. Co-pilot, by integrating seamlessly with healthcare systems, optimizes clinical workflows and encourages a culture of innovation among healthcare professionals. Additionally, the review highlights the significant contributions of AI in accelerating medical research, particularly in genomics and drug discovery, thus paving the path for personalized medicine and more effective treatments. The pivotal role of AI in epidemiology, especially in managing infectious diseases such as COVID-19, is also emphasized, demonstrating its value in enhancing public health strategies. However, the integration of AI technologies in healthcare comes with challenges. Concerns about data privacy, security, and the need for comprehensive cybersecurity measures are discussed, along with the importance of regulatory compliance and transparent consent management to uphold ethical standards and patient autonomy. The review points out the necessity for seamless integration, interoperability, and the maintenance of AI systems' reliability and accuracy to fully leverage AI's potential in advancing healthcare.

13.
Cureus ; 16(5): e59898, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38721479

RESUMO

Background Google Gemini (Google, Mountain View, CA) represents the latest advances in the realm of artificial intelligence (AI) and has garnered attention due to its capabilities similar to the increasingly popular ChatGPT (OpenAI, San Francisco, CA). Accurate dissemination of information on common conditions such as hypertension is critical for patient comprehension and management. Despite the ubiquity of AI, comparisons between ChatGPT and Gemini remain unexplored. Methods ChatGPT and Gemini were asked 52 questions derived from the American College of Cardiology's (ACC) frequently asked questions on hypertension, following a specified prompt. Prompts included: no prompting (Form 1), patient-friendly prompting (Form 2), physician-level prompting (Form 3), and prompting for statistics/references (Form 4). Responses were scored as incorrect, partially correct, or correct. Flesch-Kincaid (FK) grade level and word count were recorded. Results Across all forms, scoring frequencies were as follows: 23 (5.5%) incorrect, 162 (38.9%) partially correct, and 231 (55.5%) correct. ChatGPT showed higher rates of partially correct answers than Gemini (p = 0.0346). Physician-level prompts resulted in a higher word count across both platforms (p < 0.001). ChatGPT showed a higher FK grade level (p = 0.033) in physician-friendly prompting. Gemini exhibited a significantly higher mean word count (p < 0.001); however, ChatGPT had a higher FK grade level across all forms (p > 0.001). Conclusion To our knowledge, this study is the first to compare cardiology-related responses from ChatGPT and Gemini, two of the most popular AI chatbots. The grade level for most responses was collegiate level, which was above average for the National Institutes of Health (NIH) recommendations, but on par with most online medical information. Both chatbots responded with a high degree of accuracy, with inaccuracies being rare. Therefore, it is reasonable that cardiologists suggest either chatbot as a source of supplementary education.

14.
Graefes Arch Clin Exp Ophthalmol ; 262(9): 2945-2959, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38573349

RESUMO

PURPOSE: The aim of this study was to define the capability of ChatGPT-4 and Google Gemini in analyzing detailed glaucoma case descriptions and suggesting an accurate surgical plan. METHODS: Retrospective analysis of 60 medical records of surgical glaucoma was divided into "ordinary" (n = 40) and "challenging" (n = 20) scenarios. Case descriptions were entered into ChatGPT and Bard's interfaces with the question "What kind of surgery would you perform?" and repeated three times to analyze the answers' consistency. After collecting the answers, we assessed the level of agreement with the unified opinion of three glaucoma surgeons. Moreover, we graded the quality of the responses with scores from 1 (poor quality) to 5 (excellent quality), according to the Global Quality Score (GQS) and compared the results. RESULTS: ChatGPT surgical choice was consistent with those of glaucoma specialists in 35/60 cases (58%), compared to 19/60 (32%) of Gemini (p = 0.0001). Gemini was not able to complete the task in 16 cases (27%). Trabeculectomy was the most frequent choice for both chatbots (53% and 50% for ChatGPT and Gemini, respectively). In "challenging" cases, ChatGPT agreed with specialists in 9/20 choices (45%), outperforming Google Gemini performances (4/20, 20%). Overall, GQS scores were 3.5 ± 1.2 and 2.1 ± 1.5 for ChatGPT and Gemini (p = 0.002). This difference was even more marked if focusing only on "challenging" cases (1.5 ± 1.4 vs. 3.0 ± 1.5, p = 0.001). CONCLUSION: ChatGPT-4 showed a good analysis performance for glaucoma surgical cases, either ordinary or challenging. On the other side, Google Gemini showed strong limitations in this setting, presenting high rates of unprecise or missed answers.


Assuntos
Glaucoma , Humanos , Estudos Retrospectivos , Glaucoma/cirurgia , Glaucoma/fisiopatologia , Feminino , Masculino , Trabeculectomia/métodos , Pressão Intraocular/fisiologia , Idoso , Pessoa de Meia-Idade
15.
Yale J Biol Med ; 97(1): 17-27, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38559461

RESUMO

Enhanced health literacy in children has been empirically linked to better health outcomes over the long term; however, few interventions have been shown to improve health literacy. In this context, we investigate whether large language models (LLMs) can serve as a medium to improve health literacy in children. We tested pediatric conditions using 26 different prompts in ChatGPT-3.5, ChatGPT-4, Microsoft Bing, and Google Bard (now known as Google Gemini). The primary outcome measurement was the reading grade level (RGL) of output as assessed by Gunning Fog, Flesch-Kincaid Grade Level, Automated Readability Index, and Coleman-Liau indices. Word counts were also assessed. Across all models, output for basic prompts such as "Explain" and "What is (are)," were at, or exceeded, the tenth-grade RGL. When prompts were specified to explain conditions from the first- to twelfth-grade level, we found that LLMs had varying abilities to tailor responses based on grade level. ChatGPT-3.5 provided responses that ranged from the seventh-grade to college freshmen RGL while ChatGPT-4 outputted responses from the tenth-grade to the college senior RGL. Microsoft Bing provided responses from the ninth- to eleventh-grade RGL while Google Bard provided responses from the seventh- to tenth-grade RGL. LLMs face challenges in crafting outputs below a sixth-grade RGL. However, their capability to modify outputs above this threshold, provides a potential mechanism for adolescents to explore, understand, and engage with information regarding their health conditions, spanning from simple to complex terms. Future studies are needed to verify the accuracy and efficacy of these tools.


Assuntos
Letramento em Saúde , Adolescente , Criança , Humanos , Estudos Transversais , Compreensão , Leitura , Idioma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA