Pesquisa | Portal Regional da BVS

1.

Gemini cationic surfactant of 1, 3-bis (dodecyl dimethyl ammonium chloride) propane as a novel excellent inhibitor for the corrosion of cold rolled steel in HCl solution.

Wei, Gaofei; Deng, Shuduan; Shao, Dandan; Xu, Dake; Lei, Ran; Li, Xianghong.

J Colloid Interface Sci ; 677(Pt A): 324-345, 2025 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-39096702

RESUMO

Gemini surfactants have become the research focus of novel excellent inhibitors because of their special structure (two amphiphilic moieties covalently connected at head group by a spacer) and excellent surface properties. It is proved by theoretical calculations that 1, 3-bis (dodecyl dimethyl ammonium chloride) propane (BDDACP) molecules can perform electron transfer with Fe (110). And it has a small fraction free volume, thus greatly reducing the diffusion and migration degree of corrosive particles. The potentiodynamic polarization curve showed that coefficients of cathodic and anodic reaction less than 1 and polarization resistance increased to 1602.9 Ω cm-2 after added BDDACP, confirming that BDDACP significantly inhibited the corrosion reaction by occupying the active site. The electrochemical impedance spectrum of imperfect semi-circle shows that the system resistance increases and double layer capacitance after added BDDACP. Weight loss tests also confirmed that BDDACP forms protective film by occupying the active sites on steel surface, and the maximum inhibition efficiency is 92 %. Comparison of the microscopic morphology showed that steel surface roughness was significantly reduced after added BDDACP. The results of time-of-flight secondary ion mass spectrometry show that steel surface contains some elements from BDDACP, which confirms the adsorption of BDDACP on steel surface.

2.

An Observational Study to Evaluate Readability and Reliability of AI-Generated Brochures for Emergency Medical Conditions.

S, Adithya; Aggarwal, Shreyas; Sridhar, Janani; Vs, Kavya; John, Victoria P; Singh, Chaihthanya.

Cureus ; 16(8): e68307, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-39350844

RESUMO

Introduction The study assesses the readability of AI-generated brochures for common emergency medical conditions like heart attack, anaphylaxis, and syncope. Thus, the study aims to compare the AI-generated responses for patient information guides of common emergency medical conditions using ChatGPT and Google Gemini. Methodology Brochures for each condition were created by both AI tools. Readability was assessed using the Flesch-Kincaid Calculator, evaluating word count, sentence count and ease of understanding. Reliability was measured using the Modified DISCERN Score. The similarity between AI outputs was determined using Quillbot. Statistical analysis was performed with R (v4.3.2). Results ChatGPT and Gemini produced brochures with no statistically significant differences in word count (p= 0.2119), sentence count (p=0.1276), readability (p=0.3796), or reliability (p=0.7407). However, ChatGPT provided more detailed content with 32.4% more words (582.80 vs. 440.20) and 51.6% more sentences (67.00 vs. 44.20). In addition, Gemini's brochures were slightly easier to read with a higher ease score (50.62 vs. 41.88). Reliability varied by topic with ChatGPT scoring higher for Heart Attack (4 vs. 3) and Choking (3 vs. 2), while Google Gemini scored higher for Anaphylaxis (4 vs. 3) and Drowning (4 vs. 3), highlighting the need for topic-specific evaluation. Conclusions Although AI-generated brochures from ChatGPT and Gemini are comparable in readability and reliability for patient information on emergency medical conditions, this study highlights that there is no statistically significant difference in the responses generated by the two AI tools.

3.

Vaccination hesitancy: agreement between WHO and ChatGPT-4.0 or Gemini Advanced.

Fiore, Matteo; Bianconi, Alessandro; Acuti Martellucci, Cecilia; Rosso, Annalisa; Zauli, Enrico; Flacco, Maria Elena; Manzoli, Lamberto.

Ann Ig ; 2024 Oct 07.

Artigo em Inglês | MEDLINE | ID: mdl-39373234

RESUMO

Background: An increasing number of individuals use online Artificial Intelligence (AI) - based chatbots to retrieve information on health-related topics. This study aims to evaluate the accuracy in answering vaccine-related answers of the currently most commonly used, advanced chatbots - ChatGPT-4.0 and Google Gemini Advanced. Methods: We compared the answers provided by the World Health Organization (WHO) to 38 open questions on vaccination myths and misconception, with the answers created by ChatGPT-4.0 and Gemini Advanced. Responses were considered as "appropriate", if the information provided was coherent and not in contrast to current WHO recommendations or to drug regulatory indications. Results and Conclusions: The rate of agreement between WHO answers and Chat-GPT-4.0 or Gemini Advanced was very high, as both provided 36 (94.7%) appropriate responses. The few discrepancies between WHO and AI-chatbots answers could not be considered "harmful", and both chatbots often invited the user to check reliable sources, such as CDC or the WHO websites, or to contact a local healthcare professional. In their current versions, both AI-chatbots may already be powerful instrument to support the traditional communication tools in primary prevention, with the potential to improve health literacy, medication adherence, and vaccine hesitancy and concerns. Given the rapid evolution of AI-based systems, further studies are strongly needed to monitor their accuracy and reliability over time.

4.

Performance of large language artificial intelligence models on solving restorative dentistry and endodontics student assessments.

Künzle, Paul; Paris, Sebastian.

Clin Oral Investig ; 28(11): 575, 2024 Oct 07.

Artigo em Inglês | MEDLINE | ID: mdl-39373739

RESUMO

OBJECTIVES: The advent of artificial intelligence (AI) and large language model (LLM)-based AI applications (LLMAs) has tremendous implications for our society. This study analyzed the performance of LLMAs on solving restorative dentistry and endodontics (RDE) student assessment questions. MATERIALS AND METHODS: 151 questions from a RDE question pool were prepared for prompting using LLMAs from OpenAI (ChatGPT-3.5,-4.0 and -4.0o) and Google (Gemini 1.0). Multiple-choice questions were sorted into four question subcategories, entered into LLMAs and answers recorded for analysis. P-value and chi-square statistical analyses were performed using Python 3.9.16. RESULTS: The total answer accuracy of ChatGPT-4.0o was the highest, followed by ChatGPT-4.0, Gemini 1.0 and ChatGPT-3.5 (72%, 62%, 44% and 25%, respectively) with significant differences between all LLMAs except GPT-4.0 models. The performance on subcategories direct restorations and caries was the highest, followed by indirect restorations and endodontics. CONCLUSIONS: Overall, there are large performance differences among LLMAs. Only the ChatGPT-4 models achieved a success ratio that could be used with caution to support the dental academic curriculum. CLINICAL RELEVANCE: While LLMAs could support clinicians to answer dental field-related questions, this capacity depends strongly on the employed model. The most performant model ChatGPT-4.0o achieved acceptable accuracy rates in some subject sub-categories analyzed.

Assuntos

Inteligência Artificial , Endodontia , Humanos , Endodontia/educação , Educação em Odontologia/métodos , Avaliação Educacional/métodos , Estudantes de Odontologia , Dentística Operatória/educação , Competência Clínica , Inquéritos e Questionários

5.

Accuracy of Large Language Models for Infective Endocarditis Prophylaxis in Dental Procedures.

Rewthamrongsris, Paak; Burapacheep, Jirayu; Trachoo, Vorapat; Porntaveetus, Thantrira.

Int Dent J ; 2024 Oct 11.

Artigo em Inglês | MEDLINE | ID: mdl-39395898

RESUMO

PURPOSE: Infective endocarditis (IE) is a serious, life-threatening condition requiring antibiotic prophylaxis for high-risk individuals undergoing invasive dental procedures. As LLMs are rapidly adopted by dental professionals for their efficiency and accessibility, assessing their accuracy in answering critical questions about antibiotic prophylaxis for IE prevention is crucial. METHODS: Twenty-eight true/false questions based on the 2021 American Heart Association (AHA) guidelines for IE were posed to 7 popular LLMs. Each model underwent five independent runs per question using two prompt strategies: a pre-prompt as an experienced dentist and without a pre-prompt. Inter-model comparisons utilised the Kruskal-Wallis test, followed by post-hoc pairwise comparisons using Prism 10 software. RESULTS: Significant differences in accuracy were observed among the LLMs. All LLMs had a narrower confidence interval with a pre-prompt, and most, except Claude 3 Opus, showed improved performance. GPT-4o had the highest accuracy (80% with a pre-prompt, 78.57% without), followed by Gemini 1.5 Pro (78.57% and 77.86%) and Claude 3 Opus (75.71% and 77.14%). Gemini 1.5 Flash had the lowest accuracy (68.57% and 63.57%). Without a pre-prompt, Gemini 1.5 Flash's accuracy was significantly lower than Claude 3 Opus, Gemini 1.5 Pro, and GPT-4o. With a pre-prompt, Gemini 1.5 Flash and Claude 3.5 were significantly less accurate than Gemini 1.5 Pro and GPT-4o. None of the LLMs met the commonly used benchmark scores. All models provided both correct and incorrect answers randomly, except Claude 3.5 Sonnet with a pre-prompt, which consistently gave incorrect answers to eight questions across five runs. CONCLUSION: LLMs like GPT-4o show promise for retrieving AHA-IE guideline information, achieving up to 80% accuracy. However, complex medical questions may still pose a challenge. Pre-prompts offer a potential solution, and domain-specific training is essential for optimizing LLM performance in healthcare, especially with the emergence of models with increased token limits.

6.

Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.

Bahir, Daniel; Zur, Omri; Attal, Leah; Nujeidat, Zaki; Knaanie, Ariela; Pikkel, Joseph; Mimouni, Michael; Plopsky, Gilad.

Graefes Arch Clin Exp Ophthalmol ; 2024 Sep 15.

Artigo em Inglês | MEDLINE | ID: mdl-39277830

RESUMO

INTRODUCTION: The rapid advancement of artificial intelligence (AI), particularly in large language models like ChatGPT and Google's Gemini AI, marks a transformative era in technological innovation. This study explores the potential of AI in ophthalmology, focusing on the capabilities of ChatGPT and Gemini AI. While these models hold promise for medical education and clinical support, their integration requires comprehensive evaluation. This research aims to bridge a gap in the literature by comparing Gemini AI and ChatGPT, assessing their performance against ophthalmology residents using a dataset derived from ophthalmology board exams. METHODS: A dataset comprising 600 questions across 12 subspecialties was curated from Israeli ophthalmology residency exams, encompassing text and image-based formats. Four AI models - ChatGPT-3.5, ChatGPT-4, Gemini, and Gemini Advanced - underwent testing on this dataset. The study includes a comparative analysis with Israeli ophthalmology residents, employing specific metrics for performance assessment. RESULTS: Gemini Advanced demonstrated superior performance with a 66% accuracy rate. Notably, ChatGPT-4 exhibited improvement at 62%, Gemini at 58%, and ChatGPT-3.5 served as the reference at 46%. Comparative analysis with residents offered insights into AI models' performance relative to human-level medical knowledge. Further analysis delved into yearly performance trends, topic-specific variations, and the impact of images on chatbot accuracy. CONCLUSION: The study unveils nuanced AI model capabilities in ophthalmology, emphasizing domain-specific variations. The superior performance of Gemini Advanced superior performance indicates significant advancements, while ChatGPT-4's improvement is noteworthy. Both Gemini and ChatGPT-3.5 demonstrated commendable performance. The comparative analysis underscores AI's evolving role as a supplementary tool in medical education. This research contributes vital insights into AI effectiveness in ophthalmology, highlighting areas for refinement. As AI models evolve, targeted improvements can enhance adaptability across subspecialties, making them valuable tools for medical professionals and enriching patient care. KEY MESSAGES: What is known AI breakthroughs, like ChatGPT and Google's Gemini AI, are reshaping healthcare. In ophthalmology, AI integration has overhauled clinical workflows, particularly in analyzing images for diseases like diabetic retinopathy and glaucoma. What is new This study presents a pioneering comparison between Gemini AI and ChatGPT, evaluating their performance against ophthalmology residents using a meticulously curated dataset derived from real-world ophthalmology board exams. Notably, Gemini Advanced demonstrates superior performance, showcasing substantial advancements, while the evolution of ChatGPT-4 also merits attention. Both models exhibit commendable capabilities. These findings offer crucial insights into the efficacy of AI in ophthalmology, shedding light on areas ripe for further enhancement and optimization.

7.

Synthesis and Kinetics of CO₂-Responsive Gemini Surfactants.

Li, Yao; Tang, Xinyu; Yang, Pujiang; Zhang, Yuhui; Liu, Jinhe.

Molecules ; 29(17)2024 Sep 03.

Artigo em Inglês | MEDLINE | ID: mdl-39275014

RESUMO

Surfactants are hailed as "industrial monosodium glutamate", and are widely used as emulsifiers, demulsifiers, water treatment agents, etc., in the petroleum industry. However, due to the unidirectivity of conventional surfactants, the difficulty in demulsifying petroleum emulsions generated after emulsification with such surfactants increases sharply. Therefore, it is of great significance and application value to design and develop a novel switchable surfactant for oil exploitation. In this study, a CO2-switchable Gemini surfactant of N,N'-dimethyl-N,N'-didodecyl butylene diamine (DMDBA) was synthesized from 1, 4-dibromobutane, dodecylamine, formic acid, and formaldehyde. Then, the synthesized surfactant was structurally characterized by infrared (IR) spectroscopy, hydrogen nuclear magnetic resonance (1H NMR) spectroscopy, and electrospray ionization mass spectrometry (ESI-MS); the changes in conductivity and Zeta potential of DMDBA before and after CO2/N2 injection were also studied. The results show that DMDBA had a good CO2 response and cycle reversibility. The critical micelle concentration (CMC) of cationic surfactant obtained from DMDBA by injecting CO2 was 1.45 × 10-4 mol/L, the surface tension at CMC was 33.4 mN·m-1, and the contact angle with paraffin was less than 90°, indicating that it had a good surface activity and wettability. In addition, the kinetic law of the process of producing surfactant by injecting CO2 was studied, and it was found that the process was a second-order reaction. The influence of temperature and gas velocity on the reaction dynamics was explored. The calculated values from the equation were in good agreement with the measured values, with a correlation coefficient greater than 0.9950. The activation energy measured during the formation of surfactant was Ea = 91.16 kJ/mol.

8.

Evaluation of the accuracy and readability of ChatGPT-4 and Google Gemini in providing information on retinal detachment: a multicenter expert comparative study.

Strzalkowski, Piotr; Strzalkowska, Alicja; Chhablani, Jay; Pfau, Kristina; Errera, Marie-Hélène; Roth, Mathias; Schaub, Friederike; Bechrakis, Nikolaos E; Hoerauf, Hans; Reiter, Constantin; Schuster, Alexander K; Geerling, Gerd; Guthoff, Rainer.

Int J Retina Vitreous ; 10(1): 61, 2024 Sep 02.

Artigo em Inglês | MEDLINE | ID: mdl-39223678

RESUMO

BACKGROUND: Large language models (LLMs) such as ChatGPT-4 and Google Gemini show potential for patient health education, but concerns about their accuracy require careful evaluation. This study evaluates the readability and accuracy of ChatGPT-4 and Google Gemini in answering questions about retinal detachment. METHODS: Comparative study analyzing responses from ChatGPT-4 and Google Gemini to 13 retinal detachment questions, categorized by difficulty levels (D1, D2, D3). Masked responses were reviewed by ten vitreoretinal specialists and rated on correctness, errors, thematic accuracy, coherence, and overall quality grading. Analysis included Flesch Readability Ease Score, word and sentence counts. RESULTS: Both Artificial Intelligence tools required college-level understanding for all difficulty levels. Google Gemini was easier to understand (p = 0.03), while ChatGPT-4 provided more correct answers for the more difficult questions (p = 0.0005) with fewer serious errors. ChatGPT-4 scored highest on most challenging questions, showing superior thematic accuracy (p = 0.003). ChatGPT-4 outperformed Google Gemini in 8 of 13 questions, with higher overall quality grades in the easiest (p = 0.03) and hardest levels (p = 0.0002), showing a lower grade as question difficulty increased. CONCLUSIONS: ChatGPT-4 and Google Gemini effectively address queries about retinal detachment, offering mostly accurate answers with few critical errors, though patients require higher education for comprehension. The implementation of AI tools may contribute to improving medical care by providing accurate and relevant healthcare information quickly.

9.

Comparative Evaluation of AI Models Such as ChatGPT 3.5, ChatGPT 4.0, and Google Gemini in Neuroradiology Diagnostics.

Gupta, Rishi; Hamid, Abdullgabbar M; Jhaveri, Miral; Patel, Niki; Suthar, Pokhraj P.

Cureus ; 16(8): e67766, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-39323714

RESUMO

AIMS AND OBJECTIVE: Advances in artificial intelligence (AI), particularly in large language models (LLMs) like ChatGPT (versions 3.5 and 4.0) and Google Gemini, are transforming healthcare. This study explores the performance of these AI models in solving diagnostic quizzes from "Neuroradiology: A Core Review" to evaluate their potential as diagnostic tools in radiology. MATERIALS AND METHODS: We assessed the accuracy of ChatGPT 3.5, ChatGPT 4.0, and Google Gemini using 262 multiple-choice questions covering brain, head and neck, spine, and non-interpretive skills. Each AI tool provided answers and explanations, which were compared to textbook answers. The analysis followed the STARD (Standards for Reporting of Diagnostic Accuracy Studies) guidelines, and accuracy was calculated for each AI tool and subgroup. RESULTS: ChatGPT 4.0 achieved the highest overall accuracy at 64.89%, outperforming ChatGPT 3.5 (62.60%) and Google Gemini (55.73%). ChatGPT 4.0 excelled in brain, head, and neck diagnostics, while Google Gemini performed best in head and neck but lagged in other areas. ChatGPT 3.5 showed consistent performance across all subgroups. CONCLUSION: This study found that advanced AI models, including ChatGPT 4.0 and Google Gemini, vary in diagnostic accuracy, with ChatGPT 4.0 leading at 64.89% overall. While these tools are promising in improving diagnostics and medical education, their effectiveness varies by area, and Google Gemini performs unevenly across different categories. The study underscores the need for ongoing improvements and broader evaluation to address ethical concerns and optimize AI use in patient care.

10.

Unlocking the potential of advanced large language models in medication review and reconciliation: A proof-of-concept investigation.

Sridharan, Kannan; Sivaramakrishnan, Gowri.

Explor Res Clin Soc Pharm ; 15: 100492, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-39257533

RESUMO

Background: Medication review and reconciliation is essential for optimizing drug therapy and minimizing medication errors. Large language models (LLMs) have been recently shown to possess a lot of potential applications in healthcare field due to their abilities of deductive, abductive, and logical reasoning. The present study assessed the abilities of LLMs in medication review and medication reconciliation processes. Methods: Four LLMs were prompted with appropriate queries related to dosing regimen errors, drug-drug interactions, therapeutic drug monitoring, and genomics-based decision-making process. The veracity of the LLM outputs were verified from validated sources using pre-validated criteria (accuracy, relevancy, risk management, hallucination mitigation, and citations and guidelines). The impacts of the erroneous responses on the patients' safety were categorized either as major or minor. Results: In the assessment of four LLMs regarding dosing regimen errors, drug-drug interactions, and suggestions for dosing regimen adjustments based on therapeutic drug monitoring and genomics-based individualization of drug therapy, responses were generally consistent across prompts with no clear pattern in response quality among the LLMs. For identification of dosage regimen errors, ChatGPT performed well overall, except for the query related to simvastatin. In terms of potential drug-drug interactions, all LLMs recognized interactions with warfarin but missed the interaction between metoprolol and verapamil. Regarding dosage modifications based on therapeutic drug monitoring, Claude-Instant provided appropriate suggestions for two scenarios and nearly appropriate suggestions for the other two. Similarly, for genomics-based decision-making, Claude-Instant offered satisfactory responses for four scenarios, followed by Gemini for three. Notably, Gemini stood out by providing references to guidelines or citations even without prompting, demonstrating a commitment to accuracy and reliability in its responses. Minor impacts were noted in identifying appropriate dosing regimens and therapeutic drug monitoring, while major impacts were found in identifying drug interactions and making pharmacogenomic-based therapeutic decisions. Conclusion: Advanced LLMs hold significant promise in revolutionizing the medication review and reconciliation process in healthcare. Diverse impacts on patient safety were observed. Integrating and validating LLMs within electronic health records and prescription systems is essential to harness their full potential and enhance patient safety and care quality.

11.

Gemini-Assisted Deep Learning Classification Model for Automated Diagnosis of High-Resolution Esophageal Manometry Images.

Popa, Stefan Lucian; Surdea-Blaga, Teodora; Dumitrascu, Dan Lucian; Pop, Andrei Vasile; Ismaiel, Abdulrahman; David, Liliana; Brata, Vlad Dumitru; Turtoi, Daria Claudia; Chiarioni, Giuseppe; Savarino, Edoardo Vincenzo; Zsigmond, Imre; Czako, Zoltan; Leucuta, Daniel Corneliu.

Medicina (Kaunas) ; 60(9)2024 Sep 13.

Artigo em Inglês | MEDLINE | ID: mdl-39336534

RESUMO

Background/Objectives: To develop a deep learning model for esophageal motility disorder diagnosis using high-resolution manometry images with the aid of Gemini. Methods: Gemini assisted in developing this model by aiding in code writing, preprocessing, model optimization, and troubleshooting. Results: The model demonstrated an overall precision of 0.89 on the testing set, with an accuracy of 0.88, a recall of 0.88, and an F1-score of 0.885. It presented better results for multiple categories, particularly in the panesophageal pressurization category, with precision = 0.99 and recall = 0.99, yielding a balanced F1-score of 0.99. Conclusions: This study demonstrates the potential of artificial intelligence, particularly Gemini, in aiding the creation of robust deep learning models for medical image analysis, solving not just simple binary classification problems but more complex, multi-class image classification tasks.

Assuntos

Aprendizado Profundo , Transtornos da Motilidade Esofágica , Manometria , Humanos , Manometria/métodos , Transtornos da Motilidade Esofágica/diagnóstico , Transtornos da Motilidade Esofágica/classificação , Transtornos da Motilidade Esofágica/fisiopatologia , Processamento de Imagem Assistida por Computador/métodos , Esôfago/diagnóstico por imagem , Esôfago/fisiopatologia , Esôfago/fisiologia

12.

Enhancing the Interpretability of Malaria and Typhoid Diagnosis with Explainable AI and Large Language Models.

Attai, Kingsley; Ekpenyong, Moses; Amannah, Constance; Asuquo, Daniel; Ajuga, Peterben; Obot, Okure; Johnson, Ekemini; John, Anietie; Maduka, Omosivie; Akwaowo, Christie; Uzoka, Faith-Michael.

Trop Med Infect Dis ; 9(9)2024 Sep 16.

Artigo em Inglês | MEDLINE | ID: mdl-39330905

RESUMO

Malaria and Typhoid fever are prevalent diseases in tropical regions, and both are exacerbated by unclear protocols, drug resistance, and environmental factors. Prompt and accurate diagnosis is crucial to improve accessibility and reduce mortality rates. Traditional diagnosis methods cannot effectively capture the complexities of these diseases due to the presence of similar symptoms. Although machine learning (ML) models offer accurate predictions, they operate as "black boxes" with non-interpretable decision-making processes, making it challenging for healthcare providers to comprehend how the conclusions are reached. This study employs explainable AI (XAI) models such as Local Interpretable Model-agnostic Explanations (LIME), and Large Language Models (LLMs) like GPT to clarify diagnostic results for healthcare workers, building trust and transparency in medical diagnostics by describing which symptoms had the greatest impact on the model's decisions and providing clear, understandable explanations. The models were implemented on Google Colab and Visual Studio Code because of their rich libraries and extensions. Results showed that the Random Forest model outperformed the other tested models; in addition, important features were identified with the LIME plots while ChatGPT 3.5 had a comparative advantage over other LLMs. The study integrates RF, LIME, and GPT in building a mobile app to enhance the interpretability and transparency in malaria and typhoid diagnosis system. Despite its promising results, the system's performance is constrained by the quality of the dataset. Additionally, while LIME and GPT improve transparency, they may introduce complexities in real-time deployment due to computational demands and the need for internet service to maintain relevance and accuracy. The findings suggest that AI-driven diagnostic systems can significantly enhance healthcare delivery in environments with limited resources, and future works can explore the applicability of this framework to other medical conditions and datasets.

13.

Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.

Is, Enes Efe; Menekseoglu, Ahmet Kivanc.

Clin Rheumatol ; 2024 Sep 28.

Artigo em Inglês | MEDLINE | ID: mdl-39340572

RESUMO

OBJECTIVES: This study evaluates the performance of AI models, ChatGPT-4o and Google Gemini, in answering rheumatology board-level questions, comparing their effectiveness, reliability, and applicability in clinical practice. METHOD: A cross-sectional study was conducted using 420 rheumatology questions from the BoardVitals question bank, excluding 27 visual data questions. Both artificial intelligence models categorized the questions according to difficulty (easy, medium, hard) and answered them. In addition, the reliability of the answers was assessed by asking the questions a second time. The accuracy, reliability, and difficulty categorization of the AI models' response to the questions were analyzed. RESULTS: ChatGPT-4o answered 86.9% of the questions correctly, significantly outperforming Google Gemini's 60.2% accuracy (p < 0.001). When the questions were asked a second time, the success rate was 86.7% for ChatGPT-4o and 60.5% for Google Gemini. Both models mainly categorized questions as medium difficulty. ChatGPT-4o showed higher accuracy in various rheumatology subfields, notably in Basic and Clinical Science (p = 0.028), Osteoarthritis (p = 0.023), and Rheumatoid Arthritis (p < 0.001). CONCLUSIONS: ChatGPT-4o significantly outperformed Google Gemini in rheumatology board-level questions. This demonstrates the success of ChatGPT-4o in situations requiring complex and specialized knowledge related to rheumatological diseases. The performance of both AI models decreased as the question difficulty increased. This study demonstrates the potential of AI in clinical applications and suggests that its use as a tool to assist clinicians may improve healthcare efficiency in the future. Future studies using real clinical scenarios and real board questions are recommended. Key Points â¢ChatGPT-4o significantly outperformed Google Gemini in answering rheumatology board-level questions, achieving 86.9% accuracy compared to Google Gemini's 60.2%. â¢For both AI models, the correct answer rate decreased as the question difficulty increased. â¢The study demonstrates the potential for AI models to be used in clinical practice as a tool to assist clinicians and improve healthcare efficiency.

14.

Evaluación de la fiabilidad y legibilidad de las respuestas de los chatbots como recurso de información al paciente para las exploraciones PET-TC más communes.

Aydinbelge-Dizdar, N; Dizdar, K.

Rev Esp Med Nucl Imagen Mol (Engl Ed) ; : 500065, 2024 Sep 28.

Artigo em Inglês | MEDLINE | ID: mdl-39349172

RESUMO

PURPOSE: This study aimed to evaluate the reliability and readability of responses generated by two popular AI-chatbots, 'ChatGPT-4.0' and 'Google Gemini', to potential patient questions about PET/CT scans. MATERIALS AND METHODS: Thirty potential questions for each of [18F]FDG and [68Ga]Ga-DOTA-SSTR PET/CT, and twenty-nine potential questions for [68Ga]Ga-PSMA PET/CT were asked separately to ChatGPT-4 and Gemini in May 2024. The responses were evaluated for reliability and readability using the modified DISCERN (mDISCERN) scale, Flesch Reading Ease (FRE), Gunning Fog Index (GFI), and Flesch-Kincaid Reading Grade Level (FKRGL). The inter-rater reliability of mDISCERN scores provided by three raters (ChatGPT-4, Gemini, and a nuclear medicine physician) for the responses was assessed. RESULTS: The median [min-max] mDISCERN scores reviewed by the physician for responses about FDG, PSMA and DOTA PET/CT scans were 3.5 [2-4], 3 [3-4], 3 [3-4] for ChatPT-4 and 4 [2-5], 4 [2-5], 3.5 [3-5] for Gemini, respectively. The mDISCERN scores assessed using ChatGPT-4 for answers about FDG, PSMA, and DOTA-SSTR PET/CT scans were 3.5 [3-5], 3 [3-4], 3 [2-3] for ChatGPT-4, and 4 [3-5], 4 [3-5], 4 [3-5] for Gemini, respectively. The mDISCERN scores evaluated using Gemini for responses FDG, PSMA, and DOTA-SSTR PET/CTs were 3 [2-4], 2 [2-4], 3 [2-4] for ChatGPT-4, and 3 [2-5], 3 [1-5], 3 [2-5] for Gemini, respectively. The inter-rater reliability correlation coefficient of mDISCERN scores for ChatGPT-4 responses about FDG, PSMA, and DOTA-SSTR PET/CT scans were 0.629 (95% CIâ¯=â¯0,32-0,812), 0.707 (95% CIâ¯=â¯0.458-0.853) and 0.738 (95% CIâ¯=â¯0.519-0.866), respectively (pâ¯<â¯0.001). The correlation coefficient of mDISCERN scores for Gemini responses about FDG, PSMA, and DOTA-SSTR PET/CT scans were 0.824 (95% CIâ¯=â¯0.677-0.910), 0.881 (95% CIâ¯=â¯0.78-0.94) and 0.847 (95% CIâ¯=â¯0.719-0.922), respectively (pâ¯<â¯0.001). The mDISCERN scores assessed by ChatGPT-4, Gemini, and the physician showed that the chatbots' responses about all PET/CT scans had moderate to good statistical agreement according to the inter-rater reliability correlation coefficient (pâ¯<â¯0,001). There was a statistically significant difference in all readability scores (FKRGL, GFI, and FRE) of ChatGPT-4 and Gemini responses about PET/CT scans (pâ¯<â¯0,001). Gemini responses were shorter and had better readability scores than ChatGPT-4 responses. CONCLUSION: There was an acceptable level of agreement between raters for the mDISCERN score, indicating agreement with the overall reliability of the responses. However, the information provided by AI-chatbots cannot be easily read by the public.

15.

LLM-based automatic short answer grading in undergraduate medical education.

Grévisse, Christian.

BMC Med Educ ; 24(1): 1060, 2024 Sep 27.

Artigo em Inglês | MEDLINE | ID: mdl-39334087

RESUMO

BACKGROUND: Multiple choice questions are heavily used in medical education assessments, but rely on recognition instead of knowledge recall. However, grading open questions is a time-intensive task for teachers. Automatic short answer grading (ASAG) has tried to fill this gap, and with the recent advent of Large Language Models (LLM), this branch has seen a new momentum. METHODS: We graded 2288 student answers from 12 undergraduate medical education courses in 3 languages using GPT-4 and Gemini 1.0 Pro. RESULTS: GPT-4 proposed significantly lower grades than the human evaluator, but reached low rates of false positives. The grades of Gemini 1.0 Pro were not significantly different from the teachers'. Both LLMs reached a moderate agreement with human grades, and a high precision for GPT-4 among answers considered fully correct. A consistent grading behavior could be determined for high-quality keys. A weak correlation was found wrt. the length or language of student answers. There is a risk of bias if the LLM knows the human grade a priori. CONCLUSIONS: LLM-based ASAG applied to medical education still requires human oversight, but time can be spared on the edge cases, allowing teachers to focus on the middle ones. For Bachelor-level medical education questions, the training knowledge of LLMs seems to be sufficient, fine-tuning is thus not necessary.

Assuntos

Educação de Graduação em Medicina , Avaliação Educacional , Educação de Graduação em Medicina/métodos , Humanos , Avaliação Educacional/métodos , Idioma , Estudantes de Medicina

16.

Comparative performance analysis of large language models: ChatGPT-3.5, ChatGPT-4 and Google Gemini in glucocorticoid-induced osteoporosis.

Tong, Linjian; Zhang, Chaoyang; Liu, Rui; Yang, Jia; Sun, Zhiming.

J Orthop Surg Res ; 19(1): 574, 2024 Sep 18.

Artigo em Inglês | MEDLINE | ID: mdl-39289734

RESUMO

BACKGROUNDS: The use of large language models (LLMs) in medicine can help physicians improve the quality and effectiveness of health care by increasing the efficiency of medical information management, patient care, medical research, and clinical decision-making. METHODS: We collected 34 frequently asked questions about glucocorticoid-induced osteoporosis (GIOP), covering topics related to the disease's clinical manifestations, pathogenesis, diagnosis, treatment, prevention, and risk factors. We also generated 25 questions based on the 2022 American College of Rheumatology Guideline for the Prevention and Treatment of Glucocorticoid-Induced Osteoporosis (2022 ACR-GIOP Guideline). Each question was posed to the LLM (ChatGPT-3.5, ChatGPT-4, and Google Gemini), and three senior orthopedic surgeons independently rated the responses generated by the LLMs. Three senior orthopedic surgeons independently rated the answers based on responses ranging between 1 and 4 points. A total score (TS) > 9 indicated 'good' responses, 6 ≤ TS ≤ 9 indicated 'moderate' responses, and TS < 6 indicated 'poor' responses. RESULTS: In response to the general questions related to GIOP and the 2022 ACR-GIOP Guidelines, Google Gemini provided more concise answers than the other LLMs. In terms of pathogenesis, ChatGPT-4 had significantly higher total scores (TSs) than ChatGPT-3.5. The TSs for answering questions related to the 2022 ACR-GIOP Guideline by ChatGPT-4 were significantly higher than those for Google Gemini. ChatGPT-3.5 and ChatGPT-4 had significantly higher self-corrected TSs than pre-corrected TSs, while Google Gemini self-corrected for responses that were not significantly different than before. CONCLUSIONS: Our study showed that Google Gemini provides more concise and intuitive responses than ChatGPT-3.5 and ChatGPT-4. ChatGPT-4 performed significantly better than ChatGPT3.5 and Google Gemini in terms of answering general questions about GIOP and the 2022 ACR-GIOP Guidelines. ChatGPT3.5 and ChatGPT-4 self-corrected better than Google Gemini.

Assuntos

Glucocorticoides , Osteoporose , Humanos , Osteoporose/induzido quimicamente , Glucocorticoides/efeitos adversos , Inquéritos e Questionários

17.

Synergistic effect of curcumin and tamoxifen loaded in pH-responsive gemini surfactant nanoparticles on breast cancer cells.

Ashin, Zeinab Fotouhi; Sadeghi-Mohammadi, Sanam; Vaezi, Zahra; Najafi, Farhood; AdibAmini, Shaghayegh; Sadeghizadeh, Majid; Naderi-Manesh, Hossein.

BMC Complement Med Ther ; 24(1): 337, 2024 Sep 20.

Artigo em Inglês | MEDLINE | ID: mdl-39304876

RESUMO

BACKGROUND: Drug combination therapy is preferred over monotherapy in clinical research to improve therapeutic effects. Developing a new nanodelivery system for cancer drugs can reduce side effects and provide several advantages, including matched pharmacokinetics and potential synergistic activity. This study aimed to examine and determine the efficiency of the gemini surfactants (GSs) as a pH-sensitive polymeric carrier and cell-penetrating agent in cancer cells to achieve dual drug delivery and synergistic effects of curcumin (Cur) combined with tamoxifen citrate (TMX) in the treatment of MCF-7 and MDA-MB-231 human BC cell lines. METHODS: The synthesized NPs were self-assembled using a modified nanoprecipitation method. The functional groups and crystalline form of the nanoformulation were examined by Fourier-transform infrared spectroscopy (FTIR), X-ray diffraction (XRD), differential scanning calorimetry (DSC), and dynamic light scattering (DLS) used to assess zeta potential and particle size, and the morphological analysis determined by transmission electron microscopy (TEM). The anticancer effect was evaluated through an in vitro cytotoxicity MTT assay, flow cytometry analysis, and apoptosis analysis performed for mechanism investigation. RESULTS: The tailored NPs were developed with a size of 252.3 ± 24.6 nm and zeta potential of 18.2 ± 4.4 mV capable of crossing the membrane of cancer cells. The drug loading and release efficacy assessment showed that the loading of TMX and Cur were 93.84% ± 1.95% and 90.18% ± 0.56%, respectively. In addition, the drug release was more controlled and slower than the free state. Polymeric nanocarriers improved controlled drug release 72.19 ± 2.72% of Tmx and 55.50 ± 2.86% of Cur were released from the Tmx-Cur-Gs NPs after 72 h at pH = 5.5. This confirms the positive effect of polymeric nanocarriers on the controlled drug release mechanism. moreover, the toxicity test showed that combination-drug delivery was much more greater than single-drug delivery in MCF-7 and MDA-MB-231 cell lines. Cellular imaging showed excellent internalization of TMX-Cur-GS NPs in both MCF-7 and MDA-MB-231 cells and synergistic anticancer effects, with combination indices of 0.561 and 0.353, respectively. CONCLUSION: The combined drug delivery system had a greater toxic effect on cell lines than single-drug delivery. The synergistic effect of TMX and Cur with decreasing inhibitory concentrations could be a more promising system for BC-targeted therapy using GS NPs.

Assuntos

Neoplasias da Mama , Curcumina , Nanopartículas , Tensoativos , Tamoxifeno , Humanos , Curcumina/farmacologia , Curcumina/química , Tamoxifeno/farmacologia , Tamoxifeno/química , Nanopartículas/química , Neoplasias da Mama/tratamento farmacológico , Tensoativos/química , Tensoativos/farmacologia , Concentração de Íons de Hidrogênio , Feminino , Sinergismo Farmacológico , Células MCF-7 , Linhagem Celular Tumoral , Antineoplásicos/farmacologia , Antineoplásicos/química , Portadores de Fármacos/química

18.

ChatGPT-3.5 Versus Google Bard: Which Large Language Model Responds Best to Commonly Asked Pregnancy Questions?

Khromchenko, Keren; Shaikh, Sameeha; Singh, Meghana; Vurture, Gregory; Rana, Rima A; Baum, Jonathan D.

Cureus ; 16(7): e65543, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-39188430

RESUMO

Large language models (LLM) have been widely used to provide information in many fields, including obstetrics and gynecology. Which model performs best in providing answers to commonly asked pregnancy questions is unknown. A qualitative analysis of Chat Generative Pre-Training Transformer Version 3.5 (ChatGPT-3.5) (OpenAI, Inc., San Francisco, California, United States) and Bard, recently renamed Google Gemini (Google LLC, Mountain View, California, United States), was performed in August of 2023. Each LLM was queried on 12 commonly asked pregnancy questions and asked for their references. Review and grading of the responses and references for both LLMs were performed by the co-authors individually and then as a group to formulate a consensus. Query responses were graded as "acceptable" or "not acceptable" based on correctness and completeness in comparison to American College of Obstetricians and Gynecologists (ACOG) publications, PubMed-indexed evidence, and clinical experience. References were classified as "verified," "broken," "irrelevant," "non-existent," and "no references." Grades of "acceptable" were given to 58% of ChatGPT-3.5 responses (seven out of 12) and 83% of Bard responses (10 out of 12). In regard to references, ChatGPT-3.5 had reference issues in 100% of its references, and Bard had discrepancies in 8% of its references (one out of 12). When comparing ChatGPT-3.5 responses between May 2023 and August 2023, a change in "acceptable" responses was noted: 50% versus 58%, respectively. Bard answered more questions correctly than ChatGPT-3.5 when queried on a small sample of commonly asked pregnancy questions. ChatGPT-3.5 performed poorly in terms of reference verification. The overall performance of ChatGPT-3.5 remained stable over time, with approximately one-half of responses being "acceptable" in both May and August of 2023. Both LLMs need further evaluation and vetting before being accepted as accurate and reliable sources of information for pregnant women.

19.

The effectivity and applicability of a novel sugar-based anionic and nonionic Gemini surfactant synthetized for the perchloroethylene-contaminated groundwater remediation.

Yao, Yu; Fu, Yufeng; Zhang, Chengwu; Zhang, Hui; Qin, Chuanyu.

J Hazard Mater ; 478: 135458, 2024 Oct 05.

Artigo em Inglês | MEDLINE | ID: mdl-39173379

RESUMO

Surfactant-enhanced aquifer remediation (SEAR) has effectively removed dense nonaqueous phase liquids (DNAPLs) from the contaminated aquifers. However, restricted by structural defects, typical monomeric surfactants undergo precipitation, high adsorption loss, and poor solubilization in aquifers, resulting in low remediation efficiency. In this study, a novel sugar-based anionic and non-ionic Gemini surfactant (SANG) was designed and synthesized for SEAR. Glucose was introduced into SANG as a non-ionic group to overcome the interference of low temperature and ions in groundwater. Sodium sulfonate was introduced as an anionic group to overcome aquifer adsorption loss. Two long-straight carbon chains were introduced as hydrophobic groups to provide high surface activity and solubilizing capacity. Even with low temperature or high salt content, its solution did not precipitate in aquifer conditions. The adsorption loss was as low as 0.54 and 0.90 mg/g in medium and fine sand, respectively. Compared with typical surfactants used for SEAR, SANG had the highest solubilization and desorption abilities for perchloroethylene (PCE) without emulsification, a crucial negative that Tween80 and other non-ionic surfactants exhibit. After flushing the contaminated aquifer using SANG, > 99 % of PCE was removed. Thus, with low potential environmental risk, SANG is effectively applicable in subsurface remediation, making it a better surfactant choice for SEAR.

20.

Comparing Vision-Capable Models, GPT-4 and Gemini, With GPT-3.5 on Taiwan's Pulmonologist Exam.

Chen, Chih-Hsiung; Hsieh, Kuang-Yu; Huang, Kuo-En; Lai, Hsien-Yun.

Cureus ; 16(8): e67641, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-39185287

RESUMO

Introduction The latest generation of large language models (LLMs) features multimodal capabilities, allowing them to interpret graphics, images, and videos, which are crucial in medical fields. This study investigates the vision capabilities of the next-generation Generative Pre-trained Transformer 4 (GPT-4) and Google's Gemini. Methods To establish a comparative baseline, we used GPT-3.5, a model limited to text processing, and evaluated the performance of both GPT-4 and Gemini on questions from the Taiwan Specialist Board Exams in Pulmonary and Critical Care Medicine. Our dataset included 1,100 questions from 2012 to 2023, with 100 questions per year. Of these, 1,059 were in pure text and 41 were text with images, with the majority in a non-English language and only six in pure English. Results For each annual exam consisting of 100 questions from 2013 to 2023, GPT-4 achieved scores of 66, 69, 51, 64, 72, 64, 66, 64, 63, 68, and 67, respectively. Gemini scored 45, 48, 45, 45, 46, 59, 54, 41, 53, 45, and 45, while GPT-3.5 scored 39, 33, 35, 36, 32, 33, 43, 28, 32, 33, and 36. Conclusions These results demonstrate that the newer LLMs with vision capabilities significantly outperform the text-only model. When a passing score of 60 was set, GPT-4 passed most exams and approached human performance.

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA