ABSTRACT
The emergence of artificial intelligence (AI) has revolutionized many fields, including natural language processing, and marks a potential paradigm shift in the way we evaluate knowledge. One significant innovation in this area is ChatGPT, a large language model based on the GPT-3.5 architecture created by OpenAI, with one of its main aims being to aid in general text writing, including scientific texts. Here, we highlight the challenges and opportunities related to using generative AI and discuss both the benefits of its use, such as saving time by streamlining the writing process and reducing the amount of time spent on mundane tasks, and the potential drawbacks, including concerns regarding the accuracy and reliability of the information generated and its ethical use. In respect of both education and the writing of scientific texts, clear rules and objectives and institutional principles must be established for the use of AI. We also consider the positive and negative effects of the use of AI technologies on interpersonal interactions and behavior, and, as sleep scientists, its potential impacts on sleep. Striking a balance between the benefits and potential drawbacks of integrating AI into society demands ongoing research by experts, the wide dissemination of the scientific results, as well as continued public discourse on the subject.
ABSTRACT
OBJECTIVE: To investigate the performance of ChatGPT in the differential diagnosis of oral and maxillofacial diseases. METHODS: Thirty-seven oral and maxillofacial lesions findings were presented to ChatGPT-3.5 and - 4, 18 dental surgeons trained in oral medicine/pathology (OMP), 23 general dental surgeons (DDS), and 16 dental students (DS) for differential diagnosis. Additionally, a group of 15 general dentists was asked to describe 11 cases to ChatGPT versions. The ChatGPT-3.5, -4, and human primary and alternative diagnoses were rated by 2 independent investigators with a 4 Likert-Scale. The consistency of ChatGPT-3.5 and - 4 was evaluated with regenerated inputs. RESULTS: Moderate consistency of outputs was observed for ChatGPT-3.5 and - 4 to provide primary (κ = 0.532 and κ = 0.533 respectively) and alternative (κ = 0.337 and κ = 0.367 respectively) hypotheses. The mean of correct diagnoses was 64.86% for ChatGPT-3.5, 80.18% for ChatGPT-4, 86.64% for OMP, 24.32% for DDS, and 16.67% for DS. The mean correct primary hypothesis rates were 45.95% for ChatGPT-3.5, 61.80% for ChatGPT-4, 82.28% for OMP, 22.72% for DDS, and 15.77% for DS. The mean correct diagnosis rate for ChatGPT-3.5 with standard descriptions was 64.86%, compared to 45.95% with participants' descriptions. For ChatGPT-4, the mean was 80.18% with standard descriptions and 61.80% with participant descriptions. CONCLUSION: ChatGPT-4 demonstrates an accuracy comparable to specialists to provide differential diagnosis for oral and maxillofacial diseases. Consistency of ChatGPT to provide diagnostic hypotheses for oral diseases cases is moderate, representing a weakness for clinical application. The quality of case documentation and descriptions impacts significantly on the performance of ChatGPT. CLINICAL RELEVANCE: General dentists, dental students and specialists in oral medicine and pathology may benefit from ChatGPT-4 as an auxiliary method to define differential diagnosis for oral and maxillofacial lesions, but its accuracy is dependent on precise case descriptions.
Subject(s)
Mouth Diseases , Humans , Diagnosis, Differential , Mouth Diseases/diagnosis , Male , Female , Clinical CompetenceABSTRACT
OBJECTIVE: The rapid development of Artificial Intelligence (AI) has raised questions about its potential uses in different sectors of everyday life. Specifically in medicine, the question arose whether chatbots could be used as tools for clinical decision-making or patients' and physicians' education. To answer this question in the context of fertility, we conducted a test to determine whether current AI platforms can provide evidence-based responses regarding methods that can improve the outcomes of embryo transfers. METHODS: We asked nine popular chatbots to write a 300-word scientific essay, outlining scientific methods that improve embryo transfer outcomes. We then gathered the responses and extracted the methods suggested by each chatbot. RESULTS: Out of a total of 43 recommendations, which could be grouped into 19 similar categories, only 3/19 (15.8%) were evidence-based practices, those being "ultrasound-guided embryo transfer" in 7/9 (77.8%) chatbots, "single embryo transfer" in 4/9 (44.4%) and "use of a soft catheter" in 2/9 (22.2%), whereas some controversial responses like "preimplantation genetic testing" appeared frequently (6/9 chatbots; 66.7%), along with other debatable recommendations like "endometrial receptivity assay", "assisted hatching" and "time-lapse incubator". CONCLUSIONS: Our results suggest that AI is not yet in a position to give evidence-based recommendations in the field of fertility, particularly concerning embryo transfer, since the vast majority of responses consisted of scientifically unsupported recommendations. As such, both patients and physicians should be wary of guiding care based on chatbot recommendations in infertility. Chatbot results might improve with time especially if trained from validated medical databases; however, this will have to be scientifically checked.
ABSTRACT
BACKGROUND: ChatGPT was not intended for use in health care, but it has potential benefits that depend on end-user understanding and acceptability, which is where health care students become crucial. There is still a limited amount of research in this area. OBJECTIVE: The primary aim of our study was to assess the frequency of ChatGPT use, the perceived level of knowledge, the perceived risks associated with its use, and the ethical issues, as well as attitudes toward the use of ChatGPT in the context of education in the field of health. In addition, we aimed to examine whether there were differences across groups based on demographic variables. The second part of the study aimed to assess the association between the frequency of use, the level of perceived knowledge, the level of risk perception, and the level of perception of ethics as predictive factors for participants' attitudes toward the use of ChatGPT. METHODS: A cross-sectional survey was conducted from May to June 2023 encompassing students of medicine, nursing, dentistry, nutrition, and laboratory science across the Americas. The study used descriptive analysis, chi-square tests, and ANOVA to assess statistical significance across different categories. The study used several ordinal logistic regression models to analyze the impact of predictive factors (frequency of use, perception of knowledge, perception of risk, and ethics perception scores) on attitude as the dependent variable. The models were adjusted for gender, institution type, major, and country. Stata was used to conduct all the analyses. RESULTS: Of 2661 health care students, 42.99% (n=1144) were unaware of ChatGPT. The median score of knowledge was "minimal" (median 2.00, IQR 1.00-3.00). Most respondents (median 2.61, IQR 2.11-3.11) regarded ChatGPT as neither ethical nor unethical. Most participants (median 3.89, IQR 3.44-4.34) "somewhat agreed" that ChatGPT (1) benefits health care settings, (2) provides trustworthy data, (3) is a helpful tool for clinical and educational medical information access, and (4) makes the work easier. In total, 70% (7/10) of people used it for homework. As the perceived knowledge of ChatGPT increased, there was a stronger tendency with regard to having a favorable attitude toward ChatGPT. Higher ethical consideration perception ratings increased the likelihood of considering ChatGPT as a source of trustworthy health care information (odds ratio [OR] 1.620, 95% CI 1.498-1.752), beneficial in medical issues (OR 1.495, 95% CI 1.452-1.539), and useful for medical literature (OR 1.494, 95% CI 1.426-1.564; P<.001 for all results). CONCLUSIONS: Over 40% of American health care students (1144/2661, 42.99%) were unaware of ChatGPT despite its extensive use in the health field. Our data revealed the positive attitudes toward ChatGPT and the desire to learn more about it. Medical educators must explore how chatbots may be included in undergraduate health care education programs.
Subject(s)
Health Knowledge, Attitudes, Practice , Humans , Cross-Sectional Studies , Female , Male , Adult , Surveys and Questionnaires , Students, Health Occupations/psychology , Students, Health Occupations/statistics & numerical data , Attitude of Health Personnel , Young Adult , Students, Medical/psychology , Students, Medical/statistics & numerical dataABSTRACT
Background: Among emerging AI technologies, Chat-Generative Pre-Trained Transformer (ChatGPT) emerges as a notable language model, uniquely developed through artificial intelligence research. Its proven versatility across various domains, from language translation to healthcare data processing, underscores its promise within medical documentation, diagnostics, research, and education. The current comprehensive review aimed to investigate the utility of ChatGPT in urology education and practice and to highlight its potential limitations. Methods: The authors conducted a comprehensive literature review of the use of ChatGPT and its applications in urology education, research, and practice. Through a systematic review of the literature, with a search strategy using databases, such as PubMed and Embase, we analyzed the advantages and limitations of using ChatGPT in urology and evaluated its potential impact. Results: A total of 78 records were eligible for inclusion. The benefits of ChatGPT were frequently cited across various contexts. In educational/academic benefits mentioned in 21 records (87.5%), ChatGPT showed the ability to assist urologists by offering precise information and responding to inquiries derived from patient data analysis, thereby supporting decision making; in 18 records (75%), advantages comprised personalized medicine, predictive capabilities for disease risks and outcomes, streamlining clinical workflows and improved diagnostics. Nevertheless, apprehensions were expressed regarding potential misinformation, underscoring the necessity for human supervision to guarantee patient safety and address ethical concerns. Conclusion: The potential applications of ChatGPT hold the capacity to bring about transformative changes in urology education, research, and practice. AI technology can serve as a useful tool to augment human intelligence; however, it is essential to use it in a responsible and ethical manner.
Subject(s)
Artificial Intelligence , Urology , Humans , Urology/education , Delivery of Health CareABSTRACT
Unlabelled: Large language models (LLMs), like ChatGPT, are transforming the landscape of medical education. They offer a vast range of applications, such as tutoring (personalized learning), patient simulation, generation of examination questions, and streamlined access to information. The rapid advancement of medical knowledge and the need for personalized learning underscore the relevance and timeliness of exploring innovative strategies for integrating artificial intelligence (AI) into medical education. In this paper, we propose coupling evidence-based learning strategies, such as active recall and memory cues, with AI to optimize learning. These strategies include the generation of tests, mnemonics, and visual cues.
Subject(s)
Artificial Intelligence , Education, Medical , Humans , Education, Medical/methods , Learning , Evidence-Based Medicine/education , Evidence-Based Medicine/methodsABSTRACT
Introduction: This research investigated the capabilities of ChatGPT-4 compared to medical students in answering MCQs using the revised Bloom's Taxonomy as a benchmark. Methods: A cross-sectional study was conducted at The University of the West Indies, Barbados. ChatGPT-4 and medical students were assessed on MCQs from various medical courses using computer-based testing. Results: The study included 304 MCQs. Students demonstrated good knowledge, with 78% correctly answering at least 90% of the questions. However, ChatGPT-4 achieved a higher overall score (73.7%) compared to students (66.7%). Course type significantly affected ChatGPT-4's performance, but revised Bloom's Taxonomy levels did not. A detailed association check between program levels and Bloom's taxonomy levels for correct answers by ChatGPT-4 showed a highly significant correlation (p<0.001), reflecting a concentration of "remember-level" questions in preclinical and "evaluate-level" questions in clinical courses. Discussion: The study highlights ChatGPT-4's proficiency in standardized tests but indicates limitations in clinical reasoning and practical skills. This performance discrepancy suggests that the effectiveness of artificial intelligence (AI) varies based on course content. Conclusion: While ChatGPT-4 shows promise as an educational tool, its role should be supplementary, with strategic integration into medical education to leverage its strengths and address limitations. Further research is needed to explore AI's impact on medical education and student performance across educational levels and courses.
ABSTRACT
Background: The deployment of OpenAI's ChatGPT-3.5 and its subsequent versions, ChatGPT-4 and ChatGPT-4 With Vision (4V; also known as "GPT-4 Turbo With Vision"), has notably influenced the medical field. Having demonstrated remarkable performance in medical examinations globally, these models show potential for educational applications. However, their effectiveness in non-English contexts, particularly in Chile's medical licensing examinations-a critical step for medical practitioners in Chile-is less explored. This gap highlights the need to evaluate ChatGPT's adaptability to diverse linguistic and cultural contexts. Objective: This study aims to evaluate the performance of ChatGPT versions 3.5, 4, and 4V in the EUNACOM (Examen Único Nacional de Conocimientos de Medicina), a major medical examination in Chile. Methods: Three official practice drills (540 questions) from the University of Chile, mirroring the EUNACOM's structure and difficulty, were used to test ChatGPT versions 3.5, 4, and 4V. The 3 ChatGPT versions were provided 3 attempts for each drill. Responses to questions during each attempt were systematically categorized and analyzed to assess their accuracy rate. Results: All versions of ChatGPT passed the EUNACOM drills. Specifically, versions 4 and 4V outperformed version 3.5, achieving average accuracy rates of 79.32% and 78.83%, respectively, compared to 57.53% for version 3.5 (P<.001). Version 4V, however, did not outperform version 4 (P=.73), despite the additional visual capabilities. We also evaluated ChatGPT's performance in different medical areas of the EUNACOM and found that versions 4 and 4V consistently outperformed version 3.5. Across the different medical areas, version 3.5 displayed the highest accuracy in psychiatry (69.84%), while versions 4 and 4V achieved the highest accuracy in surgery (90.00% and 86.11%, respectively). Versions 3.5 and 4 had the lowest performance in internal medicine (52.74% and 75.62%, respectively), while version 4V had the lowest performance in public health (74.07%). Conclusions: This study reveals ChatGPT's ability to pass the EUNACOM, with distinct proficiencies across versions 3.5, 4, and 4V. Notably, advancements in artificial intelligence (AI) have not significantly led to enhancements in performance on image-based questions. The variations in proficiency across medical fields suggest the need for more nuanced AI training. Additionally, the study underscores the importance of exploring innovative approaches to using AI to augment human cognition and enhance the learning process. Such advancements have the potential to significantly influence medical education, fostering not only knowledge acquisition but also the development of critical thinking and problem-solving skills among health care professionals.
Subject(s)
Educational Measurement , Licensure, Medical , Female , Humans , Male , Chile , Clinical Competence/standards , Educational Measurement/methods , Educational Measurement/standardsABSTRACT
INTRODUCTION: Artificial intelligence (AI) shows immense potential in medicine and Chat generative pretrained transformer (ChatGPT) has been used for different purposes in the field. However, it may not match the complexity and nuance of certain medical scenarios. This study evaluates the accuracy of ChatGPT 3.5 and 4 in providing recommendations regarding the management of postprostatectomy urinary incontinence (PPUI), considering The Incontinence After Prostate Treatment: AUA/SUFU Guideline as the best practice benchmark. MATERIALS AND METHODS: A set of questions based on the AUA/SUFU Guideline was prepared. Queries included 10 conceptual questions and 10 case-based questions. All questions were open and entered into the ChatGPT with a recommendation to limit the answer to 200 words, for greater objectivity. Responses were graded as correct (1 point); partially correct (0.5 point), or incorrect (0 point). Performances of versions 3.5 and 4 of ChatGPT were analyzed overall and separately for the conceptual and the case-based questions. RESULTS: ChatGPT 3.5 scored 11.5 out of 20 points (57.5% accuracy), while ChatGPT 4 scored 18 (90.0%; p = 0.031). In the conceptual questions, ChatGPT 3.5 provided accurate answers to six questions along with one partially correct response and three incorrect answers, with a final score of 6.5. In contrast, ChatGPT 4 provided correct answers to eight questions and partially correct answers to two questions, scoring 9.0. In the case-based questions, ChatGPT 3.5 scored 5.0, while ChatGPT 4 scored 9.0. The domains where ChatGPT performed worst were evaluation, treatment options, surgical complications, and special situations. CONCLUSION: ChatGPT 4 demonstrated superior performance compared to ChatGPT 3.5 in providing recommendations for the management of PPUI, using the AUA/SUFU Guideline as a benchmark. Continuous monitoring is essential for evaluating the development and precision of AI-generated medical information.
Subject(s)
Artificial Intelligence , Urinary Incontinence , Male , Humans , Social Behavior , Pelvis , Prostatectomy , Repressor ProteinsABSTRACT
Introduction: Over the past few months, ChatGPT has raised a lot of interest given its ability to perform complex tasks through natural language and conversation. However, its use in clinical decision-making is limited and its application in the field of anesthesiology is unknown. Objective: To assess ChatGPT's basic and clinical reasoning and its learning ability in a performance test on general and specific anesthesia topics. Methods: A three-phase assessment was conducted. Basic knowledge of anesthesia was assessed in the first phase, followed by a review of difficult airway management and, finally, measurement of decision-making ability in ten clinical cases. The second and the third phases were conducted before and after feeding ChatGPT with the 2022 guidelines of the American Society of Anesthesiologists on difficult airway management. Results: On average, ChatGPT succeded 65% of the time in the first phase and 48% of the time in the second phase. Agreement in clinical cases was 20%, with 90% relevance and 10% error rate. After learning, ChatGPT improved in the second phase, and was correct 59% of the time, with agreement in clinical cases also increasing to 40%. Conclusions: ChatGPT showed acceptable accuracy in the basic knowledge test, high relevance in the management of specific difficult airway clinical cases, and the ability to improve after learning.
Introducción: En los últimos meses, ChatGPT ha suscitado un gran interés debido a su capacidad para realizar tareas complejas a través del lenguaje natural y la conversación. Sin embargo, su uso en la toma de decisiones clínicas es limitado y su aplicación en el campo de anestesiología es desconocido. Objetivo: Evaluar el razonamiento básico, clínico y la capacidad de aprendizaje de ChatGPT en una prueba de rendimiento sobre temas generales y específicos de anestesiología. Métodos: Se llevó a cabo una evaluación dividida en tres fases. Se valoraron conocimientos básicos de anestesiología en la primera fase, seguida de una revisión del manejo de vía aérea difícil y, finalmente, se midió la toma de decisiones en diez casos clínicos. La segunda y tercera fases se realizaron antes y después de alimentar a ChatGPT con las guías de la Sociedad Americana de Anestesiólogos del manejo de la vía aérea difícil del 2022. Resultados: ChatGPT obtuvo una tasa de acierto promedio del 65 % en la primera fase y del 48 % en la segunda fase. En los casos clínicos, obtuvo una concordancia del 20 %, una relevancia del 90 % y una tasa de error del 10 %. Posterior al aprendizaje, ChatGPT mejoró su tasa de acierto al 59 % en la segunda fase y aumentó la concordancia al 40 % en los casos clínicos. Conclusiones: ChatGPT demostró una precisión aceptable en la prueba de conocimientos básicos, una alta relevancia en el manejo de los casos clínicos específicos de vía aérea difícil y la capacidad de mejoría secundaria a un aprendizaje.
ABSTRACT
The rapid advancement of Artificial Intelligence (AI) has taken the world by "surprise" due to the lack of regulation over this technological innovation which, while promising application opportunities in different fields of knowledge, including education, simultaneously generates concern, rejection and even fear. In the field of Health Sciences Education, clinical simulation has transformed educational practice; however, its formal insertion is still heterogeneous, and we are now facing a new technological revolution where AI has the potential to transform the way we conceive its application.
El rápido avance de la inteligencia artificial (IA) ha tomado al mundo por "sorpresa" debido a la falta de regulación sobre esta innovación tecnológica, que si bien promete oportunidades de aplicación en diferentes campos del conocimiento, incluido el educativo, también genera preocupación e incluso miedo y rechazo. En el campo de la Educación en Ciencias de la Salud la Simulación Clínica ha transformado la práctica educativa; sin embargo, aún es heterogénea su inserción formal, y ahora nos enfrentamos a una nueva revolución tecnológica, en la que las IA tienen el potencial de transformar la manera en que concebimos su aplicación.
ABSTRACT
PURPOSE: To evaluate the influence of ChatGPT on academic tasks performed by undergraduate dental students. METHOD: Fifty-five participants completed scientific writing assignments. First, ChatGPT was utilized; subsequently, a conventional method involving the search of scientific articles was employed. Each task was preceded by a 30-min training session. The assignments were reviewed by professors, and an anonymous questionnaire was administered to the students regarding the usefulness of ChatGPT. Data were analyzed by Mann-Whitney U-test. RESULTS: Final scores and scores for the criteria of utilization of evidence, evaluation of arguments, and generation of alternatives achieved higher values through the traditional method than with ChatGPT (p = 0.019, 0.042, 0.017, and <0.001, respectively). No differences were found between the two methods for the remaining criteria (p > 0.05). A total of 64.29% of the students found ChatGPT useful, 33.33% found it very useful, and 3.38% not very useful. Regarding its application in further academic activities, 54.76% considered it useful, 40.48% found it very useful, and 4.76% not very useful. A total of 61.90% of the participants indicated that ChatGPT contributed to over 25% of their productivity, while 11.9% perceived it contributed to less than 15%. Concerning the relevance of having known ChatGPT for academic tasks, 50% found it opportune, 45.24% found it very opportune, 2.38% were unsure, and the same percentage thought it is inopportune. All students provided positive feedback. CONCLUSION: Dental students highly valued the experience of using ChatGPT for academic tasks. Nonetheless, the traditional method of searching for scientific articles yield higher scores.
Subject(s)
Artificial Intelligence , Education, Dental , Students, Dental , Education, Dental/methods , Humans , Students, Dental/psychology , Writing , Surveys and Questionnaires , Male , FemaleABSTRACT
Introducción: este artículo se enfoca en la experiencia de un Instituto Universitario de gestión privada de la ciudad de Buenos Aires al abordar la inteligencia artificial (IA) en educación. El objetivo es compartir líneas de acción y resultados para promover la reflexión y apropiación crítica de esta tecnología en la comunidad educativa. Desarrollo: se presenta un relato de experiencia referido al diseño de cuatro líneas de acción para abordar el uso de aplicaciones de IA generativa (IAGen) en la educación superior en ciencias de la salud: elaboración de un estado de la cuestión; indagación de conocimientos en la comunidad educativa; capacitaciones para actores institucionales clave; producción de materiales guía. Resultados: se observa un creciente interés en la IAGen en la comunidad educativa. Se registran experiencias positivas con aplicaciones de IAGen, encontrándolas intuitivas y útiles para la investigación y la enseñanza. Sin embargo, se destacan desafíos, como la falta de conocimiento sobre cómo usar estas herramientas de manera eficaz. La formación ha sido clave para abordar estos desafíos y se ha llevado a cabo para integrantes del equipo del Departamento de Educación, autoridades y docentes. Conclusión: la IAGen está atravesando integralmente la educación superior en el campo de las ciencias de la salud. Las instituciones universitarias tienen la responsabilidad de promover el desarrollo de competencias digitales y criterios de uso responsables. A medida que la IAGen continúa desarrollándose, es esencial abordar nuevos desafíos y regulaciones, promoviendo la reflexión y la formación continua en la comunidad educativa. El trabajo interdisciplinario y la colaboración entre diversas áreas de gestión institucional son fundamentales para abordar estos cambios tecnológicos en la educación. (AU)
Introduction:This article focuses on the experience of a privately managed University Institute in Buenos Aires city when addressing artificial intelligence (AI) in education. The aim is to share strategies and outcomes to encourage reflection and critical engagement with this technology within the educational community. Development: We present a narrative of experience concerning the design of four lines of action to address the uses of generative AI applications (GenAI) in higher education in health sciences: drafting a state-of-the-art report; probing knowledge within the educational community; training sessions for core institutional actors; production of guide materials. Results: There is a growing interest in GenAI within the educational community. We register positive experiences with IAGen applications, finding them intuitive and useful for research and teaching. However, we highlight challenges, such as gaps in knowledge on how to use these tools most effectively. Training has been crucial in addressing these challenges and has been conducted for members of the Education Department team, authorities, and teachers. Conclusion: GenAI is fundamentally permeating higher education in the field of health sciences. University institutions are responsible for promoting the development of digital competencies and standards of responsible use. As GenAI continues to evolve, addressing new challenges and regulations is essential, encouraging reflection and ongoing training within the educational community. Interdisciplinary work and collaboration among various areas of institutional management are critical to address these technological changes in education. (AU)
Subject(s)
Humans , Universities/ethics , Computer Literacy , Artificial Intelligence/ethics , Health Sciences/education , Artificial Intelligence/trends , Education/methods , Faculty/educationABSTRACT
This prospective exploratory study conducted from January 2023 through May 2023 evaluated the ability of ChatGPT to answer questions from Brazilian radiology board examinations, exploring how different prompt strategies can influence performance using GPT-3.5 and GPT-4. Three multiple-choice board examinations that did not include image-based questions were evaluated: (a) radiology and diagnostic imaging, (b) mammography, and (c) neuroradiology. Five different styles of zero-shot prompting were tested: (a) raw question, (b) brief instruction, (c) long instruction, (d) chain-of-thought, and (e) question-specific automatic prompt generation (QAPG). The QAPG and brief instruction prompt strategies performed best for all examinations (P < .05), obtaining passing scores (≥60%) on the radiology and diagnostic imaging examination when testing both versions of ChatGPT. The QAPG style achieved a score of 60% for the mammography examination using GPT-3.5 and 76% using GPT-4. GPT-4 achieved a score up to 65% in the neuroradiology examination. The long instruction style consistently underperformed, implying that excessive detail might harm performance. GPT-4's scores were less sensitive to prompt style changes. The QAPG prompt style showed a high volume of the "A" option but no statistical difference, suggesting bias was found. GPT-4 passed all three radiology board examinations, and GPT-3.5 passed two of three examinations when using an optimal prompt style. Keywords: ChatGPT, Artificial Intelligence, Board Examinations, Radiology and Diagnostic Imaging, Mammography, Neuroradiology © RSNA, 2023 See also the commentary by Trivedi and Gichoya in this issue.
Subject(s)
Artificial Intelligence , Radiology , Brazil , Prospective Studies , Radiography , MammographyABSTRACT
Introduction: Artificial intelligence has presented exponential growth in medicine. The ChatGPT language model has been highlighted as a possible source of patient information. This study evaluates the reliability and readability of ChatGPT-generated patient information on chronic diseases in Spanish. Methods: Questions frequently asked by patients on the internet about diabetes mellitus, heart failure, rheumatoid arthritis (RA), chronic kidney disease (CKD), and systemic lupus erythematosus (SLE) were submitted to ChatGPT. Reliability was assessed by rating responses as (1) comprehensive, (2) correct but inadequate, (3) some correct and some incorrect, (4) completely incorrect, and divided between "good" (1 and 2) and "bad" (3 and 4). Readability was evaluated with the adapted Flesch and Szigriszt formulas. Results: And 71.67% of the answers were "good," with none qualified as "completely incorrect." Better reliability was observed in questions on diabetes and RA versus heart failure (p = 0.02). In readability, responses were "moderately difficult" (54.73, interquartile range (IQR) 51.59-58.58), with better results for CKD (median 56.1, IQR 53.5-59.1) and RA (56.4, IQR 53.7-60.7), than for heart failure responses (median 50.6, IQR 46.3-53.8). Conclusion: Our study suggests that the ChatGPT tool can be a reliable source of information in spanish for patients with chronic diseases with different reliability for some of them, however, it needs to improve the readability of its answers to be recommended as a useful tool for patients.
ABSTRACT
BACKGROUND: Review articles play a critical role in informing medical decisions and identifying avenues for future research. With the introduction of artificial intelligence (AI), there has been a growing interest in the potential of this technology to transform the synthesis of medical literature. Open AI's Generative Pre-trained Transformer (GPT-4) (Open AI Inc, San Francisco, CA) tool provides access to advanced AI that is able to quickly produce medical literature following only simple prompts. The accuracy of the generated articles requires review, especially in subspecialty fields like Allergy/Immunology. OBJECTIVE: To critically appraise AI-synthesized allergy-focused minireviews. METHODS: We tasked the GPT-4 Chatbot with generating 2 1,000-word reviews on the topics of hereditary angioedema and eosinophilic esophagitis. Authors critically appraised these articles using the Joanna Briggs Institute (JBI) tool for text and opinion and additionally evaluated domains of interest such as language, reference quality, and accuracy of the content. RESULTS: The language of the AI-generated minireviews was carefully articulated and logically focused on the topic of interest; however, reviewers of the AI-generated articles indicated that the AI-generated content lacked depth, did not appear to be the result of an analytical process, missed critical information, and contained inaccurate information. Despite being provided instruction to utilize scientific references, the AI chatbot relied mainly on freely available resources, and the AI chatbot fabricated references. CONCLUSIONS: The AI holds the potential to change the landscape of synthesizing medical literature; however, apparent inaccurate and fabricated information calls for rigorous evaluation and validation of AI tools in generating medical literature, especially on subjects associated with limited resources.
Subject(s)
Angioedemas, Hereditary , Eosinophilic Esophagitis , Humans , Artificial Intelligence , Software , LanguageABSTRACT
PURPOSE: This study explores the potential of the Chat-Generative Pre-Trained Transformer (Chat-GPT), a Large Language Model (LLM), in assisting healthcare professionals in the diagnosis of obstructive sleep apnea (OSA). It aims to assess the agreement between Chat-GPT's responses and those of expert otolaryngologists, shedding light on the role of AI-generated content in medical decision-making. METHODS: A prospective, cross-sectional study was conducted, involving 350 otolaryngologists from 25 countries who responded to a specialized OSA survey. Chat-GPT was tasked with providing answers to the same survey questions. Responses were assessed by both super-experts and statistically analyzed for agreement. RESULTS: The study revealed that Chat-GPT and expert responses shared a common answer in over 75% of cases for individual questions. However, the overall consensus was achieved in only four questions. Super-expert assessments showed a moderate agreement level, with Chat-GPT scoring slightly lower than experts. Statistically, Chat-GPT's responses differed significantly from experts' opinions (p = 0.0009). Sub-analysis revealed areas of improvement for Chat-GPT, particularly in questions where super-experts rated its responses lower than expert consensus. CONCLUSIONS: Chat-GPT demonstrates potential as a valuable resource for OSA diagnosis, especially where access to specialists is limited. The study emphasizes the importance of AI-human collaboration, with Chat-GPT serving as a complementary tool rather than a replacement for medical professionals. This research contributes to the discourse in otolaryngology and encourages further exploration of AI-driven healthcare applications. While Chat-GPT exhibits a commendable level of consensus with expert responses, ongoing refinements in AI-based healthcare tools hold significant promise for the future of medicine, addressing the underdiagnosis and undertreatment of OSA and improving patient outcomes.
Subject(s)
Clinical Decision-Making , Sleep Apnea, Obstructive , Humans , Cross-Sectional Studies , Prospective Studies , Alanine Transaminase , Sleep Apnea, Obstructive/diagnosis , Sleep Apnea, Obstructive/therapyABSTRACT
This statement revises our earlier "WAME Recommendations on ChatGPT and Chatbots in Relation to Scholarly Publications" (January 20, 2023). The revision reflects the proliferation of chatbots and their expanding use in scholarly publishing over the last few months, as well as emerging concerns regarding lack of authenticity of content when using chatbots. These recommendations are intended to inform editors and help them develop policies for the use of chatbots in papers published in their journals. They aim to help authors and reviewers understand how best to attribute the use of chatbots in their work and to address the need for all journal editors to have access to manuscript screening tools. In this rapidly evolving field, we will continue to modify these recommendations as the software and its applications develop.
Esta declaración revisa las anteriores "Recomendaciones de WAME sobre ChatGPT y Chatbots en Relation to Scholarly Publications" (20 de enero de 2023). La revisión refleja la proliferación de chatbots y su creciente uso en las publicaciones académicas en los últimos meses, así como la preocupación por la falta de autenticidad de los contenidos cuando se utilizan chatbots. Estas recomendaciones pretenden informar a los editores y ayudarles a desarrollar políticas para el uso de chatbots en los artículos sometidos en sus revistas. Su objetivo es ayudar a autores y revisores a entender cuál es la mejor manera de atribuir el uso de chatbots en su trabajo y a la necesidad de que todos los editores de revistas tengan acceso a herramientas de selección de manuscritos. En este campo en rápida evolución, seguiremos modificando estas recomendaciones a medida que se desarrollen el software y sus aplicaciones.
Subject(s)
Artificial Intelligence , Publishing , HumansABSTRACT
ChatGPT is a virtual assistant with artificial intelligence (AI) that uses natural language to communicate, i.e., it holds conversations as those that would take place with another human being. It can be applied at all educational levels, including medical education, where it can impact medical training, research, the writing of scientific articles, clinical care, and personalized medicine. It can modify interactions between physicians and patients and thus improve the standards of healthcare quality and safety, for example, by suggesting preventive measures in a patient that sometimes are not considered by the physician for multiple reasons. ChatGPT potential uses in medical education, as a tool to support the writing of scientific articles, as a medical care assistant for patients and doctors for a more personalized medical approach, are some of the applications discussed in this article. Ethical aspects, originality, inappropriate or incorrect content, incorrect citations, cybersecurity, hallucinations, and plagiarism are some examples of situations to be considered when using AI-based tools in medicine.
ChatGPT es un asistente virtual con inteligencia artificial que utiliza lenguaje natural para comunicarse, es decir, mantiene conversaciones como las que se tendrían con otro humano. Puede aplicarse en educación a todos los niveles, que incluye la educación médica, en donde puede impactar en la formación, la investigación, la escritura de artículos científicos, la atención clínica y la medicina personalizada. Puede modificar la interacción entre médicos y pacientes para mejorar los estándares de calidad de la atención médica y la seguridad, por ejemplo, al sugerir medidas preventivas en un paciente que en ocasiones no son consideradas por el médico por múltiples causas. Los usos potenciales del ChatGPT en la educación médica, como una herramienta de ayuda en la redacción de artículos científicos, un asistente en la atención para pacientes y médicos para una práctica más personalizada, son algunas de las aplicaciones que se analizan en este artículo. Los aspectos éticos, originalidad, contenido inapropiado o incorrecto, citas incorrectas, ciberseguridad, alucinaciones y plagio son ejemplos de las situaciones a tomar en cuenta al usar las herramientas basadas en inteligencia artificial en medicina.