Search | VHL Regional Portal

High Rates of Fabricated and Inaccurate References in ChatGPT-Generated Medical Content.

Bhattacharyya, Mehul; Miller, Valerie M; Bhattacharyya, Debjani; Miller, Larry E.

Cureus ; 15(5): e39238, 2023 May.

Article in English | MEDLINE | ID: mdl-37337480

ABSTRACT

Background The availability of large language models such as Chat Generative Pre-trained Transformer (ChatGPT, OpenAI) has enabled individuals from diverse backgrounds to access medical information. However, concerns exist about the accuracy of ChatGPT responses and the references used to generate medical content. Methods This observational study investigated the authenticity and accuracy of references in medical articles generated by ChatGPT. ChatGPT-3.5 generated 30 short medical papers, each with at least three references, based on standardized prompts encompassing various topics and therapeutic areas. Reference authenticity and accuracy were verified by searching Medline, Google Scholar, and the Directory of Open Access Journals. The authenticity and accuracy of individual ChatGPT-generated reference elements were also determined. Results Overall, 115 references were generated by ChatGPT, with a mean of 3.8±1.1 per paper. Among these references, 47% were fabricated, 46% were authentic but inaccurate, and only 7% were authentic and accurate. The likelihood of fabricated references significantly differed based on prompt variations; yet the frequency of authentic and accurate references remained low in all cases. Among the seven components evaluated for each reference, an incorrect PMID number was most common, listed in 93% of papers. Incorrect volume (64%), page numbers (64%), and year of publication (60%) were the next most frequent errors. The mean number of inaccurate components was 4.3±2.8 out of seven per reference. Conclusions The findings of this study emphasize the need for caution when seeking medical information on ChatGPT since most of the references provided were found to be fabricated or inaccurate. Individuals are advised to verify medical information from reliable sources and avoid relying solely on artificial intelligence-generated content.

Recent Trend in Artificial Intelligence-Assisted Biomedical Publishing: A Quantitative Bibliometric Analysis.

Miller, Larry E; Bhattacharyya, Debjani; Miller, Valerie M; Bhattacharyya, Mehul.

Cureus ; 15(5): e39224, 2023 May.

Article in English | MEDLINE | ID: mdl-37337487

ABSTRACT

The rapid advancements in artificial intelligence (AI) technology in recent years have led to its integration into biomedical publishing. However, the extent to which AI has contributed to developing biomedical literature is unclear. This study aimed to identify trends in AI-generated content within peer-reviewed biomedical literature. We first tested the sensitivity and specificity of commercially available AI-detection software (Originality.AI, Collingwood, Ontario, Canada). Next, we conducted a MEDLINE (Medical Literature Analysis and Retrieval System Online) search to identify randomized controlled trials with available abstracts indexed between January 2020 and March 2023. We randomly selected 30 abstracts per quarter during this period and pasted the abstracts into the AI detection software to determine the probability of AI-generated content. The software yielded 100% sensitivity, 95% specificity, and excellent overall discriminatory ability with an area under the receiving operating curve of 97.6%. Among the 390 MEDLINE-indexed abstracts included in the analysis, the prevalence with a high probability (≥ 90%) of AI-generated text increased during the study period from 21.7% to 36.7% (p=0.01) based on a chi-square test for trend. The increasing prevalence of AI-generated text during the study period was also observed in various sensitivity analyses using AI probability thresholds ranging from 50% to 99% (all p≤0.01). The results of this study suggest that the prevalence of AI-assisted publishing in peer-reviewed journals has been increasing in recent years, even before the widespread adoption of ChatGPT (OpenAI, San Francisco, California, United States) and similar tools. The extent to which natural writing characteristics of the authors, utilization of common AI-powered applications, and introduction of AI elements during the post-acceptance publication phase influence AI detection scores warrants further study.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL