Pesquisa | Portal Regional da BVS (teste)

1.

Clinical and Surgical Applications of Large Language Models: A Systematic Review.

Pressman, Sophia M; Borna, Sahar; Gomez-Cabello, Cesar A; Haider, Syed Ali; Haider, Clifton R; Forte, Antonio Jorge.

J Clin Med ; 13(11)2024 May 22.

Artigo em Inglês | MEDLINE | ID: mdl-38892752

RESUMO

Background: Large language models (LLMs) represent a recent advancement in artificial intelligence with medical applications across various healthcare domains. The objective of this review is to highlight how LLMs can be utilized by clinicians and surgeons in their everyday practice. Methods: A systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Six databases were searched to identify relevant articles. Eligibility criteria emphasized articles focused primarily on clinical and surgical applications of LLMs. Results: The literature search yielded 333 results, with 34 meeting eligibility criteria. All articles were from 2023. There were 14 original research articles, four letters, one interview, and 15 review articles. These articles covered a wide variety of medical specialties, including various surgical subspecialties. Conclusions: LLMs have the potential to enhance healthcare delivery. In clinical settings, LLMs can assist in diagnosis, treatment guidance, patient triage, physician knowledge augmentation, and administrative tasks. In surgical settings, LLMs can assist surgeons with documentation, surgical planning, and intraoperative guidance. However, addressing their limitations and concerns, particularly those related to accuracy and biases, is crucial. LLMs should be viewed as tools to complement, not replace, the expertise of healthcare professionals.

2.

Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini.

Gomez-Cabello, Cesar A; Borna, Sahar; Pressman, Sophia M; Haider, Syed Ali; Forte, Antonio J.

Medicina (Kaunas) ; 60(6)2024 Jun 08.

Artigo em Inglês | MEDLINE | ID: mdl-38929573

RESUMO

Background and Objectives: Large language models (LLMs) are emerging as valuable tools in plastic surgery, potentially reducing surgeons' cognitive loads and improving patients' outcomes. This study aimed to assess and compare the current state of the two most common and readily available LLMs, Open AI's ChatGPT-4 and Google's Gemini Pro (1.0 Pro), in providing intraoperative decision support in plastic and reconstructive surgery procedures. Materials and Methods: We presented each LLM with 32 independent intraoperative scenarios spanning 5 procedures. We utilized a 5-point and a 3-point Likert scale for medical accuracy and relevance, respectively. We determined the readability of the responses using the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) score. Additionally, we measured the models' response time. We compared the performance using the Mann-Whitney U test and Student's t-test. Results: ChatGPT-4 significantly outperformed Gemini in providing accurate (3.59 ± 0.84 vs. 3.13 ± 0.83, p-value = 0.022) and relevant (2.28 ± 0.77 vs. 1.88 ± 0.83, p-value = 0.032) responses. Alternatively, Gemini provided more concise and readable responses, with an average FKGL (12.80 ± 1.56) significantly lower than ChatGPT-4's (15.00 ± 1.89) (p < 0.0001). However, there was no difference in the FRE scores (p = 0.174). Moreover, Gemini's average response time was significantly faster (8.15 ± 1.42 s) than ChatGPT'-4's (13.70 ± 2.87 s) (p < 0.0001). Conclusions: Although ChatGPT-4 provided more accurate and relevant responses, both models demonstrated potential as intraoperative tools. Nevertheless, their performance inconsistency across the different procedures underscores the need for further training and optimization to ensure their reliability as intraoperative decision-support tools.

Assuntos

Cirurgia Plástica , Humanos , Cirurgia Plástica/métodos , Idioma , Procedimentos de Cirurgia Plástica/métodos , Sistemas de Apoio a Decisões Clínicas

3.

Comparative Analysis of Large Language Models in Emergency Plastic Surgery Decision-Making: The Role of Physical Exam Data.

Borna, Sahar; Gomez-Cabello, Cesar A; Pressman, Sophia M; Haider, Syed Ali; Forte, Antonio Jorge.

J Pers Med ; 14(6)2024 Jun 08.

Artigo em Inglês | MEDLINE | ID: mdl-38929832

RESUMO

In the U.S., diagnostic errors are common across various healthcare settings due to factors like complex procedures and multiple healthcare providers, often exacerbated by inadequate initial evaluations. This study explores the role of Large Language Models (LLMs), specifically OpenAI's ChatGPT-4 and Google Gemini, in improving emergency decision-making in plastic and reconstructive surgery by evaluating their effectiveness both with and without physical examination data. Thirty medical vignettes covering emergency conditions such as fractures and nerve injuries were used to assess the diagnostic and management responses of the models. These responses were evaluated by medical professionals against established clinical guidelines, using statistical analyses including the Wilcoxon rank-sum test. Results showed that ChatGPT-4 consistently outperformed Gemini in both diagnosis and management, irrespective of the presence of physical examination data, though no significant differences were noted within each model's performance across different data scenarios. Conclusively, while ChatGPT-4 demonstrates superior accuracy and management capabilities, the addition of physical examination data, though enhancing response detail, did not significantly surpass traditional medical resources. This underscores the utility of AI in supporting clinical decision-making, particularly in scenarios with limited data, suggesting its role as a complement to, rather than a replacement for, comprehensive clinical evaluation and expertise.

4.

Artificial Intelligence in Postoperative Care: Assessing Large Language Models for Patient Recommendations in Plastic Surgery.

Gomez-Cabello, Cesar A; Borna, Sahar; Pressman, Sophia M; Haider, Syed Ali; Sehgal, Ajai; Leibovich, Bradley C; Forte, Antonio J.

Healthcare (Basel) ; 12(11)2024 May 24.

Artigo em Inglês | MEDLINE | ID: mdl-38891158

RESUMO

Since their release, the medical community has been actively exploring large language models' (LLMs) capabilities, which show promise in providing accurate medical knowledge. One potential application is as a patient resource. This study analyzes and compares the ability of the currently available LLMs, ChatGPT-3.5, GPT-4, and Gemini, to provide postoperative care recommendations to plastic surgery patients. We presented each model with 32 questions addressing common patient concerns after surgical cosmetic procedures and evaluated the medical accuracy, readability, understandability, and actionability of the models' responses. The three LLMs provided equally accurate information, with GPT-3.5 averaging the highest on the Likert scale (LS) (4.18 ± 0.93) (p = 0.849), while Gemini provided significantly more readable (p = 0.001) and understandable responses (p = 0.014; p = 0.001). There was no difference in the actionability of the models' responses (p = 0.830). Although LLMs have shown their potential as adjunctive tools in postoperative patient care, further refinement and research are imperative to enable their evolution into comprehensive standalone resources.

5.

Comparative Analysis of Artificial Intelligence Virtual Assistant and Large Language Models in Post-Operative Care.

Borna, Sahar; Gomez-Cabello, Cesar A; Pressman, Sophia M; Haider, Syed Ali; Sehgal, Ajai; Leibovich, Bradley C; Cole, Dave; Forte, Antonio Jorge.

Eur J Investig Health Psychol Educ ; 14(5): 1413-1424, 2024 May 15.

Artigo em Inglês | MEDLINE | ID: mdl-38785591

RESUMO

In postoperative care, patient education and follow-up are pivotal for enhancing the quality of care and satisfaction. Artificial intelligence virtual assistants (AIVA) and large language models (LLMs) like Google BARD and ChatGPT-4 offer avenues for addressing patient queries using natural language processing (NLP) techniques. However, the accuracy and appropriateness of the information vary across these platforms, necessitating a comparative study to evaluate their efficacy in this domain. We conducted a study comparing AIVA (using Google Dialogflow) with ChatGPT-4 and Google BARD, assessing the accuracy, knowledge gap, and response appropriateness. AIVA demonstrated superior performance, with significantly higher accuracy (mean: 0.9) and lower knowledge gap (mean: 0.1) compared to BARD and ChatGPT-4. Additionally, AIVA's responses received higher Likert scores for appropriateness. Our findings suggest that specialized AI tools like AIVA are more effective in delivering precise and contextually relevant information for postoperative care compared to general-purpose LLMs. While ChatGPT-4 shows promise, its performance varies, particularly in verbal interactions. This underscores the importance of tailored AI solutions in healthcare, where accuracy and clarity are paramount. Our study highlights the necessity for further research and the development of customized AI solutions to address specific medical contexts and improve patient outcomes.

6.

Artificial Intelligence Support for Informal Patient Caregivers: A Systematic Review.

Borna, Sahar; Maniaci, Michael J; Haider, Clifton R; Gomez-Cabello, Cesar A; Pressman, Sophia M; Haider, Syed Ali; Demaerschalk, Bart M; Cowart, Jennifer B; Forte, Antonio Jorge.

Bioengineering (Basel) ; 11(5)2024 May 12.

Artigo em Inglês | MEDLINE | ID: mdl-38790350

RESUMO

This study aims to explore how artificial intelligence can help ease the burden on caregivers, filling a gap in current research and healthcare practices due to the growing challenge of an aging population and increased reliance on informal caregivers. We conducted a search with Google Scholar, PubMed, Scopus, IEEE Xplore, and Web of Science, focusing on AI and caregiving. Our inclusion criteria were studies where AI supports informal caregivers, excluding those solely for data collection. Adhering to PRISMA 2020 guidelines, we eliminated duplicates and screened for relevance. From 947 initially identified articles, 10 met our criteria, focusing on AI's role in aiding informal caregivers. These studies, conducted between 2012 and 2023, were globally distributed, with 80% employing machine learning. Validation methods varied, with Hold-Out being the most frequent. Metrics across studies revealed accuracies ranging from 71.60% to 99.33%. Specific methods, like SCUT in conjunction with NNs and LibSVM, showcased accuracy between 93.42% and 95.36% as well as F-measures spanning 93.30% to 95.41%. AUC values indicated model performance variability, ranging from 0.50 to 0.85 in select models. Our review highlights AI's role in aiding informal caregivers, showing promising results despite different approaches. AI tools provide smart, adaptive support, improving caregivers' effectiveness and well-being.

7.

AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries.

Pressman, Sophia M; Borna, Sahar; Gomez-Cabello, Cesar A; Haider, Syed Ali; Forte, Antonio Jorge.

J Clin Med ; 13(10)2024 May 11.

Artigo em Inglês | MEDLINE | ID: mdl-38792374

RESUMO

Background: OpenAI's ChatGPT (San Francisco, CA, USA) and Google's Gemini (Mountain View, CA, USA) are two large language models that show promise in improving and expediting medical decision making in hand surgery. Evaluating the applications of these models within the field of hand surgery is warranted. This study aims to evaluate ChatGPT-4 and Gemini in classifying hand injuries and recommending treatment. Methods: Gemini and ChatGPT were given 68 fictionalized clinical vignettes of hand injuries twice. The models were asked to use a specific classification system and recommend surgical or nonsurgical treatment. Classifications were scored based on correctness. Results were analyzed using descriptive statistics, a paired two-tailed t-test, and sensitivity testing. Results: Gemini, correctly classifying 70.6% hand injuries, demonstrated superior classification ability over ChatGPT (mean score 1.46 vs. 0.87, p-value < 0.001). For management, ChatGPT demonstrated higher sensitivity in recommending surgical intervention compared to Gemini (98.0% vs. 88.8%), but lower specificity (68.4% vs. 94.7%). When compared to ChatGPT, Gemini demonstrated greater response replicability. Conclusions: Large language models like ChatGPT and Gemini show promise in assisting medical decision making, particularly in hand surgery, with Gemini generally outperforming ChatGPT. These findings emphasize the importance of considering the strengths and limitations of different models when integrating them into clinical practice.

8.

AI and Ethics: A Systematic Review of the Ethical Considerations of Large Language Model Use in Surgery Research.

Pressman, Sophia M; Borna, Sahar; Gomez-Cabello, Cesar A; Haider, Syed A; Haider, Clifton; Forte, Antonio J.

Healthcare (Basel) ; 12(8)2024 Apr 13.

Artigo em Inglês | MEDLINE | ID: mdl-38667587

RESUMO

INTRODUCTION: As large language models receive greater attention in medical research, the investigation of ethical considerations is warranted. This review aims to explore surgery literature to identify ethical concerns surrounding these artificial intelligence models and evaluate how autonomy, beneficence, nonmaleficence, and justice are represented within these ethical discussions to provide insights in order to guide further research and practice. METHODS: A systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Five electronic databases were searched in October 2023. Eligible studies included surgery-related articles that focused on large language models and contained adequate ethical discussion. Study details, including specialty and ethical concerns, were collected. RESULTS: The literature search yielded 1179 articles, with 53 meeting the inclusion criteria. Plastic surgery, orthopedic surgery, and neurosurgery were the most represented surgical specialties. Autonomy was the most explicitly cited ethical principle. The most frequently discussed ethical concern was accuracy (n = 45, 84.9%), followed by bias, patient confidentiality, and responsibility. CONCLUSION: The ethical implications of using large language models in surgery are complex and evolving. The integration of these models into surgery necessitates continuous ethical discourse to ensure responsible and ethical use, balancing technological advancement with human dignity and safety.

9.

Artificial-Intelligence-Based Clinical Decision Support Systems in Primary Care: A Scoping Review of Current Clinical Implementations.

Gomez-Cabello, Cesar A; Borna, Sahar; Pressman, Sophia; Haider, Syed Ali; Haider, Clifton R; Forte, Antonio J.

Eur J Investig Health Psychol Educ ; 14(3): 685-698, 2024 Mar 13.

Artigo em Inglês | MEDLINE | ID: mdl-38534906

RESUMO

Primary Care Physicians (PCPs) are the first point of contact in healthcare. Because PCPs face the challenge of managing diverse patient populations while maintaining up-to-date medical knowledge and updated health records, this study explores the current outcomes and effectiveness of implementing Artificial Intelligence-based Clinical Decision Support Systems (AI-CDSSs) in Primary Healthcare (PHC). Following the PRISMA-ScR guidelines, we systematically searched five databases, PubMed, Scopus, CINAHL, IEEE, and Google Scholar, and manually searched related articles. Only CDSSs powered by AI targeted to physicians and tested in real clinical PHC settings were included. From a total of 421 articles, 6 met our criteria. We found AI-CDSSs from the US, Netherlands, Spain, and China whose primary tasks included diagnosis support, management and treatment recommendations, and complication prediction. Secondary objectives included lessening physician work burden and reducing healthcare costs. While promising, the outcomes were hindered by physicians' perceptions and cultural settings. This study underscores the potential of AI-CDSSs in improving clinical management, patient satisfaction, and safety while reducing physician workload. However, further work is needed to explore the broad spectrum of applications that the new AI-CDSSs have in several PHC real clinical settings and measure their clinical outcomes.

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA