Pesquisa | Portal Regional da BVS (teste)

Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.

Masanneck, Lars; Schmidt, Linea; Seifert, Antonia; Kölsche, Tristan; Huntemann, Niklas; Jansen, Robin; Mehsin, Mohammed; Bernhard, Michael; Meuth, Sven G; Böhm, Lennert; Pawlitzki, Marc.

J Med Internet Res ; 26: e53297, 2024 Jun 14.

Artigo em Inglês | MEDLINE | ID: mdl-38875696

RESUMO

BACKGROUND: Large language models (LLMs) have demonstrated impressive performances in various medical domains, prompting an exploration of their potential utility within the high-demand setting of emergency department (ED) triage. This study evaluated the triage proficiency of different LLMs and ChatGPT, an LLM-based chatbot, compared to professionally trained ED staff and untrained personnel. We further explored whether LLM responses could guide untrained staff in effective triage. OBJECTIVE: This study aimed to assess the efficacy of LLMs and the associated product ChatGPT in ED triage compared to personnel of varying training status and to investigate if the models' responses can enhance the triage proficiency of untrained personnel. METHODS: A total of 124 anonymized case vignettes were triaged by untrained doctors; different versions of currently available LLMs; ChatGPT; and professionally trained raters, who subsequently agreed on a consensus set according to the Manchester Triage System (MTS). The prototypical vignettes were adapted from cases at a tertiary ED in Germany. The main outcome was the level of agreement between raters' MTS level assignments, measured via quadratic-weighted Cohen κ. The extent of over- and undertriage was also determined. Notably, instances of ChatGPT were prompted using zero-shot approaches without extensive background information on the MTS. The tested LLMs included raw GPT-4, Llama 3 70B, Gemini 1.5, and Mixtral 8x7b. RESULTS: GPT-4-based ChatGPT and untrained doctors showed substantial agreement with the consensus triage of professional raters (κ=mean 0.67, SD 0.037 and κ=mean 0.68, SD 0.056, respectively), significantly exceeding the performance of GPT-3.5-based ChatGPT (κ=mean 0.54, SD 0.024; P<.001). When untrained doctors used this LLM for second-opinion triage, there was a slight but statistically insignificant performance increase (κ=mean 0.70, SD 0.047; P=.97). Other tested LLMs performed similar to or worse than GPT-4-based ChatGPT or showed odd triaging behavior with the used parameters. LLMs and ChatGPT models tended toward overtriage, whereas untrained doctors undertriaged. CONCLUSIONS: While LLMs and the LLM-based product ChatGPT do not yet match professionally trained raters, their best models' triage proficiency equals that of untrained ED doctors. In its current form, LLMs or ChatGPT thus did not demonstrate gold-standard performance in ED triage and, in the setting of this study, failed to significantly improve untrained doctors' triage when used as decision support. Notable performance enhancements in newer LLM versions over older ones hint at future improvements with further technological development and specific training.

Assuntos

Medicina de Emergência , Triagem , Triagem/métodos , Triagem/normas , Humanos , Medicina de Emergência/normas , Médicos/estatística & dados numéricos , Serviço Hospitalar de Emergência/normas , Idioma , Alemanha , Feminino

The Use of Nitrosative Stress Molecules as Potential Diagnostic Biomarkers in Multiple Sclerosis.

Räuber, Saskia; Förster, Moritz; Schüller, Julia; Willison, Alice; Golombeck, Kristin S; Schroeter, Christina B; Oeztuerk, Menekse; Jansen, Robin; Huntemann, Niklas; Nelke, Christopher; Korsen, Melanie; Fischer, Katinka; Kerkhoff, Ruth; Leven, Yana; Kirschner, Patricia; Kölsche, Tristan; Nikolov, Petyo; Mehsin, Mohammed; Marae, Gelenar; Kokott, Alma; Pul, Duygu; Schulten, Julius; Vogel, Niklas; Ingwersen, Jens; Ruck, Tobias; Pawlitzki, Marc; Meuth, Sven G; Melzer, Nico; Kremer, David.

Int J Mol Sci ; 25(2)2024 Jan 08.

Artigo em Inglês | MEDLINE | ID: mdl-38255863

RESUMO

Multiple sclerosis (MS) is an autoimmune disease of the central nervous system (CNS) of still unclear etiology. In recent years, the search for biomarkers facilitating its diagnosis, prognosis, therapy response, and other parameters has gained increasing attention. In this regard, in a previous meta-analysis comprising 22 studies, we found that MS is associated with higher nitrite/nitrate (NOx) levels in the cerebrospinal fluid (CSF) compared to patients with non-inflammatory other neurological diseases (NIOND). However, many of the included studies did not distinguish between the different clinical subtypes of MS, included pre-treated patients, and inclusion criteria varied. As a follow-up to our meta-analysis, we therefore aimed to analyze the serum and CSF NOx levels in clinically well-defined cohorts of treatment-naïve MS patients compared to patients with somatic symptom disorder. To this end, we analyzed the serum and CSF levels of NOx in 117 patients (71 relapsing-remitting (RR) MS, 16 primary progressive (PP) MS, and 30 somatic symptom disorder). We found that RRMS and PPMS patients had higher serum NOx levels compared to somatic symptom disorder patients. This difference remained significant in the subgroup of MRZ-negative RRMS patients. In conclusion, the measurement of NOx in the serum might indeed be a valuable tool in supporting MS diagnosis.

Assuntos

Doenças Autoimunes , Sintomas Inexplicáveis , Esclerose Múltipla Recidivante-Remitente , Esclerose Múltipla , Humanos , Esclerose Múltipla/diagnóstico , Estresse Nitrosativo , Sistema Nervoso Central

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA