Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
JAMA Netw Open ; 7(5): e248895, 2024 May 01.
Article in English | MEDLINE | ID: mdl-38713466

ABSTRACT

Importance: The introduction of large language models (LLMs), such as Generative Pre-trained Transformer 4 (GPT-4; OpenAI), has generated significant interest in health care, yet studies evaluating their performance in a clinical setting are lacking. Determination of clinical acuity, a measure of a patient's illness severity and level of required medical attention, is one of the foundational elements of medical reasoning in emergency medicine. Objective: To determine whether an LLM can accurately assess clinical acuity in the emergency department (ED). Design, Setting, and Participants: This cross-sectional study identified all adult ED visits from January 1, 2012, to January 17, 2023, at the University of California, San Francisco, with a documented Emergency Severity Index (ESI) acuity level (immediate, emergent, urgent, less urgent, or nonurgent) and with a corresponding ED physician note. A sample of 10 000 pairs of ED visits with nonequivalent ESI scores, balanced for each of the 10 possible pairs of 5 ESI scores, was selected at random. Exposure: The potential of the LLM to classify acuity levels of patients in the ED based on the ESI across 10 000 patient pairs. Using deidentified clinical text, the LLM was queried to identify the patient with a higher-acuity presentation within each pair based on the patients' clinical history. An earlier LLM was queried to allow comparison with this model. Main Outcomes and Measures: Accuracy score was calculated to evaluate the performance of both LLMs across the 10 000-pair sample. A 500-pair subsample was manually classified by a physician reviewer to compare performance between the LLMs and human classification. Results: From a total of 251 401 adult ED visits, a balanced sample of 10 000 patient pairs was created wherein each pair comprised patients with disparate ESI acuity scores. Across this sample, the LLM correctly inferred the patient with higher acuity for 8940 of 10 000 pairs (accuracy, 0.89 [95% CI, 0.89-0.90]). Performance of the comparator LLM (accuracy, 0.84 [95% CI, 0.83-0.84]) was below that of its successor. Among the 500-pair subsample that was also manually classified, LLM performance (accuracy, 0.88 [95% CI, 0.86-0.91]) was comparable with that of the physician reviewer (accuracy, 0.86 [95% CI, 0.83-0.89]). Conclusions and Relevance: In this cross-sectional study of 10 000 pairs of ED visits, the LLM accurately identified the patient with higher acuity when given pairs of presenting histories extracted from patients' first ED documentation. These findings suggest that the integration of an LLM into ED workflows could enhance triage processes while maintaining triage quality and warrants further investigation.


Subject(s)
Emergency Service, Hospital , Patient Acuity , Humans , Emergency Service, Hospital/statistics & numerical data , Cross-Sectional Studies , Adult , Male , Female , Middle Aged , Severity of Illness Index , San Francisco
2.
medRxiv ; 2024 Apr 04.
Article in English | MEDLINE | ID: mdl-38633805

ABSTRACT

Importance: Large language models (LLMs) possess a range of capabilities which may be applied to the clinical domain, including text summarization. As ambient artificial intelligence scribes and other LLM-based tools begin to be deployed within healthcare settings, rigorous evaluations of the accuracy of these technologies are urgently needed. Objective: To investigate the performance of GPT-4 and GPT-3.5-turbo in generating Emergency Department (ED) discharge summaries and evaluate the prevalence and type of errors across each section of the discharge summary. Design: Cross-sectional study. Setting: University of California, San Francisco ED. Participants: We identified all adult ED visits from 2012 to 2023 with an ED clinician note. We randomly selected a sample of 100 ED visits for GPT-summarization. Exposure: We investigate the potential of two state-of-the-art LLMs, GPT-4 and GPT-3.5-turbo, to summarize the full ED clinician note into a discharge summary. Main Outcomes and Measures: GPT-3.5-turbo and GPT-4-generated discharge summaries were evaluated by two independent Emergency Medicine physician reviewers across three evaluation criteria: 1) Inaccuracy of GPT-summarized information; 2) Hallucination of information; 3) Omission of relevant clinical information. On identifying each error, reviewers were additionally asked to provide a brief explanation for their reasoning, which was manually classified into subgroups of errors. Results: From 202,059 eligible ED visits, we randomly sampled 100 for GPT-generated summarization and then expert-driven evaluation. In total, 33% of summaries generated by GPT-4 and 10% of those generated by GPT-3.5-turbo were entirely error-free across all evaluated domains. Summaries generated by GPT-4 were mostly accurate, with inaccuracies found in only 10% of cases, however, 42% of the summaries exhibited hallucinations and 47% omitted clinically relevant information. Inaccuracies and hallucinations were most commonly found in the Plan sections of GPT-generated summaries, while clinical omissions were concentrated in text describing patients' Physical Examination findings or History of Presenting Complaint. Conclusions and Relevance: In this cross-sectional study of 100 ED encounters, we found that LLMs could generate accurate discharge summaries, but were liable to hallucination and omission of clinically relevant information. A comprehensive understanding of the location and type of errors found in GPT-generated clinical text is important to facilitate clinician review of such content and prevent patient harm.

4.
Lancet Digit Health ; 6(3): e222-e229, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38395542

ABSTRACT

Digital therapeutics (DTx) are a somewhat novel class of US Food and Drug Administration-regulated software that help patients prevent, manage, or treat disease. Here, we use natural language processing to characterise registered DTx clinical trials and provide insights into the clinical development landscape for these novel therapeutics. We identified 449 DTx clinical trials, initiated or expected to be initiated between 2010 and 2030, from ClinicalTrials.gov using 27 search terms, and available data were analysed, including trial durations, locations, MeSH categories, enrolment, and sponsor types. Topic modelling of eligibility criteria, done with BERTopic, showed that DTx trials frequently exclude patients on the basis of age, comorbidities, pregnancy, language barriers, and digital determinants of health, including smartphone or data plan access. Our comprehensive overview of the DTx development landscape highlights challenges in designing inclusive DTx clinical trials and presents opportunities for clinicians and researchers to address these challenges. Finally, we provide an interactive dashboard for readers to conduct their own analyses.


Subject(s)
Natural Language Processing , Smartphone , Humans , Software
5.
J Am Med Inform Assoc ; 30(7): 1323-1332, 2023 06 20.
Article in English | MEDLINE | ID: mdl-37187158

ABSTRACT

OBJECTIVES: As the real-world electronic health record (EHR) data continue to grow exponentially, novel methodologies involving artificial intelligence (AI) are becoming increasingly applied to enable efficient data-driven learning and, ultimately, to advance healthcare. Our objective is to provide readers with an understanding of evolving computational methods and help in deciding on methods to pursue. TARGET AUDIENCE: The sheer diversity of existing methods presents a challenge for health scientists who are beginning to apply computational methods to their research. Therefore, this tutorial is aimed at scientists working with EHR data who are early entrants into the field of applying AI methodologies. SCOPE: This manuscript describes the diverse and growing AI research approaches in healthcare data science and categorizes them into 2 distinct paradigms, the bottom-up and top-down paradigms to provide health scientists venturing into artificial intelligent research with an understanding of the evolving computational methods and help in deciding on methods to pursue through the lens of real-world healthcare data.


Subject(s)
Artificial Intelligence , Physicians , Humans , Data Science , Big Data , Delivery of Health Care
7.
Int Immunol ; 32(12): 771-783, 2020 11 23.
Article in English | MEDLINE | ID: mdl-32808986

ABSTRACT

Diet is an environmental factor in autoimmune disorders, where the immune system erroneously destroys one's own tissues. Yet, interactions between diet and autoimmunity remain largely unexplored, particularly the impact of immunogenetics, one's human leukocyte antigen (HLA) allele make-up, in this interplay. Here, we interrogated animals and plants for the presence of epitopes implicated in human autoimmune diseases. We mapped autoimmune epitope distribution across organisms and determined their tissue expression pattern. Interestingly, diet-derived epitopes implicated in a disease were more likely to bind to HLA alleles associated with that disease than to protective alleles, with visible differences between organisms with similar autoimmune epitope content. We then analyzed an individual's HLA haplotype, generating a personalized heatmap of potential dietary autoimmune triggers. Our work uncovered differences in autoimmunogenic potential across food sources and revealed differential binding of diet-derived epitopes to autoimmune disease-associated HLA alleles, shedding light on the impact of diet on autoimmunity.


Subject(s)
Autoimmune Diseases/immunology , Autoimmunity/immunology , Diet , Major Histocompatibility Complex/immunology , Alleles , Epitopes/immunology , Humans , Major Histocompatibility Complex/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...