Search | VHL Regional Portal

Reply: Refining retrieval and chunking strategies for enhanced clinical reliability of large language models in liver disease.

Ge, Jin; Sun, Steve; Owens, Joseph; Galvez, Victor; Gologorskaya, Oksana; Lai, Jennifer C; Pletcher, Mark J; Lai, Ki.

Hepatology ; 2024 Jun 27.

Article in English | MEDLINE | ID: mdl-38935858

Development of a liver disease-specific large language model chat interface using retrieval-augmented generation.

Ge, Jin; Sun, Steve; Owens, Joseph; Galvez, Victor; Gologorskaya, Oksana; Lai, Jennifer C; Pletcher, Mark J; Lai, Ki.

Hepatology ; 2024 Mar 07.

Article in English | MEDLINE | ID: mdl-38451962

ABSTRACT

BACKGROUND AND AIMS: Large language models (LLMs) have significant capabilities in clinical information processing tasks. Commercially available LLMs, however, are not optimized for clinical uses and are prone to generating hallucinatory information. Retrieval-augmented generation (RAG) is an enterprise architecture that allows the embedding of customized data into LLMs. This approach "specializes" the LLMs and is thought to reduce hallucinations. APPROACH AND RESULTS: We developed "LiVersa," a liver disease-specific LLM, by using our institution's protected health information-complaint text embedding and LLM platform, "Versa." We conducted RAG on 30 publicly available American Association for the Study of Liver Diseases guidance documents to be incorporated into LiVersa. We evaluated LiVersa's performance by conducting 2 rounds of testing. First, we compared LiVersa's outputs versus those of trainees from a previously published knowledge assessment. LiVersa answered all 10 questions correctly. Second, we asked 15 hepatologists to evaluate the outputs of 10 hepatology topic questions generated by LiVersa, OpenAI's ChatGPT 4, and Meta's Large Language Model Meta AI 2. LiVersa's outputs were more accurate but were rated less comprehensive and safe compared to those of ChatGPT 4. RESULTS: We evaluated LiVersa's performance by conducting 2 rounds of testing. First, we compared LiVersa's outputs versus those of trainees from a previously published knowledge assessment. LiVersa answered all 10 questions correctly. Second, we asked 15 hepatologists to evaluate the outputs of 10 hepatology topic questions generated by LiVersa, OpenAI's ChatGPT 4, and Meta's Large Language Model Meta AI 2. LiVersa's outputs were more accurate but were rated less comprehensive and safe compared to those of ChatGPT 4. CONCLUSIONS: In this demonstration, we built disease-specific and protected health information-compliant LLMs using RAG. While LiVersa demonstrated higher accuracy in answering questions related to hepatology, there were some deficiencies due to limitations set by the number of documents used for RAG. LiVersa will likely require further refinement before potential live deployment. The LiVersa prototype, however, is a proof of concept for utilizing RAG to customize LLMs for clinical use cases.

Measuring Implicit Bias in ICU Notes Using Word-Embedding Neural Network Models.

Cobert, Julien; Mills, Hunter; Lee, Albert; Gologorskaya, Oksana; Espejo, Edie; Jeon, Sun Young; Boscardin, W John; Heintz, Timothy A; Kennedy, Christopher J; Ashana, Deepshikha C; Chapman, Allyson Cook; Raghunathan, Karthik; Smith, Alex K; Lee, Sei J.

Chest ; 165(6): 1481-1490, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38199323

ABSTRACT

BACKGROUND: Language in nonmedical data sets is known to transmit human-like biases when used in natural language processing (NLP) algorithms that can reinforce disparities. It is unclear if NLP algorithms of medical notes could lead to similar transmissions of biases. RESEARCH QUESTION: Can we identify implicit bias in clinical notes, and are biases stable across time and geography? STUDY DESIGN AND METHODS: To determine whether different racial and ethnic descriptors are similar contextually to stigmatizing language in ICU notes and whether these relationships are stable across time and geography, we identified notes on critically ill adults admitted to the University of California, San Francisco (UCSF), from 2012 through 2022 and to Beth Israel Deaconess Hospital (BIDMC) from 2001 through 2012. Because word meaning is derived largely from context, we trained unsupervised word-embedding algorithms to measure the similarity (cosine similarity) quantitatively of the context between a racial or ethnic descriptor (eg, African-American) and a stigmatizing target word (eg, nonco-operative) or group of words (violence, passivity, noncompliance, nonadherence). RESULTS: In UCSF notes, Black descriptors were less likely to be similar contextually to violent words compared with White descriptors. Contrastingly, in BIDMC notes, Black descriptors were more likely to be similar contextually to violent words compared with White descriptors. The UCSF data set also showed that Black descriptors were more similar contextually to passivity and noncompliance words compared with Latinx descriptors. INTERPRETATION: Implicit bias is identifiable in ICU notes. Racial and ethnic group descriptors carry different contextual relationships to stigmatizing words, depending on when and where notes were written. Because NLP models seem able to transmit implicit bias from training data, use of NLP algorithms in clinical prediction could reinforce disparities. Active debiasing strategies may be necessary to achieve algorithmic fairness when using language models in clinical research.

Subject(s)

Intensive Care Units , Natural Language Processing , Neural Networks, Computer , Humans , Algorithms , Critical Illness/psychology , Bias , Electronic Health Records , Male , Female

Development of a Liver Disease-Specific Large Language Model Chat Interface using Retrieval Augmented Generation.

Ge, Jin; Sun, Steve; Owens, Joseph; Galvez, Victor; Gologorskaya, Oksana; Lai, Jennifer C; Pletcher, Mark J; Lai, Ki.

medRxiv ; 2023 Nov 10.

Article in English | MEDLINE | ID: mdl-37986764

ABSTRACT

Background: Large language models (LLMs) have significant capabilities in clinical information processing tasks. Commercially available LLMs, however, are not optimized for clinical uses and are prone to generating incorrect or hallucinatory information. Retrieval-augmented generation (RAG) is an enterprise architecture that allows embedding of customized data into LLMs. This approach "specializes" the LLMs and is thought to reduce hallucinations. Methods: We developed "LiVersa," a liver disease-specific LLM, by using our institution's protected health information (PHI)-complaint text embedding and LLM platform, "Versa." We conducted RAG on 30 publicly available American Association for the Study of Liver Diseases (AASLD) guidelines and guidance documents to be incorporated into LiVersa. We evaluated LiVersa's performance by comparing its responses versus those of trainees from a previously published knowledge assessment study regarding hepatitis B (HBV) treatment and hepatocellular carcinoma (HCC) surveillance. Results: LiVersa answered all 10 questions correctly when forced to provide a "yes" or "no" answer. Full detailed responses with justifications and rationales, however, were not completely correct for three of the questions. Discussions: In this study, we demonstrated the ability to build disease-specific and PHI-compliant LLMs using RAG. While our LLM, LiVersa, demonstrated more specificity in answering questions related to clinical hepatology - there were some knowledge deficiencies due to limitations set by the number and types of documents used for RAG. The LiVersa prototype, however, is a proof of concept for utilizing RAG to customize LLMs for clinical uses and a potential strategy to realize personalized medicine in the future.

Negativity and Positivity in the ICU: Exploratory Development of Automated Sentiment Capture in the Electronic Health Record.

Kennedy, Chris J; Chiu, Catherine; Chapman, Allyson Cook; Gologorskaya, Oksana; Farhan, Hassan; Han, Mary; Hodgson, MacGregor; Lazzareschi, Daniel; Ashana, Deepshikha; Lee, Sei; Smith, Alexander K; Espejo, Edie; Boscardin, John; Pirracchio, Romain; Cobert, Julien.

Crit Care Explor ; 5(10): e0960, 2023 Oct.

Article in English | MEDLINE | ID: mdl-37753238

ABSTRACT

OBJECTIVES: To develop proof-of-concept algorithms using alternative approaches to capture provider sentiment in ICU notes. DESIGN: Retrospective observational cohort study. SETTING: The Multiparameter Intelligent Monitoring of Intensive Care III (MIMIC-III) and the University of California, San Francisco (UCSF) deidentified notes databases. PATIENTS: Adult (≥18 yr old) patients admitted to the ICU. MEASUREMENTS AND MAIN RESULTS: We developed two sentiment models: 1) a keywords-based approach using a consensus-based clinical sentiment lexicon comprised of 72 positive and 103 negative phrases, including negations and 2) a Decoding-enhanced Bidirectional Encoder Representations from Transformers with disentangled attention-v3-based deep learning model (keywords-independent) trained on clinical sentiment labels. We applied the models to 198,944 notes across 52,997 ICU admissions in the MIMIC-III database. Analyses were replicated on an external sample of patients admitted to a UCSF ICU from 2018 to 2019. We also labeled sentiment in 1,493 note fragments and compared the predictive accuracy of our tools to three popular sentiment classifiers. Clinical sentiment terms were found in 99% of patient visits across 88% of notes. Our two sentiment tools were substantially more predictive (Spearman correlations of 0.62-0.84, p values < 0.00001) of labeled sentiment compared with general language algorithms (0.28-0.46). CONCLUSION: Our exploratory healthcare-specific sentiment models can more accurately detect positivity and negativity in clinical notes compared with general sentiment tools not designed for clinical usage.

Machine-Learning Algorithm to Improve Cohort Identification in Interstitial Lung Disease.

Farrand, Erica; Gologorskaya, Oksana; Mills, Hunter; Radhakrishnan, Lakshmi; Collard, Harold R; Butte, Atul J.

Am J Respir Crit Care Med ; 207(10): 1398-1401, 2023 05 15.

Article in English | MEDLINE | ID: mdl-36943196

Subject(s)

Lung Diseases, Interstitial , Humans , Lung Diseases, Interstitial/diagnosis , Machine Learning , Algorithms

Electronic Medical Record Search Engine (EMERSE): An Information Retrieval Tool for Supporting Cancer Research.

Hanauer, David A; Barnholtz-Sloan, Jill S; Beno, Mark F; Del Fiol, Guilherme; Durbin, Eric B; Gologorskaya, Oksana; Harris, Daniel; Harnett, Brett; Kawamoto, Kensaku; May, Benjamin; Meeks, Eric; Pfaff, Emily; Weiss, Janie; Zheng, Kai.

JCO Clin Cancer Inform ; 4: 454-463, 2020 05.

Article in English | MEDLINE | ID: mdl-32412846

ABSTRACT

PURPOSE: The Electronic Medical Record Search Engine (EMERSE) is a software tool built to aid research spanning cohort discovery, population health, and data abstraction for clinical trials. EMERSE is now live at three academic medical centers, with additional sites currently working on implementation. In this report, we describe how EMERSE has been used to support cancer research based on a variety of metrics. METHODS: We identified peer-reviewed publications that used EMERSE through online searches as well as through direct e-mails to users based on audit logs. These logs were also used to summarize use at each of the three sites. Search terms for two of the sites were characterized using the natural language processing tool MetaMap to determine to which semantic types the terms could be mapped. RESULTS: We identified a total of 326 peer-reviewed publications that used EMERSE through August 2019, although this is likely an underestimation of the true total based on the use log analysis. Oncology-related research comprised nearly one third (n = 105; 32.2%) of all research output. The use logs showed that EMERSE had been used by multiple people at each site (nearly 3,500 across all three) who had collectively logged into the system > 100,000 times. Many user-entered search queries could not be mapped to a semantic type, but the most common semantic type for terms that did match was "disease or syndrome," followed by "pharmacologic substance." CONCLUSION: EMERSE has been shown to be a valuable tool for supporting cancer research. It has been successfully deployed at other sites, despite some implementation challenges unique to each deployment environment.

Subject(s)

Neoplasms , Search Engine , Electronic Health Records , Humans , Information Storage and Retrieval , Natural Language Processing , Neoplasms/therapy , Software

Crowdsourcing the CTSA innovation mission.

Kahlon, Maninder; Yuan, Leslie; Gologorskaya, Oksana; Johnston, S Claiborne.

Clin Transl Sci ; 7(2): 89-92, 2014 Apr.

Article in English | MEDLINE | ID: mdl-24655812

Subject(s)

Crowdsourcing , Inventions , Translational Research, Biomedical , Research Design

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL