Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 149
Filter
1.
JAMIA Open ; 7(2): ooae039, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38779571

ABSTRACT

Objectives: Numerous studies have identified information overload as a key issue for electronic health records (EHRs). This study describes the amount of text data across all notes available to emergency physicians in the EHR, trended over the time since EHR establishment. Materials and Methods: We conducted a retrospective analysis of EHR data from a large healthcare system, examining the number of notes and a corresponding number of total words and total tokens across all notes available to physicians during patient encounters in the emergency department (ED). We assessed the change in these metrics over a 17-year period between 2006 and 2023. Results: The study cohort included 730 968 ED visits made by 293 559 unique patients and a total note count of 132 574 964. The median note count for all encounters in 2006 was 5 (IQR 1-16), accounting for 1735 (IQR 447-5521) words. By the last full year of the study period, 2022, the median number of notes had grown to 359 (IQR 84-943), representing 359 (IQR 84-943) words. Note and word counts were higher for admitted patients. Discussion: The volume of notes available for review by providers has increased by over 30-fold in the 17 years since the implementation of the EHR at a large health system. The task of reviewing these notes has become commensurately more difficult. These data point to the critical need for new strategies and tools for filtering, synthesizing, and summarizing information to achieve the promise of the medical record.

2.
J Addict Med ; 2024 May 22.
Article in English | MEDLINE | ID: mdl-38776423

ABSTRACT

OBJECTIVE: A trial comparing extended-release naltrexone and sublingual buprenorphine-naloxone demonstrated higher relapse rates in individuals randomized to extended-release naltrexone. The effectiveness of treatment might vary based on patient characteristics. We hypothesized that causal machine learning would identify individualized treatment effects for each medication. METHODS: This is a secondary analysis of a multicenter randomized trial that compared the effectiveness of extended-release naltrexone versus buprenorphine-naloxone for preventing relapse of opioid misuse. Three machine learning models were derived using all trial participants with 50% randomly selected for training (n = 285) and the remaining 50% for validation. Individualized treatment effect was measured by the Qini value and c-for-benefit, with the absence of relapse denoting treatment success. Patients were grouped into quartiles by predicted individualized treatment effect to examine differences in characteristics and the observed treatment effects. RESULTS: The best-performing model had a Qini value of 4.45 (95% confidence interval, 1.02-7.83) and a c-for-benefit of 0.63 (95% confidence interval, 0.53-0.68). The quartile most likely to benefit from buprenorphine-naloxone had a 35% absolute benefit from this treatment, and at study entry, they had a high median opioid withdrawal score (P < 0.001), used cocaine on more days over the prior 30 days than other quartiles (P < 0.001), and had highest proportions with alcohol and cocaine use disorder (P ≤ 0.02). Quartile 4 individuals were predicted to be most likely to benefit from extended-release naltrexone, with the greatest proportion having heroin drug preference (P = 0.02) and all experiencing homelessness (P < 0.001). CONCLUSIONS: Causal machine learning identified differing individualized treatment effects between medications based on characteristics associated with preventing relapse.

3.
medRxiv ; 2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38585973

ABSTRACT

Natural Language Processing (NLP) is a study of automated processing of text data. Application of NLP in the clinical domain is important due to the rich unstructured information implanted in clinical documents, which often remains inaccessible in structured data. Empowered by the recent advance of language models (LMs), there is a growing interest in their application within the clinical domain. When applying NLP methods to a certain domain, the role of benchmark datasets are crucial as benchmark datasets not only guide the selection of best-performing models but also enable assessing of the reliability of the generated outputs. Despite the recent availability of LMs capable of longer context, benchmark datasets targeting long clinical document classification tasks are absent. To address this issue, we propose LCD benchmark, a benchmark for the task of predicting 30-day out-of-hospital mortality using discharge notes of MIMIC-IV and statewide death data. Our notes have a median word count of 1687 and an interquartile range of 1308 to 2169. We evaluated this benchmark dataset using baseline models, from bag-of-words and CNN to Hierarchical Transformer and an open-source instruction-tuned large language model. Additionally, we provide a comprehensive analysis of the model outputs, including manual review and visualization of model weights, to offer insights into their predictive capabilities and limitations. We expect LCD benchmarks to become a resource for the development of advanced supervised models, prompting methods, or the foundation models themselves, tailored for clinical text. The benchmark dataset is available at https://github.com/Machine-Learning-for-Medical-Language/long-clinical-doc.

4.
medRxiv ; 2024 Apr 09.
Article in English | MEDLINE | ID: mdl-38562730

ABSTRACT

In the evolving landscape of clinical Natural Language Generation (NLG), assessing abstractive text quality remains challenging, as existing methods often overlook generative task complexities. This work aimed to examine the current state of automated evaluation metrics in NLG in healthcare. To have a robust and well-validated baseline with which to examine the alignment of these metrics, we created a comprehensive human evaluation framework. Employing ChatGPT-3.5-turbo generative output, we correlated human judgments with each metric. None of the metrics demonstrated high alignment; however, the SapBERT score-a Unified Medical Language System (UMLS)- showed the best results. This underscores the importance of incorporating domain-specific knowledge into evaluation efforts. Our work reveals the deficiency in quality evaluations for generated text and introduces our comprehensive human evaluation framework as a baseline. Future efforts should prioritize integrating medical knowledge databases to enhance the alignment of automated metrics, particularly focusing on refining the SapBERT score for improved assessments.

6.
medRxiv ; 2024 Mar 19.
Article in English | MEDLINE | ID: mdl-38562803

ABSTRACT

Rationale: Early detection of clinical deterioration using early warning scores may improve outcomes. However, most implemented scores were developed using logistic regression, only underwent retrospective internal validation, and were not tested in important patient subgroups. Objectives: To develop a gradient boosted machine model (eCARTv5) for identifying clinical deterioration and then validate externally, test prospectively, and evaluate across patient subgroups. Methods: All adult patients hospitalized on the wards in seven hospitals from 2008- 2022 were used to develop eCARTv5, with demographics, vital signs, clinician documentation, and laboratory values utilized to predict intensive care unit transfer or death in the next 24 hours. The model was externally validated retrospectively in 21 hospitals from 2009-2023 and prospectively in 10 hospitals from February to May 2023. eCARTv5 was compared to the Modified Early Warning Score (MEWS) and the National Early Warning Score (NEWS) using the area under the receiver operating characteristic curve (AUROC). Measurements and Main Results: The development cohort included 901,491 admissions, the retrospective validation cohort included 1,769,461 admissions, and the prospective validation cohort included 46,330 admissions. In retrospective validation, eCART had the highest AUROC (0.835; 95%CI 0.834, 0.835), followed by NEWS (0.766 (95%CI 0.766, 0.767)), and MEWS (0.704 (95%CI 0.703, 0.704)). eCART's performance remained high (AUROC ≥0.80) across a range of patient demographics, clinical conditions, and during prospective validation. Conclusions: We developed eCARTv5, which accurately identifies early clinical deterioration in hospitalized ward patients. Our model performed better than the NEWS and MEWS retrospectively, prospectively, and across a range of subgroups.

7.
J Am Med Inform Assoc ; 31(6): 1322-1330, 2024 May 20.
Article in English | MEDLINE | ID: mdl-38679906

ABSTRACT

OBJECTIVES: To compare and externally validate popular deep learning model architectures and data transformation methods for variable-length time series data in 3 clinical tasks (clinical deterioration, severe acute kidney injury [AKI], and suspected infection). MATERIALS AND METHODS: This multicenter retrospective study included admissions at 2 medical centers that spanned 2007-2022. Distinct datasets were created for each clinical task, with 1 site used for training and the other for testing. Three feature engineering methods (normalization, standardization, and piece-wise linear encoding with decision trees [PLE-DTs]) and 3 architectures (long short-term memory/gated recurrent unit [LSTM/GRU], temporal convolutional network, and time-distributed wrapper with convolutional neural network [TDW-CNN]) were compared in each clinical task. Model discrimination was evaluated using the area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUROC). RESULTS: The study comprised 373 825 admissions for training and 256 128 admissions for testing. LSTM/GRU models tied with TDW-CNN models with both obtaining the highest mean AUPRC in 2 tasks, and LSTM/GRU had the highest mean AUROC across all tasks (deterioration: 0.81, AKI: 0.92, infection: 0.87). PLE-DT with LSTM/GRU achieved the highest AUPRC in all tasks. DISCUSSION: When externally validated in 3 clinical tasks, the LSTM/GRU model architecture with PLE-DT transformed data demonstrated the highest AUPRC in all tasks. Multiple models achieved similar performance when evaluated using AUROC. CONCLUSION: The LSTM architecture performs as well or better than some newer architectures, and PLE-DT may enhance the AUPRC in variable-length time series data for predicting clinical outcomes during external validation.


Subject(s)
Deep Learning , Humans , Retrospective Studies , Acute Kidney Injury , Neural Networks, Computer , ROC Curve , Male , Datasets as Topic , Female , Middle Aged
8.
J Clin Med ; 13(5)2024 Feb 20.
Article in English | MEDLINE | ID: mdl-38592057

ABSTRACT

(1) Background: SeptiCyte RAPID is a molecular test for discriminating sepsis from non-infectious systemic inflammation, and for estimating sepsis probabilities. The objective of this study was the clinical validation of SeptiCyte RAPID, based on testing retrospectively banked and prospectively collected patient samples. (2) Methods: The cartridge-based SeptiCyte RAPID test accepts a PAXgene blood RNA sample and provides sample-to-answer processing in ~1 h. The test output (SeptiScore, range 0-15) falls into four interpretation bands, with higher scores indicating higher probabilities of sepsis. Retrospective (N = 356) and prospective (N = 63) samples were tested from adult patients in ICU who either had the systemic inflammatory response syndrome (SIRS), or were suspected of having/diagnosed with sepsis. Patients were clinically evaluated by a panel of three expert physicians blinded to the SeptiCyte test results. Results were interpreted under either the Sepsis-2 or Sepsis-3 framework. (3) Results: Under the Sepsis-2 framework, SeptiCyte RAPID performance for the combined retrospective and prospective cohorts had Areas Under the ROC Curve (AUCs) ranging from 0.82 to 0.85, a negative predictive value of 0.91 (sensitivity 0.94) for SeptiScore Band 1 (score range 0.1-5.0; lowest risk of sepsis), and a positive predictive value of 0.81 (specificity 0.90) for SeptiScore Band 4 (score range 7.4-15; highest risk of sepsis). Performance estimates for the prospective cohort ranged from AUC 0.86-0.95. For physician-adjudicated sepsis cases that were blood culture (+) or blood, urine culture (+)(+), 43/48 (90%) of SeptiCyte scores fell in Bands 3 or 4. In multivariable analysis with up to 14 additional clinical variables, SeptiScore was the most important variable for sepsis diagnosis. A comparable performance was obtained for the majority of patients reanalyzed under the Sepsis-3 definition, although a subgroup of 16 patients was identified that was called septic under Sepsis-2 but not under Sepsis-3. (4) Conclusions: This study validates SeptiCyte RAPID for estimating sepsis probability, under both the Sepsis-2 and Sepsis-3 frameworks, for hospitalized patients on their first day of ICU admission.

9.
J Am Med Inform Assoc ; 31(6): 1291-1302, 2024 May 20.
Article in English | MEDLINE | ID: mdl-38587875

ABSTRACT

OBJECTIVE: The timely stratification of trauma injury severity can enhance the quality of trauma care but it requires intense manual annotation from certified trauma coders. The objective of this study is to develop machine learning models for the stratification of trauma injury severity across various body regions using clinical text and structured electronic health records (EHRs) data. MATERIALS AND METHODS: Our study utilized clinical documents and structured EHR variables linked with the trauma registry data to create 2 machine learning models with different approaches to representing text. The first one fuses concept unique identifiers (CUIs) extracted from free text with structured EHR variables, while the second one integrates free text with structured EHR variables. Temporal validation was undertaken to ensure the models' temporal generalizability. Additionally, analyses to assess the variable importance were conducted. RESULTS: Both models demonstrated impressive performance in categorizing leg injuries, achieving high accuracy with macro-F1 scores of over 0.8. Additionally, they showed considerable accuracy, with macro-F1 scores exceeding or near 0.7, in assessing injuries in the areas of the chest and head. We showed in our variable importance analysis that the most important features in the model have strong face validity in determining clinically relevant trauma injuries. DISCUSSION: The CUI-based model achieves comparable performance, if not higher, compared to the free-text-based model, with reduced complexity. Furthermore, integrating structured EHR data improves performance, particularly when the text modalities are insufficiently indicative. CONCLUSIONS: Our multi-modal, multiclass models can provide accurate stratification of trauma injury severity and clinically relevant interpretations.


Subject(s)
Electronic Health Records , Machine Learning , Wounds and Injuries , Humans , Wounds and Injuries/classification , Injury Severity Score , Registries , Trauma Severity Indices , Natural Language Processing
10.
Crit Care Explor ; 6(3): e1066, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38505174

ABSTRACT

OBJECTIVES: Alcohol withdrawal syndrome (AWS) may progress to require high-intensity care. Approaches to identify hospitalized patients with AWS who received higher level of care have not been previously examined. This study aimed to examine the utility of Clinical Institute Withdrawal Assessment Alcohol Revised (CIWA-Ar) for alcohol scale scores and medication doses for alcohol withdrawal management in identifying patients who received high-intensity care. DESIGN: A multicenter observational cohort study of hospitalized adults with alcohol withdrawal. SETTING: University of Chicago Medical Center and University of Wisconsin Hospital. PATIENTS: Inpatient encounters between November 2008 and February 2022 with a CIWA-Ar score greater than 0 and benzodiazepine or barbiturate administered within the first 24 hours. The primary composite outcome was patients who progressed to high-intensity care (intermediate care or ICU). INTERVENTIONS: None. MAIN RESULTS: Among the 8742 patients included in the study, 37.5% (n = 3280) progressed to high-intensity care. The odds ratio for the composite outcome increased above 1.0 when the CIWA-Ar score was 24. The sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) at this threshold were 0.12 (95% CI, 0.11-0.13), 0.95 (95% CI, 0.94-0.95), 0.58 (95% CI, 0.54-0.61), and 0.64 (95% CI, 0.63-0.65), respectively. The OR increased above 1.0 at a 24-hour lorazepam milligram equivalent dose cutoff of 15 mg. The sensitivity, specificity, PPV, and NPV at this threshold were 0.16 (95% CI, 0.14-0.17), 0.96 (95% CI, 0.95-0.96), 0.68 (95% CI, 0.65-0.72), and 0.65 (95% CI, 0.64-0.66), respectively. CONCLUSIONS: Neither CIWA-Ar scores nor medication dose cutoff points were effective measures for identifying patients with alcohol withdrawal who received high-intensity care. Research studies for examining outcomes in patients who deteriorate with AWS will require better methods for cohort identification.

11.
medRxiv ; 2024 Feb 06.
Article in English | MEDLINE | ID: mdl-38370788

ABSTRACT

OBJECTIVE: Timely intervention for clinically deteriorating ward patients requires that care teams accurately diagnose and treat their underlying medical conditions. However, the most common diagnoses leading to deterioration and the relevant therapies provided are poorly characterized. Therefore, we aimed to determine the diagnoses responsible for clinical deterioration, the relevant diagnostic tests ordered, and the treatments administered among high-risk ward patients using manual chart review. DESIGN: Multicenter retrospective observational study. SETTING: Inpatient medical-surgical wards at four health systems from 2006-2020 PATIENTS: Randomly selected patients (1,000 from each health system) with clinical deterioration, defined by reaching the 95th percentile of a validated early warning score, electronic Cardiac Arrest Risk Triage (eCART), were included. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: Clinical deterioration was confirmed by a trained reviewer or marked as a false alarm if no deterioration occurred for each patient. For true deterioration events, the condition causing deterioration, relevant diagnostic tests ordered, and treatments provided were collected. Of the 4,000 included patients, 2,484 (62%) had clinical deterioration confirmed by chart review. Sepsis was the most common cause of deterioration (41%; n=1,021), followed by arrhythmia (19%; n=473), while liver failure had the highest in-hospital mortality (41%). The most common diagnostic tests ordered were complete blood counts (47% of events), followed by chest x-rays (42%), and cultures (40%), while the most common medication orders were antimicrobials (46%), followed by fluid boluses (34%), and antiarrhythmics (19%). CONCLUSIONS: We found that sepsis was the most common cause of deterioration, while liver failure had the highest mortality. Complete blood counts and chest x-rays were the most common diagnostic tests ordered, and antimicrobials and fluid boluses were the most common medication interventions. These results provide important insights for clinical decision-making at the bedside, training of rapid response teams, and the development of institutional treatment pathways for clinical deterioration. KEY POINTS: Question: What are the most common diagnoses, diagnostic test orders, and treatments for ward patients experiencing clinical deterioration? Findings: In manual chart review of 2,484 encounters with deterioration across four health systems, we found that sepsis was the most common cause of clinical deterioration, followed by arrythmias, while liver failure had the highest mortality. Complete blood counts and chest x-rays were the most common diagnostic test orders, while antimicrobials and fluid boluses were the most common treatments. Meaning: Our results provide new insights into clinical deterioration events, which can inform institutional treatment pathways, rapid response team training, and patient care.

12.
Addiction ; 119(4): 766-771, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38011858

ABSTRACT

BACKGROUND AND AIMS: Accurate case discovery is critical for disease surveillance, resource allocation and research. International Classification of Disease (ICD) diagnosis codes are commonly used for this purpose. We aimed to determine the sensitivity, specificity and positive predictive value (PPV) of ICD-10 codes for opioid misuse case discovery in the emergency department (ED) setting. DESIGN AND SETTING: Retrospective cohort study of ED encounters from January 2018 to December 2020 at an urban academic hospital in the United States. A sample of ED encounters enriched for opioid misuse was developed by oversampling ED encounters with positive urine opiate screens or pre-existing opioid-related diagnosis codes in addition to other opioid misuse risk factors. CASES: A total of 1200 randomly selected encounters were annotated by research staff for the presence of opioid misuse within health record documentation using a 5-point scale for likelihood of opioid misuse and dichotomized into cohorts of opioid misuse and no opioid misuse. MEASUREMENTS: Using manual annotation as ground truth, the sensitivity and specificity of ICD-10 codes entered during the encounter were determined with PPV adjusted for oversampled data. Metrics were also determined by disposition subgroup: discharged home or admitted. FINDINGS: There were 541 encounters annotated as opioid misuse and 617 with no opioid misuse. The majority were males (54.4%), average age was 47 years and 68.5% were discharged directly from the ED. The sensitivity of ICD-10 codes was 0.56 (95% confidence interval [CI], 0.51-0.60), specificity 0.99 (95% CI, 0.97-0.99) and adjusted PPV 0.78 (95% CI, 0.65-0.92). The sensitivity was higher for patients discharged from the ED (0.65; 95% CI, 0.60-0.69) than those admitted (0.31; 95% CI, 0.24-0.39). CONCLUSIONS: International Classification of Disease-10 codes appear to have low sensitivity but high specificity and positive predictive value in detecting opioid misuse among emergency department patients in the United States.


Subject(s)
International Classification of Diseases , Opioid-Related Disorders , Male , Humans , United States/epidemiology , Middle Aged , Female , Retrospective Studies , Opioid-Related Disorders/diagnosis , Opioid-Related Disorders/epidemiology , Predictive Value of Tests , Emergency Service, Hospital
13.
Ann Surg Oncol ; 31(1): 488-498, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37782415

ABSTRACT

BACKGROUND: While lower socioeconomic status has been shown to correlate with worse outcomes in cancer care, data correlating neighborhood-level metrics with outcomes are scarce. We aim to explore the association between neighborhood disadvantage and both short- and long-term postoperative outcomes in patients undergoing pancreatectomy for pancreatic ductal adenocarcinoma (PDAC). PATIENTS AND METHODS: We retrospectively analyzed 243 patients who underwent resection for PDAC at a single institution between 1 January 2010 and 15 September 2021. To measure neighborhood disadvantage, the cohort was divided into tertiles by Area Deprivation Index (ADI). Short-term outcomes of interest were minor complications, major complications, unplanned readmission within 30 days, prolonged hospitalization, and delayed gastric emptying (DGE). The long-term outcome of interest was overall survival. Logistic regression was used to test short-term outcomes; Cox proportional hazards models and Kaplan-Meier method were used for long-term outcomes. RESULTS: The median ADI of the cohort was 49 (IQR 32-64.5). On adjusted analysis, the high-ADI group demonstrated greater odds of suffering a major complication (odds ratio [OR], 2.78; 95% confidence interval [CI], 1.26-6.40; p = 0.01) and of an unplanned readmission (OR, 3.09; 95% CI, 1.16-9.28; p = 0.03) compared with the low-ADI group. There were no significant differences between groups in the odds of minor complications, prolonged hospitalization, or DGE (all p > 0.05). High ADI did not confer an increased hazard of death (p = 0.63). CONCLUSIONS: We found that worse neighborhood disadvantage is associated with a higher risk of major complication and unplanned readmission after pancreatectomy for PDAC.


Subject(s)
Carcinoma, Pancreatic Ductal , Pancreatic Neoplasms , Humans , Pancreatectomy/adverse effects , Pancreatectomy/methods , Retrospective Studies , Pancreatic Neoplasms/pathology , Carcinoma, Pancreatic Ductal/pathology , Neighborhood Characteristics
14.
JAMIA Open ; 6(4): ooad109, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38144168

ABSTRACT

Objectives: To develop and externally validate machine learning models using structured and unstructured electronic health record data to predict postoperative acute kidney injury (AKI) across inpatient settings. Materials and Methods: Data for adult postoperative admissions to the Loyola University Medical Center (2009-2017) were used for model development and admissions to the University of Wisconsin-Madison (2009-2020) were used for validation. Structured features included demographics, vital signs, laboratory results, and nurse-documented scores. Unstructured text from clinical notes were converted into concept unique identifiers (CUIs) using the clinical Text Analysis and Knowledge Extraction System. The primary outcome was the development of Kidney Disease Improvement Global Outcomes stage 2 AKI within 7 days after leaving the operating room. We derived unimodal extreme gradient boosting machines (XGBoost) and elastic net logistic regression (GLMNET) models using structured-only data and multimodal models combining structured data with CUI features. Model comparison was performed using the receiver operating characteristic curve (AUROC), with Delong's test for statistical differences. Results: The study cohort included 138 389 adult patient admissions (mean [SD] age 58 [16] years; 11 506 [8%] African-American; and 70 826 [51%] female) across the 2 sites. Of those, 2959 (2.1%) developed stage 2 AKI or higher. Across all data types, XGBoost outperformed GLMNET (mean AUROC 0.81 [95% confidence interval (CI), 0.80-0.82] vs 0.78 [95% CI, 0.77-0.79]). The multimodal XGBoost model incorporating CUIs parameterized as term frequency-inverse document frequency (TF-IDF) showed the highest discrimination performance (AUROC 0.82 [95% CI, 0.81-0.83]) over unimodal models (AUROC 0.79 [95% CI, 0.78-0.80]). Discussion: A multimodality approach with structured data and TF-IDF weighting of CUIs increased model performance over structured data-only models. Conclusion: These findings highlight the predictive power of CUIs when merged with structured data for clinical prediction models, which may improve the detection of postoperative AKI.

15.
JAMIA Open ; 6(4): ooad092, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37942470

ABSTRACT

Objectives: Substance misuse is a complex and heterogeneous set of conditions associated with high mortality and regional/demographic variations. Existing data systems are siloed and have been ineffective in curtailing the substance misuse epidemic. Therefore, we aimed to build a novel informatics platform, the Substance Misuse Data Commons (SMDC), by integrating multiple data modalities to provide a unified record of information crucial to improving outcomes in substance misuse patients. Materials and Methods: The SMDC was created by linking electronic health record (EHR) data from adult cases of substance (alcohol, opioid, nonopioid drug) misuse at the University of Wisconsin hospitals to socioeconomic and state agency data. To ensure private and secure data exchange, Privacy-Preserving Record Linkage (PPRL) and Honest Broker services were utilized. The overlap in mortality reporting among the EHR, state Vital Statistics, and a commercial national data source was assessed. Results: The SMDC included data from 36 522 patients experiencing 62 594 healthcare encounters. Over half of patients were linked to the statewide ambulance database and prescription drug monitoring program. Chronic diseases accounted for most underlying causes of death, while drug-related overdoses constituted 8%. Our analysis of mortality revealed a 49.1% overlap across the 3 data sources. Nonoverlapping deaths were associated with poor socioeconomic indicators. Discussion: Through PPRL, the SMDC enabled the longitudinal integration of multimodal data. Combining death data from local, state, and national sources enhanced mortality tracking and exposed disparities. Conclusion: The SMDC provides a comprehensive resource for clinical providers and policymakers to inform interventions targeting substance misuse-related hospitalizations, overdoses, and death.

16.
Proc Conf Assoc Comput Linguist Meet ; 2023: 125-130, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37786810

ABSTRACT

Text in electronic health records is organized into sections, and classifying those sections into section categories is useful for downstream tasks. In this work, we attempt to improve the transferability of section classification models by combining the dataset-specific knowledge in supervised learning models with the world knowledge inside large language models (LLMs). Surprisingly, we find that zero-shot LLMs out-perform supervised BERT-based models applied to out-of-domain data. We also find that their strengths are synergistic, so that a simple ensemble technique leads to additional performance gains.

17.
J Am Med Inform Assoc ; 31(1): 89-97, 2023 12 22.
Article in English | MEDLINE | ID: mdl-37725927

ABSTRACT

OBJECTIVE: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for 1 institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP ("Subjective," "Object," "Assessment," and "Plan") framework with improved transferability. MATERIALS AND METHODS: We trained the baseline models by fine-tuning BERT-based models, and enhanced their transferability with continued pretraining, including domain-adaptive pretraining and task-adaptive pretraining. We added in-domain annotated samples during fine-tuning and observed model performance over a varying number of annotated sample size. Finally, we quantified the impact of continued pretraining in equivalence of the number of in-domain annotated samples added. RESULTS: We found continued pretraining improved models only when combined with in-domain annotated samples, improving the F1 score from 0.756 to 0.808, averaged across 3 datasets. This improvement was equivalent to adding 35 in-domain annotated samples. DISCUSSION: Although considered a straightforward task when performing in-domain, section classification is still a considerably difficult task when performing cross-domain, even using highly sophisticated neural network-based methods. CONCLUSION: Continued pretraining improved model transferability for cross-domain clinical note section classification in the presence of a small amount of in-domain labeled samples.


Subject(s)
Health Facilities , Information Storage and Retrieval , Natural Language Processing , Neural Networks, Computer , Sample Size
18.
Proc Conf Assoc Comput Linguist Meet ; 2023: 461-467, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37583489

ABSTRACT

The BioNLP Workshop 2023 initiated the launch of a shared task on Problem List Summarization (ProbSum) in January 2023. The aim of this shared task is to attract future research efforts in building NLP models for real-world diagnostic decision support applications, where a system generating relevant and accurate diagnoses will augment the healthcare providers' decision-making process and improve the quality of care for patients. The goal for participants is to develop models that generated a list of diagnoses and problems using input from the daily care notes collected from the hospitalization of critically ill patients. Eight teams submitted their final systems to the shared task leaderboard. In this paper, we describe the tasks, datasets, evaluation metrics, and baseline systems. Additionally, the techniques and results of the evaluation of the different approaches tried by the participating teams are summarized.

19.
Adv Exp Med Biol ; 1426: 395-412, 2023.
Article in English | MEDLINE | ID: mdl-37464130

ABSTRACT

Severe asthma is a spectrum disorder with numerous subsets, many of which are defined by clinical history and a general predisposition for T2 inflammation. Most of the approved therapies for severe asthma have required clinical trial designs with population enrichment for exacerbation frequency and/or elevation of blood eosinophils. Moving beyond this framework will require trial designs that increase efficiency for studying nondominant subsets and continue to improve upon biomarker signatures. In addition to reviewing the current literature on biomarker-informed trials for severe asthma, this chapter will also review the advantages of master protocols and adaptive design methods for establishing the efficacy of new interventions in prospectively defined subsets of patients. The incorporation of methods that allow for data collection outside of traditional study visits at academic centers, called remote decentralized trial design, is a growing trend that may increase diversity in study participation and allow for enhanced resiliency during the COVID-19 pandemic. Finally, reaching the goals of precision medicine in asthma will require increased emphasis on effectiveness studies. Recent advances in real-world data utilization from electronic health records are also discussed with a view toward pragmatic trial designs that could also incorporate the evaluation of biomarker signatures.


Subject(s)
Asthma , COVID-19 , Precision Medicine , Humans , Asthma/diagnosis , Asthma/therapy , Biomarkers , Clinical Trials as Topic , COVID-19/therapy , Pandemics
20.
Proc Conf Assoc Comput Linguist Meet ; 2023(ClinicalNLP): 78-85, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37492270

ABSTRACT

Generative artificial intelligence (AI) is a promising direction for augmenting clinical diagnostic decision support and reducing diagnostic errors, a leading contributor to medical errors. To further the development of clinical AI systems, the Diagnostic Reasoning Benchmark (DR.BENCH) was introduced as a comprehensive generative AI framework, comprised of six tasks representing key components in clinical reasoning. We present a comparative analysis of in-domain versus out-of-domain language models as well as multi-task versus single task training with a focus on the problem summarization task in DR.BENCH (Gao et al., 2023). We demonstrate that a multi-task, clinically-trained language model outperforms its general domain counterpart by a large margin, establishing a new state-of-the-art performance, with a ROUGE-L score of 28.55. This research underscores the value of domain-specific training for optimizing clinical diagnostic reasoning tasks.

SELECTION OF CITATIONS
SEARCH DETAIL
...