Search | VHL Regional Portal

Question-answering system extracts information on injection drug use from clinical notes.

Mahbub, Maria; Goethert, Ian; Danciu, Ioana; Knight, Kathryn; Srinivasan, Sudarshan; Tamang, Suzanne; Rozenberg-Ben-Dror, Karine; Solares, Hugo; Martins, Susana; Trafton, Jodie; Begoli, Edmon; Peterson, Gregory D.

Commun Med (Lond) ; 4(1): 61, 2024 Apr 03.

Article in English | MEDLINE | ID: mdl-38570620

ABSTRACT

BACKGROUND: Injection drug use (IDU) can increase mortality and morbidity. Therefore, identifying IDU early and initiating harm reduction interventions can benefit individuals at risk. However, extracting IDU behaviors from patients' electronic health records (EHR) is difficult because there is no other structured data available, such as International Classification of Disease (ICD) codes, and IDU is most often documented in unstructured free-text clinical notes. Although natural language processing can efficiently extract this information from unstructured data, there are no validated tools. METHODS: To address this gap in clinical information, we design a question-answering (QA) framework to extract information on IDU from clinical notes for use in clinical operations. Our framework involves two main steps: (1) generating a gold-standard QA dataset and (2) developing and testing the QA model. We use 2323 clinical notes of 1145 patients curated from the US Department of Veterans Affairs (VA) Corporate Data Warehouse to construct the gold-standard dataset for developing and evaluating the QA model. We also demonstrate the QA model's ability to extract IDU-related information from temporally out-of-distribution data. RESULTS: Here, we show that for a strict match between gold-standard and predicted answers, the QA model achieves a 51.65% F1 score. For a relaxed match between the gold-standard and predicted answers, the QA model obtains a 78.03% F1 score, along with 85.38% Precision and 79.02% Recall scores. Moreover, the QA model demonstrates consistent performance when subjected to temporally out-of-distribution data. CONCLUSIONS: Our study introduces a QA framework designed to extract IDU information from clinical notes, aiming to enhance the accurate and efficient detection of people who inject drugs, extract relevant information, and ultimately facilitate informed patient care.

There are many health risks associated with injection drug use (IDU). Identifying people who inject drugs early can reduce the likelihood of these issues arising. However, extracting information about any possible IDU from a person's electronic health records can be difficult because the information is often in text-based general clinical notes rather than provided in a particular section of the record or as numerical data. Manually extracting information from these notes is time-consuming and inefficient. We used a computational method to train computer software to be able to extract IDU details. Potentially, this approach could be used by healthcare providers to more efficiently and accurately identify people who inject drugs, and therefore provide better advice and medical care.

BioADAPT-MRC: adversarial learning-based domain adaptation improves biomedical machine reading comprehension task.

Mahbub, Maria; Srinivasan, Sudarshan; Begoli, Edmon; Peterson, Gregory D.

Bioinformatics ; 38(18): 4369-4379, 2022 09 15.

Article in English | MEDLINE | ID: mdl-35876792

ABSTRACT

MOTIVATION: Biomedical machine reading comprehension (biomedical-MRC) aims to comprehend complex biomedical narratives and assist healthcare professionals in retrieving information from them. The high performance of modern neural network-based MRC systems depends on high-quality, large-scale, human-annotated training datasets. In the biomedical domain, a crucial challenge in creating such datasets is the requirement for domain knowledge, inducing the scarcity of labeled data and the need for transfer learning from the labeled general-purpose (source) domain to the biomedical (target) domain. However, there is a discrepancy in marginal distributions between the general-purpose and biomedical domains due to the variances in topics. Therefore, direct-transferring of learned representations from a model trained on a general-purpose domain to the biomedical domain can hurt the model's performance. RESULTS: We present an adversarial learning-based domain adaptation framework for the biomedical machine reading comprehension task (BioADAPT-MRC), a neural network-based method to address the discrepancies in the marginal distributions between the general and biomedical domain datasets. BioADAPT-MRC relaxes the need for generating pseudo labels for training a well-performing biomedical-MRC model. We extensively evaluate the performance of BioADAPT-MRC by comparing it with the best existing methods on three widely used benchmark biomedical-MRC datasets-BioASQ-7b, BioASQ-8b and BioASQ-9b. Our results suggest that without using any synthetic or human-annotated data from the biomedical domain, BioADAPT-MRC can achieve state-of-the-art performance on these datasets. AVAILABILITY AND IMPLEMENTATION: BioADAPT-MRC is freely available as an open-source project at https://github.com/mmahbub/BioADAPT-MRC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Comprehension , Neural Networks, Computer , Humans , Benchmarking

Unstructured clinical notes within the 24 hours since admission predict short, mid & long-term mortality in adult ICU patients.

Mahbub, Maria; Srinivasan, Sudarshan; Danciu, Ioana; Peluso, Alina; Begoli, Edmon; Tamang, Suzanne; Peterson, Gregory D.

PLoS One ; 17(1): e0262182, 2022.

Article in English | MEDLINE | ID: mdl-34990485

ABSTRACT

Mortality prediction for intensive care unit (ICU) patients is crucial for improving outcomes and efficient utilization of resources. Accessibility of electronic health records (EHR) has enabled data-driven predictive modeling using machine learning. However, very few studies rely solely on unstructured clinical notes from the EHR for mortality prediction. In this work, we propose a framework to predict short, mid, and long-term mortality in adult ICU patients using unstructured clinical notes from the MIMIC III database, natural language processing (NLP), and machine learning (ML) models. Depending on the statistical description of the patients' length of stay, we define the short-term as 48-hour and 4-day period, the mid-term as 7-day and 10-day period, and the long-term as 15-day and 30-day period after admission. We found that by only using clinical notes within the 24 hours of admission, our framework can achieve a high area under the receiver operating characteristics (AU-ROC) score for short, mid and long-term mortality prediction tasks. The test AU-ROC scores are 0.87, 0.83, 0.83, 0.82, 0.82, and 0.82 for 48-hour, 4-day, 7-day, 10-day, 15-day, and 30-day period mortality prediction, respectively. We also provide a comparative study among three types of feature extraction techniques from NLP: frequency-based technique, fixed embedding-based technique, and dynamic embedding-based technique. Lastly, we provide an interpretation of the NLP-based predictive models using feature-importance scores.

Subject(s)

Hospital Mortality , Machine Learning , Area Under Curve , Databases, Factual , Electronic Health Records , Humans , Intensive Care Units , Length of Stay , Logistic Models , ROC Curve

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL