Search | VHL Regional Portal

Question-answering system extracts information on injection drug use from clinical notes.

Mahbub, Maria; Goethert, Ian; Danciu, Ioana; Knight, Kathryn; Srinivasan, Sudarshan; Tamang, Suzanne; Rozenberg-Ben-Dror, Karine; Solares, Hugo; Martins, Susana; Trafton, Jodie; Begoli, Edmon; Peterson, Gregory D.

Commun Med (Lond) ; 4(1): 61, 2024 Apr 03.

Article in English | MEDLINE | ID: mdl-38570620

ABSTRACT

BACKGROUND: Injection drug use (IDU) can increase mortality and morbidity. Therefore, identifying IDU early and initiating harm reduction interventions can benefit individuals at risk. However, extracting IDU behaviors from patients' electronic health records (EHR) is difficult because there is no other structured data available, such as International Classification of Disease (ICD) codes, and IDU is most often documented in unstructured free-text clinical notes. Although natural language processing can efficiently extract this information from unstructured data, there are no validated tools. METHODS: To address this gap in clinical information, we design a question-answering (QA) framework to extract information on IDU from clinical notes for use in clinical operations. Our framework involves two main steps: (1) generating a gold-standard QA dataset and (2) developing and testing the QA model. We use 2323 clinical notes of 1145 patients curated from the US Department of Veterans Affairs (VA) Corporate Data Warehouse to construct the gold-standard dataset for developing and evaluating the QA model. We also demonstrate the QA model's ability to extract IDU-related information from temporally out-of-distribution data. RESULTS: Here, we show that for a strict match between gold-standard and predicted answers, the QA model achieves a 51.65% F1 score. For a relaxed match between the gold-standard and predicted answers, the QA model obtains a 78.03% F1 score, along with 85.38% Precision and 79.02% Recall scores. Moreover, the QA model demonstrates consistent performance when subjected to temporally out-of-distribution data. CONCLUSIONS: Our study introduces a QA framework designed to extract IDU information from clinical notes, aiming to enhance the accurate and efficient detection of people who inject drugs, extract relevant information, and ultimately facilitate informed patient care.

There are many health risks associated with injection drug use (IDU). Identifying people who inject drugs early can reduce the likelihood of these issues arising. However, extracting information about any possible IDU from a person's electronic health records can be difficult because the information is often in text-based general clinical notes rather than provided in a particular section of the record or as numerical data. Manually extracting information from these notes is time-consuming and inefficient. We used a computational method to train computer software to be able to extract IDU details. Potentially, this approach could be used by healthcare providers to more efficiently and accurately identify people who inject drugs, and therefore provide better advice and medical care.

BioADAPT-MRC: adversarial learning-based domain adaptation improves biomedical machine reading comprehension task.

Mahbub, Maria; Srinivasan, Sudarshan; Begoli, Edmon; Peterson, Gregory D.

Bioinformatics ; 38(18): 4369-4379, 2022 09 15.

Article in English | MEDLINE | ID: mdl-35876792

ABSTRACT

MOTIVATION: Biomedical machine reading comprehension (biomedical-MRC) aims to comprehend complex biomedical narratives and assist healthcare professionals in retrieving information from them. The high performance of modern neural network-based MRC systems depends on high-quality, large-scale, human-annotated training datasets. In the biomedical domain, a crucial challenge in creating such datasets is the requirement for domain knowledge, inducing the scarcity of labeled data and the need for transfer learning from the labeled general-purpose (source) domain to the biomedical (target) domain. However, there is a discrepancy in marginal distributions between the general-purpose and biomedical domains due to the variances in topics. Therefore, direct-transferring of learned representations from a model trained on a general-purpose domain to the biomedical domain can hurt the model's performance. RESULTS: We present an adversarial learning-based domain adaptation framework for the biomedical machine reading comprehension task (BioADAPT-MRC), a neural network-based method to address the discrepancies in the marginal distributions between the general and biomedical domain datasets. BioADAPT-MRC relaxes the need for generating pseudo labels for training a well-performing biomedical-MRC model. We extensively evaluate the performance of BioADAPT-MRC by comparing it with the best existing methods on three widely used benchmark biomedical-MRC datasets-BioASQ-7b, BioASQ-8b and BioASQ-9b. Our results suggest that without using any synthetic or human-annotated data from the biomedical domain, BioADAPT-MRC can achieve state-of-the-art performance on these datasets. AVAILABILITY AND IMPLEMENTATION: BioADAPT-MRC is freely available as an open-source project at https://github.com/mmahbub/BioADAPT-MRC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Comprehension , Neural Networks, Computer , Humans , Benchmarking

Unstructured clinical notes within the 24 hours since admission predict short, mid & long-term mortality in adult ICU patients.

Mahbub, Maria; Srinivasan, Sudarshan; Danciu, Ioana; Peluso, Alina; Begoli, Edmon; Tamang, Suzanne; Peterson, Gregory D.

PLoS One ; 17(1): e0262182, 2022.

Article in English | MEDLINE | ID: mdl-34990485

ABSTRACT

Mortality prediction for intensive care unit (ICU) patients is crucial for improving outcomes and efficient utilization of resources. Accessibility of electronic health records (EHR) has enabled data-driven predictive modeling using machine learning. However, very few studies rely solely on unstructured clinical notes from the EHR for mortality prediction. In this work, we propose a framework to predict short, mid, and long-term mortality in adult ICU patients using unstructured clinical notes from the MIMIC III database, natural language processing (NLP), and machine learning (ML) models. Depending on the statistical description of the patients' length of stay, we define the short-term as 48-hour and 4-day period, the mid-term as 7-day and 10-day period, and the long-term as 15-day and 30-day period after admission. We found that by only using clinical notes within the 24 hours of admission, our framework can achieve a high area under the receiver operating characteristics (AU-ROC) score for short, mid and long-term mortality prediction tasks. The test AU-ROC scores are 0.87, 0.83, 0.83, 0.82, 0.82, and 0.82 for 48-hour, 4-day, 7-day, 10-day, 15-day, and 30-day period mortality prediction, respectively. We also provide a comparative study among three types of feature extraction techniques from NLP: frequency-based technique, fixed embedding-based technique, and dynamic embedding-based technique. Lastly, we provide an interpretation of the NLP-based predictive models using feature-importance scores.

Subject(s)

Hospital Mortality , Machine Learning , Area Under Curve , Databases, Factual , Electronic Health Records , Humans , Intensive Care Units , Length of Stay , Logistic Models , ROC Curve

Dynamics of domain coverage of the protein sequence universe.

Rekapalli, Bhanu; Wuichet, Kristin; Peterson, Gregory D; Zhulin, Igor B.

BMC Genomics ; 13: 634, 2012 Nov 16.

Article in English | MEDLINE | ID: mdl-23157439

ABSTRACT

BACKGROUND: The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its "dark matter". RESULTS: Here we suggest that true size of "dark matter" is much larger than stated by current definitions. We propose an approach to reducing the size of "dark matter" by identifying and subtracting regions in protein sequences that are not likely to contain any domain. CONCLUSIONS: Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of "dark matter"; however, its absolute size increases substantially with the growth of sequence data.

Subject(s)

Computational Biology/methods , Databases, Protein , Proteins/chemistry , Humans , Protein Structure, Tertiary , Proteins/metabolism

The sorting direct method for stochastic simulation of biochemical systems with varying reaction execution behavior.

McCollum, James M; Peterson, Gregory D; Cox, Chris D; Simpson, Michael L; Samatova, Nagiza F.

Comput Biol Chem ; 30(1): 39-49, 2006 Feb.

Article in English | MEDLINE | ID: mdl-16321569

ABSTRACT

A key to advancing the understanding of molecular biology in the post-genomic age is the development of accurate predictive models for genetic regulation, protein interaction, metabolism, and other biochemical processes. To facilitate model development, simulation algorithms must provide an accurate representation of the system, while performing the simulation in a reasonable amount of time. Gillespie's stochastic simulation algorithm (SSA) accurately depicts spatially homogeneous models with small populations of chemical species and properly represents noise, but it is often abandoned when modeling larger systems because of its computational complexity. In this work, we examine the performance of different versions of the SSA when applied to several biochemical models. Through our analysis, we discover that transient changes in reaction execution frequencies, which are typical of biochemical models with gene induction and repression, can dramatically affect simulator performance. To account for these shifts, we propose a new algorithm called the sorting direct method that maintains a loosely sorted order of the reactions as the simulation executes. Our measurements show that the sorting direct method performs favorably when compared to other well-known exact stochastic simulation algorithms.

Subject(s)

Models, Chemical , Stochastic Processes , Systems Biology/methods , Algorithms , Aliivibrio fischeri/chemistry , Escherichia coli/chemistry

Analysis of noise in quorum sensing.

Cox, Chris D; Peterson, Gregory D; Allen, Michael S; Lancaster, Joseph M; McCollum, James M; Austin, Derek; Yan, Ling; Sayler, Gary S; Simpson, Michael L.

OMICS ; 7(3): 317-34, 2003.

Article in English | MEDLINE | ID: mdl-14583119

ABSTRACT

Noise may play a pivotal role in gene circuit functionality, as demonstrated for the genetic switch in the bacterial phage lambda. Like the lambda switch, bacterial quorum sensing (QS) systems operate within a population and contain a bistable switching element, making it likely that noise plays a functional role in QS circuit operation. Therefore, a detailed analysis of the noise behavior of QS systems is needed. We have developed a set of tools generally applicable to the analysis of gene circuits, with an emphasis on investigations in the frequency domain (FD), that we apply here to the QS system in the marine bacterium Vibrio fischeri. We demonstrate that a tight coupling between exact stochastic simulation and FD analysis provides insights into the structure/function relationships in the QS circuit. Furthermore, we argue that a noise analysis is incomplete without consideration of the power spectral densities (PSDs) of the important molecular output signals. As an example we consider reversible reactions in the QS circuit, and show through analysis and exact stochastic simulation that these circuits make significant and dynamic modifications to the noise spectra. In particular, we demonstrate a "whitening" effect, which occurs as the noise is processed through these reversible reactions.

Subject(s)

Gene Expression Regulation, Bacterial/genetics , Models, Genetic , Calibration , Computer Simulation , Electronics/instrumentation , Feedback , Kinetics , Operon , Stochastic Processes , Transcription, Genetic , Vibrio/genetics

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL