Search | VHL Regional Portal

Word sense disambiguation of acronyms in clinical narratives.

Chopard, Daphné; Corcoran, Padraig; Spasic, Irena.

Front Digit Health ; 6: 1282043, 2024.

Article in English | MEDLINE | ID: mdl-38482049

ABSTRACT

Clinical narratives commonly use acronyms without explicitly defining their long forms. This makes it difficult to automatically interpret their sense as acronyms tend to be highly ambiguous. Supervised learning approaches to their disambiguation in the clinical domain are hindered by issues associated with patient privacy and manual annotation, which limit the size and diversity of training data. In this study, we demonstrate how scientific abstracts can be utilised to overcome these issues by creating a large automatically annotated dataset of artificially simulated global acronyms. A neural network trained on such a dataset achieved the F1-score of 95% on disambiguation of acronym mentions in scientific abstracts. This network was integrated with multi-word term recognition to extract a sense inventory of acronyms from a corpus of clinical narratives on the fly. Acronym sense extraction achieved the F1-score of 74% on a corpus of radiology reports. In clinical practice, the suggested approach can be used to facilitate development of institution-specific inventories.

Simulation and annotation of global acronyms.

Filimonov, Maxim; Chopard, Daphné; Spasic, Irena.

Bioinformatics ; 38(11): 3136-3138, 2022 May 26.

Article in English | MEDLINE | ID: mdl-35482480

ABSTRACT

MOTIVATION: Global acronyms are used in written text without their formal definitions. This makes it difficult to automatically interpret their sense as acronyms tend to be ambiguous. Supervised machine learning approaches to sense disambiguation require large training datasets. In clinical applications, large datasets are difficult to obtain due to patient privacy. Manual data annotation creates an additional bottleneck. RESULTS: We proposed an approach to automatically modifying scientific abstracts to (i) simulate global acronym usage and (ii) annotate their senses without the need for external sources or manual intervention. We implemented it as a web-based application, which can create large datasets that in turn can be used to train supervised approaches to word sense disambiguation of biomedical acronyms. AVAILABILITY AND IMPLEMENTATION: The datasets will be generated on demand based on a user query and will be downloadable from https://datainnovation.cardiff.ac.uk/acronyms/.

Text Mining of Adverse Events in Clinical Trials: Deep Learning Approach.

Chopard, Daphne; Treder, Matthias S; Corcoran, Padraig; Ahmed, Nagheen; Johnson, Claire; Busse, Monica; Spasic, Irena.

JMIR Med Inform ; 9(12): e28632, 2021 Dec 24.

Article in English | MEDLINE | ID: mdl-34951601

ABSTRACT

BACKGROUND: Pharmacovigilance and safety reporting, which involve processes for monitoring the use of medicines in clinical trials, play a critical role in the identification of previously unrecognized adverse events or changes in the patterns of adverse events. OBJECTIVE: This study aims to demonstrate the feasibility of automating the coding of adverse events described in the narrative section of the serious adverse event report forms to enable statistical analysis of the aforementioned patterns. METHODS: We used the Uniï¬ed Medical Language System (UMLS) as the coding scheme, which integrates 217 source vocabularies, thus enabling coding against other relevant terminologies such as the International Classification of Diseases-10th Revision, Medical Dictionary for Regulatory Activities, and Systematized Nomenclature of Medicine). We used MetaMap, a highly configurable dictionary lookup software, to identify the mentions of the UMLS concepts. We trained a binary classifier using Bidirectional Encoder Representations from Transformers (BERT), a transformer-based language model that captures contextual relationships, to differentiate between mentions of the UMLS concepts that represented adverse events and those that did not. RESULTS: The model achieved a high F1 score of 0.8080, despite the class imbalance. This is 10.15 percent points lower than human-like performance but also 17.45 percent points higher than that of the baseline approach. CONCLUSIONS: These results confirmed that automated coding of adverse events described in the narrative section of serious adverse event reports is feasible. Once coded, adverse events can be statistically analyzed so that any correlations with the trialed medicines can be estimated in a timely fashion.

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL