Your browser doesn't support javascript.
Leveraging Free-Text Information to Detect Duplicates in COVID-19 Vaccine Adverse Event Reports: An International Journal of Medical Toxicology and Drug Experience
Drug Safety ; 45(10):1203, 2022.
Article in English | ProQuest Central | ID: covidwho-2046903
ABSTRACT

Introduction:

Uppsala Monitoring Centre (UMC) manage VigiBase;the largest global database of reports of suspected adverse events (side effects) to medicines, on behalf of the World Health organisation (WHO). Following the emergency rollout of the vaccines against COVID-19, combined with a global focus on monitoring their safety, UMC saw a sharp increase in the volume of reports of suspected side effects of the vaccines. UMC sometimes receives multiple reports corresponding to the same suspected adverse event. This can have undesirable effects when it comes to both statistical signal detection and manual review of cases. Duplicate detection of vaccines has historically been especially challenging, due to homogeneity of patients. However, the extreme quantity of COVID-19 vaccine reports has highlighted the necessity for automated duplicate detection to be performant for them. Detecting duplicate reports is a non-trivial problem. Since reports do not always contain the same level of detail, and data errors can lead to different values in corresponding fields for duplicate reports, reports cannot simply be compared field by field. Several methods have been proposed for detecting duplicates based on information provided in structured form (sex, age, date of onset etc) (1,2). In our study we additionally incorporate free text information into a duplicate detection model.

Objective:

To leverage the free text information in suspected adverse event reports to identify duplicate reports which are referring to the same adverse event.

Methods:

Our method ensembles state-of-the-art machine learning methods.Narratives are placed in a spacewhere a smaller distance between two narratives conveys higher semantic similarity. This is done with vector embeddings using the SapBERT model, fine-tuned on a set of known duplicate reports (3). Two reports are then compared using the cosine similarity between the vector embeddings for the two narratives. This similarity is combined with representations of the structured information used in othermethods in a gradient boosted decision tree model, calibrated by a logistic regression model to fine tune the probability output (4). These methods are evaluated on a set of curated datasets of COVID- 19 vaccine reports comprising 1239 pairs of known duplicates. We use random pairs of COVID-19 vaccine reports as examples of nonduplicates.

Results:

Our model successfully identifies 78.9% of known duplicate pairs. It achieved a false positive rate (the number of non-duplicates erroneously marked as duplicates) of 0.001%. The full results can be seen in table 1.

Conclusion:

Not Applicable.
Keywords
Search on Google
Collection: Databases of international organizations Database: ProQuest Central Topics: Vaccines Language: English Journal: Drug Safety Year: 2022 Document Type: Article

Similar

MEDLINE

...
LILACS

LIS

Search on Google
Collection: Databases of international organizations Database: ProQuest Central Topics: Vaccines Language: English Journal: Drug Safety Year: 2022 Document Type: Article