Search | VHL Regional Portal

An annotated dataset for event-based surveillance of antimicrobial resistance.

Arinik, Nejat; Van Bortel, Wim; Boudoua, Bahdja; Busani, Luca; Decoupes, Rémy; Interdonato, Roberto; Kafando, Rodrique; van Kleef, Esther; Roche, Mathieu; Alam Syed, Mehtab; Teisseire, Maguelonne.

Data Brief ; 46: 108870, 2023 Feb.

Article in English | MEDLINE | ID: mdl-36687146

ABSTRACT

This paper presents an annotated dataset used in the MOOD Antimicrobial Resistance (AMR) hackathon, hosted in Montpellier, June 2022. The collected data concerns unstructured data from news items, scientific publications and national or international reports, collected from four event-based surveillance (EBS) Systems, i.e. ProMED, PADI-web, HealthMap and MedISys. Data was annotated by relevance for epidemic intelligence (EI) purposes with the help of AMR experts and an annotation guideline. Extracted data were intended to include relevant events on the emergence and spread of AMR such as reports on AMR trends, discovery of new drug-bug resistances, or new AMR genes in human, animal or environmental reservoirs. This dataset can be used to train or evaluate classification approaches to automatically identify written text on AMR events across the different reservoirs and sectors of One Health (i.e. human, animal, food, environmental sources, such as soil and waste water) in unstructured data (e.g. news, tweets) and classify these events by relevance for EI purposes.

ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis.

Kafando, Rodrique; Decoupes, Rémy; Valentin, Sarah; Sautot, Lucile; Teisseire, Maguelonne; Roche, Mathieu.

Health Inf Sci Syst ; 9(1): 29, 2021 Dec.

Article in English | MEDLINE | ID: mdl-34276970

ABSTRACT

Here, we introduce ITEXT-BIO, an intelligent process for biomedical domain terminology extraction from textual documents and subsequent analysis. The proposed methodology consists of two complementary approaches, including free and driven term extraction. The first is based on term extraction with statistical measures, while the second considers morphosyntactic variation rules to extract term variants from the corpus. The combination of two term extraction and analysis strategies is the keystone of ITEXT-BIO. These include combined intra-corpus strategies that enable term extraction and analysis either from a single corpus (intra), or from corpora (inter). We assessed the two approaches, the corpus or corpora to be analysed and the type of statistical measures used. Our experimental findings revealed that the proposed methodology could be used: (1) to efficiently extract representative, discriminant and new terms from a given corpus or corpora, and (2) to provide quantitative and qualitative analyses on these terms regarding the study domain.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL