Your browser doesn't support javascript.
We are not ready yet: limitations of state-of-the-art disease named entity recognizers.
Kühnel, Lisa; Fluck, Juliane.
  • Kühnel L; ZB MED - Information Centre for Life Sciences, Gleueler Str. 60, Cologne, Germany. kuehnel@zbmed.de.
  • Fluck J; Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Faculty of Technology, Bielefeld University, Postfach 10 01 31, 33501, Bielefeld, Germany. kuehnel@zbmed.de.
J Biomed Semantics ; 13(1): 26, 2022 10 27.
Article in English | MEDLINE | ID: covidwho-2089233
ABSTRACT

BACKGROUND:

Intense research has been done in the area of biomedical natural language processing. Since the breakthrough of transfer learning-based methods, BERT models are used in a variety of biomedical and clinical applications. For the available data sets, these models show excellent results - partly exceeding the inter-annotator agreements. However, biomedical named entity recognition applied on COVID-19 preprints shows a performance drop compared to the results on test data. The question arises how well trained models are able to predict on completely new data, i.e. to generalize.

RESULTS:

Based on the example of disease named entity recognition, we investigate the robustness of different machine learning-based methods - thereof transfer learning - and show that current state-of-the-art methods work well for a given training and the corresponding test set but experience a significant lack of generalization when applying to new data.

CONCLUSIONS:

We argue that there is a need for larger annotated data sets for training and testing. Therefore, we foresee the curation of further data sets and, moreover, the investigation of continual learning processes for machine learning-based models.
Subject(s)
Keywords

Full text: Available Collection: International databases Database: MEDLINE Main subject: Data Mining / COVID-19 Type of study: Prognostic study / Reviews Limits: Humans Language: English Journal: J Biomed Semantics Year: 2022 Document Type: Article Affiliation country: S13326-022-00280-6

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: Data Mining / COVID-19 Type of study: Prognostic study / Reviews Limits: Humans Language: English Journal: J Biomed Semantics Year: 2022 Document Type: Article Affiliation country: S13326-022-00280-6