We are not ready yet: limitations of state-of-the-art disease named entity recognizers.
J Biomed Semantics
; 13(1): 26, 2022 10 27.
Article
in English
| MEDLINE | ID: covidwho-2089233
ABSTRACT
BACKGROUND:
Intense research has been done in the area of biomedical natural language processing. Since the breakthrough of transfer learning-based methods, BERT models are used in a variety of biomedical and clinical applications. For the available data sets, these models show excellent results - partly exceeding the inter-annotator agreements. However, biomedical named entity recognition applied on COVID-19 preprints shows a performance drop compared to the results on test data. The question arises how well trained models are able to predict on completely new data, i.e. to generalize.RESULTS:
Based on the example of disease named entity recognition, we investigate the robustness of different machine learning-based methods - thereof transfer learning - and show that current state-of-the-art methods work well for a given training and the corresponding test set but experience a significant lack of generalization when applying to new data.CONCLUSIONS:
We argue that there is a need for larger annotated data sets for training and testing. Therefore, we foresee the curation of further data sets and, moreover, the investigation of continual learning processes for machine learning-based models.Keywords
Full text:
Available
Collection:
International databases
Database:
MEDLINE
Main subject:
Data Mining
/
COVID-19
Type of study:
Prognostic study
/
Reviews
Limits:
Humans
Language:
English
Journal:
J Biomed Semantics
Year:
2022
Document Type:
Article
Affiliation country:
S13326-022-00280-6
Similar
MEDLINE
...
LILACS
LIS