Protected Health Information Recognition by Fine-Tuning a Pre-training Transformer Model / 대한의료정보학회지
Healthcare Informatics Research
; : 16-24, 2022.
Article
en En
| WPRIM
| ID: wpr-914496
Biblioteca responsable:
WPRO
ABSTRACT
Objectives@#De-identifying protected health information (PHI) in medical documents is important, and a prerequisite to deidentification is the identification of PHI entity names in clinical documents. This study aimed to compare the performance of three pre-training models that have recently attracted significant attention and to determine which model is more suitable for PHI recognition. @*Methods@#We compared the PHI recognition performance of deep learning models using the i2b2 2014 dataset. We used the three pre-training models—namely, bidirectional encoder representations from transformers (BERT), robustly optimized BERT pre-training approach (RoBERTa), and XLNet (model built based on Transformer-XL)—to detect PHI. After the dataset was tokenized, it was processed using an inside-outside-beginning tagging scheme and WordPiecetokenized to place it into these models. Further, the PHI recognition performance was investigated using BERT, RoBERTa, and XLNet. @*Results@#Comparing the PHI recognition performance of the three models, it was confirmed that XLNet had a superior F1-score of 96.29%. In addition, when checking PHI entity performance evaluation, RoBERTa and XLNet showed a 30% improvement in performance compared to BERT. @*Conclusions@#Among the pre-training models used in this study, XLNet exhibited superior performance because word embedding was well constructed using the two-stream self-attention method. In addition, compared to BERT, RoBERTa and XLNet showed superior performance, indicating that they were more effective in grasping the context.
Texto completo:
1
Índice:
WPRIM
Tipo de estudio:
Prognostic_studies
Idioma:
En
Revista:
Healthcare Informatics Research
Año:
2022
Tipo del documento:
Article