Your browser doesn't support javascript.
loading
Protected Health Information Recognition by Fine-Tuning a Pre-training Transformer Model / 대한의료정보학회지
Article en En | WPRIM | ID: wpr-914496
Biblioteca responsable: WPRO
ABSTRACT
Objectives@#De-identifying protected health information (PHI) in medical documents is important, and a prerequisite to deidentification is the identification of PHI entity names in clinical documents. This study aimed to compare the performance of three pre-training models that have recently attracted significant attention and to determine which model is more suitable for PHI recognition. @*Methods@#We compared the PHI recognition performance of deep learning models using the i2b2 2014 dataset. We used the three pre-training models—namely, bidirectional encoder representations from transformers (BERT), robustly optimized BERT pre-training approach (RoBERTa), and XLNet (model built based on Transformer-XL)—to detect PHI. After the dataset was tokenized, it was processed using an inside-outside-beginning tagging scheme and WordPiecetokenized to place it into these models. Further, the PHI recognition performance was investigated using BERT, RoBERTa, and XLNet. @*Results@#Comparing the PHI recognition performance of the three models, it was confirmed that XLNet had a superior F1-score of 96.29%. In addition, when checking PHI entity performance evaluation, RoBERTa and XLNet showed a 30% improvement in performance compared to BERT. @*Conclusions@#Among the pre-training models used in this study, XLNet exhibited superior performance because word embedding was well constructed using the two-stream self-attention method. In addition, compared to BERT, RoBERTa and XLNet showed superior performance, indicating that they were more effective in grasping the context.
Texto completo: 1 Índice: WPRIM Tipo de estudio: Prognostic_studies Idioma: En Revista: Healthcare Informatics Research Año: 2022 Tipo del documento: Article
Texto completo: 1 Índice: WPRIM Tipo de estudio: Prognostic_studies Idioma: En Revista: Healthcare Informatics Research Año: 2022 Tipo del documento: Article