A De-identification Method for Bilingual Clinical Texts of Various Note Types

Soo-Yong SHIN; Yu-Rang PARK; Yongdon SHIN; Hyo-Joung CHOI; Jihyun PARK; Yongman LYU; Moo-Song LEE; Chang-Min CHOI; Woo-Sung KIM; Jae-Ho LEE

Soo-Yong SHIN; Yu-Rang PARK; Yongdon SHIN; Hyo-Joung CHOI; Jihyun PARK; Yongman LYU; Moo-Song LEE; Chang-Min CHOI; Woo-Sung KIM; Jae-Ho LEE.

Journal of Korean Medical Science ; : 7-15, 2015.

Artigo em Inglês | WPRIM | ID: wpr-166138

ABSTRACT

ABSTRACT

De-identification of personal health information is essential in order not to require written patient informed consent. Previous de-identification methods were proposed using natural language processing technology in order to remove the identifiers in clinical narrative text, although these methods only focused on narrative text written in English. In this study, we propose a regular expression-based de-identification method used to address bilingual clinical records written in Korean and English. To develop and validate regular expression rules, we obtained training and validation datasets composed of 6,039 clinical notes of 20 types and 5,000 notes of 33 types, respectively. Fifteen regular expression rules were constructed using the development dataset and those rules achieved 99.87% precision and 96.25% recall for the validation dataset. Our de-identification method successfully removed the identifiers in diverse types of bilingual clinical narrative texts. This method will thus assist physicians to more easily perform retrospective research.

Assuntos

Humanos; Algoritmos; Anonimização de Dados; Registros Eletrônicos de Saúde; Registros de Saúde Pessoal; Multilinguismo; Processamento de Linguagem Natural; Projetos de Pesquisa

De-identification; Anonymization; Clinical Text; Bilingual Text; Patient Privacy; Medical Informatics; Text Mining

Texto completo

Imprimir

XML

Buscar no Google

Texto completo: DisponíveL Índice: WPRIM (Pacífico Ocidental) Assunto principal: Projetos de Pesquisa / Algoritmos / Processamento de Linguagem Natural / Multilinguismo / Registros de Saúde Pessoal / Registros Eletrônicos de Saúde / Anonimização de Dados Tipo de estudo: Estudo diagnóstico / Estudo prognóstico Limite: Humanos Idioma: Inglês Revista: Journal of Korean Medical Science Ano de publicação: 2015 Tipo de documento: Artigo

Similares

MEDLINE

LILACS

LIS

Texto completo

Imprimir

XML

Buscar no Google