Search | VHL Regional Portal

Preserving medical correctness, readability and consistency in de-identified health records.

Pantazos, Kostas; Lauesen, Soren; Lippert, Soren.

Health Informatics J ; 23(4): 291-303, 2017 12.

Article in English | MEDLINE | ID: mdl-27199298

ABSTRACT

A health record database contains structured data fields that identify the patient, such as patient ID, patient name, e-mail and phone number. These data are fairly easy to de-identify, that is, replace with other identifiers. However, these data also occur in fields with doctors' free-text notes written in an abbreviated style that cannot be analyzed grammatically. If we replace a word that looks like a name, but isn't, we degrade readability and medical correctness. If we fail to replace it when we should, we degrade confidentiality. We de-identified an existing Danish electronic health record database, ending up with 323,122 patient health records. We had to invent many methods for de-identifying potential identifiers in the free-text notes. The de-identified health records should be used with caution for statistical purposes because we removed health records that were so special that they couldn't be de-identified. Furthermore, we distorted geography by replacing zip codes with random zip codes.

Subject(s)

Comprehension , Data Accuracy , Electronic Health Records/standards , Confidentiality , Humans , Netherlands

De-identifying an EHR database - anonymity, correctness and readability of the medical record.

Pantazos, Kostas; Lauesen, Soren; Lippert, Soren.

Stud Health Technol Inform ; 169: 862-6, 2011.

Article in English | MEDLINE | ID: mdl-21893869

ABSTRACT

Electronic health records (EHR) contain a large amount of structured data and free text. Exploring and sharing clinical data can improve healthcare and facilitate the development of medical software. However, revealing confidential information is against ethical principles and laws. We de-identified a Danish EHR database with 437,164 patients. The goal was to generate a version with real medical records, but related to artificial persons. We developed a de-identification algorithm that uses lists of named entities, simple language analysis, and special rules. Our algorithm consists of 3 steps: collect lists of identifiers from the database and external resources, define a replacement for each identifier, and replace identifiers in structured data and free text. Some patient records could not be safely de-identified, so the de-identified database has 323,122 patient records with an acceptable degree of anonymity, readability and correctness (F-measure of 95%). The algorithm has to be adjusted for each culture, language and database.

Subject(s)

Medical Record Linkage/standards , Primary Health Care/organization & administration , Algorithms , Computer Security , Confidentiality , Denmark , Electronic Health Records , Humans , Medical Record Linkage/methods , Patient Identification Systems , Pattern Recognition, Automated , Privacy , Reproducibility of Results , Security Measures , Software

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL