Your browser doesn't support javascript.
loading
Reliable generation of privacy-preserving synthetic electronic health record time series via diffusion models.
Tian, Muhang; Chen, Bernie; Guo, Allan; Jiang, Shiyi; Zhang, Anru R.
Afiliação
  • Tian M; Department of Computer Science, Duke University, Durham, NC 27708, United States.
  • Chen B; Department of Electrical & Computer Engineering, Duke University, Durham, NC 27708, United States.
  • Guo A; Department of Computer Science, Duke University, Durham, NC 27708, United States.
  • Jiang S; Department of Electrical & Computer Engineering, Duke University, Durham, NC 27708, United States.
  • Zhang AR; Department of Computer Science, Duke University, Durham, NC 27708, United States.
Article em En | MEDLINE | ID: mdl-39222376
ABSTRACT

OBJECTIVE:

Electronic health records (EHRs) are rich sources of patient-level data, offering valuable resources for medical data analysis. However, privacy concerns often restrict access to EHRs, hindering downstream analysis. Current EHR deidentification methods are flawed and can lead to potential privacy leakage. Additionally, existing publicly available EHR databases are limited, preventing the advancement of medical research using EHR. This study aims to overcome these challenges by generating realistic and privacy-preserving synthetic EHRs time series efficiently. MATERIALS AND

METHODS:

We introduce a new method for generating diverse and realistic synthetic EHR time series data using denoizing diffusion probabilistic models. We conducted experiments on 6 databases Medical Information Mart for Intensive Care III and IV, the eICU Collaborative Research Database (eICU), and non-EHR datasets on Stocks and Energy. We compared our proposed method with 8 existing methods.

RESULTS:

Our results demonstrate that our approach significantly outperforms all existing methods in terms of data fidelity while requiring less training effort. Additionally, data generated by our method yield a lower discriminative accuracy compared to other baseline methods, indicating the proposed method can generate data with less privacy risk.

DISCUSSION:

The proposed model utilizes a mixed diffusion process to generate realistic synthetic EHR samples that protect patient privacy. This method could be useful in tackling data availability issues in the field of healthcare by reducing barrier to EHR access and supporting research in machine learning for health.

CONCLUSION:

The proposed diffusion model-based method can reliably and efficiently generate synthetic EHR time series, which facilitates the downstream medical data analysis. Our numerical results show the superiority of the proposed method over all other existing methods.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: J Am Med Inform Assoc Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos País de publicação: Reino Unido

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: J Am Med Inform Assoc Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos País de publicação: Reino Unido