Your browser doesn't support javascript.
Some examples of privacy-preserving sharing of COVID-19 pandemic data with statistical utility evaluation.
Liu, Fang; Wang, Dong; Yan, Tian.
  • Liu F; Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, 46556, IN, USA. fang.Liu.131@nd.edu.
  • Wang D; College of Cyberspace Security, Hangzhou Dianzi University, Wuhan, 430079, China.
  • Yan T; Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, 46556, IN, USA.
BMC Med Res Methodol ; 23(1): 120, 2023 05 19.
Article in English | MEDLINE | ID: covidwho-2324512
ABSTRACT

BACKGROUND:

A considerable amount of various types of data have been collected during the COVID-19 pandemic, the analysis and understanding of which have been indispensable for curbing the spread of the disease. As the pandemic moves to an endemic state, the data collected during the pandemic will continue to be rich sources for further studying and understanding the impacts of the pandemic on various aspects of our society. On the other hand, naïve release and sharing of the information can be associated with serious privacy concerns.

METHODS:

We use three common but distinct data types collected during the pandemic (case surveillance tabular data, case location data, and contact tracing networks) to illustrate the publication and sharing of granular information and individual-level pandemic data in a privacy-preserving manner. We leverage and build upon the concept of differential privacy to generate and release privacy-preserving data for each data type. We investigate the inferential utility of privacy-preserving information through simulation studies at different levels of privacy guarantees and demonstrate the approaches in real-life data. All the approaches employed in the study are straightforward to apply.

RESULTS:

The empirical studies in all three data cases suggest that privacy-preserving results based on the differentially privately sanitized data can be similar to the original results at a reasonably small privacy loss ([Formula see text]). Statistical inferences based on sanitized data using the multiple synthesis technique also appear valid, with nominal coverage of 95% confidence intervals when there is no noticeable bias in point estimation. When [Formula see text] and the sample size is not large enough, some privacy-preserving results are subject to bias, partially due to the bounding applied to sanitized data as a post-processing step to satisfy practical data constraints.

CONCLUSIONS:

Our study generates statistical evidence on the practical feasibility of sharing pandemic data with privacy guarantees and on how to balance the statistical utility of released information during this process.
Subject(s)
Keywords

Full text: Available Collection: International databases Database: MEDLINE Main subject: COVID-19 Type of study: Experimental Studies / Observational study Limits: Humans Language: English Journal: BMC Med Res Methodol Journal subject: Medicine Year: 2023 Document Type: Article Affiliation country: S12874-023-01927-3

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: COVID-19 Type of study: Experimental Studies / Observational study Limits: Humans Language: English Journal: BMC Med Res Methodol Journal subject: Medicine Year: 2023 Document Type: Article Affiliation country: S12874-023-01927-3