Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
J Law Med Ethics ; 44(1): 156-60, 2016 03.
Article in English | MEDLINE | ID: mdl-27256131

ABSTRACT

Along with technical issues, biobanking frequently raises important privacy and security issues that must be resolved as biobanks continue to grow in scale and scope. Consent mechanisms currently in use range from fine-grained to very broad, and in some cases participants are offered very few privacy protections. However, developments in information technology are bringing improvements. New programs and systems are being developed to allow researchers to conduct analyses without distributing the data itself offsite, either by allowing the investigator to communicate with a central computer, or by having each site participate in meta-analysis that results in a shared statistic or final significance result. The implementation of security protocols into the research biobanking setting requires three key elements: authentication, authorization, and auditing. Authentication is the process of making sure individuals are who they claim to be, frequently through the use of a password, a key fob, or a physical (i.e., retinal or fingerprint) scan. Authorization involves ensuring that every individual who attempts an action has permission to do that action. Finally, auditing allows for actions to be logged so that inappropriate or unethical actions can later be traced back to their source.


Subject(s)
Biological Specimen Banks , Computer Security , Privacy , Humans , Information Technology/trends , Meta-Analysis as Topic
2.
J Am Med Inform Assoc ; 23(e1): e131-7, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26567325

ABSTRACT

BACKGROUND AND OBJECTIVE: There is an increasing desire to share de-identified electronic health records (EHRs) for secondary uses, but there are concerns that clinical terms can be exploited to compromise patient identities. Anonymization algorithms mitigate such threats while enabling novel discoveries, but their evaluation has been limited to single institutions. Here, we study how an existing clinical profile anonymization fares at multiple medical centers. METHODS: We apply a state-of-the-artk-anonymization algorithm, withkset to the standard value 5, to the International Classification of Disease, ninth edition codes for patients in a hypothyroidism association study at three medical centers: Marshfield Clinic, Northwestern University, and Vanderbilt University. We assess utility when anonymizing at three population levels: all patients in 1) the EHR system; 2) the biorepository; and 3) a hypothyroidism study. We evaluate utility using 1) changes to the number included in the dataset, 2) number of codes included, and 3) regions generalization and suppression were required. RESULTS: Our findings yield several notable results. First, we show that anonymizing in the context of the entire EHR yields a significantly greater quantity of data by reducing the amount of generalized regions from ∼15% to ∼0.5%. Second, ∼70% of codes that needed generalization only generalized two or three codes in the largest anonymization. CONCLUSIONS: Sharing large volumes of clinical data in support of phenome-wide association studies is possible while safeguarding privacy to the underlying individuals.


Subject(s)
Data Anonymization , Electronic Health Records , Information Dissemination , Confidentiality , Humans , Hypothyroidism , International Classification of Diseases , Organizational Case Studies
3.
J Am Med Inform Assoc ; 22(5): 1029-41, 2015 Sep.
Article in English | MEDLINE | ID: mdl-25911674

ABSTRACT

OBJECTIVE: The Health Insurance Portability and Accountability Act Privacy Rule enables healthcare organizations to share de-identified data via two routes. They can either 1) show re-identification risk is small (e.g., via a formal model, such as k-anonymity) with respect to an anticipated recipient or 2) apply a rule-based policy (i.e., Safe Harbor) that enumerates attributes to be altered (e.g., dates to years). The latter is often invoked because it is interpretable, but it fails to tailor protections to the capabilities of the recipient. The paper shows rule-based policies can be mapped to a utility (U) and re-identification risk (R) space, which can be searched for a collection, or frontier, of policies that systematically trade off between these goals. METHODS: We extend an algorithm to efficiently compose an R-U frontier using a lattice of policy options. Risk is proportional to the number of patients to which a record corresponds, while utility is proportional to similarity of the original and de-identified distribution. We allow our method to search 20 000 rule-based policies (out of 2(700)) and compare the resulting frontier with k-anonymous solutions and Safe Harbor using the demographics of 10 U.S. states. RESULTS: The results demonstrate the rule-based frontier 1) consists, on average, of 5000 policies, 2% of which enable better utility with less risk than Safe Harbor and 2) the policies cover a broader spectrum of utility and risk than k-anonymity frontiers. CONCLUSIONS: R-U frontiers of de-identification policies can be discovered efficiently, allowing healthcare organizations to tailor protections to anticipated needs and trustworthiness of recipients.


Subject(s)
Algorithms , Computer Security , Confidentiality , Datasets as Topic , Demography , Health Insurance Portability and Accountability Act , Humans , United States
4.
PLoS One ; 10(3): e0120592, 2015.
Article in English | MEDLINE | ID: mdl-25807380

ABSTRACT

Given the potential wealth of insights in personal data the big databases can provide, many organizations aim to share data while protecting privacy by sharing de-identified data, but are concerned because various demonstrations show such data can be re-identified. Yet these investigations focus on how attacks can be perpetrated, not the likelihood they will be realized. This paper introduces a game theoretic framework that enables a publisher to balance re-identification risk with the value of sharing data, leveraging a natural assumption that a recipient only attempts re-identification if its potential gains outweigh the costs. We apply the framework to a real case study, where the value of the data to the publisher is the actual grant funding dollar amounts from a national sponsor and the re-identification gain of the recipient is the fine paid to a regulator for violation of federal privacy rules. There are three notable findings: 1) it is possible to achieve zero risk, in that the recipient never gains from re-identification, while sharing almost as much data as the optimal solution that allows for a small amount of risk; 2) the zero-risk solution enables sharing much more data than a commonly invoked de-identification policy of the U.S. Health Insurance Portability and Accountability Act (HIPAA); and 3) a sensitivity analysis demonstrates these findings are robust to order-of-magnitude changes in player losses and gains. In combination, these findings provide support that such a framework can enable pragmatic policy decisions about de-identified data sharing.


Subject(s)
Models, Theoretical , Databases, Factual , Health Insurance Portability and Accountability Act , Humans , Information Dissemination , Privacy , Risk , Search Engine , United States
5.
Bioinformatics ; 30(23): 3334-41, 2014 Dec 01.
Article in English | MEDLINE | ID: mdl-25147357

ABSTRACT

MOTIVATION: Sharing genomic data is crucial to support scientific investigation such as genome-wide association studies. However, recent investigations suggest the privacy of the individual participants in these studies can be compromised, leading to serious concerns and consequences, such as overly restricted access to data. RESULTS: We introduce a novel cryptographic strategy to securely perform meta-analysis for genetic association studies in large consortia. Our methodology is useful for supporting joint studies among disparate data sites, where privacy or confidentiality is of concern. We validate our method using three multisite association studies. Our research shows that genetic associations can be analyzed efficiently and accurately across substudy sites, without leaking information on individual participants and site-level association summaries. AVAILABILITY AND IMPLEMENTATION: Our software for secure meta-analysis of genetic association studies, SecureMA, is publicly available at http://github.com/XieConnect/SecureMA. Our customized secure computation framework is also publicly available at http://github.com/XieConnect/CircuitService.


Subject(s)
Genetic Association Studies/methods , Genetic Privacy , Meta-Analysis as Topic , Genome-Wide Association Study/methods , Genomics , Humans , Hypothyroidism/genetics , Obesity/genetics , Software
6.
J Biomed Inform ; 52: 243-50, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25038554

ABSTRACT

OBJECTIVE: Electronic medical records (EMRs) data is increasingly incorporated into genome-phenome association studies. Investigators hope to share data, but there are concerns it may be "re-identified" through the exploitation of various features, such as combinations of standardized clinical codes. Formal anonymization algorithms (e.g., k-anonymization) can prevent such violations, but prior studies suggest that the size of the population available for anonymization may influence the utility of the resulting data. We systematically investigate this issue using a large-scale biorepository and EMR system through which we evaluate the ability of researchers to learn from anonymized data for genome-phenome association studies under various conditions. METHODS: We use a k-anonymization strategy to simulate a data protection process (on data sets containing clinical codes) for resources of similar size to those found at nine academic medical institutions within the United States. Following the protection process, we replicate an existing genome-phenome association study and compare the discoveries using the protected data and the original data through the correlation (r(2)) of the p-values of association significance. RESULTS: Our investigation shows that anonymizing an entire dataset with respect to the population from which it is derived yields significantly more utility than small study-specific datasets anonymized unto themselves. When evaluated using the correlation of genome-phenome association strengths on anonymized data versus original data, all nine simulated sites, results from largest-scale anonymizations (population ∼100,000) retained better utility to those on smaller sizes (population ∼6000-75,000). We observed a general trend of increasing r(2) for larger data set sizes: r(2)=0.9481 for small-sized datasets, r(2)=0.9493 for moderately-sized datasets, r(2)=0.9934 for large-sized datasets. CONCLUSIONS: This research implies that regardless of the overall size of an institution's data, there may be significant benefits to anonymization of the entire EMR, even if the institution is planning on releasing only data about a specific cohort of patients.


Subject(s)
Biomedical Research/methods , Confidentiality , Databases, Genetic , Electronic Health Records , Genetic Association Studies/statistics & numerical data , Sample Size , Algorithms , Computer Simulation , Genotype , Humans , Phenotype , Polymorphism, Single Nucleotide
7.
Knowl Based Syst ; 67: 361-372, 2014 Sep.
Article in English | MEDLINE | ID: mdl-25598581

ABSTRACT

Organizations share data about individuals to drive business and comply with law and regulation. However, an adversary may expose confidential information by tracking an individual across disparate data publications using quasi-identifying attributes (e.g., age, geocode and sex) associated with the records. Various studies have shown that well-established privacy protection models (e.g., k-anonymity and its extensions) fail to protect an individual's privacy against this "composition attack". This type of attack can be thwarted when organizations coordinate prior to data publication, but such a practice is not always feasible. In this paper, we introduce a probabilistic model called (d, α)-linkable, which mitigates composition attack without coordination. The model ensures that d confidential values are associated with a quasi-identifying group with a likelihood of α. We realize this model through an efficient extension to k-anonymization and use extensive experiments to show our strategy significantly reduces the likelihood of a successful composition attack and can preserve more utility than alternative privacy models, such as differential privacy.

8.
9.
PLoS One ; 8(2): e53875, 2013.
Article in English | MEDLINE | ID: mdl-23405076

ABSTRACT

Health information technologies facilitate the collection of massive quantities of patient-level data. A growing body of research demonstrates that such information can support novel, large-scale biomedical investigations at a fraction of the cost of traditional prospective studies. While healthcare organizations are being encouraged to share these data in a de-identified form, there is hesitation over concerns that it will allow corresponding patients to be re-identified. Currently proposed technologies to anonymize clinical data may make unrealistic assumptions with respect to the capabilities of a recipient to ascertain a patients identity. We show that more pragmatic assumptions enable the design of anonymization algorithms that permit the dissemination of detailed clinical profiles with provable guarantees of protection. We demonstrate this strategy with a dataset of over one million medical records and show that 192 genotype-phenotype associations can be discovered with fidelity equivalent to non-anonymized clinical data.


Subject(s)
Genetic Association Studies/methods , Genetic Privacy/standards , Genome, Human , Genome-Wide Association Study/methods , Genomics/methods , Algorithms , Databases, Factual , Humans , Medical Records Systems, Computerized
10.
CODASPY ; 2013: 59-70, 2013.
Article in English | MEDLINE | ID: mdl-25520961

ABSTRACT

Modern information technologies enable organizations to capture large quantities of person-specific data while providing routine services. Many organizations hope, or are legally required, to share such data for secondary purposes (e.g., validation of research findings) in a de-identified manner. In previous work, it was shown de-identification policy alternatives could be modeled on a lattice, which could be searched for policies that met a prespecified risk threshold (e.g., likelihood of re-identification). However, the search was limited in several ways. First, its definition of utility was syntactic - based on the level of the lattice - and not semantic - based on the actual changes induced in the resulting data. Second, the threshold may not be known in advance. The goal of this work is to build the optimal set of policies that trade-off between privacy risk (R) and utility (U), which we refer to as a R-U frontier. To model this problem, we introduce a semantic definition of utility, based on information theory, that is compatible with the lattice representation of policies. To solve the problem, we initially build a set of policies that define a frontier. We then use a probability-guided heuristic to search the lattice for policies likely to update the frontier. To demonstrate the effectiveness of our approach, we perform an empirical analysis with the Adult dataset of the UCI Machine Learning Repository. We show that our approach can construct a frontier closer to optimal than competitive approaches by searching a smaller number of policies. In addition, we show that a frequently followed de-identification policy (i.e., the Safe Harbor standard of the HIPAA Privacy Rule) is suboptimal in comparison to the frontier discovered by our approach.

SELECTION OF CITATIONS
SEARCH DETAIL
...