Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
Nat Comput Sci ; 1(3): 175-176, 2021 Mar.
Article in English | MEDLINE | ID: mdl-38183194
2.
Comput Struct Biotechnol J ; 18: 913-921, 2020.
Article in English | MEDLINE | ID: mdl-32346464

ABSTRACT

While the majority of population-level genome sequencing initiatives claim to follow the principles of informed consent, the requirements for informed consent have not been-well defined in this context. In fact, the implementation of informed consent differs greatly across these initiatives - spanning broad consent, blanket consent, and tiered consent among others. As such, this calls for an investigation into the requirements for consent to be "informed" in the context of population genomics. One particular strategy that claims to be fully informed and to continuously engage participants is called "dynamic consent". Dynamic consent is based on a personalised communication platform that aims to facilitate the consent process. It is oriented to support continuous two-way communication between researchers and participants. In this paper, we analyze the requirements of informed consent in the context of population genomics, review various current implementations of dynamic consent, assess whether they fulfill the requirement of informed consent, and, in turn, enable participants to make autonomous and informed choices on whether or not to participate in research projects.

3.
Comput Struct Biotechnol J ; 17: 463-474, 2019.
Article in English | MEDLINE | ID: mdl-31007872

ABSTRACT

Informed consent is the result of tumultuous events in both the clinical and research arenas over the last 100 years. Throughout this time, the notion of informed consent has shifted tremendously, both due to advances in medicine, as well as the type of data being gathered. As such, informed consent has misaligned with the goals of medical research. It is becoming more and more vital to address this chasm, and begin building new frameworks to link this disconnect. Thus, we address three goals in this paper. First, we discuss the history of informed consent and unify the varying definitions of the term. Second, we evaluate the current research on the topic, classify them into themes, and attend to the problems therein. Lastly, we employ these themes of informed consent research mentioned previously to provide guidance and insight for future research in the arena.

4.
JMIR Med Inform ; 7(2): e12702, 2019 Apr 29.
Article in English | MEDLINE | ID: mdl-31033449

ABSTRACT

BACKGROUND: Biomedical research often requires large cohorts and necessitates the sharing of biomedical data with researchers around the world, which raises many privacy, ethical, and legal concerns. In the face of these concerns, privacy experts are trying to explore approaches to analyzing the distributed data while protecting its privacy. Many of these approaches are based on secure multiparty computations (SMCs). SMC is an attractive approach allowing multiple parties to collectively carry out calculations on their datasets without having to reveal their own raw data; however, it incurs heavy computation time and requires extensive communication between the involved parties. OBJECTIVE: This study aimed to develop usable and efficient SMC applications that meet the needs of the potential end-users and to raise general awareness about SMC as a tool that supports data sharing. METHODS: We have introduced distributed statistical computing (DSC) into the design of secure multiparty protocols, which allows us to conduct computations on each of the parties' sites independently and then combine these computations to form 1 estimator for the collective dataset, thus limiting communication to the final step and reducing complexity. The effectiveness of our privacy-preserving model is demonstrated through a linear regression application. RESULTS: Our secure linear regression algorithm was tested for accuracy and performance using real and synthetic datasets. The results showed no loss of accuracy (over nonsecure regression) and very good performance (20 min for 100 million records). CONCLUSIONS: We used DSC to securely calculate a linear regression model over multiple datasets. Our experiments showed very good performance (in terms of the number of records it can handle). We plan to extend our method to other estimators such as logistic regression.

5.
Hum Genomics ; 12(1): 19, 2018 04 10.
Article in English | MEDLINE | ID: mdl-29636096

ABSTRACT

Contemporary biomedical databases include a wide range of information types from various observational and instrumental sources. Among the most important features that unite biomedical databases across the field are high volume of information and high potential to cause damage through data corruption, loss of performance, and loss of patient privacy. Thus, issues of data governance and privacy protection are essential for the construction of data depositories for biomedical research and healthcare. In this paper, we discuss various challenges of data governance in the context of population genome projects. The various challenges along with best practices and current research efforts are discussed through the steps of data collection, storage, sharing, analysis, and knowledge dissemination.


Subject(s)
Biomedical Research/trends , Databases, Genetic , Genomics , Humans
6.
J Biomed Inform ; 66: 231-240, 2017 02.
Article in English | MEDLINE | ID: mdl-28126604

ABSTRACT

The problem of biomedical data sharing is a form of gambling; on one hand it incurs the risk of privacy violations and on the other it stands to profit from knowledge discovery. In general, the risk of granting data access to a user depends heavily upon the data requested, the purpose for the access, the user requesting the data (user motives) and the security of the user's environment. While traditional manual biomedical data sharing processes (based on institutional review boards) are lengthy and demanding, the automated ones (known as honest broker systems) disregard the individualities of different requests and offer "one-size-fits-all" solutions to all data requestors. In this manuscript, we propose a conceptual risk-aware data sharing system; the system brings the concept of risk, from all contextual information surrounding a data request, into the data disclosure decision module. The decision module, in turn, imposes mitigation measures to counter the calculated risk.


Subject(s)
Computer Security , Information Dissemination , Ethics Committees, Research , Humans , Privacy , Risk
7.
BMC Med Inform Decis Mak ; 13: 114, 2013 Oct 05.
Article in English | MEDLINE | ID: mdl-24094134

ABSTRACT

BACKGROUND: Our objective was to develop a model for measuring re-identification risk that more closely mimics the behaviour of an adversary by accounting for repeated attempts at matching and verification of matches, and apply it to evaluate the risk of re-identification for Canada's post-marketing adverse drug event database (ADE).Re-identification is only demonstrably plausible for deaths in ADE. A matching experiment between ADE records and virtual obituaries constructed from Statistics Canada vital statistics was simulated. A new re-identification risk is considered, it assumes that after gathering all the potential matches for a patient record (all records in the obituaries that are potential matches for an ADE record), an adversary tries to verify these potential matches. Two adversary scenarios were considered: (a) a mildly motivated adversary who will stop after one verification attempt, and (b) a highly motivated adversary who will attempt to verify all the potential matches and is only limited by practical or financial considerations. METHODS: The mean percentage of records in ADE that had a high probability of being re-identified was computed. RESULTS: Under scenario (a), the risk of re-identification from disclosing the province, age at death, gender, and exact date of the report is quite high, but the removal of province brings down the risk significantly. By only generalizing the date of reporting to month and year and including all other variables, the risk is always low. All ADE records have a high risk of re-identification under scenario (b), but the plausibility of that scenario is limited because of the financial and practical deterrent even for highly motivated adversaries. CONCLUSIONS: It is possible to disclose Canada's adverse drug event database while ensuring that plausible re-identification risks are acceptably low. Our new re-identification risk model is suitable for such risk assessments.


Subject(s)
Adverse Drug Reaction Reporting Systems/standards , Confidentiality , Canada , Humans , Risk Assessment
8.
BMC Med Inform Decis Mak ; 12: 66, 2012 Jul 09.
Article in English | MEDLINE | ID: mdl-22776564

ABSTRACT

BACKGROUND: De-identification is a common way to protect patient privacy when disclosing clinical data for secondary purposes, such as research. One type of attack that de-identification protects against is linking the disclosed patient data with public and semi-public registries. Uniqueness is a commonly used measure of re-identification risk under this attack. If uniqueness can be measured accurately then the risk from this kind of attack can be managed. In practice, it is often not possible to measure uniqueness directly, therefore it must be estimated. METHODS: We evaluated the accuracy of uniqueness estimators on clinically relevant data sets. Four candidate estimators were identified because they were evaluated in the past and found to have good accuracy or because they were new and not evaluated comparatively before: the Zayatz estimator, slide negative binomial estimator, Pitman's estimator, and mu-argus. A Monte Carlo simulation was performed to evaluate the uniqueness estimators on six clinically relevant data sets. We varied the sampling fraction and the uniqueness in the population (the value being estimated). The median relative error and inter-quartile range of the uniqueness estimates was measured across 1000 runs. RESULTS: There was no single estimator that performed well across all of the conditions. We developed a decision rule which selected between the Pitman, slide negative binomial and Zayatz estimators depending on the sampling fraction and the difference between estimates. This decision rule had the best consistent median relative error across multiple conditions and data sets. CONCLUSION: This study identified an accurate decision rule that can be used by health privacy researchers and disclosure control professionals to estimate uniqueness in clinical data sets. The decision rule provides a reliable way to measure re-identification risk.


Subject(s)
Confidentiality/legislation & jurisprudence , Information Storage and Retrieval/legislation & jurisprudence , Medical Records Systems, Computerized/legislation & jurisprudence , Databases, Factual , Humans , Information Management/organization & administration , Medical Record Linkage , Medical Records Systems, Computerized/organization & administration
9.
PLoS One ; 7(7): e39915, 2012.
Article in English | MEDLINE | ID: mdl-22768321

ABSTRACT

INTRODUCTION: In order to monitor the effectiveness of HPV vaccination in Canada the linkage of multiple data registries may be required. These registries may not always be managed by the same organization and, furthermore, privacy legislation or practices may restrict any data linkages of records that can actually be done among registries. The objective of this study was to develop a secure protocol for linking data from different registries and to allow on-going monitoring of HPV vaccine effectiveness. METHODS: A secure linking protocol, using commutative hash functions and secure multi-party computation techniques was developed. This protocol allows for the exact matching of records among registries and the computation of statistics on the linked data while meeting five practical requirements to ensure patient confidentiality and privacy. The statistics considered were: odds ratio and its confidence interval, chi-square test, and relative risk and its confidence interval. Additional statistics on contingency tables, such as other measures of association, can be added using the same principles presented. The computation time performance of this protocol was evaluated. RESULTS: The protocol has acceptable computation time and scales linearly with the size of the data set and the size of the contingency table. The worse case computation time for up to 100,000 patients returned by each query and a 16 cell contingency table is less than 4 hours for basic statistics, and the best case is under 3 hours. DISCUSSION: A computationally practical protocol for the secure linking of data from multiple registries has been demonstrated in the context of HPV vaccine initiative impact assessment. The basic protocol can be generalized to the surveillance of other conditions, diseases, or vaccination programs.


Subject(s)
Algorithms , Papillomaviridae , Papillomavirus Infections/epidemiology , Population Surveillance/methods , Registries , Female , Humans , Male , Papillomavirus Infections/prevention & control , Papillomavirus Vaccines/therapeutic use
10.
BMC Med Inform Decis Mak ; 11: 53, 2011 Aug 23.
Article in English | MEDLINE | ID: mdl-21861894

ABSTRACT

BACKGROUND: The Canadian Institute for Health Information (CIHI) collects hospital discharge abstract data (DAD) from Canadian provinces and territories. There are many demands for the disclosure of this data for research and analysis to inform policy making. To expedite the disclosure of data for some of these purposes, the construction of a DAD public use microdata file (PUMF) was considered. Such purposes include: confirming some published results, providing broader feedback to CIHI to improve data quality, training students and fellows, providing an easily accessible data set for researchers to prepare for analyses on the full DAD data set, and serve as a large health data set for computer scientists and statisticians to evaluate analysis and data mining techniques. The objective of this study was to measure the probability of re-identification for records in a PUMF, and to de-identify a national DAD PUMF consisting of 10% of records. METHODS: Plausible attacks on a PUMF were evaluated. Based on these attacks, the 2008-2009 national DAD was de-identified. A new algorithm was developed to minimize the amount of suppression while maximizing the precision of the data. The acceptable threshold for the probability of correct re-identification of a record was set at between 0.04 and 0.05. Information loss was measured in terms of the extent of suppression and entropy. RESULTS: Two different PUMF files were produced, one with geographic information, and one with no geographic information but more clinical information. At a threshold of 0.05, the maximum proportion of records with the diagnosis code suppressed was 20%, but these suppressions represented only 8-9% of all values in the DAD. Our suppression algorithm has less information loss than a more traditional approach to suppression. Smaller regions, patients with longer stays, and age groups that are infrequently admitted to hospitals tend to be the ones with the highest rates of suppression. CONCLUSIONS: The strategies we used to maximize data utility and minimize information loss can result in a PUMF that would be useful for the specific purposes noted earlier. However, to create a more detailed file with less information loss suitable for more complex health services research, the risk would need to be mitigated by requiring the data recipient to commit to a data sharing agreement.


Subject(s)
Algorithms , Databases, Factual , Patient Discharge/statistics & numerical data , Canada , Humans , Information Storage and Retrieval , Length of Stay
11.
J Am Med Inform Assoc ; 16(5): 670-82, 2009.
Article in English | MEDLINE | ID: mdl-19567795

ABSTRACT

BACKGROUND: Explicit patient consent requirements in privacy laws can have a negative impact on health research, leading to selection bias and reduced recruitment. Often legislative requirements to obtain consent are waived if the information collected or disclosed is de-identified. OBJECTIVE: The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets. DESIGN: Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for different values of k and suppression limits. Measurement Three information loss metrics were used for the comparison: precision, discernability metric, and non-uniform entropy. Each algorithm's performance speed was also evaluated. RESULTS: The Datafly and Samarati algorithms had higher information loss than OLA and Incognito; OLA was consistently faster than Incognito in finding the globally optimal de-identification solution. CONCLUSIONS: For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.


Subject(s)
Algorithms , Confidentiality , Medical Records Systems, Computerized , Adolescent , Adult , Female , Humans , Information Storage and Retrieval , Male
12.
Can J Hosp Pharm ; 62(4): 307-19, 2009 Jul.
Article in English | MEDLINE | ID: mdl-22478909

ABSTRACT

BACKGROUND: Pharmacies often provide prescription records to private research firms, on the assumption that these records are de-identified (i.e., identifying information has been removed). However, concerns have been expressed about the potential that patients can be re-identified from such records. Recently, a large private research firm requested prescription records from the Children's Hospital of Eastern Ontario (CHEO), as part of a larger effort to develop a database of hospital prescription records across Canada. OBJECTIVE: To evaluate the ability to re-identify patients from CHEO'S prescription records and to determine ways to appropriately de-identify the data if the risk was too high. METHODS: The risk of re-identification was assessed for 18 months' worth of prescription data. De-identification algorithms were developed to reduce the risk to an acceptable level while maintaining the quality of the data. RESULTS: The probability of patients being re-identified from the original variables and data set requested by the private research firm was deemed quite high. A new de-identified record layout was developed, which had an acceptable level of re-identification risk. The new approach involved replacing the admission and discharge dates with the quarter and year of admission and the length of stay in days, reporting the patient's age in weeks, and including only the first character of the patient's postal code. Additional requirements were included in the data-sharing agreement with the private research firm (e.g., audit requirements and a protocol for notification of a breach of privacy). CONCLUSIONS: Without a formal analysis of the risk of re-identification, assurances of data anonymity may not be accurate. A formal risk analysis at one hospital produced a clinically relevant data set that also protects patient privacy and allows the hospital pharmacy to explicitly manage the risks of breach of patient privacy.

13.
J Am Med Inform Assoc ; 15(5): 627-37, 2008.
Article in English | MEDLINE | ID: mdl-18579830

ABSTRACT

OBJECTIVE: There is increasing pressure to share health information and even make it publicly available. However, such disclosures of personal health information raise serious privacy concerns. To alleviate such concerns, it is possible to anonymize the data before disclosure. One popular anonymization approach is k-anonymity. There have been no evaluations of the actual re-identification probability of k-anonymized data sets. DESIGN: Through a simulation, we evaluated the re-identification risk of k-anonymization and three different improvements on three large data sets. MEASUREMENT: Re-identification probability is measured under two different re-identification scenarios. Information loss is measured by the commonly used discernability metric. RESULTS: For one of the re-identification scenarios, k-Anonymity consistently over-anonymizes data sets, with this over-anonymization being most pronounced with small sampling fractions. Over-anonymization results in excessive distortions to the data (i.e., high information loss), making the data less useful for subsequent analysis. We found that a hypothesis testing approach provided the best control over re-identification risk and reduces the extent of information loss compared to baseline k-anonymity. CONCLUSION: Guidelines are provided on when to use the hypothesis testing approach instead of baseline k-anonymity.


Subject(s)
Computer Security , Confidentiality , Information Storage and Retrieval , Medical Records Systems, Computerized , Risk Management/methods , Algorithms , Computer Simulation , Humans , Ontario , Risk
SELECTION OF CITATIONS
SEARCH DETAIL
...