Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24
Filter
1.
Proc IEEE Symp Secur Priv ; 2023: 1908-1925, 2023 May.
Article in English | MEDLINE | ID: mdl-38665901

ABSTRACT

Principal component analysis (PCA) is an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confidentiality. Our solution, SF-PCA, is an end-to-end secure system that preserves the confidentiality of both the original data and all intermediate results in a passive-adversary model with up to all-but-one colluding parties. SF-PCA jointly leverages multiparty homomorphic encryption, interactive protocols, and edge computing to efficiently interleave computations on local cleartext data with operations on collectively encrypted data. SF-PCA obtains results as accurate as non-secure centralized solutions, independently of the data distribution among the parties. It scales linearly or better with the dataset dimensions and with the number of data providers. SF-PCA is more precise than existing approaches that approximate the solution by combining local analysis results, and between 3x and 250x faster than privacy-preserving alternatives based solely on secure multiparty computation or homomorphic encryption. Our work demonstrates the practical applicability of secure and federated PCA on private distributed datasets.

2.
Proc Priv Enhanc Technol ; 2022(3): 732-753, 2022.
Article in English | MEDLINE | ID: mdl-36212774

ABSTRACT

Providing provenance in scientific workflows is essential for reproducibility and auditability purposes. In this work, we propose a framework that verifies the correctness of the aggregate statistics obtained as a result of a genome-wide association study (GWAS) conducted by a researcher while protecting individuals' privacy in the researcher's dataset. In GWAS, the goal of the researcher is to identify highly associated point mutations (variants) with a given phenotype. The researcher publishes the workflow of the conducted study, its output, and associated metadata. They keep the research dataset private while providing, as part of the metadata, a partial noisy dataset (that achieves local differential privacy). To check the correctness of the workflow output, a verifier makes use of the workflow, its metadata, and results of another GWAS (conducted using publicly available datasets) to distinguish between correct statistics and incorrect ones. For evaluation, we use real genomic data and show that the correctness of the workflow output can be verified with high accuracy even when the aggregate statistics of a small number of variants are provided. We also quantify the privacy leakage due to the provided workflow and its associated metadata and show that the additional privacy risk due to the provided metadata does not increase the existing privacy risk due to sharing of the research results. Thus, our results show that the workflow output (i.e., research results) can be verified with high confidence in a privacy-preserving way. We believe that this work will be a valuable step towards providing provenance in a privacy-preserving way while providing guarantees to the users about the correctness of the results.

3.
Patterns (N Y) ; 3(5): 100487, 2022 May 13.
Article in English | MEDLINE | ID: mdl-35607628

ABSTRACT

Training accurate and robust machine learning models requires a large amount of data that is usually scattered across data silos. Sharing or centralizing the data of different healthcare institutions is, however, unfeasible or prohibitively difficult due to privacy regulations. In this work, we address this problem by using a privacy-preserving federated learning-based approach, PriCell, for complex models such as convolutional neural networks. PriCell relies on multiparty homomorphic encryption and enables the collaborative training of encrypted neural networks with multiple healthcare institutions. We preserve the confidentiality of each institutions' input data, of any intermediate values, and of the trained model parameters. We efficiently replicate the training of a published state-of-the-art convolutional neural network architecture in a decentralized and privacy-preserving manner. Our solution achieves an accuracy comparable with the one obtained with the centralized non-secure solution. PriCell guarantees patient privacy and ensures data utility for efficient multi-center studies involving complex healthcare data.

5.
Nat Commun ; 12(1): 5910, 2021 10 11.
Article in English | MEDLINE | ID: mdl-34635645

ABSTRACT

Using real-world evidence in biomedical research, an indispensable complement to clinical trials, requires access to large quantities of patient data that are typically held separately by multiple healthcare institutions. We propose FAMHE, a novel federated analytics system that, based on multiparty homomorphic encryption (MHE), enables privacy-preserving analyses of distributed datasets by yielding highly accurate results without revealing any intermediate data. We demonstrate the applicability of FAMHE to essential biomedical analysis tasks, including Kaplan-Meier survival analysis in oncology and genome-wide association studies in medical genetics. Using our system, we accurately and efficiently reproduce two published centralized studies in a federated setting, enabling biomedical insights that are not possible from individual institutions alone. Our work represents a necessary key step towards overcoming the privacy hurdle in enabling multi-centric scientific collaborations.


Subject(s)
Precision Medicine , Privacy , Algorithms , Computer Security , Delivery of Health Care , Genome-Wide Association Study , Humans , Kaplan-Meier Estimate , Survival Analysis
6.
Cell Syst ; 12(11): 1108-1120.e4, 2021 11 17.
Article in English | MEDLINE | ID: mdl-34464590

ABSTRACT

Genotype imputation is a fundamental step in genomic data analysis, where missing variant genotypes are predicted using the existing genotypes of nearby "tag" variants. Although researchers can outsource genotype imputation, privacy concerns may prohibit genetic data sharing with an untrusted imputation service. Here, we developed secure genotype imputation using efficient homomorphic encryption (HE) techniques. In HE-based methods, the genotype data are secure while it is in transit, at rest, and in analysis. It can only be decrypted by the owner. We compared secure imputation with three state-of-the-art non-secure methods and found that HE-based methods provide genetic data security with comparable accuracy for common variants. HE-based methods have time and memory requirements that are comparable or lower than those for the non-secure methods. Our results provide evidence that HE-based methods can practically perform resource-intensive computations for high-throughput genetic data analysis. The source code is freely available for download at https://github.com/K-miran/secure-imputation.


Subject(s)
Outsourced Services , Computer Security , Genome-Wide Association Study , Genotype , Privacy
7.
J Med Internet Res ; 23(2): e25120, 2021 02 25.
Article in English | MEDLINE | ID: mdl-33629963

ABSTRACT

Multisite medical data sharing is critical in modern clinical practice and medical research. The challenge is to conduct data sharing that preserves individual privacy and data utility. The shortcomings of traditional privacy-enhancing technologies mean that institutions rely upon bespoke data sharing contracts. The lengthy process and administration induced by these contracts increases the inefficiency of data sharing and may disincentivize important clinical treatment and medical research. This paper provides a synthesis between 2 novel advanced privacy-enhancing technologies-homomorphic encryption and secure multiparty computation (defined together as multiparty homomorphic encryption). These privacy-enhancing technologies provide a mathematical guarantee of privacy, with multiparty homomorphic encryption providing a performance advantage over separately using homomorphic encryption or secure multiparty computation. We argue multiparty homomorphic encryption fulfills legal requirements for medical data sharing under the European Union's General Data Protection Regulation which has set a global benchmark for data protection. Specifically, the data processed and shared using multiparty homomorphic encryption can be considered anonymized data. We explain how multiparty homomorphic encryption can reduce the reliance upon customized contractual measures between institutions. The proposed approach can accelerate the pace of medical research while offering additional incentives for health care and research institutes to employ common data interoperability standards.


Subject(s)
Computer Security/ethics , Information Dissemination/ethics , Privacy/legislation & jurisprudence , Technology/methods , Humans
8.
Nat Comput Sci ; 1(3): 192-198, 2021 Mar.
Article in English | MEDLINE | ID: mdl-38183193

ABSTRACT

The growing number of health-data breaches, the use of genomic databases for law enforcement purposes and the lack of transparency of personal genomics companies are raising unprecedented privacy concerns. To enable a secure exploration of genomic datasets with controlled and transparent data access, we propose a citizen-centric approach that combines cryptographic privacy-preserving technologies, such as homomorphic encryption and secure multi-party computation, with the auditability of blockchains. Our open-source implementation supports queries on the encrypted genomic data of hundreds of thousands of individuals, with minimal overhead. We show that real-world adoption of our system alleviates widespread privacy concerns and encourages data access sharing with researchers.

9.
J Law Biosci ; 7(1): lsaa010, 2020.
Article in English | MEDLINE | ID: mdl-32733683

ABSTRACT

Personalised medicine can improve both public and individual health by providing targeted preventative and therapeutic healthcare. However, patient health data must be shared between institutions and across jurisdictions for the benefits of personalised medicine to be realised. Whilst data protection, privacy, and research ethics laws protect patient confidentiality and safety they also may impede multisite research, particularly across jurisdictions. Accordingly, we compare the concept of data accessibility in data protection and research ethics laws across seven jurisdictions. These jurisdictions include Switzerland, Italy, Spain, the United Kingdom (which have implemented the General Data Protection Regulation), the United States, Canada, and Australia. Our paper identifies the requirements for consent, the standards for anonymisation or pseudonymisation, and adequacy of protection between jurisdictions as barriers for sharing. We also identify differences between the European Union and other jurisdictions as a significant barrier for data accessibility in cross jurisdictional multisite research. Our paper concludes by considering solutions to overcome these legislative differences. These solutions include data transfer agreements and organisational collaborations designed to `front load' the process of ethics approval, so that subsequent research protocols are standardised. We also allude to technical solutions, such as distributed computing, secure multiparty computation and homomorphic encryption.

10.
Stud Health Technol Inform ; 270: 238-241, 2020 Jun 16.
Article in English | MEDLINE | ID: mdl-32570382

ABSTRACT

One major obstacle to developing precision medicine to its full potential is the privacy concerns related to genomic-data sharing. Even though the academic community has proposed many solutions to protect genomic privacy, these so far have not been adopted in practice, mainly due to their impact on the data utility. We introduce GenoShare, a framework that enables individual citizens to understand and quantify the risks of revealing genome-related privacy-sensitive attributes (e.g., health status, kinship, physical traits) from sharing their genomic data with (potentially untrusted) third parties. GenoShare enables informed decision-making about sharing exact genomic data, by jointly simulating genome-based inference attacks and quantifying the risk stemming from a potential data disclosure.


Subject(s)
Databases, Genetic/ethics , Genetic Privacy , Genomics/ethics , Information Dissemination/ethics , Informed Consent , Confidentiality , Disclosure , Genome , Humans , Medical Record Linkage
11.
Stud Health Technol Inform ; 270: 317-321, 2020 Jun 16.
Article in English | MEDLINE | ID: mdl-32570398

ABSTRACT

Medical studies are usually time consuming, cumbersome and extremely costly to perform, and for exploratory research, their results are also difficult to predict a priori. This is particularly the case for rare diseases, for which finding enough patients is difficult and usually requires an international-scale research. In this case, the process can be even more difficult due to the heterogeneity of data-protection regulations, making the data sharing process particularly hard. In this short paper, we propose MedCo2 (pronounced MedCo square), a distributed system that streamlines the process of a medical study by bridging and enabling both data discovery and data analysis among multiple databases, while protecting data confidentiality and patients' privacy. MedCo2 relies on interactive protocols, homomorphic encryption and differential privacy. It enables the privacy-preserving computations of multiple statistics such as cosine similarity and variance, and the training of machine learning models, on patients that are obliviously selected according to specific criteria among multiple databases.


Subject(s)
Privacy , Cohort Studies , Computer Security , Confidentiality , Humans , Machine Learning
12.
Stud Health Technol Inform ; 270: 1161-1162, 2020 Jun 16.
Article in English | MEDLINE | ID: mdl-32570563

ABSTRACT

MedCo is the first operational system that makes sensitive medical-data available for research in a simple, privacy-conscious and secure way. It enables a consortium of clinical sites to collectively protect their data and to securely share them with investigators, without single points of failure. In this short paper, we report on our ongoing effort for the operational deployment of MedCo within the context of the Swiss Personalized Health Network (SPHN) for the Swiss Molecular Tumor Board.


Subject(s)
Neoplasms , Privacy , Computer Security , Confidentiality , Electronic Health Records , Humans , Power, Psychological , Switzerland
13.
IEEE/ACM Trans Comput Biol Bioinform ; 16(4): 1328-1341, 2019.
Article in English | MEDLINE | ID: mdl-30010584

ABSTRACT

The increasing number of health-data breaches is creating a complicated environment for medical-data sharing and, consequently, for medical progress. Therefore, the development of new solutions that can reassure clinical sites by enabling privacy-preserving sharing of sensitive medical data in compliance with stringent regulations (e.g., HIPAA, GDPR) is now more urgent than ever. In this work, we introduce MedCo, the first operational system that enables a group of clinical sites to federate and collectively protect their data in order to share them with external investigators without worrying about security and privacy concerns. MedCo uses (a) collective homomorphic encryption to provide trust decentralization and end-to-end confidentiality protection, and (b) obfuscation techniques to achieve formal notions of privacy, such as differential privacy. A critical feature of MedCo is that it is fully integrated within the i2b2 (Informatics for Integrating Biology and the Bedside) framework, currently used in more than 300 hospitals worldwide. Therefore, it is easily adoptable by clinical sites. We demonstrate MedCo's practicality by testing it on data from The Cancer Genome Atlas in a simulated network of three institutions. Its performance is comparable to the ones of SHRINE (networked i2b2), which, in contrast, does not provide any data protection guarantee.


Subject(s)
Computer Security , Electronic Health Records , Genomics , Medical Informatics/methods , Algorithms , Confidentiality , Genome, Human , Hospitals , Humans , Internet , Mutation , Neoplasms/genetics , Proto-Oncogene Proteins B-raf/genetics , Software
14.
IEEE/ACM Trans Comput Biol Bioinform ; 15(5): 1413-1426, 2018.
Article in English | MEDLINE | ID: mdl-30004884

ABSTRACT

Re-use of patients' health records can provide tremendous benefits for clinical research. Yet, when researchers need to access sensitive/identifying data, such as genomic data, in order to compile cohorts of well-characterized patients for specific studies, privacy and security concerns represent major obstacles that make such a procedure extremely difficult if not impossible. In this paper, we address the challenge of designing and deploying in a real operational setting an efficient privacy-preserving explorer for genetic cohorts. Our solution is built on top of the i2b2 (Informatics for Integrating Biology and the Bedside) framework and leverages cutting-edge privacy-enhancing technologies such as homomorphic encryption and differential privacy. Solutions involving homomorphic encryption are often believed to be costly and immature for use in operational environments. Here, we show that, for specific applications, homomorphic encryption is actually a very efficient enabler. Indeed, our solution outperforms prior work by enabling a researcher to securely compute simple statistics on more than 3,000 encrypted genetic variants simultaneously for a cohort of 5,000 individuals in less than 5 seconds with commodity hardware. To the best of our knowledge, our privacy-preserving solution is the first to also be successfully deployed and tested in a operation setting (Lausanne University Hospital).


Subject(s)
Computer Security/standards , Electronic Health Records , Genetic Privacy/standards , Genomics , Medical Informatics Computing , Humans
15.
AMIA Jt Summits Transl Sci Proc ; 2017: 176-185, 2018.
Article in English | MEDLINE | ID: mdl-29888067

ABSTRACT

The biomedical community is lagging in the adoption of cloud computing for the management of medical data. The primary obstacles are concerns about privacy and security. In this paper, we explore the feasibility of using advanced privacy-enhancing technologies in order to enable the sharing of sensitive clinical data in a public cloud. Our goal is to facilitate sharing of clinical data in the cloud by minimizing the risk of unintended leakage of sensitive clinical information. In particular, we focus on homomorphic encryption, a specific type of encryption that offers the ability to run computation on the data while the data remains encrypted. This paper demonstrates that homomorphic encryption can be used efficiently to compute aggregating queries on the ciphertexts, along with providing end-to-end confidentiality of aggregate-level data from the i2b2 data model.

16.
J Biomed Inform ; 79: 1-6, 2018 03.
Article in English | MEDLINE | ID: mdl-29331453

ABSTRACT

PURPOSE: Protecting patient privacy is a major obstacle for the implementation of genomic-based medicine. Emerging privacy-enhancing technologies can become key enablers for managing sensitive genetic data. We studied physicians' attitude toward this kind of technology in order to derive insights that might foster their future adoption for clinical care. METHODS: We conducted a questionnaire-based survey among 55 physicians of the Swiss HIV Cohort Study who tested the first implementation of a privacy-preserving model for delivering genomic test results. We evaluated their feedback on three different aspects of our model: clinical utility, ability to address privacy concerns and system usability. RESULTS: 38/55 (69%) physicians participated in the study. Two thirds of them acknowledged genetic privacy as a key aspect that needs to be protected to help building patient trust and deploy new-generation medical information systems. All of them successfully used the tool for evaluating their patients' pharmacogenomics risk and 90% were happy with the user experience and the efficiency of the tool. Only 8% of physicians were unsatisfied with the level of information and wanted to have access to the patient's actual DNA sequence. CONCLUSION: This survey, although limited in size, represents the first evaluation of privacy-preserving models for genomic-based medicine. It has allowed us to derive unique insights that will improve the design of these new systems in the future. In particular, we have observed that a clinical information system that uses homomorphic encryption to provide clinicians with risk information based on sensitive genetic test results can offer information that clinicians feel sufficient for their needs and appropriately respectful of patients' privacy. The ability of this kind of systems to ensure strong security and privacy guarantees and to provide some analytics on encrypted data has been assessed as a key enabler for the management of sensitive medical information in the near future. Providing clinically relevant information to physicians while protecting patients' privacy in order to comply with regulations is crucial for the widespread use of these new technologies.


Subject(s)
Computer Security , Confidentiality , HIV Infections/genetics , HIV Infections/therapy , Adult , Aged , Cohort Studies , Electronic Health Records , Female , Genetic Variation , Genomics , Genotype , Humans , Male , Medical Informatics , Middle Aged , Physicians , Software , Surveys and Questionnaires , Switzerland , User-Computer Interface
17.
BMC Med Genomics ; 10(Suppl 2): 46, 2017 07 26.
Article in English | MEDLINE | ID: mdl-28786363

ABSTRACT

BACKGROUND: Cloud computing is becoming the preferred solution for efficiently dealing with the increasing amount of genomic data. Yet, outsourcing storage and processing sensitive information, such as genomic data, comes with important concerns related to privacy and security. This calls for new sophisticated techniques that ensure data protection from untrusted cloud providers and that still enable researchers to obtain useful information. METHODS: We present a novel privacy-preserving algorithm for fully outsourcing the storage of large genomic data files to a public cloud and enabling researchers to efficiently search for variants of interest. In order to protect data and query confidentiality from possible leakage, our solution exploits optimal encoding for genomic variants and combines it with homomorphic encryption and private information retrieval. Our proposed algorithm is implemented in C++ and was evaluated on real data as part of the 2016 iDash Genome Privacy-Protection Challenge. RESULTS: Results show that our solution outperforms the state-of-the-art solutions and enables researchers to search over millions of encrypted variants in a few seconds. CONCLUSIONS: As opposed to prior beliefs that sophisticated privacy-enhancing technologies (PETs) are unpractical for real operational settings, our solution demonstrates that, in the case of genomic data, PETs are very efficient enablers.


Subject(s)
Computer Security , Genomics , Information Storage and Retrieval/methods , Outsourced Services/methods , Cloud Computing , Models, Theoretical
18.
Bioinformatics ; 33(15): 2273-2280, 2017 Aug 01.
Article in English | MEDLINE | ID: mdl-28379351

ABSTRACT

MOTIVATION: Due to the limited power of small-scale genome-wide association studies (GWAS), researchers tend to collaborate and establish a larger consortium in order to perform large-scale GWAS. Genome-wide association meta-analysis (GWAMA) is a statistical tool that aims to synthesize results from multiple independent studies to increase the statistical power and reduce false-positive findings of GWAS. However, it has been demonstrated that the aggregate data of individual studies are subject to inference attacks, hence privacy concerns arise when researchers share study data in GWAMA. RESULTS: In this article, we propose a secure quality control (SQC) protocol, which enables checking the quality of data in a privacy-preserving way without revealing sensitive information to a potential adversary. SQC employs state-of-the-art cryptographic and statistical techniques for privacy protection. We implement the solution in a meta-analysis pipeline with real data to demonstrate the efficiency and scalability on commodity machines. The distributed execution of SQC on a cluster of 128 cores for one million genetic variants takes less than one hour, which is a modest cost considering the 10-month time span usually observed for the completion of the QC procedure that includes timing of logistics. AVAILABILITY AND IMPLEMENTATION: SQC is implemented in Java and is publicly available at https://github.com/acs6610987/secureqc. CONTACT: jean-pierre.hubaux@epfl.ch. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Confidentiality , Genome-Wide Association Study/methods , Meta-Analysis as Topic , Quality Control , Genome-Wide Association Study/standards , Humans
19.
J Am Med Inform Assoc ; 24(4): 799-805, 2017 Jul 01.
Article in English | MEDLINE | ID: mdl-28339683

ABSTRACT

The Global Alliance for Genomics and Health (GA4GH) created the Beacon Project as a means of testing the willingness of data holders to share genetic data in the simplest technical context-a query for the presence of a specified nucleotide at a given position within a chromosome. Each participating site (or "beacon") is responsible for assuring that genomic data are exposed through the Beacon service only with the permission of the individual to whom the data pertains and in accordance with the GA4GH policy and standards.While recognizing the inference risks associated with large-scale data aggregation, and the fact that some beacons contain sensitive phenotypic associations that increase privacy risk, the GA4GH adjudged the risk of re-identification based on the binary yes/no allele-presence query responses as acceptable. However, recent work demonstrated that, given a beacon with specific characteristics (including relatively small sample size and an adversary who possesses an individual's whole genome sequence), the individual's membership in a beacon can be inferred through repeated queries for variants present in the individual's genome.In this paper, we propose three practical strategies for reducing re-identification risks in beacons. The first two strategies manipulate the beacon such that the presence of rare alleles is obscured; the third strategy budgets the number of accesses per user for each individual genome. Using a beacon containing data from the 1000 Genomes Project, we demonstrate that the proposed strategies can effectively reduce re-identification risk in beacon-like datasets.


Subject(s)
Data Anonymization , Genetic Privacy , Information Dissemination , Genomics , Humans
20.
Genome Res ; 26(12): 1687-1696, 2016 12.
Article in English | MEDLINE | ID: mdl-27789525

ABSTRACT

In clinical genomics, the continuous evolution of bioinformatic algorithms and sequencing platforms makes it beneficial to store patients' complete aligned genomic data in addition to variant calls relative to a reference sequence. Due to the large size of human genome sequence data files (varying from 30 GB to 200 GB depending on coverage), two major challenges facing genomics laboratories are the costs of storage and the efficiency of the initial data processing. In addition, privacy of genomic data is becoming an increasingly serious concern, yet no standard data storage solutions exist that enable compression, encryption, and selective retrieval. Here we present a privacy-preserving solution named SECRAM (Selective retrieval on Encrypted and Compressed Reference-oriented Alignment Map) for the secure storage of compressed aligned genomic data. Our solution enables selective retrieval of encrypted data and improves the efficiency of downstream analysis (e.g., variant calling). Compared with BAM, the de facto standard for storing aligned genomic data, SECRAM uses 18% less storage. Compared with CRAM, one of the most compressed nonencrypted formats (using 34% less storage than BAM), SECRAM maintains efficient compression and downstream data processing, while allowing for unprecedented levels of security in genomic data storage. Compared with previous work, the distinguishing features of SECRAM are that (1) it is position-based instead of read-based, and (2) it allows random querying of a subregion from a BAM-like file in an encrypted form. Our method thus offers a space-saving, privacy-preserving, and effective solution for the storage of clinical genomic data.


Subject(s)
Data Compression/methods , Genomics/methods , Information Storage and Retrieval/methods , Algorithms , Computational Biology/methods , Computer Security/standards , Genetic Privacy , Genome, Human , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...