Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Med Inform Decis Mak ; 22(1): 269, 2022 10 16.
Artigo em Inglês | MEDLINE | ID: mdl-36244993

RESUMO

OBJECTIVES: This paper developed federated solutions based on two approximation algorithms to achieve federated generalized linear mixed effect models (GLMM). The paper also proposed a solution for numerical errors and singularity issues. And showed the two proposed methods can perform well in revealing the significance of parameter in distributed datasets, comparing to a centralized GLMM algorithm from R package ('lme4') as the baseline model. METHODS: The log-likelihood function of GLMM is approximated by two numerical methods (Laplace approximation and Gaussian Hermite approximation, abbreviated as LA and GH), which supports federated decomposition of GLMM to bring computation to data. To solve the numerical errors and singularity issues, the loss-less estimation of log-sum-exponential trick and the adaptive regularization strategy was used to tackle the problems caused by federated settings. RESULTS: Our proposed method can handle GLMM to accommodate hierarchical data with multiple non-independent levels of observations in a federated setting. The experiment results demonstrate comparable (LA) and superior (GH) performances with simulated and real-world data. CONCLUSION: We modified and compared federated GLMMs with different approximations, which can support researchers in analyzing versatile biomedical data to accommodate mixed effects and address non-independence due to hierarchical structures (i.e., institutes, region, country, etc.).


Assuntos
Algoritmos , Projetos de Pesquisa , Simulação por Computador , Humanos , Funções Verossimilhança , Modelos Lineares
2.
Front Oncol ; 12: 879607, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35814415

RESUMO

Proper analysis of high-dimensional human genomic data is necessary to increase human knowledge about fundamental biological questions such as disease associations and drug sensitivity. However, such data contain sensitive private information about individuals and can be used to identify an individual (i.e., privacy violation) uniquely. Therefore, raw genomic datasets cannot be publicly published or shared with researchers. The recent success of deep learning (DL) in diverse problems proved its suitability for analyzing the high volume of high-dimensional genomic data. Still, DL-based models leak information about the training samples. To overcome this challenge, we can incorporate differential privacy mechanisms into the DL analysis framework as differential privacy can protect individuals' privacy. We proposed a differential privacy based DL framework to solve two biological problems: breast cancer status (BCS) and cancer type (CT) classification, and drug sensitivity prediction. To predict BCS and CT using genomic data, we built a differential private (DP) deep autoencoder (dpAE) using private gene expression datasets that performs low-dimensional data representation learning. We used dpAE features to build multiple DP binary classifiers to predict BCS and CT in any individual. To predict drug sensitivity, we used the Genomics of Drug Sensitivity in Cancer (GDSC) dataset. We extracted GDSC's dpAE features to build our DP drug sensitivity prediction model for 265 drugs. Evaluation of our proposed DP framework shows that it achieves improved prediction performance in predicting BCS, CT, and drug sensitivity than the previously published DP work.

3.
J Biomed Inform ; 132: 104113, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35690350

RESUMO

The success behind Machine Learning (ML) methods has largely been attributed to the quality and quantity of the available data which can spread across multiple owners. A Federated Learning (FL) from distributed datasets often provides a reliable solution that provides valuable insight. For a genomic dataset, such data have also proven to be sensitive which requires additional safety mechanisms before any sharing or ML operations. We propose a generalized gene expression data sharing method using a differentially private mechanism. Due to the large number of genes available, the data dimension is also reduced to accommodate smaller privacy budgets as we utilize an exponential mechanism to create a private histogram from numeric expression data. The output histogram can be used in any federated machine learning setting having multiple data owners. The proposed solution was submitted to genomic data security and privacy competition, iDash 2020 where it ranked third among 55 teams. We extend the proposed solution and experimented with two different machine learning algorithms and different settings. The experimental results show that it takes around 8 s to train a model while achieving 0.89 AUC with only a privacy budget of 5. The paper outlined a method to share gene expression data for Federated Learning using a privacy-preserving mechanism. Different experimental settings and recent competition results show the efficacy of the method which can be further extended to other genomic datasets and machine learning algorithms.


Assuntos
Aprendizado de Máquina , Privacidade , Algoritmos , Genômica , Disseminação de Informação
4.
BMC Genom Data ; 23(1): 45, 2022 06 17.
Artigo em Inglês | MEDLINE | ID: mdl-35715724

RESUMO

BACKGROUND: Several technological advancements and digitization of healthcare data have provided the scientific community with a large quantity of genomic data. Such datasets facilitated a deeper understanding of several diseases and our health in general. Strikingly, these genome datasets require a large storage volume and present technical challenges in retrieving meaningful information. Furthermore, the privacy aspects of genomic data limit access and often hinder timely scientific discovery. METHODS: In this paper, we utilize the Generalized Suffix Tree (GST); their construction and applications have been fairly studied in related areas. The main contribution of this article is the proposal of a privacy-preserving string query execution framework using GSTs and an additional tree-based hashing mechanism. Initially, we start by introducing an efficient GST construction in parallel that is scalable for a large genomic dataset. The secure indexing scheme allows the genomic data in a GST to be outsourced to an untrusted cloud server under encryption. Additionally, the proposed methods can perform several string search operations (i.e., exact, set-maximal matches) securely and efficiently using the outlined framework. RESULTS: The experimental results on different datasets and parameters in a real cloud environment exhibit the scalability of these methods as they also outperform the state-of-the-art method based on Burrows-Wheeler Transformation (BWT). The proposed method only takes around 36.7s to execute a set-maximal match whereas the BWT-based method takes around 160.85s, providing a 4× speedup.


Assuntos
Computação em Nuvem , Serviços Terceirizados , Segurança Computacional , Genômica , Privacidade
5.
J Biomed Inform ; 127: 104008, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35167978

RESUMO

Generalized Linear Mixed Model is one of the most pervasive class of statistical models. It is widely used in the medical domain. Training such models in a collaborative setting often entails privacy risks. Standard privacy preserving mechanisms such as differential privacy can be used to mitigate the privacy risk during training the model. However, experimental evidence suggests that adding differential privacy to the training of the model can cause significant utility loss which makes the model impractical for real world usage. Therefore, it becomes clear that the specific class of generalized linear mixed models which lose their usability under differential privacy requires a different approach for privacy preserving model training. In this work, we propose a value-blind training method in a collaborative setting for generalized linear mixed models. In our proposed training method, the central server optimizes model parameters for a generalized linear mixed model without ever getting access to the raw training data or intermediate computation values. Intermediate computation values that are shared by the collaborating parties with the central server are encrypted using homomorphic encryption. Experimentation on multiple datasets suggests that the model trained by our proposed method achieves very low error rate while preserving privacy. To the best of our knowledge, this is the first work that performs a systematic privacy analysis of generalized linear mixed model training in collaborative setting.


Assuntos
Práticas Interdisciplinares , Privacidade , Segurança Computacional , Modelos Lineares , Projetos de Pesquisa
6.
Sci Rep ; 10(1): 18600, 2020 10 29.
Artigo em Inglês | MEDLINE | ID: mdl-33122735

RESUMO

According to a recent study, around 99% of hospitals across the US now use electronic health record systems (EHRs). One of the most common types of EHR is the unstructured textual data, and unlocking hidden details from this data is critical for improving current medical practices and research endeavors. However, these textual data contain sensitive information, which could compromise our privacy. Therefore, medical textual data cannot be released publicly without undergoing any privacy-protective measures. De-identification is a process of detecting and removing all sensitive information present in EHRs, and it is a necessary step towards privacy-preserving EHR data sharing. Over the last decade, there have been several proposals to de-identify textual data using manual, rule-based, and machine learning methods. In this article, we propose new methods to de-identify textual data based on the self-attention mechanism and stacked Recurrent Neural Network. To the best of our knowledge, we are the first to employ these techniques. Experimental results on three different datasets show that our model performs better than all state-of-the-art mechanism irrespective of the dataset. Additionally, our proposed method is significantly faster than the existing techniques. Finally, we introduced three utility metrics to judge the quality of the de-identified data.

7.
BMC Med Inform Decis Mak ; 19(1): 183, 2019 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-31493797

RESUMO

BACKGROUND: Medical data sharing is a big challenge in biomedicine, which often hinders collaborative research. Due to privacy concerns, clinical notes cannot be directly shared. A lot of efforts have been dedicated to de-identifying clinical notes but it is still very challenging to accurately locate and scrub all sensitive elements from notes in an automatic manner. An alternative approach is to remove sentences that might contain sensitive terms related to personal information. METHODS: A previous study introduced a frequency-based filtering approach that removes sentences containing low frequency bigrams to improve the privacy protection without significantly decreasing the utility. Our work extends this method to consider clinical notes from distributed sources with security and privacy considerations. We developed a novel secure protocol based on private set intersection and secure thresholding to identify uncommon and low-frequency terms, which can be used to guide sentence filtering. RESULTS: As the computational cost of our proposed framework mostly depends on the cardinality of the intersection of the sets and the number of data owners, we evaluated the framework in terms of these two factors. Experimental results demonstrate that our proposed method is scalable in various experimental settings. In addition, we evaluated our framework in terms of data utility. This evaluation shows that the proposed method is able to retain enough information for data analysis. CONCLUSION: This work demonstrates the feasibility of using homomorphic encryption to develop a secure and efficient multi-party protocol.


Assuntos
Artefatos , Segurança Computacional , Disseminação de Informação , Registros Eletrônicos de Saúde , Humanos
8.
Brief Bioinform ; 20(3): 887-895, 2019 05 21.
Artigo em Inglês | MEDLINE | ID: mdl-29121240

RESUMO

Genomic data hold salient information about the characteristics of a living organism. Throughout the past decade, pinnacle developments have given us more accurate and inexpensive methods to retrieve genome sequences of humans. However, with the advancement of genomic research, there is a growing privacy concern regarding the collection, storage and analysis of such sensitive human data. Recent results show that given some background information, it is possible for an adversary to reidentify an individual from a specific genomic data set. This can reveal the current association or future susceptibility of some diseases for that individual (and sometimes the kinship between individuals) resulting in a privacy violation. Regardless of these risks, our genomic data hold much importance in analyzing the well-being of us and the future generation. Thus, in this article, we discuss the different privacy and security-related problems revolving around human genomic data. In addition, we will explore some of the cardinal cryptographic concepts, which can bring efficacy in secure and private genomic data computation. This article will relate the gaps between these two research areas-Cryptography and Genomics.


Assuntos
Privacidade Genética , Genoma Humano , Humanos , Inquéritos e Questionários
9.
Artigo em Inglês | MEDLINE | ID: mdl-29993695

RESUMO

Recent studies demonstrate that effective healthcare can benefit from using the human genomic information. Consequently, many institutions are using statistical analysis of genomic data, which are mostly based on genome-wide association studies (GWAS). GWAS analyze genome sequence variations in order to identify genetic risk factors for diseases. These studies often require pooling data from different sources together in order to unravel statistical patterns, and relationships between genetic variants and diseases. Here, the primary challenge is to fulfill one major objective: accessing multiple genomic data repositories for collaborative research in a privacy-preserving manner. Due to the privacy concerns regarding the genomic data, multi-jurisdictional laws and policies of cross-border genomic data sharing are enforced among different countries. In this article, we present SAFETY, a hybrid framework, which can securely perform GWAS on federated genomic datasets using homomorphic encryption and recently introduced secure hardware component of Intel Software Guard Extensions to ensure high efficiency and privacy at the same time. Different experimental settings show the efficacy and applicability of such hybrid framework in secure conduction of GWAS. To the best of our knowledge, this hybrid use of homomorphic encryption along with Intel SGX is not proposed to this date. SAFETY is up to 4.82 times faster than the best existing secure computation technique.


Assuntos
Segurança Computacional , Bases de Dados Genéticas , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Software , Segurança Computacional/legislação & jurisprudência , Segurança Computacional/normas , Genoma Humano/genética , Humanos , Fatores de Tempo
10.
Artigo em Inglês | MEDLINE | ID: mdl-29994005

RESUMO

Machine learning applications are intensively utilized in various science fields, and increasingly the biomedical and healthcare sector. Applying predictive modeling to biomedical data introduces privacy and security concerns requiring additional protection to prevent accidental disclosure or leakage of sensitive patient information. Significant advancements in secure computing methods have emerged in recent years, however, many of which require substantial computational and/or communication overheads, which might hinder their adoption in biomedical applications. In this work, we propose SecureLR, a novel framework allowing researchers to leverage both the computational and storage capacity of Public Cloud Servers to conduct learning and predictions on biomedical data without compromising data security or efficiency. Our model builds upon homomorphic encryption methodologies with hardware-based security reinforcement through Software Guard Extensions (SGX), and our implementation demonstrates a practical hybrid cryptographic solution to address important concerns in conducting machine learning with public clouds.


Assuntos
Computação em Nuvem , Segurança Computacional , Modelos Logísticos , Software , Algoritmos , Registros Eletrônicos de Saúde , Aprendizado de Máquina , Informática Médica
11.
IEEE J Biomed Health Inform ; 23(6): 2611-2618, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-30442622

RESUMO

Both individuals and enterprises produce genomic data rapidly and continuously. There is a need to outsource such data to the cloud for better flexibility. Outsourcing also helps data owners by eliminating the local storage management problem. To protect data privacy and security, data owners must encrypt the sensitive data before outsourcing. Since genomic data are enormous in volume, executing researchers queries securely, and efficiently is a challenging task. In this paper, we introduce an indexing algorithm based on the prefix-tree to support similar patient queries. The proposed method guarantees the following: data privacy, query privacy, and output privacy. The privacy is guaranteed through encryption and garbled circuits considering the semi-honest adversary model. The overall computation is scalable and fast enough for real-life biomedical applications. Moreover, experimental results show that our method performs better than existing state-of-art techniques in this domain.


Assuntos
Segurança Computacional , Bases de Dados Genéticas , Disseminação de Informação/métodos , Algoritmos , Computação em Nuvem , Genômica , Humanos
12.
Comput Methods Programs Biomed ; 165: 129-137, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-30337067

RESUMO

BACKGROUND AND OBJECTIVE: Cloud computing plays a vital role in big data science with its scalable and cost-efficient architecture. Large-scale genome data storage and computations would benefit from using these latest cloud computing infrastructures, to save cost and speedup discoveries. However, due to the privacy and security concerns, data owners are often disinclined to put sensitive data in a public cloud environment without enforcing some protective measures. An ideal solution is to develop secure genome database that supports encrypted data deposition and query. METHODS: Nevertheless, it is a challenging task to make such a system fast and scalable enough to handle real-world demands providing data security as well. In this paper, we propose a novel, secure mechanism to support secure count queries on an open source graph database (Neo4j) and evaluated the performance on a real-world dataset of around 735,317 Single Nucleotide Polymorphisms (SNPs). In particular, we propose a new tree indexing method that offers constant time complexity (proportion to the tree depth), which was the bottleneck of existing approaches. RESULTS: The proposed method significantly improves the runtime of query execution compared to the existing techniques. It takes less than one minute to execute an arbitrary count query on a dataset of 212  GB, while the best-known algorithm takes around 7  min. CONCLUSIONS: The outlined framework and experimental results show the applicability of utilizing graph database for securely storing large-scale genome data in untrusted environment. Furthermore, the crypto-system and security assumptions underlined are much suitable for such use cases which be generalized in future work.


Assuntos
Segurança Computacional , Bases de Dados Genéticas/estatística & dados numéricos , Genoma Humano , Armazenamento e Recuperação da Informação , Big Data , Computação em Nuvem , Humanos , Polimorfismo de Nucleotídeo Único , Ferramenta de Busca
13.
J Biomed Inform ; 81: 41-52, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29550393

RESUMO

Human genomic information can yield more effective healthcare by guiding medical decisions. Therefore, genomics research is gaining popularity as it can identify potential correlations between a disease and a certain gene, which improves the safety and efficacy of drug treatment and can also develop more effective prevention strategies [1]. To reduce the sampling error and to increase the statistical accuracy of this type of research projects, data from different sources need to be brought together since a single organization does not necessarily possess required amount of data. In this case, data sharing among multiple organizations must satisfy strict policies (for instance, HIPAA and PIPEDA) that have been enforced to regulate privacy-sensitive data sharing. Storage and computation on the shared data can be outsourced to a third party cloud service provider, equipped with enormous storage and computation resources. However, outsourcing data to a third party is associated with a potential risk of privacy violation of the participants, whose genomic sequence or clinical profile is used in these studies. In this article, we propose a method for secure sharing and computation on genomic data in a semi-honest cloud server. In particular, there are two main contributions. Firstly, the proposed method can handle biomedical data containing both genotype and phenotype. Secondly, our proposed index tree scheme reduces the computational overhead significantly for executing secure count query operation. In our proposed method, the confidentiality of shared data is ensured through encryption, while making the entire computation process efficient and scalable for cutting-edge biomedical applications. We evaluated our proposed method in terms of efficiency on a database of Single-Nucleotide Polymorphism (SNP) sequences, and experimental results demonstrate that the execution time for a query of 50 SNPs in a database of 50,000 records is approximately 5 s, where each record contains 500 SNPs. And, it requires 69.7 s to execute the query on the same database that also includes phenotypes.


Assuntos
Computação em Nuvem , Segurança Computacional , Genoma Humano , Genômica/métodos , Informática Médica/métodos , Algoritmos , Confidencialidade , Reações Falso-Positivas , Genótipo , Health Insurance Portability and Accountability Act , Humanos , Disseminação de Informação , Informática Médica/instrumentação , Serviços Terceirizados , Fenótipo , Polimorfismo de Nucleotídeo Único , Privacidade , Linguagens de Programação , Registros , Estados Unidos
14.
JMIR Med Inform ; 6(1): e14, 2018 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-29506966

RESUMO

BACKGROUND: Machine learning is an effective data-driven tool that is being widely used to extract valuable patterns and insights from data. Specifically, predictive machine learning models are very important in health care for clinical data analysis. The machine learning algorithms that generate predictive models often require pooling data from different sources to discover statistical patterns or correlations among different attributes of the input data. The primary challenge is to fulfill one major objective: preserving the privacy of individuals while discovering knowledge from data. OBJECTIVE: Our objective was to develop a hybrid cryptographic framework for performing regression analysis over distributed data in a secure and efficient way. METHODS: Existing secure computation schemes are not suitable for processing the large-scale data that are used in cutting-edge machine learning applications. We designed, developed, and evaluated a hybrid cryptographic framework, which can securely perform regression analysis, a fundamental machine learning algorithm using somewhat homomorphic encryption and a newly introduced secure hardware component of Intel Software Guard Extensions (Intel SGX) to ensure both privacy and efficiency at the same time. RESULTS: Experimental results demonstrate that our proposed method provides a better trade-off in terms of security and efficiency than solely secure hardware-based methods. Besides, there is no approximation error. Computed model parameters are exactly similar to plaintext results. CONCLUSIONS: To the best of our knowledge, this kind of secure computation model using a hybrid cryptographic framework, which leverages both somewhat homomorphic encryption and Intel SGX, is not proposed or evaluated to this date. Our proposed framework ensures data security and computational efficiency at the same time.

15.
BMC Med Genomics ; 10(Suppl 2): 41, 2017 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-28786362

RESUMO

BACKGROUND: Edit distance is a well established metric to quantify how dissimilar two strings are by counting the minimum number of operations required to transform one string into the other. It is utilized in the domain of human genomic sequence similarity as it captures the requirements and leads to a better diagnosis of diseases. However, in addition to the computational complexity due to the large genomic sequence length, the privacy of these sequences are highly important. As these genomic sequences are unique and can identify an individual, these cannot be shared in a plaintext. METHODS: In this paper, we propose two different approximation methods to securely compute the edit distance among genomic sequences. We use shingling, private set intersection methods, the banded alignment algorithm, and garbled circuits to implement these methods. We experimentally evaluate these methods and discuss both advantages and limitations. RESULTS: Experimental results show that our first approximation method is fast and achieves similar accuracy compared to existing techniques. However, for longer genomic sequences, both the existing techniques and our proposed first method are unable to achieve a good accuracy. On the other hand, our second approximation method is able to achieve higher accuracy on such datasets. However, the second method is relatively slower than the first proposed method. CONCLUSION: The proposed algorithms are generally accurate, time-efficient and can be applied individually and jointly as they have complimentary properties (runtime vs. accuracy) on different types of datasets.


Assuntos
Segurança Computacional , Genômica , Alinhamento de Sequência/métodos , Algoritmos
16.
BMC Med Genomics ; 10(Suppl 2): 43, 2017 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-28786364

RESUMO

BACKGROUND: With the enormous need for federated eco-system for holding global genomic and clinical data, Global Alliance for Genomic and Health (GA4GH) has created an international website called beacon service which allows a researcher to find out whether a specific dataset can be utilized to his or her research beforehand. This simple webservice is quite useful as it allows queries like whether a certain position of a target chromosome has a specific nucleotide. However, the increased integration of individuals genomic data into clinical practice and research raised serious privacy concern. Though the answer of such queries are yes or no in Bacon network, it results in serious privacy implication as demonstrated in a recent work from Shringarpure and Bustamante. In their attack model, the authors demonstrated that with a limited number of queries, presence of an individual in any dataset can be determined. METHODS: We propose two lightweight algorithms (based on randomized response) which captures the efficacy while preserving the privacy of the participants in a genomic beacon service. We also elaborate the strength and weakness of the attack by explaining some of their statistical and mathematical models using real world genomic database. We extend their experimental simulations for different adversarial assumptions and parameters. RESULTS: We experimentally evaluated the solutions on the original attack model with different parameters for better understanding of the privacy and utility tradeoffs provided by these two methods. Also, the statistical analysis further elaborates the different aspects of the prior attack which leads to a better risk management for the participants in a beacon service. CONCLUSIONS: The differentially private and lightweight solutions discussed here will make the attack much difficult to succeed while maintaining the fundamental motivation of beacon database network.


Assuntos
Algoritmos , Segurança Computacional , Genômica , Fatores de Tempo
17.
BMC Med Genomics ; 10(Suppl 2): 48, 2017 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-28786365

RESUMO

BACKGROUND: Advances in DNA sequencing technologies have prompted a wide range of genomic applications to improve healthcare and facilitate biomedical research. However, privacy and security concerns have emerged as a challenge for utilizing cloud computing to handle sensitive genomic data. METHODS: We present one of the first implementations of Software Guard Extension (SGX) based securely outsourced genetic testing framework, which leverages multiple cryptographic protocols and minimal perfect hash scheme to enable efficient and secure data storage and computation outsourcing. RESULTS: We compared the performance of the proposed PRESAGE framework with the state-of-the-art homomorphic encryption scheme, as well as the plaintext implementation. The experimental results demonstrated significant performance over the homomorphic encryption methods and a small computational overhead in comparison to plaintext implementation. CONCLUSIONS: The proposed PRESAGE provides an alternative solution for secure and efficient genomic data outsourcing in an untrusted cloud by using a hybrid framework that combines secure hardware and multiple crypto protocols.


Assuntos
Segurança Computacional , Testes Genéticos , Análise de Sequência de DNA , Software , Computação em Nuvem , Serviços Terceirizados
18.
AMIA Annu Symp Proc ; 2017: 1744-1753, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29854245

RESUMO

As genomic data are usually at large scale and highly sensitive, it is essential to enable both efficient and secure analysis, by which the data owner can securely delegate both computation and storage on untrusted public cloud. Counting query of genotypes is a basic function for many downstream applications in biomedical research (e.g., computing allele frequency, calculating chi-squared statistics, etc.). Previous solutions show promise on secure counting of outsourced data but the efficiency is still a big limitation for real world applications. In this paper, we propose a novel hybrid solution to combine a rigorous theoretical model (homomorphic encryption) and the latest hardware-based infrastructure (i.e., Software Guard Extensions) to speed up the computation while preserving the privacy of both data owners and data users. Our results demonstrated efficiency by using the real data from the personal genome project.


Assuntos
Computação em Nuvem , Segurança Computacional , Conjuntos de Dados como Assunto , Privacidade Genética , Genômica , Bases de Dados Genéticas , Genoma Humano , Humanos , Modelos Teóricos , Software
19.
IEEE J Biomed Health Inform ; 21(5): 1466-1472, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-27834660

RESUMO

Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time consuming and expensive process. Second, it requires large-scale computation and storage systems to process genomic sequences. Third, genomic databases are often owned by different organizations, and thus, not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 Single Nucleotide Polymorphisms (SNPs) in a database of 20 000 records takes around 100 and 150 s, respectively.


Assuntos
Segurança Computacional , Genômica/normas , Informática Médica/normas , Serviços Terceirizados/normas , Privacidade , Computação em Nuvem , Bases de Dados Genéticas , Humanos
20.
BMC Med Inform Decis Mak ; 14 Suppl 1: S2, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25521306

RESUMO

Advanced sequencing techniques make large genome data available at an unprecedented speed and reduced cost. Genome data sharing has the potential to facilitate significant medical breakthroughs. However, privacy concerns have impeded efficient genome data sharing. In this paper, we present a novel approach for disseminating genomic data while satisfying differential privacy. The proposed algorithm splits raw genome sequences into blocks, subdivides the blocks in a top-down fashion, and finally adds noise to counts to preserve privacy. The experimental results suggest that the proposed algorithm can retain certain data utility in terms of a high sensitivity.


Assuntos
Algoritmos , Privacidade Genética/normas , Estudo de Associação Genômica Ampla/normas , Disseminação de Informação/métodos , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...