Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Bioresour Technol ; : 130970, 2024 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-38876285

RESUMO

The effects and mitigation mechanisms of biochar added at different composting stages on N2O emission were investigated. Four treatments were set as follows: CK: control, BB10%: +10 % biochar at beginning of composting, BB5%&T5%: +5% biochar at beginning and + 5 % biochar after thermophilic stage of composting, BT10%: +10 % after thermophilic stage of composting. Results showed that treatment BB10%, BB5%&T5%, and BT10% reduced total N2O emissions by 55 %, 37 %, and 36 %, respectively. N2O emission was closely related to most physicochemical properties, while it was only related to amoA gene and hydroxylamine oxidoreductase. Different addition strategies of biochar changed the contributions of physicochemical properties, functional genes and enzymes to N2O emission. Organic matter and C/N contributed 23.7 % and 27.6 % of variations in functional gene abundances (P < 0.05), respectively. pH and C/N (P < 0.05) contributed 37.3 % and 17.3 % of variations in functional enzyme activities. These findings provided valuable insights into mitigating N2O emissions during composting.

2.
IEEE Trans Knowl Data Eng ; 30(3): 573-584, 2018 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-30034201

RESUMO

Privacy concern in data sharing especially for health data gains particularly increasing attention nowadays. Now some patients agree to open their information for research use, which gives rise to a new question of how to effectively use the public information to better understand the private dataset without breaching privacy. In this paper, we specialize this question as selecting an optimal subset of the public dataset for M-estimators in the framework of differential privacy (DP) in [1]. From a perspective of non-interactive learning, we first construct the weighted private density estimation from the hybrid datasets under DP. Along the same line as [2], we analyze the accuracy of the DP M-estimators based on the hybrid datasets. Our main contributions are (i) we find that the bias-variance tradeoff in the performance of our M-estimators can be characterized in the sample size of the released dataset; (2) based on this finding, we develop an algorithm to select the optimal subset of the public dataset to release under DP. Our simulation studies and application to the real datasets confirm our findings and set a guideline in the real application.

4.
Bioinformatics ; 33(23): 3716-3725, 2017 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-29036461

RESUMO

MOTIVATION: Inappropriate disclosure of human genomes may put the privacy of study subjects and of their family members at risk. Existing privacy-preserving mechanisms for Genome-Wide Association Studies (GWAS) mainly focus on protecting individual information in case-control studies. Protecting privacy in family-based studies is more difficult. The transmission disequilibrium test (TDT) is a powerful family-based association test employed in many rare disease studies. It gathers information about families (most frequently involving parents, affected children and their siblings). It is important to develop privacy-preserving approaches to disclose TDT statistics with a guarantee that the risk of family 're-identification' stays below a pre-specified risk threshold. 'Re-identification' in this context means that an attacker can infer that the presence of a family in a study. METHODS: In the context of protecting family-level privacy, we developed and evaluated a suite of differentially private (DP) mechanisms for TDT. They include Laplace mechanisms based on the TDT test statistic, P-values, projected P-values and exponential mechanisms based on the TDT test statistic and the shortest Hamming distance (SHD) score. RESULTS: Using simulation studies with a small cohort and a large one, we showed that that the exponential mechanism based on the SHD score preserves the highest utility and privacy among all proposed DP methods. We provide a guideline on applying our DP TDT in a real dataset in analyzing Kawasaki disease with 187 families and 906 SNPs. There are some limitations, including: (1) the performance of our implementation is slow for real-time results generation and (2) handling missing data is still challenging. AVAILABILITY AND IMPLEMENTATION: The software dpTDT is available in https://github.com/mwgrassgreen/dpTDT. CONTACT: mengw1@stanford.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Privacidade Genética , Estudo de Associação Genômica Ampla , Criança , Família , Humanos , Desequilíbrio de Ligação , Pais , Polimorfismo de Nucleotídeo Único , Software
5.
Adv Knowl Discov Data Min (2017) ; 10234: 615-627, 2017 May.
Artigo em Inglês | MEDLINE | ID: mdl-28932827

RESUMO

Differential privacy has recently emerged in private statistical aggregate analysis as one of the strongest privacy guarantees. A limitation of the model is that it provides the same privacy protection for all individuals in the database. However, it is common that data owners may have different privacy preferences for their data. Consequently, a global differential privacy parameter may provide excessive privacy protection for some users, while insufficient for others. In this paper, we propose two partitioning-based mechanisms, privacy-aware and utility-based partitioning, to handle personalized differential privacy parameters for each individual in a dataset while maximizing utility of the differentially private computation. The privacy-aware partitioning is to minimize the privacy budget waste, while utility-based partitioning is to maximize the utility for a given aggregate analysis. We also develop a t-round partitioning to take full advantage of remaining privacy budgets. Extensive experiments using real datasets show the effectiveness of our partitioning mechanisms.

6.
J Am Med Inform Assoc ; 24(4): 799-805, 2017 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-28339683

RESUMO

The Global Alliance for Genomics and Health (GA4GH) created the Beacon Project as a means of testing the willingness of data holders to share genetic data in the simplest technical context-a query for the presence of a specified nucleotide at a given position within a chromosome. Each participating site (or "beacon") is responsible for assuring that genomic data are exposed through the Beacon service only with the permission of the individual to whom the data pertains and in accordance with the GA4GH policy and standards.While recognizing the inference risks associated with large-scale data aggregation, and the fact that some beacons contain sensitive phenotypic associations that increase privacy risk, the GA4GH adjudged the risk of re-identification based on the binary yes/no allele-presence query responses as acceptable. However, recent work demonstrated that, given a beacon with specific characteristics (including relatively small sample size and an adversary who possesses an individual's whole genome sequence), the individual's membership in a beacon can be inferred through repeated queries for variants present in the individual's genome.In this paper, we propose three practical strategies for reducing re-identification risks in beacons. The first two strategies manipulate the beacon such that the presence of rare alleles is obscured; the third strategy budgets the number of accesses per user for each individual genome. Using a beacon containing data from the 1000 Genomes Project, we demonstrate that the proposed strategies can effectively reduce re-identification risk in beacon-like datasets.


Assuntos
Anonimização de Dados , Privacidade Genética , Disseminação de Informação , Genômica , Humanos
7.
J Am Med Inform Assoc ; 22(6): 1212-9, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26159465

RESUMO

OBJECTIVE: The Cox proportional hazards model is a widely used method for analyzing survival data. To achieve sufficient statistical power in a survival analysis, it usually requires a large amount of data. Data sharing across institutions could be a potential workaround for providing this added power. METHODS AND MATERIALS: The authors develop a web service for distributed Cox model learning (WebDISCO), which focuses on the proof-of-concept and algorithm development for federated survival analysis. The sensitive patient-level data can be processed locally and only the less-sensitive intermediate statistics are exchanged to build a global Cox model. Mathematical derivation shows that the proposed distributed algorithm is identical to the centralized Cox model. RESULTS: The authors evaluated the proposed framework at the University of California, San Diego (UCSD), Emory, and Duke. The experimental results show that both distributed and centralized models result in near-identical model coefficients with differences in the range [Formula: see text] to [Formula: see text]. The results confirm the mathematical derivation and show that the implementation of the distributed model can achieve the same results as the centralized implementation. LIMITATION: The proposed method serves as a proof of concept, in which a publicly available dataset was used to evaluate the performance. The authors do not intend to suggest that this method can resolve policy and engineering issues related to the federated use of institutional data, but they should serve as evidence of the technical feasibility of the proposed approach.Conclusions WebDISCO (Web-based Distributed Cox Regression Model; https://webdisco.ucsd-dbmi.org:8443/cox/) provides a proof-of-concept web service that implements a distributed algorithm to conduct distributed survival analysis without sharing patient level data.


Assuntos
Algoritmos , Modelos de Riscos Proporcionais , Análise de Sobrevida , Redes de Comunicação de Computadores , Conjuntos de Dados como Assunto , Sistemas de Apoio a Decisões Clínicas , Humanos , Disseminação de Informação/métodos , Internet
8.
BMC Med Inform Decis Mak ; 14 Suppl 1: S3, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25521367

RESUMO

In response to the growing interest in genome-wide association study (GWAS) data privacy, the Integrating Data for Analysis, Anonymization and SHaring (iDASH) center organized the iDASH Healthcare Privacy Protection Challenge, with the aim of investigating the effectiveness of applying privacy-preserving methodologies to human genetic data. This paper is based on a submission to the iDASH Healthcare Privacy Protection Challenge. We apply privacy-preserving methods that are adapted from Uhler et al. 2013 and Yu et al. 2014 to the challenge's data and analyze the data utility after the data are perturbed by the privacy-preserving methods. Major contributions of this paper include new interpretation of the χ2 statistic in a GWAS setting and new results about the Hamming distance score, a key component for one of the privacy-preserving methods.


Assuntos
Privacidade Genética/normas , Estudo de Associação Genômica Ampla/normas , Disseminação de Informação/métodos , Humanos
9.
BMC Med Genomics ; 7 Suppl 1: S14, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25079786

RESUMO

BACKGROUND: Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. METHODOLOGY: In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. EXPERIMENTS AND RESULTS: We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. CONCLUSION: Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee.


Assuntos
Mineração de Dados/métodos , Informática Médica/métodos , Privacidade , Acesso à Informação , Algoritmos , Neoplasias da Mama/epidemiologia , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Humanos , Modelos Logísticos , Alta do Paciente/estatística & dados numéricos
10.
Trans Data Priv ; 6(1): 19-34, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24409205

RESUMO

A reasonable compromise of privacy and utility exists at an "appropriate" resolution of the data. We proposed novel mechanisms to achieve privacy preserving data publishing (PPDP) satisfying ε-differential privacy with improved utility through component analysis. The mechanisms studied in this article are Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). The differential PCA-based PPDP serves as a general-purpose data dissemination tool that guarantees better utility (i.e., smaller error) compared to Laplacian and Exponential mechanisms using the same "privacy budget". Our second mechanism, the differential LDA-based PPDP, favors data dissemination for classification purposes. Both mechanisms were compared with state-of-the-art methods to show performance differences.

11.
AMIA Annu Symp Proc ; 2013: 1429-37, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24551418

RESUMO

Pain is a common but significant problem that is considered a high priority area of care. Although there are many pain assessment scales that can be applied to patients who can communicate, either verbally or non-verbally, pain assessment for minimally responsive patients is limited. In this preliminary work, we developed a novel approach for assessing pain in such patients using a principal component analysis (PCA)-based local detector. Our algorithm produce a single index to indicate the increase in pain level based on unsynchronized, sparse and noisy time series data collected from electronic flowsheets. Among 8032 patient cases collected, 53 cases that satisfied the data requirements for PCA were used in this experiment. Our preliminary results indicate high potential in this approach by yielding an average AUC of 0.76 for the 53 cases.


Assuntos
Algoritmos , Modelos Biológicos , Medição da Dor/métodos , Inconsciência , Área Sob a Curva , Barreiras de Comunicação , Humanos , Curva ROC
12.
Mach Learn ; 93(1): 163-183, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24482559

RESUMO

This paper analyzes a novel method for publishing data while still protecting privacy. The method is based on computing weights that make an existing dataset, for which there are no confidentiality issues, analogous to the dataset that must be kept private. The existing dataset may be genuine but public already, or it may be synthetic. The weights are importance sampling weights, but to protect privacy, they are regularized and have noise added. The weights allow statistical queries to be answered approximately while provably guaranteeing differential privacy. We derive an expression for the asymptotic variance of the approximate answers. Experiments show that the new mechanism performs well even when the privacy budget is small, and when the public and private datasets are drawn from different populations.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...