RESUMO
When agencies release public-use data, they must be cognizant of the potential risk of disclosure associated with making their data publicly available. This issue is particularly pertinent in disease mapping, where small counts pose both inferential challenges and potential disclosure risks. While the small area estimation, disease mapping, and statistical disclosure limitation literatures are individually robust, there have been few intersections between them. Here, we formally propose the use of spatiotemporal data analysis methods to generate synthetic data for public use. Specifically, we analyze ten years of county-level heart disease death counts for multiple age-groups using a Bayesian model that accounts for dependence spatially, temporally, and between age-groups; generating synthetic data from the resulting posterior predictive distribution will preserve these dependencies. After demonstrating the synthetic data's privacy-preserving features, we illustrate their utility by comparing estimates of urban/rural disparities from the synthetic data to those from data with small counts suppressed.
Assuntos
Confidencialidade/normas , Análise Espaço-Temporal , Topografia Médica , Teorema de Bayes , Revelação , Mapeamento Geográfico , Humanos , Modelos Estatísticos , Risco , Topografia Médica/ética , Topografia Médica/métodos , Topografia Médica/estatística & dados numéricosRESUMO
The promoter region of the UDP glucuronosyltransferase 1 gene (UGT1A1) contains a run of thymine-adenine (TA) repeats, usually six (TA)(6). As well as its relationship to Gilbert's syndrome, homozygosity for the extended sequence, (TA)(7) (TA)(7), has been found to be an important risk factor for hyperbilirubinemia and gallstones in patients with hemoglobin E-beta-thalassemia and other intermediate forms of beta thalassemia. To assess the importance of this polymorphism in these common disorders a wide-scale population study of the relative frequency of the size alleles of the UGT1A1 promoter has been carried out. Homozygosity for the (TA)(7) allele occurs in 10-25% of the populations of Africa and the Indian subcontinent, with a variable frequency in Europe. It occurs at a much lower frequency in Southeast Asia, Melanesia, and the Pacific Islands, ranging from 0 to 5%. African populations show a much greater diversity of length alleles than other populations. These findings define those populations with a high frequency of hemoglobin E-beta-thalassemia and related disorders that are at increased risk for hyperbilirubinemia and gall bladder disease and provide evolutionary insights into how these polymorphisms have arisen and are so unequally distributed among human populations.