Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Proc Natl Acad Sci U S A ; 105(46): 17608-13, 2008 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-19015533

RESUMO

Datasets describing the health status of individuals are important for medical research but must be used cautiously to protect patient privacy. For patient data containing geographical identifiers, the conventional solution is to aggregate the data by large areas. This method often preserves privacy but suffers from substantial information loss, which degrades the quality of subsequent disease mapping or cluster detection studies. Other heuristic methods for de-identifying spatial patient information do not quantify the risk to individual privacy. We develop an optimal method based on linear programming to add noise to individual locations that preserves the distribution of a disease. The method ensures a small, quantitative risk of individual re-identification. Because the amount of noise added is minimal for the desired degree of privacy protection, the de-identified set is ideal for spatial epidemiological studies. We apply the method to patients in New York County, New York, showing that privacy is guaranteed while moving patients 25-150 times less than aggregation by zip code.


Assuntos
Confidencialidade , Doença , Estudos de Casos e Controles , Análise por Conglomerados , Geografia , Humanos , Modelos Lineares , New York , Densidade Demográfica
2.
Int J Health Geogr ; 7: 45, 2008 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-18700031

RESUMO

BACKGROUND: Knowledge of the geographical locations of individuals is fundamental to the practice of spatial epidemiology. One approach to preserving the privacy of individual-level addresses in a data set is to de-identify the data using a non-deterministic blurring algorithm that shifts the geocoded values. We investigate a vulnerability in this approach which enables an adversary to re-identify individuals using multiple anonymized versions of the original data set. If several such versions are available, each can be used to incrementally refine estimates of the original geocoded location. RESULTS: We produce multiple anonymized data sets using a single set of addresses and then progressively average the anonymized results related to each address, characterizing the steep decline in distance from the re-identified point to the original location, (and the reduction in privacy). With ten anonymized copies of an original data set, we find a substantial decrease in average distance from 0.7 km to 0.2 km between the estimated, re-identified address and the original address. With fifty anonymized copies of an original data set, we find a decrease in average distance from 0.7 km to 0.1 km. CONCLUSION: We demonstrate that multiple versions of the same data, each anonymized by non-deterministic Gaussian skew, can be used to ascertain original geographic locations. We explore solutions to this problem that include infrastructure to support the safe disclosure of anonymized medical data to prevent inference or re-identification of original address data, and the use of a Markov-process based algorithm to mitigate this risk.


Assuntos
Algoritmos , Métodos Epidemiológicos , Sistemas de Informação Geográfica , Vigilância da População , Privacidade , Humanos , Cadeias de Markov , Distribuição Normal
3.
BMC Med Inform Decis Mak ; 7: 15, 2007 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-17567912

RESUMO

BACKGROUND: For real time surveillance, detection of abnormal disease patterns is based on a difference between patterns observed, and those predicted by models of historical data. The usefulness of outbreak detection strategies depends on their specificity; the false alarm rate affects the interpretation of alarms. RESULTS: We evaluate the specificity of five traditional models: autoregressive, Serfling, trimmed seasonal, wavelet-based, and generalized linear. We apply each to 12 years of emergency department visits for respiratory infection syndromes at a pediatric hospital, finding that the specificity of the five models was almost always a non-constant function of the day of the week, month, and year of the study (p < 0.05). We develop an outbreak detection method, called the expectation-variance model, based on generalized additive modeling to achieve a constant specificity by accounting for not only the expected number of visits, but also the variance of the number of visits. The expectation-variance model achieves constant specificity on all three time scales, as well as earlier detection and improved sensitivity compared to traditional methods in most circumstances. CONCLUSION: Modeling the variance of visit patterns enables real-time detection with known, constant specificity at all times. With constant specificity, public health practitioners can better interpret the alarms and better evaluate the cost-effectiveness of surveillance systems.


Assuntos
Sistemas Computacionais , Surtos de Doenças , Serviço Hospitalar de Emergência/estatística & dados numéricos , Hospitais Pediátricos/estatística & dados numéricos , Modelos Estatísticos , Vigilância de Evento Sentinela , Adolescente , Algoritmos , Criança , Processamento Eletrônico de Dados , Humanos , Estações do Ano , Sensibilidade e Especificidade , Tempo
4.
Proc Natl Acad Sci U S A ; 104(22): 9404-9, 2007 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-17519338

RESUMO

Existing disease cluster detection methods cannot detect clusters of all shapes and sizes or identify highly irregular sets that overestimate the true extent of the cluster. We introduce a graph-theoretical method for detecting arbitrarily shaped clusters based on the Euclidean minimum spanning tree of cartogram-transformed case locations, which overcomes these shortcomings. The method is illustrated by using several clusters, including historical data sets from West Nile virus and inhalational anthrax outbreaks. Sensitivity and accuracy comparisons with the prevailing cluster detection method show that the method performs similarly on approximately circular historical clusters and greatly improves detection for noncircular clusters.


Assuntos
Antraz/epidemiologia , Febre do Nilo Ocidental/epidemiologia , Antraz/microbiologia , Antraz/patologia , Boston/epidemiologia , Análise por Conglomerados , New York/epidemiologia , Federação Russa/epidemiologia , Sensibilidade e Especificidade , Fatores de Tempo , Febre do Nilo Ocidental/patologia , Febre do Nilo Ocidental/virologia
5.
J Mol Biol ; 340(1): 179-90, 2004 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-15184029

RESUMO

Networks are proving to be central to the study of gene function, protein-protein interaction, and biochemical pathway data. Visualization of networks is important for their study, but visualization tools are often inadequate for working with very large biological networks. Here, we present an algorithm, called large graph layout (LGL), which can be used to dynamically visualize large networks on the order of hundreds of thousands of vertices and millions of edges. LGL applies a force-directed iterative layout guided by a minimal spanning tree of the network in order to generate coordinates for the vertices in two or three dimensions, which are subsequently visualized and interactively navigated with companion programs. We demonstrate the use of LGL in visualizing an extensive protein map summarizing the results of approximately 21 billion sequence comparisons between 145579 proteins from 50 genomes. Proteins are positioned in the map according to sequence homology and gene fusions, with the map ultimately serving as a theoretical framework that integrates inferences about gene function derived from sequence homology, remote homology, gene fusions, and higher-order fusions. We confirm that protein neighbors in the resulting map are functionally related, and that distinct map regions correspond to distinct cellular systems, enabling a computational strategy for discovering proteins' functions on the basis of the proteins' map positions. Using the map produced by LGL, we infer general functions for 23 uncharacterized protein families.


Assuntos
Algoritmos , Mapeamento de Interação de Proteínas , Proteínas/fisiologia , Sequência de Aminoácidos , Bases de Dados de Proteínas , Humanos , Ligação Proteica , Proteínas/química , Homologia de Sequência de Aminoácidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...