Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Sci Adv ; 10(28): eadj9260, 2024 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-38985874

RESUMO

Despite machine learning models being widely used today, the relationship between a model and its training dataset is not well understood. We explore correlation inference attacks, whether and when a model leaks information about the correlations between the input variables of its training dataset. We first propose a model-less attack, where an adversary exploits the spherical parameterization of correlation matrices alone to make an informed guess. Second, we propose a model-based attack, where an adversary exploits black-box model access to infer the correlations using minimal and realistic assumptions. Third, we evaluate our attacks against logistic regression and multilayer perceptron models on three tabular datasets and show the models to leak correlations. We lastly show how extracted correlations can be used as building blocks for attribute inference attacks and enable weaker adversaries. Our results raise fundamental questions on what a model does and should remember from its training set.

2.
Sci Adv ; 10(29): eadn7053, 2024 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-39018389

RESUMO

Information about us, our actions, and our preferences is created at scale through surveys or scientific studies or as a result of our interaction with digital devices such as smartphones and fitness trackers. The ability to safely share and analyze such data is key for scientific and societal progress. Anonymization is considered by scientists and policy-makers as one of the main ways to share data while minimizing privacy risks. In this review, we offer a pragmatic perspective on the modern literature on privacy attacks and anonymization techniques. We discuss traditional de-identification techniques and their strong limitations in the age of big data. We then turn our attention to modern approaches to share anonymous aggregate data, such as data query systems, synthetic data, and differential privacy. We find that, although no perfect solution exists, applying modern techniques while auditing their guarantees against attacks is the best approach to safely use and share data today.

4.
Patterns (N Y) ; 4(1): 100662, 2023 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-36699738

RESUMO

Despite proportionality being one of the tenets of data protection laws, we currently lack a robust analytical framework to evaluate the reach of modern data collections and the network effects at play. Here, we propose a graph-theoretic model and notions of node- and edge-observability to quantify the reach of networked data collections. We first prove closed-form expressions for our metrics and quantify the impact of the graph's structure on observability. Second, using our model, we quantify how (1) from 270,000 compromised accounts, Cambridge Analytica collected 68.0M Facebook profiles; (2) from surveilling 0.01% of the nodes in a mobile phone network, a law enforcement agency could observe 18.6% of all communications; and (3) an app installed on 1% of smartphones could monitor the location of half of the London population through close proximity tracing. Better quantifying the reach of data collection mechanisms is essential to evaluate their proportionality.

5.
Sci Adv ; 8(33): eabl6464, 2022 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-35984877

RESUMO

Behavioral data, collected from our daily interactions with technology, have driven scientific advances. Yet, the collection and sharing of this data raise legitimate privacy concerns, as individuals can often be reidentified. Current identification attacks, however, require auxiliary information to roughly match the information available in the dataset, limiting their applicability. We here propose an entropy-based profiling model to learn time-persistent profiles. Using auxiliary information about a single target collected over a nonoverlapping time period, we show that individuals are correctly identified 79% of the time in a large location dataset of 0.5 million individuals and 65.2% for a grocery shopping dataset of 85,000 individuals. We further show that accuracy only slowly decreases over time and that the model is robust to state-of-the-art noise addition. Our results show that much more auxiliary information than previously believed can be used to identify individuals, challenging deidentification practices and what currently constitutes legally anonymous data.

7.
Nat Commun ; 13(1): 313, 2022 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-35078995

RESUMO

Fine-grained records of people's interactions, both offline and online, are collected at large scale. These data contain sensitive information about whom we meet, talk to, and when. We demonstrate here how people's interaction behavior is stable over long periods of time and can be used to identify individuals in anonymous datasets. Our attack learns the profile of an individual using geometric deep learning and triplet loss optimization. In a mobile phone metadata dataset of more than 40k people, it correctly identifies 52% of individuals based on their 2-hop interaction graph. We further show that the profiles learned by our method are stable over time and that 24% of people are still identifiable after 20 weeks. Our results suggest that people with well-balanced interaction graphs are more identifiable. Applying our attack to Bluetooth close-proximity networks, we show that even 1-hop interaction graphs are enough to identify people more than 26% of the time. Our results provide strong evidence that disconnected and even re-pseudonymized interaction data can be linked together making them personal data under the European Union's General Data Protection Regulation.

9.
Patterns (N Y) ; 2(3): 100204, 2021 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-33748793

RESUMO

Although anonymous data are not considered personal data, recent research has shown how individuals can often be re-identified. Scholars have argued that previous findings apply only to small-scale datasets and that privacy is preserved in large-scale datasets. Using 3 months of location data, we (1) show the risk of re-identification to decrease slowly with dataset size, (2) approximate this decrease with a simple model taking into account three population-wide marginal distributions, and (3) prove that unicity is convex and obtain a linear lower bound. Our estimates show that 93% of people would be uniquely identified in a dataset of 60M people using four points of auxiliary information, with a lower bound at 22%. This lower bound increases to 87% when five points are available. Taken together, our results show how the privacy of individuals is very unlikely to be preserved even in country-scale location datasets.

11.
Nat Commun ; 10(1): 3069, 2019 07 23.
Artigo em Inglês | MEDLINE | ID: mdl-31337762

RESUMO

While rich medical, behavioral, and socio-demographic data are key to modern data-driven research, their collection and use raise legitimate privacy concerns. Anonymizing datasets through de-identification and sampling before sharing them has been the main tool used to address those concerns. We here propose a generative copula-based method that can accurately estimate the likelihood of a specific person to be correctly re-identified, even in a heavily incomplete dataset. On 210 populations, our method obtains AUC scores for predicting individual uniqueness ranging from 0.84 to 0.97, with low false-discovery rate. Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model.


Assuntos
Análise de Dados , Anonimização de Dados , Informações Pessoalmente Identificáveis , Conjuntos de Dados como Assunto , Funções Verossimilhança , Distribuição Normal
12.
Psychol Sci ; 30(8): 1111-1122, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31268832

RESUMO

It is often assumed that there is a robust positive symmetrical relationship between happiness and social behavior: Social relationships are viewed as essential to happiness, and happiness is thought to foster social relationships. However, empirical support for this widely held view is surprisingly mixed, and this view does little to clarify which social partner a person will be motivated to interact with when happy. To address these issues, we monitored the happiness and social interactions of more than 30,000 people for a month. We found that patterns of social interaction followed the hedonic-flexibility principle, whereby people tend to engage in happiness-enhancing social relationships when they feel bad and sustain happiness-decreasing periods of solitude and less pleasant types of social relationships that might promise long-term payoff when they feel good. These findings demonstrate that links between happiness and social behavior are more complex than often assumed in the positive-emotion literature.


Assuntos
Emoções/fisiologia , Felicidade , Motivação/fisiologia , Adulto , Algoritmos , Feminino , França/epidemiologia , Humanos , Relações Interpessoais , Masculino , Aplicativos Móveis/provisão & distribuição , Filosofia , Comportamento Social
14.
J R Soc Interface ; 14(127)2017 02.
Artigo em Inglês | MEDLINE | ID: mdl-28148765

RESUMO

Poverty is one of the most important determinants of adverse health outcomes globally, a major cause of societal instability and one of the largest causes of lost human potential. Traditional approaches to measuring and targeting poverty rely heavily on census data, which in most low- and middle-income countries (LMICs) are unavailable or out-of-date. Alternate measures are needed to complement and update estimates between censuses. This study demonstrates how public and private data sources that are commonly available for LMICs can be used to provide novel insight into the spatial distribution of poverty. We evaluate the relative value of modelling three traditional poverty measures using aggregate data from mobile operators and widely available geospatial data. Taken together, models combining these data sources provide the best predictive power (highest r2 = 0.78) and lowest error, but generally models employing mobile data only yield comparable results, offering the potential to measure poverty more frequently and at finer granularity. Stratifying models into urban and rural areas highlights the advantage of using mobile data in urban areas and different data in different contexts. The findings indicate the possibility to estimate and continually monitor poverty rates at high spatial resolution in countries with limited capacity to support traditional methods of data collection.


Assuntos
Telefone Celular , Modelos Teóricos , Pobreza , Comunicações Via Satélite , Humanos , Valor Preditivo dos Testes
15.
Proc Natl Acad Sci U S A ; 113(35): 9769-73, 2016 08 30.
Artigo em Inglês | MEDLINE | ID: mdl-27528666

RESUMO

Most theories of motivation have highlighted that human behavior is guided by the hedonic principle, according to which our choices of daily activities aim to minimize negative affect and maximize positive affect. However, it is not clear how to reconcile this idea with the fact that people routinely engage in unpleasant yet necessary activities. To address this issue, we monitored in real time the activities and moods of over 28,000 people across an average of 27 d using a multiplatform smartphone application. We found that people's choices of activities followed a hedonic flexibility principle. Specifically, people were more likely to engage in mood-increasing activities (e.g., play sports) when they felt bad, and to engage in useful but mood-decreasing activities (e.g., housework) when they felt good. These findings clarify how hedonic considerations shape human behavior. They may explain how humans overcome the allure of short-term gains in happiness to maximize long-term welfare.


Assuntos
Afeto/fisiologia , Comportamento de Escolha/fisiologia , Motivação/fisiologia , Filosofia , Feminino , Felicidade , Humanos , Masculino , Modelos Psicológicos
16.
Science ; 351(6279): 1274, 2016 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-26989244

RESUMO

Sánchez et al.'s textbook k-anonymization example does not prove, or even suggest, that location and other big-data data sets can be anonymized and of general use. The synthetic data set that they "successfully anonymize" bears no resemblance to modern high-dimensional data sets on which their methods fail. Moving forward, deidentification should not be considered a useful basis for policy.


Assuntos
Comércio , Coleta de Dados , Disseminação de Informação , Privacidade , Feminino , Humanos , Masculino
18.
Science ; 347(6221): 536-9, 2015 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-25635097

RESUMO

Large-scale data sets of human behavior have the potential to fundamentally transform the way we fight diseases, design cities, or perform research. Metadata, however, contain sensitive information. Understanding the privacy of these data sets is key to their broad use and, ultimately, their impact. We study 3 months of credit card records for 1.1 million people and show that four spatiotemporal points are enough to uniquely reidentify 90% of individuals. We show that knowing the price of a transaction increases the risk of reidentification by 22%, on average. Finally, we show that even data sets that provide coarse information at any or all of the dimensions provide little anonymity and that women are more reidentifiable than men in credit card metadata.


Assuntos
Comércio , Coleta de Dados , Disseminação de Informação , Privacidade , Feminino , Humanos , Renda , Masculino , Caracteres Sexuais
19.
PLoS One ; 9(7): e98790, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25007320

RESUMO

The rise of smartphones and web services made possible the large-scale collection of personal metadata. Information about individuals' location, phone call logs, or web-searches, is collected and used intensively by organizations and big data researchers. Metadata has however yet to realize its full potential. Privacy and legal concerns, as well as the lack of technical solutions for personal metadata management is preventing metadata from being shared and reconciled under the control of the individual. This lack of access and control is furthermore fueling growing concerns, as it prevents individuals from understanding and managing the risks associated with the collection and use of their data. Our contribution is two-fold: (1) we describe openPDS, a personal metadata management framework that allows individuals to collect, store, and give fine-grained access to their metadata to third parties. It has been implemented in two field studies; (2) we introduce and analyze SafeAnswers, a new and practical way of protecting the privacy of metadata at an individual level. SafeAnswers turns a hard anonymization problem into a more tractable security one. It allows services to ask questions whose answers are calculated against the metadata instead of trying to anonymize individuals' metadata. The dimensionality of the data shared with the services is reduced from high-dimensional metadata to low-dimensional answers that are less likely to be re-identifiable and to contain sensitive information. These answers can then be directly shared individually or in aggregate. openPDS and SafeAnswers provide a new way of dynamically protecting personal metadata, thereby supporting the creation of smart data-driven services and data science research.


Assuntos
Armazenamento e Recuperação da Informação , Privacidade , Software , Confidencialidade , Coleta de Dados , Humanos
20.
Sci Rep ; 4: 5277, 2014 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-24946798

RESUMO

Complex problem solving in science, engineering, and business has become a highly collaborative endeavor. Teams of scientists or engineers collaborate on projects using their social networks to gather new ideas and feedback. Here we bridge the literature on team performance and information networks by studying teams' problem solving abilities as a function of both their within-team networks and their members' extended networks. We show that, while an assigned team's performance is strongly correlated with its networks of expressive and instrumental ties, only the strongest ties in both networks have an effect on performance. Both networks of strong ties explain more of the variance than other factors, such as measured or self-evaluated technical competencies, or the personalities of the team members. In fact, the inclusion of the network of strong ties renders these factors non-significant in the statistical analysis. Our results have consequences for the organization of teams of scientists, engineers, and other knowledge workers tackling today's most complex problems.


Assuntos
Redes Comunitárias , Comportamento Cooperativo , Teoria dos Jogos , Modelos Estatísticos , Resolução de Problemas , Rede Social , Simulação por Computador , Estados Unidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...