Search | VHL Regional Portal

Investigating the Visual Utility of Differentially Private Scatterplots.

Panavas, Liudas; Crnovrsanin, Tarik; Adams, Jane Lydia; Ullman, Jonathan; Sargavad, Ali; Tory, Melanie; Dunne, Cody.

IEEE Trans Vis Comput Graph ; PP2023 Jul 05.

Article in English | MEDLINE | ID: mdl-37405888

ABSTRACT

Increasingly, visualization practitioners are working with, using, and studying private and sensitive data. There can be many stakeholders interested in the resulting analyses-but widespread sharing of the data can cause harm to individuals, companies, and organizations. Practitioners are increasingly turning to differential privacy to enable public data sharing with a guaranteed amount of privacy. Differential privacy algorithms do this by aggregating data statistics with noise, and this now-private data can be released visually with differentially private scatterplots. While the private visual output is affected by the algorithm choice, privacy level, bin number, data distribution, and user task, there is little guidance on how to choose and balance the effect of these parameters. To address this gap, we had experts examine 1,200 differentially private scatterplots created with a variety of parameter choices and tested their ability to see aggregate patterns in the private output (i.e. the visual utility of the chart). We synthesized these results to provide easy-to-use guidance for visualization practitioners releasing private data through scatterplots. Our findings also provide a ground truth for visual utility, which we use to benchmark automated utility metrics from various fields. We demonstrate how multi-scale structural similarity (MS-SSIM), the metric most strongly correlated with our study's utility results, can be used to optimize parameter selection. A free copy of this paper along with all supplemental materials is available at https://osf.io/wej4s/.

Quantifying Changes in the Language Used Around Mental Health on Twitter Over 10 Years: Observational Study.

Stupinski, Anne Marie; Alshaabi, Thayer; Arnold, Michael V; Adams, Jane Lydia; Minot, Joshua R; Price, Matthew; Dodds, Peter Sheridan; Danforth, Christopher M.

JMIR Ment Health ; 9(3): e33685, 2022 Mar 30.

Article in English | MEDLINE | ID: mdl-35353049

ABSTRACT

BACKGROUND: Mental health challenges are thought to affect approximately 10% of the global population each year, with many of those affected going untreated because of the stigma and limited access to services. As social media lowers the barrier for joining difficult conversations and finding supportive groups, Twitter is an open source of language data describing the changing experience of a stigmatized group. OBJECTIVE: By measuring changes in the conversation around mental health on Twitter, we aim to quantify the hypothesized increase in discussions and awareness of the topic as well as the corresponding reduction in stigma around mental health. METHODS: We explored trends in words and phrases related to mental health through a collection of 1-, 2-, and 3-grams parsed from a data stream of approximately 10% of all English tweets from 2010 to 2021. We examined temporal dynamics of mental health language and measured levels of positivity of the messages. Finally, we used the ratio of original tweets to retweets to quantify the fraction of appearances of mental health language that was due to social amplification. RESULTS: We found that the popularity of the phrase mental health increased by nearly two orders of magnitude between 2012 and 2018. We observed that mentions of mental health spiked annually and reliably because of mental health awareness campaigns as well as unpredictably in response to mass shootings, celebrities dying by suicide, and popular fictional television stories portraying suicide. We found that the level of positivity of messages containing mental health, while stable through the growth period, has declined recently. Finally, we observed that since 2015, mentions of mental health have become increasingly due to retweets, suggesting that the stigma associated with the discussion of mental health on Twitter has diminished with time. CONCLUSIONS: These results provide useful texture regarding the growing conversation around mental health on Twitter and suggest that more awareness and acceptance has been brought to the topic compared with past years.

Computational timeline reconstruction of the stories surrounding Trump: Story turbulence, narrative control, and collective chronopathy.

Dodds, Peter Sheridan; Minot, Joshua R; Arnold, Michael V; Alshaabi, Thayer; Adams, Jane Lydia; Reagan, Andrew J; Danforth, Christopher M.

PLoS One ; 16(12): e0260592, 2021.

Article in English | MEDLINE | ID: mdl-34879105

ABSTRACT

Measuring the specific kind, temporal ordering, diversity, and turnover rate of stories surrounding any given subject is essential to developing a complete reckoning of that subject's historical impact. Here, we use Twitter as a distributed news and opinion aggregation source to identify and track the dynamics of the dominant day-scale stories around Donald Trump, the 45th President of the United States. Working with a data set comprising around 20 billion 1-grams, we first compare each day's 1-gram and 2-gram usage frequencies to those of a year before, to create day- and week-scale timelines for Trump stories for 2016-2021. We measure Trump's narrative control, the extent to which stories have been about Trump or put forward by Trump. We then quantify story turbulence and collective chronopathy-the rate at which a population's stories for a subject seem to change over time. We show that 2017 was the most turbulent overall year for Trump. In 2020, story generation slowed dramatically during the first two major waves of the COVID-19 pandemic, with rapid turnover returning first with the Black Lives Matter protests following George Floyd's murder and then later by events leading up to and following the 2020 US presidential election, including the storming of the US Capitol six days into 2021. Trump story turnover for 2 months during the COVID-19 pandemic was on par with that of 3 days in September 2017. Our methods may be applied to any well-discussed phenomenon, and have potential to enable the computational aspects of journalism, history, and biography.

Subject(s)

Politics , COVID-19/epidemiology , COVID-19/pathology , COVID-19/virology , Humans , SARS-CoV-2/isolation & purification , United States

How the world's collective attention is being paid to a pandemic: COVID-19 related n-gram time series for 24 languages on Twitter.

Alshaabi, Thayer; Arnold, Michael V; Minot, Joshua R; Adams, Jane Lydia; Dewhurst, David Rushing; Reagan, Andrew J; Muhamad, Roby; Danforth, Christopher M; Dodds, Peter Sheridan.

PLoS One ; 16(1): e0244476, 2021.

Article in English | MEDLINE | ID: mdl-33406101

ABSTRACT

In confronting the global spread of the coronavirus disease COVID-19 pandemic we must have coordinated medical, operational, and political responses. In all efforts, data is crucial. Fundamentally, and in the possible absence of a vaccine for 12 to 18 months, we need universal, well-documented testing for both the presence of the disease as well as confirmed recovery through serological tests for antibodies, and we need to track major socioeconomic indices. But we also need auxiliary data of all kinds, including data related to how populations are talking about the unfolding pandemic through news and stories. To in part help on the social media side, we curate a set of 2000 day-scale time series of 1- and 2-grams across 24 languages on Twitter that are most 'important' for April 2020 with respect to April 2019. We determine importance through our allotaxonometric instrument, rank-turbulence divergence. We make some basic observations about some of the time series, including a comparison to numbers of confirmed deaths due to COVID-19 over time. We broadly observe across all languages a peak for the language-specific word for 'virus' in January 2020 followed by a decline through February and then a surge through March and April. The world's collective attention dropped away while the virus spread out from China. We host the time series on Gitlab, updating them on a daily basis while relevant. Our main intent is for other researchers to use these time series to enhance whatever analyses that may be of use during the pandemic as well as for retrospective investigations.

Subject(s)

COVID-19/psychology , Pandemics/statistics & numerical data , Social Media/trends , Attention , COVID-19/etiology , Coronavirus Infections/etiology , Coronavirus Infections/psychology , Humans , Language , Retrospective Studies , SARS-CoV-2/pathogenicity

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL