ABSTRACT
Do some types of information spread faster, broader, or further than others? To understand how information diffusions differ, scholars compare structural properties of the paths taken by content as it spreads through a network, studying so-called cascades. Commonly studied cascade properties include the reach, depth, breadth, and speed of propagation. Drawing conclusions from statistical differences in these properties can be challenging, as many properties are dependent. In this work, we demonstrate the essentiality of controlling for cascade sizes when studying structural differences between collections of cascades. We first revisit two datasets from notable recent studies of online diffusion that reported content-specific differences in cascade topology: an exhaustive corpus of Twitter cascades for verified true- or false-news content by Vosoughi et al. [S. Vosoughi, D. Roy, S. Aral. Science 359, 1146-1151 (2018)] and a comparison of Twitter cascades of videos, pictures, news, and petitions by Goel et al. [S. Goel, A. Anderson, J. Hofman, D. J. Watts. Manage. Sci. 62, 180-196 (2016)]. Using methods that control for joint cascade statistics, we find that for false- and true-news cascades, the reported structural differences can almost entirely be explained by false-news cascades being larger. For videos, images, news, and petitions, structural differences persist when controlling for size. Studying classical models of diffusion, we then give conditions under which differences in structural properties under different models do or do not reduce to differences in size. Our findings are consistent with the mechanisms underlying true- and false-news diffusion being quite similar, differing primarily in the basic infectiousness of their spreading process.
Subject(s)
Information Dissemination/methods , Communication , Diffusion , Humans , Social MediaABSTRACT
The observation that individuals tend to be friends with people who are similar to themselves, commonly known as homophily, is a prominent feature of social networks. While homophily describes a bias in attribute preferences for similar others, it gives limited attention to variability. Here, we observe that attribute preferences can exhibit variation beyond what can be explained by homophily. We call this excess variation monophily to describe the presence of individuals with extreme preferences for a particular attribute possibly unrelated to their own attribute. We observe that monophily can induce a similarity among friends-of-friends without requiring any similarity among friends. To simulate homophily and monophily in synthetic networks, we propose an overdispersed extension of the classical stochastic block model. We use this model to demonstrate how homophily-based methods for predicting attributes on social networks based on friends (that is, 'the company you keep') are fundamentally different from monophily-based methods based on friends-of-friends (that is, 'the company you're kept in'). We place particular focus on predicting gender, where homophily can be weak or non-existent in practice. These findings offer an alternative perspective on network structure and prediction, complicating the already difficult task of protecting privacy on social networks.
Subject(s)
Friends/psychology , Social Behavior , Social Networking , Female , Humans , Individuality , Interpersonal Relations , Male , Models, Psychological , Politics , Sex Factors , Social Media , TerrorismABSTRACT
Methods for ranking the importance of nodes in a network have a rich history in machine learning and across domains that analyze structured data. Recent work has evaluated these methods through the "seed set expansion problem": given a subset [Formula: see text] of nodes from a community of interest in an underlying graph, can we reliably identify the rest of the community? We start from the observation that the most widely used techniques for this problem, personalized PageRank and heat kernel methods, operate in the space of "landing probabilities" of a random walk rooted at the seed set, ranking nodes according to weighted sums of landing probabilities of different length walks. Both schemes, however, lack an a priori relationship to the seed set objective. In this work, we develop a principled framework for evaluating ranking methods by studying seed set expansion applied to the stochastic block model. We derive the optimal gradient for separating the landing probabilities of two classes in a stochastic block model and find, surprisingly, that under reasonable assumptions the gradient is asymptotically equivalent to personalized PageRank for a specific choice of the PageRank parameter [Formula: see text] that depends on the block model parameters. This connection provides a formal motivation for the success of personalized PageRank in seed set expansion and node ranking generally. We use this connection to propose more advanced techniques incorporating higher moments of landing probabilities; our advanced methods exhibit greatly improved performance, despite being simple linear classification rules, and are even competitive with belief propagation.
ABSTRACT
The concept of contagion has steadily expanded from its original grounding in epidemic disease to describe a vast array of processes that spread across networks, notably social phenomena such as fads, political opinions, the adoption of new technologies, and financial decisions. Traditional models of social contagion have been based on physical analogies with biological contagion, in which the probability that an individual is affected by the contagion grows monotonically with the size of his or her "contact neighborhood"--the number of affected individuals with whom he or she is in contact. Whereas this contact neighborhood hypothesis has formed the underpinning of essentially all current models, it has been challenging to evaluate it due to the difficulty in obtaining detailed data on individual network neighborhoods during the course of a large-scale contagion process. Here we study this question by analyzing the growth of Facebook, a rare example of a social process with genuinely global adoption. We find that the probability of contagion is tightly controlled by the number of connected components in an individual's contact neighborhood, rather than by the actual size of the neighborhood. Surprisingly, once this "structural diversity" is controlled for, the size of the contact neighborhood is in fact generally a negative predictor of contagion. More broadly, our analysis shows how data at the size and resolution of the Facebook network make possible the identification of subtle structural signals that go undetected at smaller scales yet hold pivotal predictive roles for the outcomes of social processes.