Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 44
Filter
1.
Science ; 384(6695): eadi5147, 2024 May 03.
Article in English | MEDLINE | ID: mdl-38696582

ABSTRACT

Certain people occupy topological positions within social networks that enhance their effectiveness at inducing spillovers. We mapped face-to-face networks among 24,702 people in 176 isolated villages in Honduras and randomly assigned villages to targeting methods, varying the fraction of households receiving a 22-month health education package and the method by which households were chosen (randomly versus using the friendship-nomination algorithm). We assessed 117 diverse knowledge, attitude, and practice outcomes. Friendship-nomination targeting reduced the number of households needed to attain specified levels of village-wide uptake. Knowledge spread more readily than behavior, and spillovers extended to two degrees of separation. Outcomes that were intrinsically easier to adopt also manifested greater spillovers. Network targeting using friendship nomination effectively promotes population-wide improvements in welfare through social contagion.

2.
Proc Natl Acad Sci U S A ; 119(44): e2208975119, 2022 11.
Article in English | MEDLINE | ID: mdl-36279463

ABSTRACT

Randomized experiments are widely used to estimate the causal effects of a proposed treatment in many areas of science, from medicine and healthcare to the physical and biological sciences, from the social sciences to engineering, and from public policy to the technology industry. Here we consider situations where classical methods for estimating the total treatment effect on a target population are considerably biased due to confounding network effects, i.e., the fact that the treatment of an individual may impact its neighbors' outcomes, an issue referred to as network interference or as nonindividualized treatment response. A key challenge in these situations is that the network is often unknown and difficult or costly to measure. We assume a potential outcomes model with heterogeneous additive network effects, encompassing a broad class of network interference sources, including spillover, peer effects, and contagion. First, we characterize the limitations in estimating the total treatment effect without knowledge of the network that drives interference. By contrast, we subsequently develop a simple estimator and efficient randomized design that outputs an unbiased estimate with low variance in situations where one is given access to average historical baseline measurements prior to the experiment. Our solution does not require knowledge of the underlying network structure, and it comes with statistical guarantees for a broad class of models. Due to their ease of interpretation and implementation, and their theoretical guarantees, we believe our results will have significant impact on the design of randomized experiments.


Subject(s)
Randomized Controlled Trials as Topic , Causality
3.
4.
Proc Natl Acad Sci U S A ; 117(38): 23393-23400, 2020 09 22.
Article in English | MEDLINE | ID: mdl-32887799

ABSTRACT

Most real-world networks are incompletely observed. Algorithms that can accurately predict which links are missing can dramatically speed up network data collection and improve network model validation. Many algorithms now exist for predicting missing links, given a partially observed network, but it has remained unknown whether a single best predictor exists, how link predictability varies across methods and networks from different domains, and how close to optimality current methods are. We answer these questions by systematically evaluating 203 individual link predictor algorithms, representing three popular families of methods, applied to a large corpus of 550 structurally diverse networks from six scientific domains. We first show that individual algorithms exhibit a broad diversity of prediction errors, such that no one predictor or family is best, or worst, across all realistic inputs. We then exploit this diversity using network-based metalearning to construct a series of "stacked" models that combine predictors into a single algorithm. Applied to a broad range of synthetic networks, for which we may analytically calculate optimal performance, these stacked models achieve optimal or nearly optimal levels of accuracy. Applied to real-world networks, stacked models are superior, but their accuracy varies strongly by domain, suggesting that link prediction may be fundamentally easier in social networks than in biological or technological networks. These results indicate that the state of the art for link prediction comes from combining individual algorithms, which can achieve nearly optimal predictions. We close with a brief discussion of limitations and opportunities for further improvements.


Subject(s)
Machine Learning , Neural Networks, Computer , Humans , Machine Learning/standards , Models, Statistical , Predictive Value of Tests , Social Networking
5.
Proc Natl Acad Sci U S A ; 117(32): 19045-19053, 2020 08 11.
Article in English | MEDLINE | ID: mdl-32723822

ABSTRACT

Data analyses typically rely upon assumptions about the missingness mechanisms that lead to observed versus missing data, assumptions that are typically unassessable. We explore an approach where the joint distribution of observed data and missing data are specified in a nonstandard way. In this formulation, which traces back to a representation of the joint distribution of the data and missingness mechanism, apparently first proposed by J. W. Tukey, the modeling assumptions about the distributions are either assessable or are designed to allow relatively easy incorporation of substantive knowledge about the problem at hand, thereby offering a possibly realistic portrayal of the data, both observed and missing. We develop Tukey's representation for exponential-family models, propose a computationally tractable approach to inference in this class of models, and offer some general theoretical comments. We then illustrate the utility of this approach with an example in systems biology.

6.
Sociol Sci ; 6: 197-218, 2019.
Article in English | MEDLINE | ID: mdl-32704522

ABSTRACT

Negative (antagonistic) connections have been of longstanding theoretical importance for social structure. In a population of 24,696 adults interacting face to face within 176 isolated villages in western Honduras, we measured all connections that were present, amounting to 105,175 positive and 16,448 negative ties. Here, we show that negative and positive ties exhibit many of the same structural characteristics. We then develop a complete taxonomy of all 138 possible triads of two-type relationships. Consistent with balance theory, we find that antagonists of friends and friends of antagonists tend to be antagonists; but, in an important empirical refutation of balance theory, we find that antagonists of antagonists also tend to be antagonists, not friends. Finally, villages with comparable levels of animosity tend to be geographically proximate. Similar processes, involving social contact, give rise to both positive and negative social ties in rural villages, and negative ties play an important role in social structure.

8.
Nat Hum Behav ; 2(11): 822-829, 2018 11.
Article in English | MEDLINE | ID: mdl-31558815

ABSTRACT

From decentralized banking systems to digital community currencies, the way humans perceive and use money is changing1-3, thus creating novel opportunities for solving important economic and social problems. Here, we study Sardex, a fast-growing community currency in Sardinia (involving 1,477 businesses arrayed in a network with 48,170 transactions) using network analysis to shed light on its operation. Based on our experience with its day-to-day operations, we propose performance metrics tailored for Sardex but also to similar economic systems, introduce criteria for identifying prominent economic actors and investigate the interplay between network structure and economic robustness. Leveraging new methods for quantifying network 'cyclic density' and 'k-cycle centrality,' we show that geodesic transaction cycles, where money flows in a circle through the network, are prevalent and that certain nodes have a pivotal role in them. We analyse the transactions within cycles and find that the economic turnover of the involved firms is higher, and that excessive currency and debt accumulations are lower. We also measure a similar, but secondary, effect for nodes and edges that serve as intermediaries to many transactions. These metrics are strong indicators of the success of such mutual credit systems at individual and collective levels.


Subject(s)
Banking, Personal , Community Participation , Social Behavior , Social Perception , Banking, Personal/methods , Banking, Personal/organization & administration , Banking, Personal/trends , Commerce , Community Participation/economics , Community Participation/methods , Community Participation/trends , Humans , Italy , Models, Economic , Resource Allocation
9.
Ann Appl Stat ; 12(3): 1361-1384, 2018 Sep.
Article in English | MEDLINE | ID: mdl-36506698

ABSTRACT

Improving current models and hypotheses of cellular pathways is one of the major challenges of systems biology and functional genomics. There is a need for methods to build on established expert knowledge and reconcile it with results of new high-throughput studies. Moreover, the available sources of data are heterogeneous, and the data need to be integrated in different ways depending on which part of the pathway they are most informative for. In this paper, we introduce a compartment specific strategy to integrate edge, node and path data for refining a given network hypothesis. To carry out inference, we use a local-move Gibbs sampler for updating the pathway hypothesis from a compendium of heterogeneous data sources, and a new network regression idea for integrating protein attributes. We demonstrate the utility of this approach in a case study of the pheromone response MAPK pathway in the yeast S. cerevisiae.

10.
Elife ; 52016 09 13.
Article in English | MEDLINE | ID: mdl-27623011

ABSTRACT

Understanding chromatin function requires knowing the precise location of nucleosomes. MNase-seq methods have been widely applied to characterize nucleosome organization in vivo, but generally lack the accuracy to determine the precise nucleosome positions. Here we develop a computational approach leveraging digestion variability to determine nucleosome positions at a base-pair resolution from MNase-seq data. We generate a variability template as a simple error model for how MNase digestion affects the mapping of individual nucleosomes. Applied to both yeast and human cells, this analysis reveals that alternatively positioned nucleosomes are prevalent and create significant heterogeneity in a cell population. We show that the periodic occurrences of dinucleotide sequences relative to nucleosome dyads can be directly determined from genome-wide nucleosome positions from MNase-seq. Alternatively positioned nucleosomes near transcription start sites likely represent different states of promoter nucleosomes during transcription initiation. Our method can be applied to map nucleosome positions in diverse organisms at base-pair resolution.


Subject(s)
Chromatin , Computational Biology/methods , DNA/genetics , DNA/metabolism , Nucleosomes , Saccharomyces cerevisiae/genetics , Humans , Sequence Analysis, DNA
11.
Mol Cell ; 63(1): 60-71, 2016 07 07.
Article in English | MEDLINE | ID: mdl-27320198

ABSTRACT

Despite its eponymous association with the heat shock response, yeast heat shock factor 1 (Hsf1) is essential even at low temperatures. Here we show that engineered nuclear export of Hsf1 results in cytotoxicity associated with massive protein aggregation. Genome-wide analysis revealed that Hsf1 nuclear export immediately decreased basal transcription and mRNA expression of 18 genes, which predominately encode chaperones. Strikingly, rescuing basal expression of Hsp70 and Hsp90 chaperones enabled robust cell growth in the complete absence of Hsf1. With the exception of chaperone gene induction, the vast majority of the heat shock response was Hsf1 independent. By comparative analysis of mammalian cell lines, we found that only heat shock-induced but not basal expression of chaperones is dependent on the mammalian Hsf1 homolog (HSF1). Our work reveals that yeast chaperone gene expression is an essential housekeeping mechanism and provides a roadmap for defining the function of HSF1 as a driver of oncogenesis.


Subject(s)
DNA-Binding Proteins/metabolism , Heat-Shock Proteins/metabolism , Heat-Shock Response , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Transcription Factors/metabolism , Transcription, Genetic , Animals , CRISPR-Cas Systems , Cell Line , DNA-Binding Proteins/genetics , Embryonic Stem Cells/metabolism , Fibroblasts/metabolism , Gene Expression Regulation, Fungal , Gene Regulatory Networks , HSP70 Heat-Shock Proteins/metabolism , HSP90 Heat-Shock Proteins/metabolism , Heat Shock Transcription Factors , Heat-Shock Proteins/genetics , Homeostasis , Mice, 129 Strain , Mice, Inbred CBA , Protein Aggregates , Protein Interaction Maps , RNA, Fungal/genetics , RNA, Fungal/metabolism , RNA, Messenger/genetics , RNA, Messenger/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/genetics , Time Factors , Transcription Factors/genetics , Transfection
12.
Mol Biol Cell ; 27(8): 1383-96, 2016 Apr 15.
Article in English | MEDLINE | ID: mdl-26941329

ABSTRACT

Cell growth rate is regulated in response to the abundance and molecular form of essential nutrients. InSaccharomyces cerevisiae(budding yeast), the molecular form of environmental nitrogen is a major determinant of cell growth rate, supporting growth rates that vary at least threefold. Transcriptional control of nitrogen use is mediated in large part by nitrogen catabolite repression (NCR), which results in the repression of specific transcripts in the presence of a preferred nitrogen source that supports a fast growth rate, such as glutamine, that are otherwise expressed in the presence of a nonpreferred nitrogen source, such as proline, which supports a slower growth rate. Differential expression of the NCR regulon and additional nitrogen-responsive genes results in >500 transcripts that are differentially expressed in cells growing in the presence of different nitrogen sources in batch cultures. Here we find that in growth rate-controlled cultures using nitrogen-limited chemostats, gene expression programs are strikingly similar regardless of nitrogen source. NCR expression is derepressed in all nitrogen-limiting chemostat conditions regardless of nitrogen source, and in these conditions, only 34 transcripts exhibit nitrogen source-specific differential gene expression. Addition of either the preferred nitrogen source, glutamine, or the nonpreferred nitrogen source, proline, to cells growing in nitrogen-limited chemostats results in rapid, dose-dependent repression of the NCR regulon. Using a novel means of computational normalization to compare global gene expression programs in steady-state and dynamic conditions, we find evidence that the addition of nitrogen to nitrogen-limited cells results in the transient overproduction of transcripts required for protein translation. Simultaneously, we find that that accelerated mRNA degradation underlies the rapid clearing of a subset of transcripts, which is most pronounced for the highly expressed NCR-regulated permease genesGAP1,MEP2,DAL5,PUT4, andDIP5 Our results reveal novel aspects of nitrogen-regulated gene expression and highlight the need for a quantitative approach to study how the cell coordinates protein translation and nitrogen assimilation to optimize cell growth in different environments.


Subject(s)
Gene Expression Regulation, Fungal , Gene-Environment Interaction , Nitrogen/metabolism , Saccharomyces cerevisiae/genetics , Ammonia/metabolism , RNA, Messenger/metabolism , Regulon , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae/physiology , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism , Transcriptome
14.
Cell ; 162(6): 1286-98, 2015 Sep 10.
Article in English | MEDLINE | ID: mdl-26359986

ABSTRACT

Heat causes protein misfolding and aggregation and, in eukaryotic cells, triggers aggregation of proteins and RNA into stress granules. We have carried out extensive proteomic studies to quantify heat-triggered aggregation and subsequent disaggregation in budding yeast, identifying >170 endogenous proteins aggregating within minutes of heat shock in multiple subcellular compartments. We demonstrate that these aggregated proteins are not misfolded and destined for degradation. Stable-isotope labeling reveals that even severely aggregated endogenous proteins are disaggregated without degradation during recovery from shock, contrasting with the rapid degradation observed for many exogenous thermolabile proteins. Although aggregation likely inactivates many cellular proteins, in the case of a heterotrimeric aminoacyl-tRNA synthetase complex, the aggregated proteins remain active with unaltered fidelity. We propose that most heat-induced aggregation of mature proteins reflects the operation of an adaptive, autoregulatory process of functionally significant aggregate assembly and disassembly that aids cellular adaptation to thermal stress.


Subject(s)
Heat-Shock Response , Saccharomyces cerevisiae/cytology , Saccharomyces cerevisiae/physiology , Cycloheximide/pharmacology , Cytoplasmic Granules/metabolism , Protein Aggregates , Protein Biosynthesis/drug effects , Protein Synthesis Inhibitors/pharmacology , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/metabolism
15.
Stat Comput ; 25(4): 781-795, 2015 Jul 01.
Article in English | MEDLINE | ID: mdl-26139959

ABSTRACT

Estimation with large amounts of data can be facilitated by stochastic gradient methods, in which model parameters are updated sequentially using small batches of data at each step. Here, we review early work and modern results that illustrate the statistical properties of these methods, including convergence rates, stability, and asymptotic bias and variance. We then overview modern applications where these methods are useful, ranging from an online version of the EM algorithm to deep learning. In light of these results, we argue that stochastic gradient methods are poised to become benchmark principled estimation procedures for large data sets, especially those in the family of stable proximal methods, such as implicit stochastic gradient descent.

16.
PLoS Genet ; 11(5): e1005206, 2015 May.
Article in English | MEDLINE | ID: mdl-25950722

ABSTRACT

Cells respond to their environment by modulating protein levels through mRNA transcription and post-transcriptional control. Modest observed correlations between global steady-state mRNA and protein measurements have been interpreted as evidence that mRNA levels determine roughly 40% of the variation in protein levels, indicating dominant post-transcriptional effects. However, the techniques underlying these conclusions, such as correlation and regression, yield biased results when data are noisy, missing systematically, and collinear---properties of mRNA and protein measurements---which motivated us to revisit this subject. Noise-robust analyses of 24 studies of budding yeast reveal that mRNA levels explain more than 85% of the variation in steady-state protein levels. Protein levels are not proportional to mRNA levels, but rise much more rapidly. Regulation of translation suffices to explain this nonlinear effect, revealing post-transcriptional amplification of, rather than competition with, transcriptional signals. These results substantially revise widely credited models of protein-level regulation, and introduce multiple noise-aware approaches essential for proper analysis of many biological phenomena.


Subject(s)
Gene Expression Regulation, Fungal , RNA Processing, Post-Transcriptional , RNA, Messenger/genetics , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/genetics , Models, Genetic , RNA, Messenger/metabolism , Reproducibility of Results , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/genetics , Transcription, Genetic
17.
Proc Natl Acad Sci U S A ; 112(21): 6595-600, 2015 May 26.
Article in English | MEDLINE | ID: mdl-25964337

ABSTRACT

Social networks affect many aspects of life, including the spread of diseases, the diffusion of information, the workers' productivity, and consumers' behavior. Little is known, however, about how these networks form and change. Estimating causal effects and mechanisms that drive social network formation and dynamics is challenging because of the complexity of engineering social relations in a controlled environment, endogeneity between network structure and individual characteristics, and the lack of time-resolved data about individuals' behavior. We leverage data from a sample of 1.5 million college students on Facebook, who wrote more than 630 million messages and 590 million posts over 4 years, to design a long-term natural experiment of friendship formation and social dynamics in the aftermath of a natural disaster. The analysis shows that affected individuals are more likely to strengthen interactions, while maintaining the same number of friends as unaffected individuals. Our findings suggest that the formation of social relationships may serve as a coping mechanism to deal with high-stress situations and build resilience in communities.


Subject(s)
Social Media , Social Networking , Cyclonic Storms , Disasters , Humans , Internet , Interpersonal Relations , Students , United States , Universities
18.
J Am Stat Assoc ; 110(509): 27-44, 2015 Mar 01.
Article in English | MEDLINE | ID: mdl-25954056

ABSTRACT

We consider the problem of quantifying the degree of coordination between transcription and translation, in yeast. Several studies have reported a surprising lack of coordination over the years, in organisms as different as yeast and human, using diverse technologies. However, a close look at this literature suggests that the lack of reported correlation may not reflect the biology of regulation. These reports do not control for between-study biases and structure in the measurement errors, ignore key aspects of how the data connect to the estimand, and systematically underestimate the correlation as a consequence. Here, we design a careful meta-analysis of 27 yeast data sets, supported by a multilevel model, full uncertainty quantification, a suite of sensitivity analyses and novel theory, to produce a more accurate estimate of the correlation between mRNA and protein levels-a proxy for coordination. From a statistical perspective, this problem motivates new theory on the impact of noise, model mis-specifications and non-ignorable missing data on estimates of the correlation between high dimensional responses. We find that the correlation between mRNA and protein levels is quite high under the studied conditions, in yeast, suggesting that post-transcriptional regulation plays a less prominent role than previously thought.

19.
J Biomol Screen ; 20(8): 985-97, 2015 Sep.
Article in English | MEDLINE | ID: mdl-25918037

ABSTRACT

High-content screening (HCS) using RNA interference (RNAi) in combination with automated microscopy is a powerful investigative tool to explore complex biological processes. However, despite the plethora of data generated from these screens, little progress has been made in analyzing HC data using multivariate methods that exploit the full richness of multidimensional data. We developed a novel multivariate method for HCS, multivariate robust analysis method (M-RAM), integrating image feature selection with ranking of perturbations for hit identification, and applied this method to an HC RNAi screen to discover novel components of the DNA damage response in an osteosarcoma cell line. M-RAM automatically selects the most informative phenotypic readouts and time points to facilitate the more efficient design of follow-up experiments and enhance biological understanding. Our method outperforms univariate hit identification and identifies relevant genes that these approaches would have missed. We found that statistical cell-to-cell variation in phenotypic responses is an important predictor of hits in RNAi-directed image-based screens. Genes that we identified as modulators of DNA damage signaling in U2OS cells include B-Raf, a cancer driver gene in multiple tumor types, whose role in DNA damage signaling we confirm experimentally, and multiple subunits of protein kinase A.


Subject(s)
High-Throughput Screening Assays , Models, Biological , RNA Interference , RNA, Messenger/genetics , RNA, Small Interfering/genetics , Algorithms , Animals , Cell Line , Computer Simulation , DNA Damage , Gene Knockdown Techniques , Humans , Phenotype , Proto-Oncogene Proteins B-raf/genetics
20.
Proc Natl Acad Sci U S A ; 112(18): 5643-8, 2015 May 05.
Article in English | MEDLINE | ID: mdl-25902504

ABSTRACT

Public transportation systems are an essential component of major cities. The widespread use of smart cards for automated fare collection in these systems offers a unique opportunity to understand passenger behavior at a massive scale. In this study, we use network-wide data obtained from smart cards in the London transport system to predict future traffic volumes, and to estimate the effects of disruptions due to unplanned closures of stations or lines. Disruptions, or shocks, force passengers to make different decisions concerning which stations to enter or exit. We describe how these changes in passenger behavior lead to possible overcrowding and model how stations will be affected by given disruptions. This information can then be used to mitigate the effects of these shocks because transport authorities may prepare in advance alternative solutions such as additional buses near the most affected stations. We describe statistical methods that leverage the large amount of smart-card data collected under the natural state of the system, where no shocks take place, as variables that are indicative of behavior under disruptions. We find that features extracted from the natural regime data can be successfully exploited to describe different disruption regimes, and that our framework can be used as a general tool for any similar complex transportation system.


Subject(s)
Accidents, Traffic/statistics & numerical data , Cities , Motor Vehicles/statistics & numerical data , Transportation/statistics & numerical data , Accidents, Traffic/prevention & control , Accidents, Traffic/trends , Algorithms , City Planning/methods , City Planning/statistics & numerical data , City Planning/trends , Environment Design/statistics & numerical data , Environment Design/trends , Forecasting , Humans , London , Maps as Topic , Models, Theoretical , Transportation/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...