Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Phys Rev E ; 107(4): L042301, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37198821

RESUMO

Real-world networks are rarely static. Recently, there has been increasing interest in both network growth and network densification, in which the number of edges scales superlinearly with the number of nodes. Less studied but equally important, however, are scaling laws of higher-order cliques, which can drive clustering and network redundancy. In this paper, we study how cliques grow with network size, by analyzing several empirical networks from emails to Wikipedia interactions. Our results show superlinear scaling laws whose exponents increase with clique size, in contrast to predictions from a previous model. We then show that these results are in qualitative agreement with a model that we propose, the local preferential attachment model, where an incoming node links not only to a target node, but also to its higher-degree neighbors. Our results provide insights into how networks grow and where network redundancy occurs.

2.
Pac Symp Biocomput ; 28: 121-132, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36540970

RESUMO

Groups of distantly related individuals who share a short segment of their genome identical-by-descent (IBD) can provide insights about rare traits and diseases in massive biobanks using IBD mapping. Clustering algorithms play an important role in finding these groups accurately and at scale. We set out to analyze the fitness of commonly used, fast and scalable clustering algorithms for IBD mapping applications. We designed a realistic benchmark for local IBD graphs and utilized it to compare the statistical power of clustering algorithms via simulating 2.3 million clusters across 850 experiments. We found Infomap and Markov Clustering (MCL) community detection methods to have high statistical power in most of the scenarios. They yield a 30% increase in power compared to the current state-of-art approach, with a 3 orders of magnitude lower runtime. We also found that standard clustering metrics, such as modularity, cannot predict statistical power of algorithms in IBD mapping applications. We extend our findings to real datasets by analyzing the Population Architecture using Genomics and Epidemiology (PAGE) Study dataset with 51,000 samples and 2 million shared segments on Chromosome 1, resulting in the extraction of 39 million local IBD clusters. We demonstrate the power of our approach by recovering signals of rare genetic variation in the Whole-Exome Sequence data of 200,000 individuals in the UK Biobank. We provide an efficient implementation to enable clustering at scale for IBD mapping for various populations and scenarios.Supplementary Information: The code, along with supplementary methods and figures are available at https://github.com/roohy/localIBDClustering.


Assuntos
Algoritmos , Biologia Computacional , Humanos , Genômica , Análise por Conglomerados
3.
Sci Data ; 9(1): 536, 2022 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-36050329

RESUMO

The TILES-2019 data set consists of behavioral and physiological data gathered from 57 medical residents (i.e., trainees) working in an intensive care unit (ICU) in the United States. The data set allows for the exploration of longitudinal changes in well-being, teamwork, and job performance in a demanding environment, as residents worked in the ICU for three weeks. Residents wore a Fitbit, a Bluetooth-based proximity sensor, and an audio-feature recorder. They completed daily surveys and interviews at the beginning and end of their rotation. In addition, we collected data from environmental sensors (i.e., Internet-of-Things Bluetooth data hubs) and obtained hospital records (e.g., patient census) and residents' job evaluations. This data set may be may be of interest to researchers interested in workplace stress, group dynamics, social support, the physical and psychological effects of witnessing patient deaths, predicting survey data from sensors, and privacy-aware and privacy-preserving machine learning. Notably, a small subset of the data was collected during the first wave of the COVID-19 pandemic.


Assuntos
Internato e Residência , Estresse Ocupacional , COVID-19 , Humanos , Unidades de Terapia Intensiva , Pandemias
4.
Proc Natl Acad Sci U S A ; 119(40): e2206070119, 2022 Oct 04.
Artigo em Inglês | MEDLINE | ID: mdl-36161888

RESUMO

Diversity in science is necessary to improve innovation and increase the capacity of the scientific workforce. Despite decades-long efforts to increase gender diversity, however, women remain a small minority in many fields, especially in senior positions. The dearth of elite women scientists, in turn, leaves fewer women to serve as mentors and role models for young women scientists. To shed light on gender disparities in science, we study prominent scholars who were elected to the National Academy of Sciences. We construct author citation networks that capture the structure of recognition among scholars' peers. We identify gender disparities in the patterns of peer citations and show that these differences are strong enough to accurately predict the scholar's gender. In contrast, we do not observe disparities due to prestige, with few significant differences in the structure of citations of scholars affiliated with high-ranked and low-ranked institutions. These results provide further evidence that a scholar's gender plays a role in the mechanisms of success in science.

5.
EPJ Data Sci ; 11(1): 49, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36090462

RESUMO

Change point detection has many practical applications, from anomaly detection in data to scene changes in robotics; however, finding changes in high dimensional data is an ongoing challenge. We describe a self-training model-agnostic framework to detect changes in arbitrarily complex data. The method consists of two steps. First, it labels data as before or after a candidate change point and trains a classifier to predict these labels. The accuracy of this classifier varies for different candidate change points. By modeling the accuracy change we can infer the true change point and fraction of data affected by the change (a proxy for detection confidence). We demonstrate how our framework can achieve low bias over a wide range of conditions and detect changes in high dimensional, noisy data more accurately than alternative methods. We use the framework to identify changes in real-world data and measure their effects using regression discontinuity designs, thereby uncovering potential natural experiments, such as the effect of pandemic lockdowns on air pollution and the effect of policy changes on performance and persistence in a learning platform. Our method opens new avenues for data-driven discovery due to its flexibility, accuracy and robustness in identifying changes in data.

6.
Sci Rep ; 12(1): 15671, 2022 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-36123387

RESUMO

Online misinformation is believed to have contributed to vaccine hesitancy during the Covid-19 pandemic, highlighting concerns about social media's destabilizing role in public life. Previous research identified a link between political conservatism and sharing misinformation; however, it is not clear how partisanship affects how much misinformation people see online. As a result, we do not know whether partisanship drives exposure to misinformation or people selectively share misinformation despite being exposed to factual content. To address this question, we study Twitter discussions about the Covid-19 pandemic, classifying users along the political and factual spectrum based on the information sources they share. In addition, we quantify exposure through retweet interactions. We uncover partisan asymmetries in the exposure to misinformation: conservatives are more likely to see and share misinformation, and while users' connections expose them to ideologically congruent content, the interactions between political and factual dimensions create conditions for the highly polarized users-hardline conservatives and liberals-to amplify misinformation. Overall, however, misinformation receives less attention than factual content and political moderates, the bulk of users in our sample, help filter out misinformation. Identifying the extent of polarization and how political ideology exacerbates misinformation can help public health experts and policy makers improve their messaging.


Assuntos
COVID-19 , Política , Mídias Sociais , Comunicação , Humanos , Pandemias , Saúde Pública
7.
Philos Trans A Math Phys Eng Sci ; 380(2214): 20210122, 2022 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-34802275

RESUMO

The COVID-19 pandemic has posed unprecedented challenges to public health world-wide. To make decisions about mitigation strategies and to understand the disease dynamics, policy makers and epidemiologists must know how the disease is spreading in their communities. Here we analyse confirmed infections and deaths over multiple geographic scales to show that COVID-19's impact is highly unequal: many regions have nearly zero infections, while others are hot spots. We attribute the effect to a Reed-Hughes-like mechanism in which the disease arrives to regions at different times and grows exponentially at different rates. Faster growing regions correspond to hot spots that dominate spatially aggregated statistics, thereby skewing growth rates at larger spatial scales. Finally, we use these analyses to show that, across multiple spatial scales, the growth rate of COVID-19 has slowed down with each surge. These results demonstrate a trade-off when estimating growth rates: while spatial aggregation lowers noise, it can increase bias. Public policy and epidemic modelling should be aware of, and aim to address, this distortion. This article is part of the theme issue 'Data science approaches to infectious disease surveillance'.


Assuntos
COVID-19 , Pandemias , Viés , Humanos , SARS-CoV-2
8.
J Med Internet Res ; 23(6): e26692, 2021 06 14.
Artigo em Inglês | MEDLINE | ID: mdl-34014831

RESUMO

BACKGROUND: The novel coronavirus pandemic continues to ravage communities across the United States. Opinion surveys identified the importance of political ideology in shaping perceptions of the pandemic and compliance with preventive measures. OBJECTIVE: The aim of this study was to measure political partisanship and antiscience attitudes in the discussions about the pandemic on social media, as well as their geographic and temporal distributions. METHODS: We analyzed a large set of tweets from Twitter related to the pandemic, collected between January and May 2020, and developed methods to classify the ideological alignment of users along the moderacy (hardline vs moderate), political (liberal vs conservative), and science (antiscience vs proscience) dimensions. RESULTS: We found a significant correlation in polarized views along the science and political dimensions. Moreover, politically moderate users were more aligned with proscience views, while hardline users were more aligned with antiscience views. Contrary to expectations, we did not find that polarization grew over time; instead, we saw increasing activity by moderate proscience users. We also show that antiscience conservatives in the United States tended to tweet from the southern and northwestern states, while antiscience moderates tended to tweet from the western states. The proportion of antiscience conservatives was found to correlate with COVID-19 cases. CONCLUSIONS: Our findings shed light on the multidimensional nature of polarization and the feasibility of tracking polarized opinions about the pandemic across time and space through social media data.


Assuntos
COVID-19/terapia , Mídias Sociais/tendências , Humanos , Uso da Internet , Política , SARS-CoV-2 , Telemedicina
9.
J Med Internet Res ; 23(4): e25379, 2021 04 12.
Artigo em Inglês | MEDLINE | ID: mdl-33735097

RESUMO

BACKGROUND: Gender imbalances in academia have been evident historically and persist today. For the past 60 years, we have witnessed the increase of participation of women in biomedical disciplines, showing that the gender gap is shrinking. However, preliminary evidence suggests that women, including female researchers, are disproportionately affected by the COVID-19 pandemic in terms of unequal distribution of childcare, elderly care, and other kinds of domestic and emotional labor. Sudden lockdowns and abrupt shifts in daily routines have had disproportionate consequences on their productivity, which is reflected by a sudden drop in research output in biomedical research, consequently affecting the number of female authors of scientific publications. OBJECTIVE: The objective of this study is to test the hypothesis that the COVID-19 pandemic has had a disproportionate adverse effect on the productivity of female researchers in the biomedical field in terms of authorship of scientific publications. METHODS: This is a retrospective observational bibliometric study. We investigated the proportion of male and female researchers who published scientific papers during the COVID-19 pandemic, using bibliometric data from biomedical preprint servers and selected Springer-Nature journals. We used the ordinary least squares regression model to estimate the expected proportions over time by correcting for temporal trends. We also used a set of statistical methods, such as the Kolmogorov-Smirnov test and regression discontinuity design, to test the validity of the results. RESULTS: A total of 78,950 papers from the bioRxiv and medRxiv repositories and from 62 selected Springer-Nature journals by 346,354 unique authors were analyzed. The acquired data set consisted of papers that were published between January 1, 2019, and August 2, 2020. The proportion of female first authors publishing in the biomedical field during the pandemic dropped by 9.1%, on average, across disciplines (expected arithmetic mean yest=0.39; observed arithmetic mean y=0.35; standard error of the estimate, Sest=0.007; standard error of the observation, σx=0.004). The impact was particularly pronounced for papers related to COVID-19 research, where the proportion of female scientists in the first author position dropped by 28% (yest=0.39; y=0.28; Sest=0.007; σx=0.007). When looking at the last authors, the proportion of women dropped by 7.9%, on average (yest=0.25; y=0.23; Sest=0.005; σx=0.003), while the proportion of women writing about COVID-19 as the last author decreased by 18.8% (yest=0.25; y=0.21; Sest=0.005; σx=0.007). Further, by geocoding authors' affiliations, we showed that the gender disparities became even more apparent when disaggregated by country, up to 35% in some cases. CONCLUSIONS: Our findings document a decrease in the number of publications by female authors in the biomedical field during the global pandemic. This effect was particularly pronounced for papers related to COVID-19, indicating that women are producing fewer publications related to COVID-19 research. This sudden increase in the gender gap was persistent across the 10 countries with the highest number of researchers. These results should be used to inform the scientific community of this worrying trend in COVID-19 research and the disproportionate effect that the pandemic has had on female academics.


Assuntos
Autoria , Bibliometria , Pesquisa Biomédica/estatística & dados numéricos , COVID-19 , Editoração/estatística & dados numéricos , Pesquisadores/estatística & dados numéricos , Distribuição por Sexo , COVID-19/epidemiologia , Eficiência , Feminino , Humanos , Masculino , Pandemias , Estudos Retrospectivos , Fatores Sexuais
10.
IEEE Trans Emerg Top Comput ; 9(1): 316-328, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35548703

RESUMO

Data science is a field that has developed to enable efficient integration and analysis of increasingly large data sets in many domains. In particular, big data in genetics, neuroimaging, mobile health, and other subfields of biomedical science, promises new insights, but also poses challenges. To address these challenges, the National Institutes of Health launched the Big Data to Knowledge (BD2K) initiative, including a Training Coordinating Center (TCC) tasked with developing a resource for personalized data science training for biomedical researchers. The BD2K TCC web portal is powered by ERuDIte, the Educational Resource Discovery Index, which collects training resources for data science, including online courses, videos of tutorials and research talks, textbooks, and other web-based materials. While the availability of so many potential learning resources is exciting, they are highly heterogeneous in quality, difficulty, format, and topic, making the field intimidating to enter and difficult to navigate. Moreover, data science is rapidly evolving, so there is a constant influx of new materials and concepts. We leverage data science techniques to build ERuDIte itself, using data extraction, data integration, machine learning, information retrieval, and natural language processing to automatically collect, integrate, describe, and organize existing online resources for learning data science.

11.
Sci Rep ; 10(1): 20427, 2020 11 24.
Artigo em Inglês | MEDLINE | ID: mdl-33235260

RESUMO

Applications from finance to epidemiology and cyber-security require accurate forecasts of dynamic phenomena, which are often only partially observed. We demonstrate that a system's predictability degrades as a function of temporal sampling, regardless of the adopted forecasting model. We quantify the loss of predictability due to sampling, and show that it cannot be recovered by using external signals. We validate the generality of our theoretical findings in real-world partially observed systems representing infectious disease outbreaks, online discussions, and software development projects. On a variety of prediction tasks-forecasting new infections, the popularity of topics in online discussions, or interest in cryptocurrency projects-predictability irrecoverably decays as a function of sampling, unveiling predictability limits in partially observed systems.

12.
Sci Data ; 7(1): 354, 2020 10 16.
Artigo em Inglês | MEDLINE | ID: mdl-33067468

RESUMO

We present a novel longitudinal multimodal corpus of physiological and behavioral data collected from direct clinical providers in a hospital workplace. We designed the study to investigate the use of off-the-shelf wearable and environmental sensors to understand individual-specific constructs such as job performance, interpersonal interaction, and well-being of hospital workers over time in their natural day-to-day job settings. We collected behavioral and physiological data from n = 212 participants through Internet-of-Things Bluetooth data hubs, wearable sensors (including a wristband, a biometrics-tracking garment, a smartphone, and an audio-feature recorder), together with a battery of surveys to assess personality traits, behavioral states, job performance, and well-being over time. Besides the default use of the data set, we envision several novel research opportunities and potential applications, including multi-modal and multi-task behavioral modeling, authentication through biometrics, and privacy-aware and privacy-preserving machine learning.


Assuntos
Comportamento , Recursos Humanos em Hospital , Nível de Saúde , Hospitais , Humanos , Internet das Coisas , Personalidade , Dispositivos Eletrônicos Vestíveis
13.
Hum Behav Emerg Technol ; 2(3): 200-211, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32838229

RESUMO

Since the outbreak in China in late 2019, the novel coronavirus (COVID-19) has spread around the world and has come to dominate online conversations. By linking 2.3 million Twitter users to locations within the United States, we study in aggregate how political characteristics of the locations affect the evolution of online discussions about COVID-19. We show that COVID-19 chatter in the United States is largely shaped by political polarization. Partisanship correlates with sentiment toward government measures and the tendency to share health and prevention messaging. Cross-ideological interactions are modulated by user segregation and polarized network structure. We also observe a correlation between user engagement with topics related to public health and the varying impact of the disease outbreak in different U.S. states. These findings may help inform policies both online and offline. Decision-makers may calibrate their use of online platforms to measure the effectiveness of public health campaigns, and to monitor the reception of national and state-level policies, by tracking in real-time discussions in a highly polarized social media ecosystem.

14.
Proc Math Phys Eng Sci ; 476(2237): 20190772, 2020 May.
Artigo em Inglês | MEDLINE | ID: mdl-32523411

RESUMO

Network topologies can be highly non-trivial, due to the complex underlying behaviours that form them. While past research has shown that some processes on networks may be characterized by local statistics describing nodes and their neighbours, such as degree assortativity, these quantities fail to capture important sources of variation in network structure. We define a property called transsortativity that describes correlations among a node's neighbours. Transsortativity can be systematically varied, independently of the network's degree distribution and assortativity. Moreover, it can significantly impact the spread of contagions as well as the perceptions of neighbours, known as the majority illusion. Our work improves our ability to create and analyse more realistic models of complex networks.

15.
JMIR Public Health Surveill ; 6(2): e19273, 2020 05 29.
Artigo em Inglês | MEDLINE | ID: mdl-32427106

RESUMO

BACKGROUND: At the time of this writing, the coronavirus disease (COVID-19) pandemic outbreak has already put tremendous strain on many countries' citizens, resources, and economies around the world. Social distancing measures, travel bans, self-quarantines, and business closures are changing the very fabric of societies worldwide. With people forced out of public spaces, much of the conversation about these phenomena now occurs online on social media platforms like Twitter. OBJECTIVE: In this paper, we describe a multilingual COVID-19 Twitter data set that we are making available to the research community via our COVID-19-TweetIDs GitHub repository. METHODS: We started this ongoing data collection on January 28, 2020, leveraging Twitter's streaming application programming interface (API) and Tweepy to follow certain keywords and accounts that were trending at the time data collection began. We used Twitter's search API to query for past tweets, resulting in the earliest tweets in our collection dating back to January 21, 2020. RESULTS: Since the inception of our collection, we have actively maintained and updated our GitHub repository on a weekly basis. We have published over 123 million tweets, with over 60% of the tweets in English. This paper also presents basic statistics that show that Twitter activity responds and reacts to COVID-19-related events. CONCLUSIONS: It is our hope that our contribution will enable the study of online conversation dynamics in the context of a planetary-scale epidemic outbreak of unprecedented proportions and implications. This data set could also help track COVID-19-related misinformation and unverified rumors or enable the understanding of fear and panic-and undoubtedly more.


Assuntos
Comunicação , Infecções por Coronavirus/epidemiologia , Conjuntos de Dados como Assunto , Pandemias , Pneumonia Viral/epidemiologia , Mídias Sociais , COVID-19 , Humanos
16.
Nat Commun ; 11(1): 707, 2020 02 05.
Artigo em Inglês | MEDLINE | ID: mdl-32024843

RESUMO

Social networks shape perceptions by exposing people to the actions and opinions of their peers. However, the perceived popularity of a trait or an opinion may be very different from its actual popularity. We attribute this perception bias to friendship paradox and identify conditions under which it appears. We validate the findings empirically using Twitter data. Within posts made by users in our sample, we identify topics that appear more often within users' social feeds than they do globally among all posts. We also present a polling algorithm that leverages the friendship paradox to obtain a statistically efficient estimate of a topic's global prevalence from biased individual perceptions. We characterize the polling estimate and validate it through synthetic polling experiments on Twitter data. Our paper elucidates the non-intuitive ways in which the structure of directed networks can distort perceptions and presents approaches to mitigate this bias.


Assuntos
Amigos/psicologia , Redes Sociais Online , Algoritmos , Viés , Humanos , Reprodutibilidade dos Testes
17.
J Healthc Inform Res ; 4(3): 261-294, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35415445

RESUMO

Affective states are associated with people's mental health status and have profound impact on daily life, thus unobtrusively understanding and estimating affects have been brought to the public attention. The pervasiveness of wearable sensors makes it possible to build automatic systems for affect tracking. However, constructing such systems is a challenging task due to the complexity of human behaviors. In this work, we focus on the problem of estimating daily self-reported affects from sensor-generated data. We first analyze the intra- and inter-subject differences of self-reported affect labels. Second, we explore different machine learning models as well as label transformation techniques to overcome the individual differences in self-reported responses estimation. We conceptualize three experimental settings including long-term and short-term estimation scenarios. Our experimental results show that the mixed effects model and label transformation can yield better estimation of individual daily affect. This work poses the basis for future sensor-based individualized and real-time affective digital and/or clinical interventions.

18.
JMIR Mhealth Uhealth ; 7(12): e13305, 2019 12 10.
Artigo em Inglês | MEDLINE | ID: mdl-31821155

RESUMO

Although traditional methods of data collection in naturalistic settings can shed light on constructs of interest to researchers, advances in sensor-based technology allow researchers to capture continuous physiological and behavioral data to provide a more comprehensive understanding of the constructs that are examined in a dynamic health care setting. This study gives examples for implementing technology-facilitated approaches and provides the following recommendations for conducting such longitudinal, sensor-based research, with both environmental and wearable sensors in a health care setting: pilot test sensors and software early and often; build trust with key stakeholders and with potential participants who may be wary of sensor-based data collection and concerned about privacy; generate excitement for novel, new technology during recruitment; monitor incoming sensor data to troubleshoot sensor issues; and consider the logistical constraints of sensor-based research. The study describes how these recommendations were successfully implemented by providing examples from a large-scale, longitudinal, sensor-based study of hospital employees at a large hospital in California. The knowledge gained from this study may be helpful to researchers interested in obtaining dynamic, longitudinal sensor data from both wearable and environmental sensors in a health care setting (eg, a hospital) to obtain a more comprehensive understanding of constructs of interest in an ecologically valid, secure, and efficient way.


Assuntos
Coleta de Dados/instrumentação , Monitorização Fisiológica/instrumentação , Tecnologia/instrumentação , Dispositivos Eletrônicos Vestíveis/provisão & distribuição , Adulto , Idoso , California/epidemiologia , Feminino , Humanos , Ciência da Implementação , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Software , Dispositivos Eletrônicos Vestíveis/economia
19.
Sci Rep ; 8(1): 17599, 2018 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-30514864

RESUMO

A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has not been fixed in the paper.

20.
Phys Rev E ; 98(2-1): 022321, 2018 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-30253536

RESUMO

Networks facilitate the spread of cascades, allowing a local perturbation to percolate via interactions between nodes and their neighbors. We investigate how network structure affects the dynamics of a spreading cascade. By accounting for the joint degree distribution of a network within a generating function framework, we can quantify how degree correlations affect both the onset of global cascades and the propensity of nodes of specific degree class to trigger large cascades. However, not all degree correlations are equally important in a spreading process. We introduce a new measure of degree assortativity that accounts for correlations among nodes relevant to a spreading cascade. We show that the critical point defining the onset of global cascades has a monotone relationship to this new assortativity measure. In addition, we show that the choice of nodes to seed the largest cascades is strongly affected by degree correlations. Contrary to traditional wisdom, when degree assortativity is positive, low degree nodes are more likely to generate largest cascades. Our work suggests that it may be possible to tailor spreading processes by manipulating the higher-order structure of networks.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...