ABSTRACT
BACKGROUND: Scholars have used data from in-person interviews, administrative systems, and surveys for sexual violence research. Using Twitter as a data source for examining the nature of sexual violence is a relatively new and underexplored area of study. OBJECTIVE: We aimed to perform a scoping review of the current literature on using Twitter data for researching sexual violence, elaborate on the validity of the methods, and discuss the implications and limitations of existing studies. METHODS: We performed a literature search in the following 6 databases: APA PsycInfo (Ovid), Scopus, PubMed, International Bibliography of Social Sciences (ProQuest), Criminal Justice Abstracts (EBSCO), and Communications Abstracts (EBSCO), in April 2022. The initial search identified 3759 articles that were imported into Covidence. Seven independent reviewers screened these articles following 2 steps: (1) title and abstract screening, and (2) full-text screening. The inclusion criteria were as follows: (1) empirical research, (2) focus on sexual violence, (3) analysis of Twitter data (ie, tweets or Twitter metadata), and (4) text in English. Finally, we selected 121 articles that met the inclusion criteria and coded these articles. RESULTS: We coded and presented the 121 articles using Twitter-based data for sexual violence research. About 70% (89/121, 73.6%) of the articles were published in peer-reviewed journals after 2018. The reviewed articles collectively analyzed about 79.6 million tweets. The primary approaches to using Twitter as a data source were content text analysis (112/121, 92.5%) and sentiment analysis (31/121, 25.6%). Hashtags (103/121, 85.1%) were the most prominent metadata feature, followed by tweet time and date, retweets, replies, URLs, and geotags. More than a third of the articles (51/121, 42.1%) used the application programming interface to collect Twitter data. Data analyses included qualitative thematic analysis, machine learning (eg, sentiment analysis, supervised machine learning, unsupervised machine learning, and social network analysis), and quantitative analysis. Only 10.7% (13/121) of the studies discussed ethical considerations. CONCLUSIONS: We described the current state of using Twitter data for sexual violence research, developed a new taxonomy describing Twitter as a data source, and evaluated the methodologies. Research recommendations include the following: development of methods for data collection and analysis, in-depth discussions about ethical norms, exploration of specific aspects of sexual violence on Twitter, examination of tweets in multiple languages, and decontextualization of Twitter data. This review demonstrates the potential of using Twitter data in sexual violence research.
Subject(s)
Sex Offenses , Social Media , Humans , Communication , Machine Learning , Surveys and QuestionnairesABSTRACT
Real-time data processing and distributed messaging are problems that have been worked on for a long time. As the amount of spatial data being produced has increased, coupled with increasingly complex software solutions being developed, there is a need for platforms that address these needs. In this paper, we present a distributed and light streaming system for combating pandemics and give a case study on spatial analysis of the COVID-19 geo-tagged Twitter dataset. In this system, three of the major components are the translation of tweets matching with user-defined bounding boxes, name entity recognition in tweets, and skyline queries. Apache Pulsar addresses all these components in this paper. With the proposed system, end-users have the capability of getting COVID-19 related information within foreign regions, filtering/searching location, organization, person, and miscellaneous based tweets, and performing skyline based queries. The evaluation of the proposed system is done based on certain characteristics and performance metrics. The study differs greatly from other studies in terms of using distributed computing and big data technologies on spatial data to combat COVID-19. It is concluded that Pulsar is designed to handle large amounts of long-term on disk persistence.
ABSTRACT
The global economy has been hard hit by the COVID-19 pandemic. Many countries are experiencing a severe and destructive recession. A significant number of firms and businesses have gone bankrupt or been scaled down, and many individuals have lost their jobs. The main goal of this study is to support policy- and decision-makers with additional and real-time information about the labor market flow using Twitter data. We leverage the data to trace and nowcast the unemployment rate of South Africa during the COVID-19 pandemic. First, we create a dataset of unemployment-related tweets using certain keywords. Principal Component Regression (PCR) is then applied to nowcast the unemployment rate using the gathered tweets and their sentiment scores. Numerical results indicate that the volume of the tweets has a positive correlation, and the sentiments of the tweets have a negative correlation with the unemployment rate during and before the COVID-19 pandemic. Moreover, the now-casted unemployment rate using PCR has an outstanding evaluation result with a low Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), Symmetric MAPE (SMAPE) of 0.921, 0.018, 0.018, respectively and a high R2-score of 0.929.
Subject(s)
COVID-19 , Social Media , Humans , COVID-19/epidemiology , Pandemics , South Africa/epidemiology , UnemploymentABSTRACT
Public sentiments towards global pandemics are important for public health assessment and disease control. This study develops a modularized deep learning framework to quantify public sentiments towards COVID-19, followed by leveraging the predicted sentiments to model and forecast the daily growth rate of confirmed COVID-19 cases globally, via a proposed G parameter. In the proposed framework, public sentiments are first modeled via a valence dimensional indicator, instead of discrete schemas, and are classified into 4 primary emotional categories: (a) neutral; (b) negative; (c) positive; (d) ambivalent, by using multiple word embedding models and classifiers for text sentiments analyses and classification. The trained model is subsequently applied to analyze large volumes (millions in quantity) of daily Tweets pertaining to COVID-19, ranging from 22 Jan 2020 to 10 May 2020. The results demonstrate that the global community gradually evokes both positive and negative sentiments towards COVID-19 over time compared to the dominant neural emotion at its inception. The predicted time-series sentiments are then leveraged to train a deep neural network (DNN) to model and forecast the G parameter by achieving the lowest possible mean absolute percentage error (MAPE) score of around 17.0% during the model's testing step with the optimal model configuration.
ABSTRACT
This article presents a study that applied opinion analysis about COVID-19 immunization in Brazil. An initial set of 143,615 tweets was collected containing 49,477 pro- and 44,643 anti-vaccination and 49,495 neutral posts. Supervised classifiers (multinomial naïve Bayes, logistic regression, linear support vector machines, random forests, adaptative boosting, and multilayer perceptron) were tested, and multinomial naïve Bayes, which had the best trade-off between overfitting and correctness, was selected to classify a second set containing 221,884 unclassified tweets. A timeline with the classified tweets was constructed, helping to identify dates with peaks in each polarity and search for events that may have caused the peaks, providing methodological assistance in combating sources of misinformation linked to the spread of anti-vaccination opinion.
ABSTRACT
Emotion detection is a promising field of research in multiple perspectives such as psychology, marketing, network analysis and so on. Multiple models have been suggested over the years for accurate and efficient mood detection. Identifying emotion, or mood, from text has progressed from a simple frequency distribution analysis to far more complicated learning approaches. The main aim of all these text mining and analysis is twofold. First is to categorise existing text into broad classes of emotions, such as happy, sad, angry, surprised and so on. The second aim is to accurately predict the moods of real-time streaming text. The novelty of the work lies in the extensive comparison of nine conventional learning methods with respect to performance metrics precision, recall, F1 and accuracy as well as studying the variance of mood over time using a wide array of moods (25). Using conventional classifiers allow near real-time predictions, can work on considerably less training data, and has the flexibility of feature engineering, as deep learning methods have feature engineering embedded in the model. Since a single line of text can be associated with multiple emotions, this article compares the performance of classifiers in predicting multiple moods for streaming text with likelihood-based ranking. An android application named Citizens' Sense was developed for text collection and analysis. The performance of mood classifiers are tested further using Twitter data related to COVID19. Based on the precision, recall, F1 and accuracy of the classifiers, it can be seen that Random Forest, Decision Tree and Complement Naive Bayes classifiers are marginally better than the other classifiers. The variance of mood over time, and predicted moods for text support this finding.