This article is a Preprint
Preprints are preliminary research reports that have not been certified by peer review. They should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Preprints posted online allow authors to receive rapid feedback and the entire scientific community can appraise the work for themselves and respond appropriately. Those comments are posted alongside the preprints for anyone to read them and serve as a post publication assessment.
TClustVID: A Novel Machine Learning Classification Model to Investigate Topics and Sentiment inCOVID-19 Tweets
Preprint
in En
| PREPRINT-MEDRXIV
| ID: ppmedrxiv-20167973
Journal article
A scientific journal published article is available and is probably based on this preprint. It has been identified through a machine matching algorithm, human confirmation is still pending.
See journal article
A scientific journal published article is available and is probably based on this preprint. It has been identified through a machine matching algorithm, human confirmation is still pending.
See journal article
ABSTRACT
COVID-19, caused by the SARS-Cov2, varies greatly in its severity but represent serious respiratory symptoms with vascular and other complications, particularly in older adults. The disease can be spread by both symptomatic and asymptomatic infected individuals, and remains uncertainty over key aspects of its infectivity, no effective remedy yet exists and this disease causes severe economic effects globally. For these reasons, COVID-19 is the subject of intense and widespread discussion on social media platforms including Facebook and Twitter. These public forums substantially impact on public opinions in some cases and exacerbate widespread panic and misinformation spread during the crisis. Thus, this work aimed to design an intelligent clustering-based classification and topics extracting model (named TClustVID) that analyze COVID-19-related public tweets to extract significant sentiments with high accuracy. We gathered COVID-19 Twitter datasets from the IEEE Dataport repository and employed a range of data preprocessing methods to clean the raw data, then applied tokenization and produced a word-to-index dictionary. Thereafter, different classifications were employed to Twitter datasets which enabled exploration of the performance of traditional and TclustVID classification methods. TClustVID showed higher performance compared to the traditional classifiers determined by clustering criteria. Finally, we extracted significant topic clusters from TClustVID, split them into positive, neutral and negative clusters and implemented latent dirichlet allocation for extraction of popular COVID-19 topics. This approach identified common prevailing public opinions and concerns related to COVID-19, as well as attitudes to infection prevention strategies held by people from different countries concerning the current pandemic situation.
cc_by_nc_nd
Full text:
1
Collection:
09-preprints
Database:
PREPRINT-MEDRXIV
Type of study:
Prognostic_studies
Language:
En
Year:
2020
Document type:
Preprint