TClustVID: A Novel Machine Learning Classification Model to Investigate Topics and Sentiment inCOVID-19 Tweets

Md. Shahriare Satu; Md. Imran Khan; Mufti Mahmud; Shahadat Uddin; Matthew A Summers; Julian M. W. Quinn; Mohammad Ali Moni

This article is a Preprint

Preprints are preliminary research reports that have not been certified by peer review. They should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Preprints posted online allow authors to receive rapid feedback and the entire scientific community can appraise the work for themselves and respond appropriately. Those comments are posted alongside the preprints for anyone to read them and serve as a post publication assessment.

TClustVID: A Novel Machine Learning Classification Model to Investigate Topics and Sentiment inCOVID-19 Tweets

Md. Shahriare Satu; Md. Imran Khan; Mufti Mahmud; Shahadat Uddin; Matthew A Summers; Julian M. W. Quinn; Mohammad Ali Moni.

Affiliation

Md. Shahriare Satu; Faculty Member, Department of Management Information Systems, Noakhali Science and Technology University
Md. Imran Khan; Gono Bishwabidylay
Mufti Mahmud; Dept. of Computing & Technology, Nottingham Trent University
Shahadat Uddin; The University of Sydney
Matthew A Summers; Garvan Institute of Medical Research
Julian M. W. Quinn; Garvan Institute of Medical Research
Mohammad Ali Moni; University of New South Wales

Preprint in En | PREPRINT-MEDRXIV | ID: ppmedrxiv-20167973

Journal article
A scientific journal published article is available and is probably based on this preprint. It has been identified through a machine matching algorithm, human confirmation is still pending.
See journal article

ABSTRACT

ABSTRACT

COVID-19, caused by the SARS-Cov2, varies greatly in its severity but represent serious respiratory symptoms with vascular and other complications, particularly in older adults. The disease can be spread by both symptomatic and asymptomatic infected individuals, and remains uncertainty over key aspects of its infectivity, no effective remedy yet exists and this disease causes severe economic effects globally. For these reasons, COVID-19 is the subject of intense and widespread discussion on social media platforms including Facebook and Twitter. These public forums substantially impact on public opinions in some cases and exacerbate widespread panic and misinformation spread during the crisis. Thus, this work aimed to design an intelligent clustering-based classification and topics extracting model (named TClustVID) that analyze COVID-19-related public tweets to extract significant sentiments with high accuracy. We gathered COVID-19 Twitter datasets from the IEEE Dataport repository and employed a range of data preprocessing methods to clean the raw data, then applied tokenization and produced a word-to-index dictionary. Thereafter, different classifications were employed to Twitter datasets which enabled exploration of the performance of traditional and TclustVID classification methods. TClustVID showed higher performance compared to the traditional classifiers determined by clustering criteria. Finally, we extracted significant topic clusters from TClustVID, split them into positive, neutral and negative clusters and implemented latent dirichlet allocation for extraction of popular COVID-19 topics. This approach identified common prevailing public opinions and concerns related to COVID-19, as well as attitudes to infection prevention strategies held by people from different countries concerning the current pandemic situation.

License

cc_by_nc_nd

Fulltext

Add to My VHL

XML

Search on Google

Full text: 1 Collection: 09-preprints Database: PREPRINT-MEDRXIV Type of study: Prognostic_studies Language: En Year: 2020 Document type: Preprint

Fulltext

Add to My VHL

XML

Search on Google

Full text: 1 Collection: 09-preprints Database: PREPRINT-MEDRXIV Type of study: Prognostic_studies Language: En Year: 2020 Document type: Preprint