Detecting COVID-19-Related Fake News Using Feature Extraction.

Khan, Suleman; Hakak, Saqib; Deepa, N; Prabadevi, B; Dev, Kapal; Trelova, Silvia

Khan, Suleman; Hakak, Saqib; Deepa, N; Prabadevi, B; Dev, Kapal; Trelova, Silvia.

Khan S; Air University, Islamabad, Pakistan.
Hakak S; Canadian Institute for Cybersecurity, University of New Brunswick Fredericton, Fredericton, NB, Canada.
Deepa N; School of Information Technology and Engineering, VIT University, Vellore, India.
Prabadevi B; School of Information Technology and Engineering, VIT University, Vellore, India.
Dev K; Division for Institutional Planning, Evaluation and Monitoring (DIPEM), University of Johannesburg, Johannesburg, South Africa.
Trelova S; Department of Information Systems, Faculty of Management, Comenius University Bratislava, Bratislava, Slovakia.

Front Public Health ; 9: 788074, 2021.

Article in English | MEDLINE | ID: covidwho-1643561

ABSTRACT

ABSTRACT

Since its emergence in December 2019, there have been numerous posts and news regarding the COVID-19 pandemic in social media, traditional print, and electronic media. These sources have information from both trusted and non-trusted medical sources. Furthermore, the news from these media are spread rapidly. Spreading a piece of deceptive information may lead to anxiety, unwanted exposure to medical remedies, tricks for digital marketing, and may lead to deadly factors. Therefore, a model for detecting fake news from the news pool is essential. In this work, the dataset which is a fusion of news related to COVID-19 that has been sourced from data from several social media and news sources is used for classification. In the first step, preprocessing is performed on the dataset to remove unwanted text, then tokenization is carried out to extract the tokens from the raw text data collected from various sources. Later, feature selection is performed to avoid the computational overhead incurred in processing all the features in the dataset. The linguistic and sentiment features are extracted for further processing. Finally, several state-of-the-art machine learning algorithms are trained to classify the COVID-19-related dataset. These algorithms are then evaluated using various metrics. The results show that the random forest classifier outperforms the other classifiers with an accuracy of 88.50%.

Subject(s)

COVID-19; Social Media; Disinformation; Humans; Pandemics; SARS-CoV-2

Keywords

COVID-19; fake news; feature extraction; machine learning; social media

Fulltext

XML

PubMed Links

Search on Google

Full text: Available Collection: International databases Database: MEDLINE Main subject: Social Media / COVID-19 Type of study: Experimental Studies / Randomized controlled trials Limits: Humans Language: English Journal: Front Public Health Year: 2021 Document Type: Article Affiliation country: Fpubh.2021.788074

Similar

MEDLINE

LILACS

LIS

Fulltext

XML

PubMed Links

Search on Google