They May Not Work! An evaluation of eleven sentiment analysis tools on seven social media datasets.

He, Lu; Yin, Tingjue; Zheng, Kai

He, Lu; Yin, Tingjue; Zheng, Kai.

He L; Department of Informatics, Donald Bren School of Information and Computer Science, University of California, Irvine, Irvine, CA, United States.
Yin T; Department of Informatics, Donald Bren School of Information and Computer Science, University of California, Irvine, Irvine, CA, United States.
Zheng K; Department of Informatics, Donald Bren School of Information and Computer Science, University of California, Irvine, Irvine, CA, United States; Department of Emergency Medicine, School of Medicine, University of California, Irvine, Irvine, CA, United States. Electronic address: kai.zheng@uci.edu.

J Biomed Inform ; 132: 104142, 2022 08.

Article in English | MEDLINE | ID: covidwho-1926610

ABSTRACT

ABSTRACT

OBJECTIVE:

Sentiment analysis is an important method for understanding emotions and opinions expressed through social media exchanges. Little work has been done to evaluate the performance of existing sentiment analysis tools on social media datasets, particularly those related to health, healthcare, or public health. This study aims to address the gap. MATERIAL AND

METHODS:

We evaluated 11 commonly used sentiment analysis tools on five health-related social media datasets curated in previously published studies. These datasets include Human Papillomavirus Vaccine, Health Care Reform, COVID-19 Masking, Vitals.com Physician Reviews, and the Breast Cancer Forum from MedHelp.org. For comparison, we also analyzed two non-health datasets based on movie reviews and generic tweets. We conducted a qualitative error analysis on the social media posts that were incorrectly classified by all tools.

RESULTS:

The existing sentiment analysis tools performed poorly with an average weighted F1 score below 0.6. The inter-tool agreement was also low; the average Fleiss Kappa score is 0.066. The qualitative error analysis identified two major causes for misclassification (1) correct sentiment but on wrong subject(s) and (2) failure to properly interpret inexplicit/indirect sentiment expressions. DISCUSSION AND

CONCLUSION:

The performance of the existing sentiment analysis tools is insufficient to generate accurate sentiment classification results. The low inter-tool agreement suggests that the conclusion of a study could be entirely driven by the idiosyncrasies of the tool selected, rather than by the data. This is very concerning especially if the results may be used to inform important policy decisions such as mask or vaccination mandates.

Subject(s)

COVID-19; Social Media; Emotions; Humans; Public Health/methods; Sentiment Analysis

Keywords

Consumer health information [D054626]; Natural language processing [D009323]; Sentiment analysis; Social media [D061108]

Fulltext

XML

PubMed Links

Search on Google

Full text: Available Collection: International databases Database: MEDLINE Main subject: Social Media / COVID-19 Type of study: Experimental Studies / Prognostic study / Qualitative research Topics: Vaccines Limits: Humans Language: English Journal: J Biomed Inform Journal subject: Medical Informatics Year: 2022 Document Type: Article Affiliation country: J.jbi.2022.104142

Similar

MEDLINE

LILACS

LIS

Fulltext

XML

PubMed Links

Search on Google