No Time like the Present: Effects of Language Change on Automated Comment Moderation

Justen, L.; Muller, K.; Niemann, M.; Becker, J.

Justen, L.; Muller, K.; Niemann, M.; Becker, J..

24th IEEE International Conference on Business Informatics, CBI 2022 ; 1:40-49, 2022.

Article in English | Scopus | ID: covidwho-2152432

ABSTRACT

ABSTRACT

The spread of online hate has become a significant problem for newspapers that host comment sections. As a result, there is growing interest in using machine learning (ML) and natural language processing (NLP) for (semi-) automated abusive language detection to avoid manual comment moderation costs or having to shut down comment sections altogether. However, much of the past work on abusive language detection assumes that classifiers operate in a static language environment, despite language and news being in a state of constant flux. In this paper, we show using a new German newspaper comments dataset that the classifiers trained with naive ML techniques like a random test-train split will underperform on future data, and that a time-stratified evaluation split is more appropriate. We also show that a classifier's performance rapidly degrades when evaluated on data from a different period than the training data. Our findings suggest that it is necessary to consider the temporal dynamics of language when developing an abusive language detection system or risk deploying a model that will quickly become defunct. © 2022 IEEE.

Keywords

abusive language detection; Auto-Ml; concept drift; Covid-19; natural language processing; Automation; Classification (of information); Learning algorithms; Natural language processing systems; Statistical tests; Concept drifts; Language dete

Fulltext

XML

Search on Google

Full text: Available Collection: Databases of international organizations Database: Scopus Type of study: Experimental Studies Language: English Journal: 24th IEEE International Conference on Business Informatics, CBI 2022 Year: 2022 Document Type: Article

Similar

MEDLINE

LILACS

LIS

Fulltext

XML

Search on Google