Your browser doesn't support javascript.
Spanish Corpora of tweets about COVID-19 Vaccination for Automatic Stance Detection
Information Processing & Management ; : 103294, 2023.
Article in English | ScienceDirect | ID: covidwho-2210541
ABSTRACT
The paper presents new annotated corpora for performing stance detection on Spanish Twitter data, most notably Health-related tweets. The objectives of this research are threefold (1) to develop a manually annotated benchmark corpus for emotion recognition taking into account different variants of Spanish in social posts;(2) to evaluate the efficiency of semi-supervised models for extending such corpus with unlabelled posts;and (3) to describe such short text corpora via specialised topic modelling. A corpus of 2,801 tweets about COVID-19 vaccination was annotated by three native speakers to be in favour (904), against (674) or neither (1,223) with a 0.725 Fleiss' kappa score. Results show that the self-training method with SVM base estimator can alleviate annotation work while ensuring high model performance. The self-training model outperformed the other approaches and produced a corpus of 11,204 tweets with a macro averaged f1 score of 0.94. The combination of sentence-level deep learning embeddings and density-based clustering was applied to explore the contents of both corpora. Topic quality was measured in terms of the trustworthiness and the validation index.
Keywords

Full text: Available Collection: Databases of international organizations Database: ScienceDirect Topics: Vaccines Language: English Journal: Information Processing & Management Year: 2023 Document Type: Article

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: Databases of international organizations Database: ScienceDirect Topics: Vaccines Language: English Journal: Information Processing & Management Year: 2023 Document Type: Article