Search | VHL Regional Portal

Twitter dataset on public sentiments towards biodiversity policy in Indonesia.

Uliniansyah, Mohammad Teduh; Budi, Indra; Nurfadhilah, Elvira; Afra, Dian Isnaeni Nurul; Santosa, Agung; Latief, Andi Djalal; Jarin, Asril; Jiwanggi, Meganingrum Arista; Hidayati, Nuraisa Novia; Fajri, Radhiyatul; Suryono, Ryan Randy; Pebiana, Siska; Shaleha, Siti; Ramdhani, Tosan Wiar; Sampurno, Tri.

Data Brief ; 52: 109890, 2024 Feb.

Article in English | MEDLINE | ID: mdl-38146299

ABSTRACT

In recent years, biodiversity has emerged as a prominent and pressing topic due to the urgent need to address biodiversity loss and the recognition of its connections to climate change and sustainable development. Additionally, increased public awareness and the consideration of economic factors have further underscored the significance of biodiversity conservation. To investigate the sentiment of the Indonesian people towards biodiversity, we conducted a comprehensive data collection on Twitter, focusing on keywords we have set. We amassed a substantial dataset of 500,000 Indonesian tweets from January 2020 to March 2023. These tweets encompassed a wide range of discussions on biodiversity, including its subdomains such as food security, health, and environmental management. Three annotators labeled each tweet with a sentiment class (positive, negative, neutral), or label none for unrelated tweet. The final label was determined using the majority voting method. The tweets with the final label none and those with undecided sentiment class were considered invalid and excluded in the subsequent process. Before labeling, a team of 18 experts jointly developed a labeling guide. This document served as a reference in labeling. After going through a series of processes, including cleaning (removing duplications, irrelevant tweets, and tweets written other than in Indonesian) and preprocessing, we prepared a dataset containing 13,435 tweets. We measured the inter-annotator agreement level, made several models using different algorithms and the K-Fold cross-validation method, and evaluated the models. The Fleiss' Kappa value of the dataset was 0.62187 as the value of the inter-annotator agreement level, and the F1-score value with the best model using the pre-trained IndoBERT model was 0.7959. The Fleiss' Kappa and F1-score values suggest that the annotators have a substantial comprehension and agreement of how to label a tweet, thus ensuring consistency and reliability of our dataset, and the reusability of our dataset is quite suitable for further research on sentiment analysis on biodiversity, respectively. This dataset will benefit various research, including topic modeling, sentiment analysis, public opinion analysis on Twitter, etc., especially biodiversity-related policies.

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL