Improving Text Clustering Using a New Technique for Selecting Trustworthy Content in Social Networks
19th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2022
; 1602 CCIS:275-287, 2022.
Article
in English
| Scopus | ID: covidwho-1971509
ABSTRACT
Today’s information society has led to the emergence of a large number of applications that generate and consume digital data. Many of these applications are based on social networks, and therefore their information often comes in the form of unstructured text. This text from social media also tends to contain a high level of noise and untrustworthy content. Therefore, having systems capable of dealing with it efficiently is a very relevant issue. In order to verify the trustworthiness of the social media content, it is necessary to analyse and explore social media data by using text mining techniques. One of the most widespread techniques in the field of text mining is text clustering, that allows us to automatically group similar documents into categories. Text clustering is very sensitive to the presence of noise and so in this paper we propose a pre-processing pipeline based on word embedding that allows selecting trustworthy content and discarding noise in a way that improves clustering results. To validate the proposed pipeline, a real use case is provided on a Twitter dataset related to COVID-19. © 2022, Springer Nature Switzerland AG.
Clustering; Pre-processing; Social media mining; Cluster analysis; Clustering algorithms; COVID-19; Data mining; Pipeline processing systems; Social networking (online); Clusterings; Digital datas; Information society; Media content; Social media; Social media datum; Social media minings; Text Clustering; Unstructured texts; Pipelines
Full text:
Available
Collection:
Databases of international organizations
Database:
Scopus
Language:
English
Journal:
19th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2022
Year:
2022
Document Type:
Article
Similar
MEDLINE
...
LILACS
LIS