Tracking the Evolution of Clusters in Social Media Streams
IEEE Transactions on Big Data
; : 1-15, 2022.
Article
in English
| Scopus | ID: covidwho-2052080
ABSTRACT
Tracking the evolution of clusters in social media streams is becoming increasingly important for many applications, such as early detection and monitoring of natural disasters or pandemics. In contrast to clustering on a static set of data, streaming data clustering does not have a global view of the complete data. The local (or partial) view in a high-speed stream makes clustering a challenging task. In this paper, we propose a novel density peak based algorithm, <monospace>TStream</monospace>, for tracking the evolution of clusters and outliers in social media streams, via the evolutionary actions of cluster adjustment, emergence, disappearance, split, and merge. <monospace>TStream</monospace> is based on a temporal decay model and text stream summarisation. The decay model captures the decreasing importance of textual documents over time. The stream summarisation compactly represents them with the help of cells (aka micro-clusters) in the memory. We also propose a novel efficient index called shared dependency tree (aka SD-Tree) based on the ideas of density peak and shared dependency. It maintains the dynamic dependency relationships in <monospace>TStream</monospace> and thereby improves the overall efficiency. We conduct extensive experiments on five real datasets. <monospace>TStream</monospace> outperforms the existing state-of-the-art solutions based on <monospace>MStream</monospace>, <monospace>MStreamF</monospace>, <monospace>EDMStream</monospace>, <monospace>OSGM</monospace>, and <monospace>EStream</monospace>, in terms of cluster mapping measure (CMM) by up to 17.8%, 18.6%, 6.9%, 16.4%, and 20.1%, respectively. It is also significantly more efficient than <monospace>MStream</monospace>, <monospace>MStreamF</monospace>, <monospace>OSGM</monospace>, and <monospace>EStream</monospace>, in terms of response time and throughput. IEEE
Australia; Big Data; Clustering algorithms; COVID-19; Density peak clustering; Heuristic algorithms; Indexes; shared dependency tree; short text streams; social media; Social networking (online); stream clustering; Cluster analysis; Disasters; Evolutionary algorithms; Media streaming; Trees (mathematics); Clusterings; Dependency trees; Heuristics algorithm; Index; Shared dependencies; Short text stream; Short texts; Text streams
Full text:
Available
Collection:
Databases of international organizations
Database:
Scopus
Language:
English
Journal:
IEEE Transactions on Big Data
Year:
2022
Document Type:
Article
Similar
MEDLINE
...
LILACS
LIS