Your browser doesn't support javascript.
Tracking the Evolution of Clusters in Social Media Streams
IEEE Transactions on Big Data ; : 1-15, 2022.
Article in English | Scopus | ID: covidwho-2052080
ABSTRACT
Tracking the evolution of clusters in social media streams is becoming increasingly important for many applications, such as early detection and monitoring of natural disasters or pandemics. In contrast to clustering on a static set of data, streaming data clustering does not have a global view of the complete data. The local (or partial) view in a high-speed stream makes clustering a challenging task. In this paper, we propose a novel density peak based algorithm, <monospace>TStream</monospace>, for tracking the evolution of clusters and outliers in social media streams, via the evolutionary actions of cluster adjustment, emergence, disappearance, split, and merge. <monospace>TStream</monospace> is based on a temporal decay model and text stream summarisation. The decay model captures the decreasing importance of textual documents over time. The stream summarisation compactly represents them with the help of cells (aka micro-clusters) in the memory. We also propose a novel efficient index called shared dependency tree (aka SD-Tree) based on the ideas of density peak and shared dependency. It maintains the dynamic dependency relationships in <monospace>TStream</monospace> and thereby improves the overall efficiency. We conduct extensive experiments on five real datasets. <monospace>TStream</monospace> outperforms the existing state-of-the-art solutions based on <monospace>MStream</monospace>, <monospace>MStreamF</monospace>, <monospace>EDMStream</monospace>, <monospace>OSGM</monospace>, and <monospace>EStream</monospace>, in terms of cluster mapping measure (CMM) by up to 17.8%, 18.6%, 6.9%, 16.4%, and 20.1%, respectively. It is also significantly more efficient than <monospace>MStream</monospace>, <monospace>MStreamF</monospace>, <monospace>OSGM</monospace>, and <monospace>EStream</monospace>, in terms of response time and throughput. IEEE
Keywords

Full text: Available Collection: Databases of international organizations Database: Scopus Language: English Journal: IEEE Transactions on Big Data Year: 2022 Document Type: Article

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: Databases of international organizations Database: Scopus Language: English Journal: IEEE Transactions on Big Data Year: 2022 Document Type: Article