Your browser doesn't support javascript.
An Analysis on the Weibo Topic Detection Based on K-means Algorithm
2022 IEEE International Conference on Electrical Engineering, Big Data and Algorithms, EEBDA 2022 ; : 1328-1331, 2022.
Article in English | Scopus | ID: covidwho-1831757
ABSTRACT
Sina Weibo, as a platform for netizens to express their opinions, generates a large amount of public opinion data and constantly generates new topics. How to detect new and hot topics on Weibo is a meaningful studied issue. Document Clustering is a widely studied problem in Text Categorization. K-means is one of the most famous unsupervised learning algorithms, partitions a given dataset into disjoint clusters following a simple and easy way. But the traditional K-means algorithm assigns initial centroids randomly, which cannot guarantee to choose the maximum dissimilar documents as the centroids for the clusters. A modified K-means algorithm is proposed, which uses Jaccard distance measure for assigning the most dissimilar k documents as centroids, and uses Word2vec as the Chinese text vectorization model. The experimental results demonstrate that the proposed K-means algorithm improves the clustering performance, and is able to detect new and hot topics based on Weibo COVID-19 data. © 2022 IEEE.
Keywords

Full text: Available Collection: Databases of international organizations Database: Scopus Language: English Journal: 2022 IEEE International Conference on Electrical Engineering, Big Data and Algorithms, EEBDA 2022 Year: 2022 Document Type: Article

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: Databases of international organizations Database: Scopus Language: English Journal: 2022 IEEE International Conference on Electrical Engineering, Big Data and Algorithms, EEBDA 2022 Year: 2022 Document Type: Article