An Improved K-means Clustering Algorithm Towards an Efficient Data-Driven Modeling

Zubair, M.; Iqbal, M. A.; Shil, A.; Chowdhury, M. J. M.; Moni, M. A.; Sarker, I. H.

Zubair, M.; Iqbal, M. A.; Shil, A.; Chowdhury, M. J. M.; Moni, M. A.; Sarker, I. H..

Annals of Data Science ; 2022.

Article in English | Scopus | ID: covidwho-1920411

ABSTRACT

ABSTRACT

K-means algorithm is one of the well-known unsupervised machine learning algorithms. The algorithm typically finds out distinct non-overlapping clusters in which each point is assigned to a group. The minimum squared distance technique distributes each point to the nearest clusters or subgroups. One of the K-means algorithm’s main concerns is to find out the initial optimal centroids of clusters. It is the most challenging task to determine the optimum position of the initial clusters’ centroids at the very first iteration. This paper proposes an approach to find the optimal initial centroids efficiently to reduce the number of iterations and execution time. To analyze the effectiveness of our proposed method, we have utilized different real-world datasets to conduct experiments. We have first analyzed COVID-19 and patient datasets to show our proposed method’s efficiency. A synthetic dataset of 10M instances with 8 dimensions is also used to estimate the performance of the proposed algorithm. Experimental results show that our proposed method outperforms traditional kmeans++ and random centroids initialization methods regarding the computation time and the number of iterations. © 2022, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.

Keywords

Data Science; K-means Clustering; Machine Learning; Percentile; Principal Component Analysis; Unsupervised Algorithm; Iterative methods; Learning algorithms; Data-driven model; K-mean algorithms; K-means clustering algorithms; K-means++ clustering; Machine-learning; Number of iterations; Principal-component analysis; Unsupervised algorithms; Unsupervised machine learning

Fulltext

XML

Search on Google

Full text: Available Collection: Databases of international organizations Database: Scopus Language: English Journal: Annals of Data Science Year: 2022 Document Type: Article

Similar

MEDLINE

LILACS

LIS

Fulltext

XML

Search on Google