An Efficient Cluster Centroid Initialization Method for K-Means Clustering
Cluster analysis is one of the fundamental data analysis methods and K-Means is one of the most well-known popular clustering algorithms. The clustering result of the K-Means clustering algorithm is based on the correctness of the initial centroids, which are selected randomly. The original K-Means algorithm converges to local optimum, not the global optimum. The K-Means clustering performance can be enhanced if the initial cluster centers are found to it a series of procedure is done. Data in a cell is partitioned using a cutting plane that divides cell in two smaller cells. In this paper a new method is proposed for finding the better initial centroid and to estimate Number of Clusters based on two-cluster model which provides an efficient way of assigning the data points to suitable clusters with reduced time complexity. According to the experimental results, the proposed technique estimate the number of clusters and compute initial cluster centers for K-Means clustering. The achieved clustering results have more accuracy of clustering with less computational time when comparing to original K-Means clustering algorithm and CCIA method.
A. K. Jain, M. N. Murty and P. J. Flynn, “Data Clustering: A Review”, ACM Computing Surveys, Vol. 31, No. 3, 1999.
Kohei Arai and Ali Ridho Barakbah, “Hierarchical K-Means: an algorithm for centroids initialization for K-Means”, Saga University, Vol. 36, No.1, Pp. 25-31, 2007.
Madhu Yedla, Srinivasa Rao Pathakota and T. M. Srinivasa, “Enhancing K- means Clustering Algorithm with Improved Initial Center”, Vol. 1, Pp. 121-125, 2010.
Brian S. Everitt, “Cluster analysis”. Third Edition, 1993.
P. Fränti and J. Kivijärvi, “Randomised Local Search Algorithm for the Clustering Problem”. Pattern Analysis and Applications, Volume 3, No. 4, Pp. 358 – 369, 2000.
L. Kaufman and P. Rousseeuw, “Finding Groups in Data: An Introduction to Cluster Analysis”. John Wiley Sons, New York, USA, 1990.
A. M. Fahim, A. M. Salem, F. A. Torkey and M. A. Ramadan, “An Efficient enhanced K-Means clustering algorithm,” journal of Zhejiang University, Vol. 10, No. 7, Pp. 1626-1633, 2006.
Chen Zhang and Shixiong Xia, “ K-Means Clustering Algorithm with Improved Initial center,” in Second International Workshop on Knowledge Discovery and Data Mining (WKDD), Pp. 790-792, 2009.
P. S. Bradley and Usama M. Fayyad, “Refining Initial Points for K-Means Clustering”, Appears in Proceedings of the 15th International Conference on Machine Learning (ICML98), J. Shavlik (ed.), Pp. 91- 99. Morgan Kaufmann, San Francisco, 1998.
R. Agrawal, J. Gehrke, D. Gunopulos and P. Raghavan, “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications”. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Vol. 27, No. 2, Pp. 94 – 105, 1998.
J Martin Bland, “Cluster randomized trials in the medical literature: two bibliometric surveys”, BMC Medical Research Methodology, 2004. http://www.biomedcentral.com/1471-2288/4/21.
C. Aggarwal and P. Yu, “Redefining Clustering for High-Dimensional Applications”. In Proceedings of the IEEE International Conference on Transaction of Knowledge and Data Engineering, Vol. 14, No. 2, Pp. 210 – 225, 2002.
M. Ester, H-P. Kriegel, J. Sander and X. Xu, “A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Pp. 226 – 231, 1996.
M. Halkidi, Y. Batistakis and M. Vazirgiannis, “Cluster Validity Methods: part I”. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Vol. 31, No. 2, Pp. 40 – 45, 2002.
K. A. Abdul Nazeer and M. P. Sebastian, “Improving the accuracy and efficiency of the k-means clustering algorithm,” in International Conference on Data Mining and Knowledge Engineering (ICDMKE), Proceedings of the World Congress on Engineering (WCE-2009), Vol. 1, 2009.
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.