Open Access Open Access  Restricted Access Subscription or Fee Access

Analysis of Various Clustering Techniques with Centroid Initialized K-Means Clustering

S. Nisha

Abstract


K-Means is one of the algorithms that solve the well known clustering problem. The algorithm classifies objects to a pre-defined number of clusters, which is given by the user (assume k clusters). The idea is to choose random cluster centers (centroid), one for each cluster. These centers are preferred to be as far as possible from each other. Starting points affect the clustering process and results. Centroid initialization plays an important role in determining the cluster assignment in effective way. Also, the convergence behavior of clustering is based on the initial centriod values assigned. This paper focuses on the assignment of cluster centroid selection so as to improve the clustering performance by K-Means clustering algorithm. This paper uses Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance to assign for cluster centroid. Experimental result suggests that the proposed approach results in better clustering result when compared to the conventional technique.

Keywords


K-Means Clustering, Centroid, Data Partitioning, Variance

Full Text:

PDF

References


Shi Yong; Zhang Ge; “Research on an improved algorithm for cluster analysis”, International Conference on Consumer Electronics, Communications and Networks (CECNet), Pp. 598 – 601, 2011.

Gkalelis, N.; Mezaris, V.; Kompatsiaris, I.; “Mixture Subclass Discriminant Analysis”, IEEE Signal Processing Letters, Vol. 18, No. 5, Pp. 319 – 322, 2011.

Weijiang Jiang; Jun Ye; “Decision-making method based on an improved similarity measure between vague sets”, IEEE 10th International Conference on Computer-Aided Industrial Design & Conceptual Design (CAID & CD), Pp. 2086 – 2090, 2009.

Gil-Garcia, R.; Badia-Contelles, J.M.; Pons-Porrata, A.; “A General Framework for Agglomerative Hierarchical Clustering Algorithms”, 18th International Conference on Pattern Recognition (ICPR), Vol. 2, Pp. 569 – 572, 2006.

de Souza, R.M.C.; de Carvalho, F.A.T.; “A Clustering Method for Mixed Feature-Type Symbolic Data using Adaptive Squared Euclidean Distances”, 7th International Conference on Hybrid Intelligent Systems (HIS), Pp. 168 – 173, 2007.

Tasoulis, D.K.; Plagianakos, V.P.; Vrahatis, M.N.; “Clustering in evolutionary algorithms to efficiently compute simultaneously local and global minima”, The 2005 IEEE Congress on Evolutionary Computation, Vol. 2, Pp. 1847 – 1854, 2005.

Chen, B.; Tai, P.C.; Harrison, R.; Yi Pan; “Novel hybrid hierarchical-K-means clustering method (H-K-means) for microarray analysis”, IEEE Computational Systems Bioinformatics Conference, Pp. 105 – 108, 2005.

Wei-Chuan Liu; Jiun-Long Huang; Ming-Syan Chen; “KACU: k-means with hardware centroid-updating”, Emerging Information Technology Conference, DOI: 10.1109/EITC.2005.1544347, 2005.

Kehar Singh, Dimple Malik and Naveen Sharma, “Evolving limitations in K-means algorithm in data mining and their removal”, IJCEM International Journal of Computational Engineering & Management, Vol. 12, Pp. 105-109, 2011.

Yinghua Zhou; Hong Yu; Xuemei Cai; “A Novel k-Means Algorithm for Clustering and Outlier Detection”, Second International Conference on Future Information Technology and Management Engineering (FITME '09), Pp. 476 – 480, 2009.

P. S. Bradley, and U. M. Fayyad, “Refining Initial Points for K-Means Clustering,” ACM, Proceedings of the 15th International Conference on Machine Learning, pp. 91-99, 1998.

F. Yang, T. Sun, and C. Zhang, “An efficient hybrid data clustering method based on K-harmonic means and Particle Swarm Optimization,” An International Journal on Expert Systems with Applications, vol. 36, no. 6, pp. 9847-9852, 2009.

Aristidis Likas, Nikos Vlassis, and Jakob J. Verbeek, “The global k-means clustering algorithm,” The Journal of Pattern Recognition society, Elsevier, vol. 36, no. 2, pp. 451-461, 2003.

Zhexue Huang, “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values,” Journal on Data Mining and Knowledge Discovery, Springer, vol. 2, no. 3, pp. 283-304, 1998.

Xue Sun; Kunlun Li; Rui Zhao; Xikun Hu; “Global Optimization for Semi-supervised K-means”, Asia-Pacific Conference on Information Processing (APCIP), Vol. 2, Pp. 410 – 413, 2009.

Junjie Wu; Hui Xiong; Jian Chen; Wenjun Zhou; “A Generalization of Proximity Functions for K-Means”, Seventh IEEE International Conference on Data Mining (ICDM), Pp. 361 – 370, 2007.

Kuncheva, L.I.; Vetrov, D.P.; “Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, No. 11, Pp. 1798 – 1808, 2006.

Khan, D.M.; Mohamudally, N.; “A multiagent system (MAS) for the generation of initial centroids for k-means clustering data mining algorithm based on actual sample datapoints”, 2nd International Conference on Software Engineering and Data Mining (SEDM), Pp. 495 – 500, 2010.

Yan Zhu; Jian Yu; Caiyan Jia; “Initializing K-means Clustering Using Affinity Propagation”, Ninth International Conference on Hybrid Intelligent Systems (HIS '09), Vol. 1, Pp. 338 – 343, 2009.

Jieming Wu; Wenhu Yu; “Optimization and Improvement Based on K-Means Cluster Algorithm”, Second International Symposium on Knowledge Acquisition and Modeling (KAM '09), Vol. 3, Pp. 335 – 339, 2009.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.