Open Access Open Access  Restricted Access Subscription or Fee Access

The Centroid Initialization for K-Means Clustering Algorithm based on T-Score Ranking Method

V. Kathiresan, Dr.P. Sumathi

Abstract


In this paper, we propose an algorithm to compute initial cluster centers for K-means clustering based on T-Score ranking. This scoring technique is a statistical method of ranking numerical and nominal attributes based on distance measure. The data are sorted based on the score values. Then divide the ranked data into k subsets. Calculate the mean values of each k subsets. Pick the nearby value of data to the mean as the initial centroid. The experimental results suggest that the proposed algorithm is effective, converge to better clustering results than those of the random initialization method. The research also indicated the proposed algorithm would greatly improve the likelihood of every cluster containing some data in it.

Keywords


Clustering Algorithm, K-means Clustering, Centroid Initialization, K Medoid Clustering

Full Text:

PDF

References


J.M Pena, J.A Lozanoa and P Larranagaa, "An empirical comparison of four initialization methods for the K-Means algorithm", Pattern Recognition Letters, Volume 20, Issue 10, October 1999, Pages 1027-1040

Gabriela Trazzi Perim, Estefhan Dazzi Wandekokem and Flavio Miguel Varejao, ―K-Means Initialization Methods for Improving Clustering by Simulated annealing‖, Advances in Artificial Intelligence – IBERAMIA 2008, Volume 5290/2008, 133-142, 2008

K. Krishna and M. Narasimha Murty, ―Genetic K-Means Algorithm‖, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 29, NO. 3, JUNE 1999.

Fahim A.M, Salem A. M, Torkey A and Ramadan M. A, ―An Efficient enhanced k-means clustering algorithm,‖ Journal of Zhejiang University, 10(7):1626–1633, 2006.

Huang Z, ―Extensions to the k-means algorithm for clustering large data sets with categorical values,‖ Data Mining and Knowledge Discovery, (2):283–304, 1998.

Jiawei Han M. K, Data Mining Concepts and Techniques, Morgan Kaufmann Publishers, An Imprint of Elsevier, 2006.

Margaret H. Dunham, Data Mining- Introductory and Advanced Concepts, Pearson Education, 2006. Proceedings of the World Congress on Engineering 2009 Vol I WCE 2009, July 1 - 3, 2009, London, U.K. ISBN: 978-988-17012-5-1 WCE 2009

McQueen J, ―Some methods for classification and analysis of multivariate observations,‖ Proc. 5th Berkeley Symp. Math. Statist. Prob., (1):281–297, 1967.

Dharmendra K Roy and Lokesh K Sharma, ―Genetic k-Means clustering algorithm for mixed numerical and categorical data sets‖, International journal for Artificial intelligence and Applications (IJAIA), Vol 1, No 2, April 2010

Pang-Ning Tan, Michael Steinback and Vipin Kumar, Introduction to Data Mining, Pearson Education, 2007.

F. Yuan, Z. H. Meng, H. X. Zhangz, C. R. Dong, ― A New Algorithm to Get the Initial Centroids,‖ proceedings of the 3rd International Conference on Machine Learning and Cybernetics, pp. 26-29, August 2004.

Yuan F, Meng Z. H, Zhang H. X and Dong C. R, ―A New Algorithm to Get the Initial Centroids,‖ Proc. of the 3rd International Conference on Machine Learning and Cybernetics, pages 26–29, August 2004.

Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Diego, 2001.

P. Mitra, C.A. Murthy, S.K. Pal, ―Density based multiscale data condensation‖, IEEE Trans, Pattern Anal, Machine Intell, 24 (6), 2002,pp. 734–747.

S. S. Khan and A. Ahmad, ―Cluster Center Initialization for K-mean Clustering‖, Pattern Recognition Letters, Volume 25, Issue 11, 2004, pp. 1293-1302


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.