Open Access Open Access  Restricted Access Subscription or Fee Access

A Novel Clustering Data based on K-Means

Swapna Sunkara, K. Nageswara Rao, P. Upendar, Shaik. Nagasaidulu

Abstract


In this paper a new algorithm for clustering symbolic data based on K-Means algorithm is proposed .This new algorithm allows the data entry and the membership degree to be intervals. In our approach, we propose a dynamic document clustering based on structured MARDL technique. In this method, each document is assigned a weight by term frequency and inverse document frequency method using cosine similarity measure and then, the documents are first separated into clusters using k-Means method. The largest cluster will split and forms two sub clusters and this step would be repeated for many times until clusters formed are with high similarity. In addition, our approach tends to capture the intrinsic structure of a data set, e.g., the number of clusters. Simulation results demonstrate that our approach yields favorite results for a variety of temporal data clustering tasks. As our weighted cluster ensemble algorithm can combine any input partitions to generate a clustering ensemble, we also investigate its limitation by formal analysis and empirical studies.


Keywords


Clustering, K- Means, MARDAL

Full Text:

PDF

References


J. Kleinberg, ―An Impossible Theorem for Clustering,‖ Advances in Neural Information Processing Systems, vol. 15, 2002.

E. Keogh and S. Kasetty, ―On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Study,‖ Knowledge and Data Discovery, vol. 6, pp. 102-111, 2002.

A. Jain, M. Murthy, and P. Flynn, ―Data Clustering: A Review,‖ ACM Computing Surveys, vol. 31, pp. 264-323, 1999.

R. Xu and D. Wunsch, II, ―Survey of Clustering Algorithms,‖ IEEE Trans. Neural Networks, vol. 16, no. 3, pp. 645-678, May 2005.

P. Smyth, ―Probabilistic Model-Based Clustering of Multivariate and Sequential Data,‖ Proc. Int’l Workshop Artificial Intelligence and Statistics, pp. 299-304, 1999.

K. Murphy, ―Dynami c B ayesian Networks: Representation ,Inference and Learning,‖ PhD thesis, Dept. of Computer Science,Univ. of California, Berkeley, 2002.

Y. Xiong and D. Yeung, ―Mixtures of ARMA Models for Model-Based Time Series Clustering,‖ Proc. IEEE Int’l Conf. Data Mining,pp. 717-720, 2002.

A. Strehl and J. Ghosh, ―Cluster Ensembles—A Knowledge Reuse Framework for Combining Multiple Partitions,‖ J. Machine Learning Research, vol. 3, pp. 583-617, 2002.

S. Monti, P. Tamayo, J. Mesirov, and T. Golub, ―Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data,‖ Machine Learning, vol. 52, pp. 91-118, 2003.

X. Fern and C. Brodley, ―Solving Cluster Ensemble Problem by Bipartite Graph Partitioning,‖ Proc. Int’l Conf. Machine Learning,pp. 36-43, 2004.

A. Fred and A. Jain, ―Combining Multiple Clusterings Using Evidence Accumulation,‖ IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 6 pp. 835-850, June 2005.

N. Ailon, M. Charikar, and A. Newman, ―Aggregating Incon-sistent Information Ranking and Clustering,‖ Proc. ACM Symp. Theory of Computing (STOC ’05), pp. 684-693, 2005.

A. Gionis, H. Mannila, and P. Tsaparas, ―Clustering Aggregation,‖ ACM Trans. Knowledge Discovery from Data, vol. 1, no. 1, article no. 4, Mar. 2007.

V. Singh, L. Mukerjee, J. Peng, and J. Xu, ―Ensemble Clustering Using Semidefinite Programming,‖ Advances in Neural Information Processing Systems, pp. 1353-1360, 2007.

A. Topchy, M. Law, A. Jain, and A. Fred, ―Analysis of Consensus Partition in Cluster Ensemble,‖ Proc. IEEE Int’l Conf. Data Mining, pp. 225-232, 2004.

K. Chen, L. Wang, and H. Chi, ―Methods of Combining Multiple Classifiers with Different Feature Sets and Their Applications to Text-Independent Speaker Identification,‖ Int’l J. Pattern Recogni-tion and Artificial Intelligence, vol. 11, pp. 417-445, 1997.

K. Chen, ―A Connectionist Method for Pattern Classification on Diverse Feature Sets,‖ Pattern Recognition Letters, vol. 19, pp. 545-558, 1998.

K. Chen and H.Chi,― A Method of Combining Multiple Probabilistic Classifiers through Soft Competition on Different Feature Sets,‖ Neuro computing, vol. 20, pp. 227-252, 1998.

K. Chen, ―On the Use of Different Speech Representations for Speaker Modeling,‖ IEEE Trans. Systems, Man, and Cybernetics (Part C), vol. 35, no. 3, pp. 301-314, Aug. 2005.

S. Wang and K. Chen, ―Ensemble Learning with Active Data Selection for Semi-Supervised Pattern Classification,‖ Proc. Int’l Joint Conf. Neural Networks, 2007.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.