Open Access Open Access  Restricted Access Subscription or Fee Access

A Review for Data Clustering Techniques

Millan K. John, Markus Stumptner

Abstract


Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. Clustering is a process of grouping objects with similar properties. Any cluster should exhibit two main properties; low inter-class similarity and high intra-class similarity. The goal of this survey is to provide a comprehensive review of different clustering techniques in data mining. Data mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is currently used in a wide range of profiling practices, such as marketing, surveillance, fraud detection, and scientific discovery. This paper gives an overview of different clustering algorithms used in large data sets. In addition the paper also describes the efficiency of Self-Organized Map (SOM) algorithm in enhancing the mixed data clustering.

Keywords


Data Clustering, Data Mining, Mixed Data Clustering, Self-Organized Map Algorithm.

Full Text:

PDF

References


Juha Vesanto and Esa Alhoniemi, “Clustering of Self-Organizing Map,” IEEE Transactions on Neural Networks, vol. 11, no. 3, May 2000, pp. 586-600.

J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.

Y. Gdalyahu, D. Weinshall, and M. Werman, “Self-Organization in Vision: Stochastic Clustering for Image Segmentation, Perceptual Grouping, and Image Database Organization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 10, pp. 1053-1074, Oct. 2001.

J. C. Bezdek and S. K. Pal, Eds., “Fuzzy Models for Pattern Recognition: Methods that Search for Structures in Data,” New York: IEEE, 1992.

Mark Girolami, “Mercer Kernel-based Clustering in Feature space,” IEEE Transactions on Neural Networks, vol. 13, no. 3, May 2002.

Bernd Fischer, and J. M. Buhmann, “Path-Based Clustering for Grouping of Smooth Curves and Texture Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 4, April 2003.

B. Fischer, T. Zoller, and J.M. Buhmann, “Path Based Pair wise Data Clustering with Application to Texture Segmentation,” Energy Minimization Methods in Computer Vision and Pattern Recognition, pp. 235-250, LNCS 2134, 2001.

Bernd Fischer, and J. M. Buhmann, “Bagging for Path-Based Clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 11, November 2003.

Leo Grady and Eric L. Schwartz, “Isoperimetric Graph Partitioning for Data Clustering and Image Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004.

Yaxin Bi, Sally McClean and Terry Anderson, “Improving Classification Decisions by Multiple Knowledge,” Proceedings of the 17th IEEE International Conference on Tools with Artificial Intelligence, 2005.

Zhijie Xu, Laisheng Wang, Jiancheng Luo and Jianqin Zhang, “A Modified Clustering Algorithm Data Mining,” IEEE 2005.

Massimiliano Pavan and Marcello Pelillo, “Dominant Sets and Pairwise Clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 1, January 2007.

M. Pavan and M. Pelillo, “Dominant Sets and Hierarchical Clustering,” Proceedings of IEEE International Conference Computer Vision, vol. 1, pp. 362-369, 2003.

M. Pavan and M. Pelillo, “Efficient Out-of-Sample Extension of Dominant-Set Clusters,” Advances in Neural Information Processing Systems 17,L.K. Saul, Y. Weiss, and L. Bottou, eds., pp. 1057-1064, 2005.

J.M. Buhmann, “Data Clustering and Learning,” Handbook of Brain Theory and Neural Networks, M. Arbib, ed., pp. 308-312, Bradfort Books/MIT Press, second ed., 2002.

A Tutorial on Clustering Algorithms, http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.