Open Access Open Access  Restricted Access Subscription or Fee Access

An Enhanced Projected Clustering Algorithm for High Dimensional Space

B. Shanmugapriya, M. Punithavalli, G. Selvavinayagam

Abstract


Clustering is a data mining technique for identifying groups in the data set based on some similarity measure. Clustering high dimensional data has been a major challenge due to the inherent sparsity of the points. Most existing clustering algorithms become substantially inefficient if the required similarity measure is computed between data points in the full dimensional space. A number of projected clustering algorithms have been proposed to overcome the above issue. This led to the development of a robust partitional distance based projected clustering algorithm based on K-means algorithm with the computation of distance restricted to subsets of attributes with dense object values. The algorithm is capable of detecting projected clusters of low dimensionality embedded in a high-dimensional space and avoids the computation of the distance in full-dimensional space. The algorithm has been demonstrated using synthetic and real datasets.

Keywords


Clustering, High Dimensional Data, Projected Cluster, K-Means Clustering, Subspace Clustering

Full Text:

PDF

References


R.Aggarwal, J.Gehrke, D.Gunopulos and P.Raghavan, “Automatic Subspace Clustering of High Dimensional Data,” Data Mining and Knowledge Discovery, vol.11, no.1, pp.5-33, 2005.

C.C.Aggarwal, C.Procopuic, J.L.Wolf, P.S.Yu, and J.S.Park, “Fast Algorithm for Projected Clustering,” Proceeding of ACM SIGMOD ’99, pp.61-72, 1999.

K.Y.L.Yip, D.W.Cheng,and M.K.Ng, “On Discovery of Extremely Low Dimensional Clusters Using Semi-Supervised Projected Clustering,”, Proceedings of 21st International Conference in Data Engineering(ICDE ’05) pp.329-340, 2005.

C.C.Aggarwal and P.S.Yu, “Redefining Clustering For High Dimensional Applications”, IEEE Transactions on Knowledge and Data Engineering, vol.14, no.2, pp. 210-225, Mar./Apr.2002.

K.Y.L.Yip, D.W.Cheng, and M.K.Ng, “HARP: A Practical Projected Clustering Algorithm,” IEEE Transactions on Knowledge and Data Engineering, vol.16, no.11, pp.1387-1397, Nov.2004.

C.M.Procopuic, M.Jones, P.K.Aggarwal, and T.M.Murali, “Monte Carlo Algorithm For Fast Projective Clustering,” Proceedings of ACM SIGMOD ’02, pp.418-427, 2002.

M.Ling and N.Mamoulis, “Iterative Projected Clustering by Subspace Mining,”, IEEE Transactions on Knowledge and Data Engineering, vol.17, no.2, pp.176-189, Feb.2005.

E.K.K.Ng, A.W.Fu, and R.C.Wong, “Projective Clustering by Histograms,” IEEE Transactions on Knowledge and Data Engineering, vol.17, no.3, pp.369-383, Mar.2005.

J.Han and M.Kamber, Data Mining, Concepts and Techniques .Morgan Kaufman, 2001.

A.Patrikainen and M.Meila, “Comparing Subspace Clusterings,” IEEE Transactions on Knowledge and Data Engineering, vol.18, no.7, pp.902- 916, Jul.2006.

Mohammed Bouguessa and Shengrui Wang, “Mining Projected Clusters in High-Dimensional Spaces”, IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 4, pp. 507 – 522, April 2009.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.