CAK-NN Algorithm: Cluster and Attribute Weightage-Based Algorithm for Effective Classification

Parvinder S. Sandhu; Dalvinder S. Dhaliwal; S.N. Panda

CAK-NN Algorithm: Cluster and Attribute Weightage-Based Algorithm for Effective Classification

Parvinder S. Sandhu, Dalvinder S. Dhaliwal, S.N. Panda

Abstract

The task of classification is to assign a new object to a class from a given set of classes based on the attribute values of the object. The k-Nearest Neighbor (k-NN) is one of the simplest classification methods used in data mining and machine learning. Although k-NN can be applied broadly, it has few inherent problems, which is why researchers have proposed different extensions of the k-NN, or even ensemble formulations of k-NN classifiers. In our proposed CAk-NN (cluster and attribute weighted k-NN algorithm) algorithm, weight is assigned to each and every attribute of the training dataset so that the accurate distance matching can be possible. In addition to, clustering the training dataset reduces the execution time that is taken for classification and the resultant clusters are used to classify test instances. For this, we have proposed an attribute weighted k-means clustering algorithm that is used for partition the training dataset. After that, each centroid of the obtained cluster constitutes the sub-sample of input database, which is then used for classification. For testing case, distance measure based on attribute weight is calculated between a test instances with the mean of each cluster of training dataset. According to the computed distance measure, k-nearest neighbor cluster are identified and the class label is assigned if every cluster is from the same class. Otherwise, the relevant data records from the k-nearest cluster are retrieved and k-nearest neighbor data records are identified. Finally, the performance of the proposed CAk-NN algorithm is compared with the k-NN algorithm in terms of computation time and Classification accuracy using IRIS dataset.

Keywords

Classification, Clustering, K-Nearest Neighbor Algorithm, K-Means Clustering Algorithm, Distance Measure, CAK-NN (Cluster And Attribute Weighted K-NN Algorithm)

Full Text:

PDF

References

S. P. Deshpande and V. M. Thakare, “Data Mining System And Applications: A Review”, International Journal of Distributed and Parallel systems, Vol. 1, No. 1, pp. 32-44, 2010.

Aynur Dayanik, Craig G. Nevill-Manning, "Clustering in Relational Biological Data", ICML-2004 Workshop on Statistical Relational Learning and Connections to Other Fields, pp: 42-47, 2004.

Thair Nu Phyu, “Survey of Classification Techniques in Data Mining”, In Proceedings of the International MultiConference of Engineers and Computer Scientists, Hong Kong , Vol. 1, pp 18-20, 2009.

Yan-Nei Law and Carlo Zaniolo, “An Adaptive Nearest Neighbor Classification Algorithm for Data Streams”, In proceedings of 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, pp. 108-120,2005.

Lei Wang, Latifur Khan and Bhavani Thuraisingham, “An Effective Evidence Theory based K-nearest Neighbor (KNN) classification”, In proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology, Sydney, NSW ,Vol. 1, pp . 797-801,2008.

Chuanyao Yang , Yuqin Li , Chenghong Zhang and Yunfa Hu, “A Fast KNN Algorithm Based on Simulated Annealing”, In Proceedings of the International Conference on Data Mining, Las Vegas, Nevada, USA, pp .25-28, 2007.

Li Baoli, Yu Shiwen, and Lu Qin, “An Improved k-Nearest Neighbor Algorithm For Text Categorization”, In Proceedings of the 20th international conference on computer processing of oriental languages,Shenyang, China, 2003.

Hamid Parvin, Hosein Alizadeh and Behrouz Minaei-Bidgoli, “MKNN: Modified K-Nearest Neighbor”, In Proceedings of the World Congress on Engineering and Computer Science, pp. 22-24, 2008.

D. A. Stanley, Zineng Yuan, A. Bonner and Zhaolei Zhang, “A Deep Non-linear Feature Mapping for Large-Margin KNN Classification”, In proceedings of the Ninth IEEE International Conference on Data Mining,Miami, Florida, USA, pp. 357 – 366, 2009.

Qi Yu, Antti Sorjamaa, Yoan Miche, Eric Severin and Amaury Lendasse,“Optimal Pruned K-Nearest Neighbors: OP-KNN –Application to Financial Modeling”, In proceedings of 8th International Conference on Hybrid Intelligent Systems, Barcelona, Spain, No. 1, 2008.

Muhammad Miah, “Improved k-NN Algorithm for Text Classification”,In Proceedings of the 2009 International Conference on Data Mining, pp.434-440, Las Vegas, USA, 2009.

J. B. MacQueen, “Some Method for Classification and Analysis of Multivariate Observations”, Proc. of Berkeley Symp. on Mathematical Statistics and Prob., Berkeley, U. of California Press, vol. 1, pp. 281-297,1967.

Hans-Peter Kriege, Alexey Pryakhin, and Matthias Schubert,“Multi-represented KNN-Classification for Large Class Sets”, In Proceedings of International conference on Database systems for Advanced Applications, Beijing, chine, Vol. 3453, pp. 511-522, 2005.

Sampath Deegalla and Henrik Bostrom, “Classification of Micro arrays with KNN: Comparison of Dimensionality Reduction Methods,” In proceedings of Intelligent Data Engineering and Automated Learning - IDEAL 2007, 8th International Conference, Birmingham, UK,2007.

Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu,Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach, David J. Hand and Dan Steinberg, “Top 10 Algorithms in Data Mining,” Knowledge and

Information Systems, Vol. 14, No. 1,pp. 1-37,2007.

Vincent Garcia, Eric Debreuve, Frank Nielsen, and Michel Barlaud, “k-nearest neighbor search: fast GPU-based implementations and application to high-dimensional feature matching”, In Proceedings of the IEEE International Conference on Image Processing (ICIP), Hong Kong,

China, pp. 3757-3760, September 2010.

W. Buntine,” Learning classification trees”, In D. J.Hand, editor, Artificial Intelligence frontiers in statistics, Chapman & Hall, London, pp 182–201,1993.

J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993.

C. J. C. Burges, “A tutorial on support vector machines for pattern recognition”, Data Mining and Knowledge Discovery, Vol. 2, No 2,pp.121–167, 1998.

J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities”, In Proceedings NatL Acad. Sci, USA, Vol. 79, pp. 2554-2558, 1982.

R. E. Neapolitan, Learning Bayesian Networks, Prentice Hall, Upper Saddle River, NJ, 2004.

H. Zhang, A. C. Berg, M. Maire, and J. Malik, “SVM-KNN:

Discriminative nearest neighbor classification for visual category recognition”, In International Conference on Computer Vision and Pattern Recognition, New York (NY), USA, 2006.

M. N. Goria, N. N. Leonenko, V. V. Mergel, and P. L. Novi Inverardi, “A new class of random vector entropy estimators and its applications in testing statistical hypotheses”, J. Nonparametric. Stat., Vol. 17, pp.277–297, 2005.

F. Pan, B. Wang, X. Hu, and W. Perrizo, “Comprehensive vertical sample-based knn/lsvm classification for gene expression analysis”, J. Biomed. Inform, Vol. 37, pp. 240–248, 2004.

Lin Chang and Xue Bai, “Data Mining: A Clustering Application”, In Proceedings of PACIS, 2010.

Ganesh Kumar M. and Arun Ram. K., “Controlling Free Riders in Peer to Peer Networks by Intelligent Mining”, International Journal of Computer and Electrical Engineering, Vol. 1, No. 3, pp. 288-292, 2009.

Juan Zhang, Yi Niu and Wenbin He, “Using Genetic Algorithm to Improve Fuzzy k-NN”, In Proceedings of International Conference on Computational Intelligence and Security, Vol. 1, pp. 475-479, 2008.

Xian Yang Li and Nong Ye, “A Supervised Clustering and Classification Algorithm for Mining Data with Mixed Variables”, IEEE Transactions on Systems, Man, and Cybernetics—Part a: Systems and Humans, Vol. 36,No. 2, 2006.

K. Kozak, M. Kozak and K. Stapor, “Weighted k-Nearest-Neighbor Techniques for High Throughput Screening Data”, International Journal of Biological and Life Sciences, International Journal of Biological and Life Science Vol. 1, No. 3, pp. 155-160, 2005.

Stefanos Ougiaroglou, Alexandros Nanopoulos, Apostolos N. Papadopoulos, Yannis Manolopoulos and Tatjana Welzer-Druzovec,“Adaptive k -Nearest-Neighbor Classification Using a Dynamic Number of Nearest Neighbors”, Lecture Notes in Computer Science, Vol. 4690,pp. 66-82, 2007.

Wenqian Shang, Houkuan Huang, Haibin Zhu, Yongmin Lin, zhihai Wang, Youli Qu, “An Improved KNN Algorithm - Fuzzy KNN ”, In proceedings of International Conference, Computational Intelligence and Security, Xi'an, China, 2005.

Yang Song, Jian Huang, Ding Zhou, Hongyuan Zha1,, and C. Lee Giles, “IKNN: Informative K-Nearest Neighbor Pattern Classification”, In Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases, pp.249-264, 2007.

Zhi-Hong Deng and Shi-Wei Tang, “A Non-VSM kNN Algorithm for Text Classification”, Lecture Notes in Computer Science, Vol. 3584,pp.339-346, 2005.

D. T. Pham, and A. A. Afify, “Clustering techniques and their applications in engineering”, Proceedings of Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science, Vol. 221,No. 11, pp. 1445-1460, 2007.

Sumithra Devi K.A. and M. N. Vijayalakshmi, “Classification Algorithms in Achieving Partitioning Optimization for VLSI Applications”,International Journal of Computer Theory and Engineering, Vol. 2, No. 6,pp. 1793-8201, December 2010.

Quansheng Kuang and Lei Zhao, “A Practical GPU Based KNN Algorithm”, In Proceedings of the Second Symposium International Computer Science and Computational Technology (ISCSCT ’09), pp.151-155, December 2009.

B.V. Dasarathy, “Nearest Neighbor Norms: NN Pattern Classification Techniques”, IEEE Computer Society Press, 1991.

K. Mumtaz and K. Duraiswamy, “A Novel Density based improved k-means Clustering Algorithm – Dbkmeans”, International Journal on Computer Science and Engineering, Vol. 2, No. 2, pp.213-218, 2010.

Andrew Secker and Alex A. Freitas, “WAIRS: improving classification accuracy by weighting attributes in the AIRS classifier”, In Proceedings of the 2007 IEEE Congress on Evolutionary Computation, pp. 3759-3765,September 2007.

Mehdi Moradian and Ahmad Baraani, “KNNBA:

K-Nearest-Neighbor-Based-Association Algorithm”, Journal of

Theoretical and Applied Information Technology, Vol. 6, No. 1, pp.123-129, 2009.

Chang Yin Zhou and Yan Qiu Chen, “Improving Nearest Neighbor Classification With Cam Weighted Distance”, Journal Pattern Recognition, Vol. 39, April 2006.

Fahim. A.M., Salem. A.M., Torkey F.A., Ramadan M.A., "An efficient enhanced k-means clustering algorithm", Journal of Zhejiang University science, Vol. 7, No.10, pp.1626-1633, 2006.

Classification Accuracy from

http://www.gepsoft.com/gxpt4kb/Chapter09/Section4/SS05.html Iris dataset from http://archive.ics.uci.edu/ml/datasets/Iris.wine dataset from,

http://archive.ics.uci.edu/ml/datasets/Wine

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me