Open Access Open Access  Restricted Access Subscription or Fee Access

A New Methodology to Overcome High Dimensionality Problem in Data Mining

V. Arul Kumar, L. Arockiam

Abstract


Classification is one of the important techniques of data mining. In the classification task, features play a vital role. Therefore, selecting the relevant features becomes an essential task. Though many feature selection algorithms are available many research works are been carried out to improve the classification accuracy. In this paper, a new methodology is proposed with three different feature selection algorithms to improve the classification accuracy by selecting the relevant features.


Keywords


Data Mining, Feature Selection, Filter Approach, k-NN Algorithm, Naïve Bayesian Algorithm, J48 Algorithm

Full Text:

PDF

References


Pushpalata Pujari, Jyoti Bala Gupta, Improving Classification Accuracy by Using Feature Selection and Ensemble Model, International Journal of Soft Computing and Engineering, Volume 2, Issue 2, 2012, 380-386.

Jinjie Huang, Yunze Cai, and Xiaoming Xu, A Filter Approach to Feature Selection Based on Mutual Information, Proceedings of the IEEE International Conference on Cognitive Informatics, Volume 1, 2006, pp. 84-89.

Huilin Zhou, Jianbin Wu, Yuhao Wang, and Mao Tian, Wrapper Approach for Feature Subset Selection using GA, Proceedings of the International Symposium on Intelligent Signal Processing and Communication Systems, 2007, pp. 181-191.

Li-Yeh Chuang, Kuo-Chuan Wu, and Cheng-Hong Yang, Hybrid Feature Selection Method using Gene Expression Data, Proceedings of the IEEE Conference on Soft Computing in Industrial Applications, 2008, pp. 199-204.

Antonio Mucherino, Petraq J. Papajorgji, and Panos M. Pardalos, k-Nearest Neighbor Classification, Journal of Data Mining in Agriculture, Volume 34, 2009, pp. 83-106.

Min-Ling Zhang, José M. Peña, and Victor Robles, Feature Selection for Multi-Label Naïve Bayes Classification, International Journal of Information Science, Volume 179, Issue 19, 2009, pp. 3218-3229.

Guangzhi Qu, Hui Zhang, and Hartrick, C.T., Multi-label Classification with Bayes' Theorem, Proceedings of the International Conference on Biomedical Engineering and Informatics, Volume 4, 2011, pp. 2281-2285.

Alok Sharma, and Kuldip K. Paliwal, Rotational Linear Discriminant Analysis Technique for Dimensionality Reduction, IEEE Transactions on Knowledge and Data Engineering, Volume 20, Issue 10, 2008, pp. 1336-1347.

Liu Yuxun, and Xie Niuniu, Improved ID3 Algorithm, Proceedings of the IEEE International Conference on Computer Science and Information Technology, Volume 8, 2010, pp. 465-468.

Li Rui, Wei Xian-mei, and Yu Xue-wei, The Improvement of C4.5 Algorithm and Case Study, Proceedings of the 2nd International Symposium on Computational Intelligence and Design, Volume 2, 2009, pp. 190-192.

Deepali Saini, and Anand Rajavat, Performance of Decision Tree Algorithms in Knowledge Based System, International Journal of Computer Science & Information Technology Volume 1, Issue 10, 2011, pp. 734-743.

Yogendra Kumar Jain, and Upendra, An Efficient Intrusion Detection Based on Decision Tree Classifier Using Feature Reduction, International Journal of Scientific and Research Publications, Volume 2, Issue 1, 2012, pp. 1-6.

Meng Wang, Kun Gao, Li-jing Wang, and Xiang-hu Miu, A Novel Hyperspectral Classification Method Based on C5.0 Decision Tree of Multiple Combined Classifiers, Proceedings of the 4th International Conference on Computational and Information Sciences, 2012, pp. 373 – 376.

UCI Machine Learning Repository, (http://archive.ics.uci.edu/ml /datasets.html/dated: 08/08/2012).


Refbacks

  • There are currently no refbacks.