Open Access Open Access  Restricted Access Subscription or Fee Access

Feature Selection: A New Perspective

S. Charles, Dr. L. Arockiam

Abstract


 Feature selection is one of the process data mining tasks. This process finds the optimal feature subset using machine learning and evaluation criteria. Several techniques are used to find the optimum features in artificial and text databases. In this paper, the machine learning based methods are classified as unsupervised learning, semi-supervised learning and supervised learning, which find the various features in the text databases, web databases and gene databases. The evaluation criteria based methods are categorized as Filter, Wrapper and Hybrid approach, which are employed to discover the optimal feature set in artificial datasets. These approaches are very useful in data mining process for improving the prediction performance, reducing the cost and understanding of the features. These issues are addressed by various techniques using measures like dependent and independent criterion. This survey explores the various feature selection processes and their uniqueness for finding the optimal feature subset in term of accuracy, robustness and efficiency


Keywords


Feature Subset Generation, Feature Subset Evaluation, Stopping Criteria, Feature Subset Validation, Dependent Criterion, Independent Criterion, Unsupervised Feature Selection, Semi-Supervised Feature Selection, Supervised Feature Selection, FilterApproach

Full Text:

PDF

References


Hui-Huang Hsu, Cheng-Wei Hsieh, , and Ming-Da Lu, “Hybrid feature selection by combining filters and wrappers”. An International Journal Expert Systems with Applications: Volume 38 Issue 7, 2011

J. Doak, “An Evaluation of Feature Selection Methods and Their Application to omputer Security,” technical report, Univ. Of California at Davis, Dept. Computer Science, 1992.

Huan Liu, and Lei Yu, Toward Integrating Feature Selection Algorithms ForClassification And Clustering, IEEE Transactions on Knowledge And Data Engineering, Vol. 17, No. 4, April 2005.Pearl. Heuristics. Addison-Wesley, 1983.

P. Narendra and K. Fukunaga. A Branch and Bound Algorithmfor Feature Subset Selection. IEEE Transactions on Computer, C–26(9):917–922, 1977.

H. Liu and H. Motoda. Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, London, GB, 1998.

Helyane Bronoski Borges, Julio Cesar Nievola,"Gene Selection from Microarray data", “Intelligent Text Categorization and Clustering” ,SCI 164, pp 1-23, Springer Verlag Berlin Heidelberg, 2009.

M. A. Hall. Correlation–based Feature Selection for Machine Learning. PhD thesis, University of Waikato, 1999.

Liu, H.; Dougherty, E.R.; Dy, J.G.; Torkkola, K.; Tuv, E.; Peng, H.; Ding, C.; Long, F.; Berens, M.; Parsons, L.; Zhao, Z.; Yu, L.; Forman, G.; “ Evolving Feature Selection”,Intelligent Systems, IEEE 2005

Luying Liu; Jianchu Kang; Jing Yu; Zhongliang Wang; ” A comparative study on unsupervised feature selection methods for text clustering”, IEEE International Conference on Natural Language Processing and Knowledge Engineering,2006.

Guangrong Li; Xiaohua Hu; Xiajiong Shen; Xin Chen; Zhoujun Li; ” A novel unsupervised feature selection method for bioinformatics data sets through feature clusterin”,IEEE International Conference on Granular Computing, 2008. GrC 2008.

Yanjun Li; Congnan Luo; Chung, S.M.;” Text Clustering with Feature Selection by Using Statistical Data” IEEE Transactions on Knowledge and Data Engineering, 2008.

Shengyi Jiang; Lianxi Wang, “Unsupervised feature selection based on clustering” IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA), 2010.

Li-Ping Jing; Hou-Kuan Huang; Hong-Bo Shi, “Improved feature selection approach TFIDF in text mining”, International Conference on Machine Learning and Cybernetics, IEEE 2003

Yang, S.M.; Xiao-Bin Wu; Zhi-Hong Deng; Ming Zhang; Dong-Qing Yang, “Relative term-frequency based feature selection for text categorization;Machine Learning and International Conference on Cybernetics, IEEE 2003

Y. Huang, P. J. McCullagh, N. D. Black, "Feature Selection via Supervised Model Construction," Data Mining, IEEE International Conference on, pp. 411-414, Fourth IEEE International Conference on Data Mining (ICDM'04), 2004

Mhamdi, F.; Elloumi, M.; Rakotomalala, R., “ Textmining, feature selection and datamining for proteins classification”, International Conference on Information and Communication Technologies: From Theory to Applications,IEEE 2004

Nuntiyagul, A.; Naruedomkul, K.; Cercone, N.; Wongsawang, D.; “PKIP: feature selection in text categorization for item banks”, 17th IEEE International Conference on Tools with Artificial Intelligence, 2005. ICTAI 05.

Yi Wang; Xiao-Jing Wang; “A new approach to feature selection in text classification International Conference on Machine Learning and Cybernetics, 2005. IEEE Proceedings of 2005

Shifei Ding, Fengxiang Jin, Xiaofeng Lei and ,” A Supervised Feature Extraction Algorithm for Multi-class” Proceeding FAW '08 Proceedings of the 2nd annual international workshop on Frontiers in Algorithmics Springer-Verlag, 2008

anisse Quinzán, José M. Sotoca, Filiberto Pla,”Clustering-based Feature Selection in Semi-supervised Problems”, Ninth International Conference on Intelligent Systems Design and Applications, 2009.

Yun Yang; Yanan Wu; “The improved features selection for text classification”,International Conference on Computer Engineering and Technology (ICCET), IEEE 2010.

Mike Wasikowski, Xue-wen Chen, "Combating the Small Sample Class Imbalance Problem Using Feature Selection," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1388-1400, October, 2010

Ghaderi, M.A.; Yazdani, N.; Moshiri, B.; Tayefeh Mahmoudi, M.;” A new approach for text feature selection based on OWA operator”, 5th International Symposium on Telecommunications (IST), IEEE 2011.

Julia Handl and Joshua Knowles, “ Semi-supervised feature selection via multiobjective optimization”, International Joint Conference on Neural Networks, 2006. IJCNN '06. IEEE Procedings (2006)

Jiangtao Ren, Zhengyuan Qiu, Wei Fan, Hong Cheng, and Philip S. Yu, “Forward Semi-Supervised Feature Selection”, Advances in Knowledge Discovery and Data Mining, Springer,(L NCS) 2008,

Yubo Cheng , Yunpeng Cai, Yijun Sun, Jian Li Pattern Recognition, 2008. ICPR 2008. 19th International Conference on . IEEE Explore - 2008 [2008]

Jidong Zhao et al, Ke lu et al Xiaofei He et al, “Locality Sensitive Semi-supervised Feature Selection” Journal Neurocomputing archive, Elsevier Science Publishers, 2008

Ianisse Quinzán, José M. Sotoca, Filiberto Pla, “Clustering-based Feature Selection in Semi-supervised Problems”, Ninth International Conference on Intelligent Systems Design and Applications, 2009

Ruichu Caia et al , Zhenjie Zhangb et al, Zhifeng Haoa et al, “A Bayesian Semi-Supervised Method for Classification Feature Selection”, Pattern Recognition Volume 44, Issue 4, Pages 811-820 Elsevier 2010

Zenglin Xu† Rong Jin‡Michael R. Lyu† Irwin King, “Discriminative Semi-Supervised Feature Selection via Manifold Regularization”, Neural Networks, IEEE Transactions on 2010.

M. Dash, K. Choi, P. Scheuermann, and H. Liu, “Feature Selection for Clustering-a Filter Solution,” Proc. Second Int’l Conf. DataMining, pp. 115-122, 2002.

H. Liu and H. Motoda, “Less Is More,” Feature Extraction, Construction and Selection: A Data Mining Perspective, pp. 3-12,chapter 1, 1998, second printing, 2001.

H. Liu and R. Setiono, “A Probabilistic Approach to Feature Selection-A Filter Solution,” Proc. 13th Int’l Conf. Machine Learning,pp. 319-327, 1996.

L. Yu and H. Liu, “Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution,” Proc. 20th Int’l Conf.Machine Learning, pp. 856-863, 2003.

R. Caruana and D. Freitag, “Greedy Attribute Selection,” Proc. 11th Int’l Conf. Machine Learning, pp. 28-36, 1994.

J.G. Dy and C.E. Brodley, “Feature Subset Selection and Order Identification for Unsupervised Learning,” Proc. 17th Int’l Conf.Machine Learning, pp. 247-254, 2000.

Y. Kim, W. Street, and F. Menczer, “Feature Selection for Unsupervised Learning via Evolutionary Search,” Proc. SixthACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, pp. 365-369, 2000.

R. Kohavi and G.H. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, nos. 1-2, pp. 273-324, 1997.

P. Langley, “Selection of Relevant Features in Machine Learning,” Proc. AAAI Fall Symp. Relevance, pp. 140-144, 1994.

.S. Das, “Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection,” Proc. 18th Int’l Conf. Machine Learning, pp. 74-81, 2001.

.A.Y. Ng, “On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples,” Proc. 15th Int’lConf. Machine Learning, pp. 404-412, 1998..

E. Xing, M. Jordan, and R. Karp, “Feature Selection for High- Dimensional Genomic Microarray Data,” Proc. 15th Int’l Conf. Machine Learning, pp. 601-608, 2001.

Lit-Hsin Loo, Roberts, S, Hrebien, L, Kam, M.. “ New filter-based feature selection criteria for identifying differentially expressed genes”, Fourth International Conference on Machine Learning and Applications, 2006.

Jinjie Huang; Yunze Cai; Xiaoming Xu ,”A Filter Approach to Feature Selection Based on Mutual Information”, 5th IEEE International Conference on Cognitive Informatics, 2007

Noelia Sánchez-Maroño, Amparo Alonso-Betanzos and María Tombilla-Sanromán,” Filter Methods for Feature Selection – A Comparative Study”, Intelligent Data Engineering and Automated Learning - IDEAL 2007 Springer (LNCS), 2007,

Liang Lan,Vucetic, S., ““A Multi-task Feature Selection Filter for Microarray Classification,” IEEE International Conference on Bioinformatics and Biomedicine, 2009. (BIBM '09)

Huang,Y.,Huang,B.Q., Kechadi,M.T, “A new filter feature selection approach for customer churn prediction in telecommunications”,IEEE International Conference on Industrial Engineering and Engineering Management (IEEM),2010

Wakamatsu-ku, Kitakyushu-shi, Fukuoka-ken, ““Feature subset selection: a correlation-based SVM filter approach”., Japan Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.,2011

Luis Talavera, “An evaluation of filter and wrapper methods for feature selection in categorical clustering”, Advances in Intelligent Data Analysis VI,Springer - LNCS, 2005

Buqun Zhang, Shangzhi Zheng, Hualong Bu, Jing Xia, "Relief wrapper based Kernel Partial Least Squares subspace selection," Computer Science and Information Technology, International Conference on, pp. 44-48, 2009 2nd IEEE International Conference on Computer Science and Information Technology, 2009.

Asha Gowda Karegowda, M.A.Jayaram, A.S. Manjunath, “Feature Subset Selection Problem using Wrapper Approach in Supervised Learning”, International Journal of Computer Applications 2010 (0975 – 8887)

Qin Yang, Elham Salehi and Robin Gras,“Using Feature Selection Approaches to Find the Dependent Features Artificial Intelligence and Soft Computing” Springer (LNCS) 2010.

Hu Min, Wu Fangfang,"Filter-Wrapper Hybrid Method on Feature Selection," Second WRI Global Congress on Intelligent Systems, gcis, vol. 3, pp.98-101, 2010

Hongbin Sun Hao Wang Boming Zhang Feng Zhao “A hybrid feature selection method based on mutual information”, Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD),2010.

Saúl Solorio-Fernández, J. Ariel Carrasco-Ochoa and José Fco. Martínez-Trinidad “Hybrid Feature Selection Method for Supervised Classification Based on Laplacian Score Ranking”, Advances in Pattern Recognition, Lecture Notes in Computer Science, 2010, Volume 6256/, pp- 260-269, 2010.

Hui-Huang Hsu, Cheng-Wei Hsieh, , and Ming-Da Lu, “Hybrid feature selection by combining filters and wrappers”, An International Journal Expert Systems with applications, Volume 38, Issue 7, pp- 8144-8150, July, 2011.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.