Open Access Open Access  Restricted Access Subscription or Fee Access

Fuzzy Possibilistic C-Mean Clustering Algorithm for Text Categorization

R. Karthika, S. Revathi

Abstract


The main aim of text categorization is the classification of documents into a fixed number of predefined categories. In text categorization, the dimensionality of the feature vector is usually high. Various approaches have been proposed to reduce the dimensionality of the feature vector while performing automatic text categorization. Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text classification. This work deals a Fuzzy Posibilistic C-Mean algorithm that reduces the dimensionality of a feature vector. It performs automatic categorization of text and hypertext documents using a Support Vector Machines (SVMs) classifier. Experimental results show that, proposed method improve the performance by reducing the number dimension required to obtain the cluster center.


Keywords


Text Categorization, Support Vector Machine, Fuzzy C-Mean, Fuzzy Posibilistic C-Mean.

Full Text:

PDF

References


Basu, Atreya, C. Walters, and M. Shepherd 2003. Support vector machines for text categorization. In System Sciences, 2003. Proceedings of the 36th Annual Hawaii International Conference on, pp. 7-pp.

Chavali, Anila, and Arun D. Kulkarni 2013. A fuzzy self-constructing algorithm for feature reduction. In System Theory (SSST), 2013 45th Southeastern Symposium on, pp. 35-40.

Christian Borgelt, Christian Doring, HeikoTimm& Rudolf Kruse 2004. An extension to possibilistic fuzzy cluster analysis. Fuzzy Sets and Systems, vol.147, issue.1, pp.3-16.

De-Shen Xia, Quan-Sen Sun &Ze-XuanJi 2011. A modified possibilistic fuzzy c-means clustering algorithm for bias field estimation and segmentation of brain MR image. Computerized Medical Imaging and Graphics, vol.35, issue.5, pp.383-397.

Forman, George 2003. An extensive empirical study of feature selection metrics for text classification. The Journal of machine learning research 3: 1289-1305.

James C Bezdek , James M Keller, Kuhu Pal & Nikhil R Pal 2005. A possibilistic fuzzy c-Means clustering algorithm. IEEE Transactions on Fuzzy Systems, vol.13, no.4, pp.517-530

Joachims, Thorsten 1998. Text categorization with support vector machines: Learning with many relevant features. Springer Berlin Heidelberg.

Lan, Man, Chew-Lim Tan, Hwee-Boon Low, and Sam-Yuan Sung 2005. A comprehensive comparative study on term weighting schemes for text categorization with support vector machines. In Special interest tracks and posters of the 14th international conference on World Wide Web, pp. 1032-1033.

Leopold, Edda, and Jörg Kindermann 2002. Text categorization with support vector machines. How to represent texts in input space. Machine Learning 46, no. 1-3: 423-444.

Nor Ashidi Mat Isa &SitiNorainiSulaiman 2010. Adaptive fuzzy-K-means clustering algorithm for image segmentation. IEEE Transactions on Consumer Electronics, vol.56, issue.4, pp.2661-2668.

Rennie, Jason DM, and Ryan Rifkin 2001. Improving multiclass text classification with the support vector machine.

Sebastiani, Fabrizio 2002. Machine learning in automated text categorization. ACM computing surveys (CSUR) 34, no. 1: 1-47.

Tong, Simon, and Daphne Koller 2002. Support vector machine active learning with applications to text classification. The Journal of Machine Learning Research 2: 45-66.

Yang, Yiming 1999. An evaluation of statistical approaches to text categorization. Information retrieval 1, no. 1-2: 69-90.

Zhang, Tong, and Frank J. Oles 2001. Text categorization based on regularized linear classification methods. Information retrieval 4, no. 1: 5-31.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.