Open Access Open Access  Restricted Access Subscription or Fee Access

To Improve the Classifier Accuracy on the Text Categorization Using Soft Computing Technique

Pragya Tiwari, Illyas Khan

Abstract


Text categorization is a conventional classification problem applied to the textual domain. It solves the problem of assigning text content to predefined categories. Automatic classification schemes can greatly facilitate the process of categorization. Categorization of documents is challenging, as the number of discriminating words can be very large. The traditional method of text categorization like KNN has a defect that the time of similarity computing is huge. In this paper, neural network technique Back propagation Layer and SOM Algorithm is proposed. The objective of this paper is to reduce the time and effort the user has to spend to find the information sought after. Keywords and phrases increase the effectiveness and efficiency of the search process. In the proposed approach, latent semantic indexing of SOM can be used to enhance the association between terms. A brief review is given on existing document clustering techniques. The proposed method will be efficient in terms of computational cost, accuracy and visualization. It can be easily adapted for large data set.

Keywords


KNN, ANN, Back Propagation, SOM

Full Text:

PDF

References


Boger, Z., Kuflik, T., Shoval, P., Shapira, B.(2001) Automatic keyword identification by artificial neural networks compared to manual identification by users of filtering systems, Information Processing and Management, 37:187-198.

Nerijus Remeikis,Ignas Skucas,vida Melninkait.E,”Text Categorization using Neural Networks Initialized with Decision

A. McCallum and K.Nigam, “A comparison of event models for naïve bayes text classification,” in Proc. AAAI / ICML-98 workshop on Learning for Text Categorization , 1998.

Lewis, D.D., and M. Ringuette (2001). A comparison of two learning algorithms for text categorization. InProceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas. pp. 81–93.

Raudys, S., and M. Skurichina (2003). The role of the number of training samples on weight initialization of artificial neural net classifier. In Neuroinformatics and Neurocomputers. Proc. RNNS/IEEE Symposium. Rostov-on-Don, Russia. pp. 343–353.

Raudys, S. (2001). Statistical and Neural Classifiers: an Integrated Approach to Design. Springer-Verlag, NY.

K. Lagus, S. Kaski, and T. Kohonen, “Mining massive document collections by the websom method,” Inf. Sci.., vol. 163, no. 1–3, pp. 135–156, 2004.

R. Rakotomalala, and F. Mhamdi, “Combining Feature Selection and Feature Reduction for Protein Classification”, Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization, Lisbon, Portugal, September 2006, pp. 444-451.

B. Yu, X. Zong-ben, and L. Cheng-hua, “Latent Semantic Analysis for Text Categorization Using Neural Network”, Knowledge-Based Systems journal, 2008, 21, pp. 900-904.

L. Manevitz and M. Yousef, “Document classification on neural networks using only positive examples,” in Proc. 23rd Annu. Int. ACM SIGIR Conf. Research and Development in Information Retrieval, vol. 34, pp. 304–306, 2000.

R. Kondadadi and R. Kozma, “A modified fuzzy art for soft document clustering,” in Proc. Int. Joint Conf. Neural Networks.IJCNN’02, vol. 3. Honolulu, HI, pp. 2545-2549, 2002.

L. Massey, “On the quality of art1 text clustering,” Neural Netw., vol. 16, no. 5-6, pp. 771-778, 2003.

BoYu, Zong-ben Xu, and Cheng-hua Li, ”Latent semantic analysis for text categorization using neural network, ” Knowledge-Based Systems 21 pp. 900–904, 2008.

T. Joachims,”Text categorization with support vector machines : Learning with many relevant features,” in Proc. Machine learning: EMCL-98, 10th Eur. Conf. machine learning , pp. 137-142, 1998.

R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, Inc., New York, NY, 1973.


Refbacks

  • There are currently no refbacks.