Open Access Open Access  Restricted Access Subscription or Fee Access

Document Clustering Using Firefly Algorithm

Seerat Preet Kaur, Neena Madan

Abstract


Document clustering is an important technique that has been widely employed in Information Retrieval (IR). Various clustering techniques have been reported, but the effectiveness of most of these techniques relies on the initial value of k clusters. Such an approach may not be suitable as we may not have prior knowledge on the collection of documents. To date, there are various swarm based clustering techniques proposed to address such problem including this paper that explores the adaptation of Firefly Algorithm (FA) in document clustering. We extend the work on Gravitation Firefly Algorithm (GFA) by introducing a relocate mechanism that relocates assigned documents, if necessary. The newly proposed clustering algorithm, known as GFA_R, is then tested on a benchmarked dataset obtained from the 20Newsgroups. Experimental results on external and relative quality metrics for the GFA_R are compared against the one obtained using the standard GFA. It is learned that by extending GFA to becoming GFA_R, a better quality clustering is obtained.


Keywords


Clustering Process, Data Mining, Document Clustering, Firefly Algorithm, Gravitational Firefly Algorithm,

Full Text:

PDF

References


Jusoh Shaidah and Alfawareh Hejab M., “Techniques Applications and Challenging Issue in Text Mining uses, Applications”, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 6, No 2, November 2012 ISSN (Online): 1694-0814

Gupta Vishal and Lehal Gurpreet S., “A Survey of Text Mining Techniques and Applications”, Journal of Emerging Technologies in Web Intelligence, VOL.1, NO.1, August, 2009

Shehata Shady, “Enhancing Text Clustering using Concept-based Mining Model”, Proceedings of the Sixth International Conference on Data Mining (ICDM'06) 0-7695-2701-9/06/2006

Khare Akhil, Jadhav Amol N., “An Efficient Concept-Based Mining Model For Enhancing Text Clustering” ,IJAET/Vol.II/ Issue IV/October-December, 2011

Shehata Shady, “A WordNet-based Semantic Model for Enhancing Text Clustering”, IEEE International Conference on Data Mining Workshops, IEEE, 2009

Steinbach Michael, “A Comparison of Document Clustering Techniques”, University of Minnesota, Technical Report #00-034 (2000).

Azaryuon Kayvan , Fakhar Babak, “A Novel Document Clustering Algorithm Based on Ant Colony Optimization Algorithm”, Journal of mathematics and computer Science Vol.7 , pp. 171 -180, 2013.

Abdel Hamid Nihal M., AbdelHalim M.B. & Fakhr M.W., “Document clustering using Bees Algorithm‖”, International Conference of Information Technology, IEEE, Indonesia, 2013.

Salton G. and McGill M. J., Introduction to Modern Information Retrieval. McGraw-Hill, 1983.

Miller G. A., “Wordnet: a lexical database for English,” Commun. ACM, vol. 38, no. 11, pp. 39–41, 1995.

Drakshayani B. and Prasad E.V., “Semantic Based Model for Text Document Clustering with Idioms”, International Journal of Data Engineering (IJDE), Volume (4): Issue (1):2013

Charu C. Aggarwal, ChengXiang Zhai,” A SURVEY OF TEXT CLUSTERING ALGORITHMS”.

Neepa Shah, Sunita Mahajan,” Document Clustering: A Detailed Review” International Journal of Applied Information Systems (IJAIS) – ISSN: 2249-0868 Foundation of Computer Science FCS, New York, USA Volume 4– No.5, October 2012

Athraa Jasim Mohammed, Yuhanis Yusof, Husniza Husni,” Document Clustering Based on Firefly Algorithm” Journal of Computer Science 2015, 11 (3): 453.465 DOI: 10.3844/jcssp.2015.453.465.

Rekha Behgal, Dr. Renu Dhir,” A Frequent Concepts Based Document Clustering Algorithm”, International Journal of Computer Applications (0975 – 8887) Volume 4 – No.5, July 2010.


Refbacks

  • There are currently no refbacks.