Open Access Open Access  Restricted Access Subscription or Fee Access

Comparative Analysis of Optimization Algorithms for Document Clustering

K. Karpagam, A. Saradha

Abstract


Document clustering or text clustering is an unsupervised technique and it is used to grouping the documents of same context. Document clustering algorithms are widely used in web searching engines to produce results relevant to a query. Today, the information in websites is growing in huge size and it leads to the process of managing, retrieve the required and updated information is a tedious task. Also necessary to obtain the exact information required by the user from the documents. Recently optimization algorithms are introduced and are applied to the clustering algorithms. The Genetic Algorithm and Cuckoo Search algorithms are meta-heuristic optimization algorithms and are used to obtain the optimum solutions. In this paper, Genetic Algorithm and Cuckoo Search algorithm based Domain-specific Keyword Similarity based Knowledgebase Creation algorithm are proposed to optimize the document clustering to answers the question answering system. The experimental were conducted on benchmark datasets and the performance was analyzed in terms of Precision, Recall, F1, Missrate, Fallout and Purity.


Keywords


Cuckoo Search, Document Clustering, Genetic Algorithm, Information Processing Knowledge Base, Text Mining.

Full Text:

PDF

References


J.H. Holland, Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press, 1975.

Y.Xin-She, S.Deb, “Cuckoo search via lévy flights”, World Congress on Nature & Biologically Inspired Computing, NaBIC, , 2009,pp. 210–214.

X.S.Yang and S.Deb, "Engineering Optimization by Cuckoo Search", J. Mathematical Modeling and Numerical Optimization, vol. 1, no. 4, 2010.

K.Karpagam and A.Saradha, ”An Improved Question Answering System Using Domain Context Specific Document Clustering with Wordnet”, International Journal of Printing, Packaging & Allied Sciences, 2016, Volume 4, No. 5, Pages 3257 -3265

H.Yang, T.Chua, S.Wang, C.Koh, ”Structured use of external knowledge for event- based open domain question answering“, In Proceedings of the annual international ACM SIGIR conference on research and development in information retrieval ACM, 2003, pp. 33–40.

J.Jeon, W.Croft, and J.Lee, “Finding semantically similar questions based on their answers”, In Proceedings of the annual international ACM SIGIR conference on research and development in information retrieval, 2005.

K.Iman, S.A.Mohammad, “Genetic programming-based feature learning for question answering”, Elsevier- Information Processing and Management, 2016.

T.Ming, D.S.Cicero, X.Bing and Z. Bowen, Improved Representation Learning for Question Answer Matching”,, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, August, 2016,pp. 7-12.

P.Pathak, M.Gordon, and W.Fan, “Effective information retrieval using genetic algorithms based matching functions adaption,” in Proc.33rd Hawaii International Conference on Science (HICS), Hawaii, USA, 2000.

E.Abdessamad, H.Ulf, H.Eduard, M.Daniel, M.Eric, and R.Deepak, “How to Select Answer String”, Springer Netherlands, 2006.

A.Mansaf, S.Kishwar, Web Search Result Clustering based on Cuckoo Search and Consensus Clustering”, Indian Journal of science and Technology,Volume 9, Issue 15, April, 2016.

C.Cobos, H.M.Collazos, R.U.Munoz, M. Medoza, E.Leon and E.H.Veidema, “Clustering of web search results based on cuckoo search algorithm and balanced Bayesian information criterion”, Information Sciences,.2014,pp. :248- 264.

J. Sethilnath, V. Das, S.N. Omkar, and V. Maniv, “Clustering using Levy flight cuckoo search”, Proceedings of Seventh International Conference on Bio-Inspired Computing: Theories andApplications,BIC-TA, 2012.

S.Liu, F.Liu, C.Yu, and W. Meng, “An effective approach to document retrieval via utilizing WordNet and recognizing phrases", In Proceedings of the annual international ACM SIGIR conference on research and development in information retrieval (pp.266–272), ACM, 2004.

Voorhess, Ellen, Graff, and David,” AQUAINT-2 information retrieval text research collection LDC2002T25, Web Download. Philadelphia, Linguistic data consortium 2008.

M.Saeedeh, K.Dietrich, “Bridging the vocabulary gap between questions and answer sentences”, Elsevier- Information Processing and Management, 2015.

S.Gunnar et al, “Setting Goals and Choosing Metrics for Recommender System Evaluations”, 5th ACM Conference on Dresden University of Technology Recommender Systems,Chicago, 2011.

Heie, H. Matthias, Whittaker, W.D.Edward and S.Furui, “Question answering using statistical language modeling”,Computer Speech and Language, 26, , 2012 pp. 193–209.

http://qwone.com/~jason/20Newsgroups/

Graff, David, “The AQUAINT corpus of English News Text”, LDC2002T31, Web Download. Philadelphia, Linguistic data consortium, 2002.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.