WordNet Based Concept Weight using Semantic Relation for Clustering Documents

Apeksha Charola; Sahista Machchhar

WordNet Based Concept Weight using Semantic Relation for Clustering Documents

Apeksha Charola, Sahista Machchhar

Abstract

This paper presents a novel technique by combining regular clustering techniques with information extracted from WordNet. There are two approaches for traditional clustering algorithms utilize in documents clustering area. First approach work with documents as bag of words and consider each term as independent (means ignore semantic relationships between words). Second approach can determine semantics using WordNet. The proposed technique isutilizing second approach with different (identity, synonym,direct hypernym and meronym relation) &weighted (identity > synonym >direct hypernym > meronym)semantic relation. Concepts are weighted by generating concepts chain of related concepts. It utilizes the WordNet in turn to create low dimensional vector space which allows to build an efficient clustering technique. The proposed technique can improve cluster quality as well as achieve low dimensional vector space compared to other techniques.

Keywords

Document Clustering, K-means Algorithm, WordNet, Concept Weighting, Synonym, Hypernym, Meronym.

Full Text:

PDF

References

AditiSharan, Nidhi Malik, Vajeti Mala ”Extracting Concepts using Linguistic Ontology in Agriculture Domain ” Journal Of The Indian Society Of Agricultural Statistics 67(1) 2013.

Jain, A.K, Murty, M.N., and Flynn P.J. “Data clustering: a review”. ACM Computing Surveys, pp. 31, 3, 264-323 1999.

M. Steinbach, G. Karypis, and V. Kumar.“A comparison of document clustering techniques” KDD Workshop on Text Mining 2000.

HmwayHmway Tar and ThiThiSoeNyaunt “Ontology-based Concept Weighting for Text Documents” World Academy of Science, Engineering and Technology 2011.

RekhaBaghel, Dr. RenuDhir "A Frequent Concepts Based Document Clustering Algorithm” International Journal of Computer Applications Volume 4 – No.5, July 2010.

JiWentian, GuoQingju, Zhong Sheng & Zhou En “Improved K-medoids Clustering Algorithm under Semantic Web”ICCSEE 2013.

B.S.Vamsi Krishna, P.Satheesh and Suneel Kumar R. “Comparative Study of K-means and Bisecting k-means Techniques in WordNet Based Document Clustering” International Journal of Engineering and Advanced Technology August 2012.

ZakariaElberrichi, AbdelattifRahmoun and Mohamed Amine Bentaalah “Using WordNet for Text Categorization” The International Arab Journal of Information Technology, January 2008.

Julian Sedding and DimitarKazakov“WordNet-based Text Document Clustering” 3rd Workshop on Robust Methods in Analysis of Natural Language Data.

Zheng, Kang, Kim. “Exploiting noun phrases and semantic relationships for text document clustering” Information Science.

Cognitive Science Laboratory at Princeton University Available at: http://www.cogsci. princeton .edu/.

FatihaBoubekeur, MohandBoughanem, Lynda Tamine and Mariam “Using WordNet for Concept-Based Document Indexing in Information Retrieval” SEMAPRO 2010.

Bo Yeong Kang and Sang Jo Lee “Document Indexing : A Concept Based Approach To Term Weight Estimation” Information Processing and Management 2005.

DinakarJayarajan, DiptiDeodhare, B.Ravindran and SandipanSakar “Document Clustering using Lexical Chain”.

Stanford Tagger available at- http://www.stanford.edu.

Description of JWNL packages available at - http://jwordnet.sourceforge.net.

Java API of WordNet study available at-http://shiffman.net/teaching/a2z/wordnet/

Experimental Dataset available at - http://archive.ics.uci.edu/ml/datasets/NSF+Research+Award+Abstracts.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me