Open Access Open Access  Restricted Access Subscription or Fee Access

Semi-Automatic Domain Ontology Construction for Tamil Documents

M. S. Girija, T. Mala, T. V. Geetha

Abstract


Ontology is an explicit specification of a conceptualization. That is, ontology is a description of the concepts and relationships that can exist for an agent or a community of agents.Ontology construction is a challenging task and in this paper a new technique is employed for the semi-automatic construction of ontology. It involves two modules. They are ontological word selection and semantic relationship extraction. Ontological nodes and semantically related words are selected from tamil text corpus. The input to the system is the tamil text documents. Each and every tamil text document is word segmented and then morphologically analyzed to find out the parts of speech. This is because, ontological words are supposed to be nouns. The confinement of the noun list is performed using TF-IDF technique. Semantically related words are identified based on the notion of serial clustering of words in text and by exploring the value of such clustering as an indicator of a word’s bearing content. This approach is flexible in the sense that is it is sensitive to context. A term is assessed as content bearing within one collection, but not another. In this way, a domain ontology is constructed semi-automatically for tamil text documents.


Keywords


Ontology, Semi-automatic Ontology, Semantic Relationship Extraction, Content Bearing Words, TF-IDF, Morphological analysis and Clustering.

Full Text:

PDF

References


Asanee Kawtrakul , Mukda Suktarachan, Aurawan Imsombut ,“Automatic Thai Ontology Construction and Maintenance System”,Workshop on papillon, 2004.

Auxilio Medina, Alberto Chavez-Aragon, “Construction, Implementation and Maintenance of Ontologies of Records”, Proceedings of the Fourth Latin American Web Congress (LA-WEB'06), 2006.

Bookstein. A, Klein S.T, and Raita.T, “Clumping properties of content bearing words”, Journal of the American society, 1998.

Bookstein. A, Klein S.T, and Raita.T, “Detecting content bearing words by serial clustering”, proc. 18th ACM- SIGIR conf., seattle (1995)319-327.

Ferg luo and Latifur khan, “ Ontology construction for information selection”, Proceedings of the fourteenth IEEE international conference on tools with Artificial intelligence(ICTAI )”, 2002.

Mu-hee song, Soo yeon lim, Ki-jun son and sang joo lee , “ Domain ontology construction based on semantic relation information of terminology”, The 30th annual conference of the IEEE industrial electronics society, November 2-6 2004, Busan, Korea.

Rajan.K., Dr.Ganesan.M., Ramalingam.V., “Tamil text analyzer”, Tamil internet, 2003.

Saravanan.K., Ranjani parthasarathi, Geetha.T.V., “Syantactic Parser for tamil”, Tamil internet, 2003.

Xin Peng, Wenyun Zhao, “An Incremental and FCA-based Ontology Construction Method for Semantics-based Component Retrieval”,Seventh International Conference on Quality Software (QSIC 2007)

Yasin Uzun, “Keyword Extraction Using Naive Bayes”, 2005.


Refbacks

  • There are currently no refbacks.