Open Access Open Access  Restricted Access Subscription or Fee Access

Implementation for Illustrative Sentences for English Multiword Expressions

Tarun Dhar Diwan, Anand Parihar

Abstract


Recognizing multiword expressions is the fundamental task of the field. It could be potentially successfully approached in a supervised way, i.e. using a manually annotated training corpus to learn the characteristics (features) of multiword expressions as far as their structure and contextual environment is concerned. In succession, this knowledge would be used so as to locate multiword expressions that occur in another annotated text. The symbolic and statistical methods has been apparent in natural language processing (NLP) for some time. Multiword expressions are a key problem for the development of large-scale, linguistically sound natural language processing technology. We propose a method to search for illustrative sentences for English multiword expressions (MWEs) from a research paper database. We focus on syntactically flexible expressions such as “a lot of-work.” Traditionally, illustrative sentences that contain such expressions have been searched for by limiting the maximum number of words between the component words of the MWE. However, this method could not collect enough illustrative sentences in which clauses are inserted between component words of MWEs. We therefore devised a measure that calculates the distance between component words of an MWE in a parse tree, and use it for flexible expression search. We conducted experiments.

Keywords


Automated Testing, Illustrative Sentences, Component Words, Multiword Expressions, Contextual Environment.

Full Text:

PDF

References


Pardeep Kumar and Vishal Goyal, “Development of Hindi-Punjabi Parallel Corpus Using Existing Hindi-Punjabi Machine Translation System and Using Sentence Alignments”, International Journal of Computer Applications (0975 – 8887), Volume 5– No.9, August 2010, 15.

Srinivasan C Janarthanam, Sethuramalingam S and Udhyakumar Nallasamy, “Named Entity Transliteration for Cross-Language Information Retrieval using Compressed Word Format Mapping algorithm”, Proceedings of 2nd International ACM Workshop on Improving Non-English Web Searching (iNEWS08), CIKM-2008, Report No: IIIT/TR/2009/42.

Sachin Manchanda, Divanshu Gupta, Aram Bhusal, Afreen Ansari and Ratna Sanyal, “Language independent Lexicon Building Tool”.

Cong LIG, GETALP, GETA, “Extending an On-Line Parallel Corpus Management System to Handle Specific Types of Structured Documents”, The first International Workshop on Spoken Languages Technologies for Under-resourced languages(SLTU - 2008).

Gahgene Gweon, Carolyn Penstein Rosé, Joerg Wittwer, and Matthias Nueckles “Supporting Efficient and Reliable Content Analysis Using Automatic Text Processing Technology”.

T. Baldwin, C. Bannard, T. Tanaka, and D.Widdow. An empirical model of multiword expressions decomposability. ACL-2003 Workshop on Multiword Expressions. 2003.

B.V. Moiron and J. Tiedemann. Identifying idiomatic expressions using automatic word alignment. EACL 2006 Workshop on Multiword Expressions in a multilingual context. 2006.

T. Tomokiyo and M. Hurst. A language model approach to keyphrase extraction. ACL-2003 Workshop on Multiword Expressions. 2003.

S. Venkatapathy and A. Joshi. Relative Compositionality of Noun+Verb Multi-word Expressions in Hindi. ICON-2005.

Fazly and S. Stevenson. Automatically constructing a lexicon of verb phrase idiomatic combinations. EACL. 2006.

M. Lauer. Designing Statistical Language Learners: Experiments on Noun Compounds. PhD thesis, Macquarie University. 1995.

Mukerjee, A. Soni, and A. Raina. Detecting Complex Predicates in Hindi using POS Projection across Parallel corpora. Proceedings of the Workshop on Multiword Expressions at ACL-2006.

Plag. Word Formation in English. Cambridge University Press, 2003.

E. Keane. Echo Words in Tamil. PhD thesis, Meriton College, Oxford, 2001.

D. Narayan, D. Chakrabarti, P. Pandey, and P.Bhattacharyya. An experience in building the Indo WordNet - a WordNet for Hindi. Global WordNet Conference, 2002.


Refbacks

  • There are currently no refbacks.