Open Access Open Access  Restricted Access Subscription or Fee Access

Hybrid Approach for Handling OOV Words

Jasleen Kaur

Abstract


Language transliteration is one of the important area in natural language processing. Accurate transliteration of named entities plays an important role in the performance of machine translation (MT),cross-language information retrieval (CLIR) and question answering (QA), and bilingual lexicon construction. Handling out of vocabulary words is crucial in CLIR and MT. It is important for Machine Translation, especially when the languages do not use the same scripts. This paper addresses the issue of transliteration from Roman Script to Gurmukhi Script. Statistical Approach guided by rules is used for transliteration from English to Punjabi using MOSES, a statistical machine translation tool. The overall TAR after the application of observed rules comes out to be 74.18%.

Keywords


Machine Transliteration, Statistical Approach, MOSES and N-Gram

Full Text:

PDF

References


Jong-Hoon Oh Key-Sun Choi”Machine Learning Based English-to-Korean Transliteration using Grapheme and Phoneme Information” IEICE TRANS.INF.& SYST., VOL.E88-D, NO.7,july2005,pp 1737-1748.

H.Li, M.Zhang, and J.shu “A Joint Source-channel model for Machine Transliteration”, Proc.ACL2004. pp 160-167.

Gurmukhi script accessed from “http://www.omniglot.com/writing/gurmuki.htm".

English Language Accessed from “http://en.wikipedia.org/wiki/English_language.

Punjabi Language Accessed from “http://en.wikipedia.org/wiki/Punjabi_language” .

Lehal Gurpreet singh Josan Gurpreet Singh “A Punjabi to Hindi Translation System”, coling 2008: Companion volume –Posters and Demonstrations, pages 157-160 Manchester, august 2008.

Monojit Choudhury,Anupam Basu” A Rule Based Schwa Deletion Algorithm for Hindi” Proceedings of the International Conference On Knowledge-Based Computer Systems. 2002, pp. 343- 353.

Nasreen AbdulJaleel Leah S.Larkey “English to Arabic transliteration for Cross Language Information Retrieval: A Statistical Approach”in Proceedings of the 12th international conference on information and knowledge management, 2003, pp-139-146.

Monojit Choudhury,Anupam Basu and Sudeshna Sarkar” A Diachronic Approach for Schwa Deletion in Indo Aryan Languages” Association for Computations Linguistics ACL Special Interest Group on Computational Phonology (SIGPHON) Proceedings of the Workshop of the Barcelona, July 2004, pp20-26.

Article SRILM accessed from http://www.speech.sri.com/projects/srilm/.

Tejinder Singh Saini Gurpreet Singh Lehal , “Shahmukhi to Gurmukhi Transliteration System: A Corpus based Approach” , Advances in Natural Language Processing and Applications Research in Computing Science 33, 2008, pp. 151-162.

GIZA++ accessed from “http://www.fjoch.com/GIZA++.html”.

Moses accessed from” http://www.statmt.org/moses/”.

Jong-Hoon Oh Key-Sun Choi Hitoshi Isahara “A Comparison of Different Machine Transliteration Models” Journal of Artificial Intelligence Research 27 (2006) pp 119-151.

Srinivasan C Janarthanam Sethuramalingam S Udhyakumar Nallasamy ” Named Entity Transliteration for Cross-Language Information Retrieval using Compressed Word Format Mapping algorithm” In Proceedings of 2nd International ACM Workshop on Improving Non-English Web Searching (iNEWS08), CIKM-2008.

M.G Abbas Malik,”Punjabi Machine Transliteration”, Proceedings of the 21st International Conference on Computational Linguistics and 44th annual meeting of the ACL 2006,pages 1137-1144.

Article Using GIZA++ accessed from “http://wiki.apertium.org/wiki/Using_GIZA%2B%2B”.

Knight, Graehl English-Japanese Transliteration system‖ Computational Linguistics, Volume 24,Number 4,pages:599-612,2005.

Sato “ Web-Based Transliteration of Person Names‖ IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology –Workshops.,pages:273-278,2009.

Yaser ,Knight―Machine Transliteration of names in Arabic text‖, Machine transliteration of names in Arabic text In Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages, Philadelphia,PA,pages:1-13,2002


Refbacks

  • There are currently no refbacks.