Open Access Open Access  Restricted Access Subscription or Fee Access

Development of Bilingual Application Using Machine Transliteration: A Practical Case Study

M.L. Dhore, Dr. S. K. Dixit

Abstract


This paper focuses on the use of transliteration approach for customizable localization support in small scale systems. Marathi, Devanagari based Indian language is considered for the customizable localization support with the use of machine transliteration and translation memory using phonemic based pure consonant approach. Marathi is one of the widely spoken languages in India especially in the state of Maharashtra. This work addresses the support of local language access to the user to input and retrieve the data in Marathi language on the fly, whereas the data is stored in database in default language, English. User can interact with the system in Marathi as well as in English. The designed middleware plays the role of transliteration, when user uses the local language Marathi. Middleware reads data from the database and transliterate it into Devanagari script Marathi and display it to user. The transliteration from English to Devanagari and vice versa is carried out with the help of translation memory.  This method solves the problem of extra space on the web server as well as complexity in web pages. This approach provides safe and cost effective method of localizing existing and new web pages stored on web server.

 


Keywords


Localization, Machine Transliteration, Phonemics, Translation Memory

Full Text:

PDF

References


M Sasikumar , “Bridging the Digital Language Divide ” ,CDAC Mumbai, 2008

S. P.Mudur, N. Nayak, S. Shanbhag, and R. K. Joshi, Graphics and CAD Division, National Centre for Software Technology, Juhu, Mumbai “An architecture for the shaping of indic texts,” Computers and Graphics, vol. 23, 1999, pp. 7–24

Frost & Sullivan, “Local Language Information Technology Market in India”, TDIL, Department of IT, Ministry of Communications and Information Technology, India, 2003.

Donald A DePalma, Benjamin B. Sargent and Renato S. Beninatto, :”Can‟t Read, Won‟t Buy : Why Language Matters on Global Websites.

An International Survey of Global Consumer Buying Preferences.” , 2006

“Top Ten Languages Used in the Web” Source: www.internetworldstats.com/stats7.htm, June -2010

B. K. Murthy and W. R. Deshpande, “Country Report, Language Technology in India: past, present and future” by DoE, Government of India, 1998.

BIS. Indian standard code for information interchange (ISCII), 1991.

C-DAC. Standards for Indian languages http://www.cdac.in/GIST

“Unicode 6.0 “ , http://www.unicode.org., October 2010

R.K. Joshi, K. Shroff , S. P. Mudur, “A Phonemic Code Based Scheme for Effective Processing of Indian Languages” 23rd Internationalization and Unicode Conference, Prague, Czech Republic, 1 March 2003.

Karimi, S., Scholer, F., and Turpin, A. “Machine transliteration survey.” ACM Computing Surveys, Vol. 43, No. 3, Article 17, April 2011.

Grishman Ralph, “The New York University System MUC-6 or Where‟s the syntax?”, In Proceedings of the Sixth Message Understanding Conference, 1995

M.G.A. Malik. “Punjabi Machine Transliteration”, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL. 2006, pp 1137–1144.

Xue Jiang , Le Sun and Dakun Huazhong “ A Syllable-based Name Transliteration System”, Institute of Software,Beijing China, ,Le Sun and Dakun Huazhong University of Science and Technology. Wuhan China, NEWS-2009

Chai Wutiwiwatchai and Ausdang Thangthai, “Syllable-based Thai-English Machine Transliteration” National Electronics and Computer Technology Center Pathumthani, Thailand, NEW 2010

Kommaluri VIJAYANAND and R. P. Seenivasan, “Named Entity Recognition and Transliteration for Telugu Language”, Department of Computer Science School of Engineering and Technology Pondicherry University Puducherry – India. Language in India www.languageinindia.com, Special Volume: Problems of Parsing in Indian Languages , 2011

K Knight, J. Graehl, “Machine Transliteration”, Comput. Linguist. 1997, pp.128–135.

S. Y. Jung,, S. Hong, S., E. Paek,. “English to Korean transliteration model of extended Markov window”, In Proceedings of the 18th Conference on Computational Linguistics, 2000, pp. 383–389.

M. Ganapathiraju, M. Balakrishnan, N. Balakrishnan, R. Reddy.OM: One Tool for Many (Indian) Languages. ICUDL: International Conference on Universal Digital Library, Hangzhou, 2005.

R Sproat. “Brahmi scripts” In Constraints on Spelling Changes: Fifth International Workshop on Writing Systems, Nijmegen, The Netherlands, 2002.

R. Sproat. “A formal computational analysis of indic scripts”, In International Symposium on Indic Scripts: Past and Future, Tokyo, Dec. 2003.

R. Sproat. “A computational theory of writing systems”, In Constraints on Spelling Changes: Fifth International Workshop on Writing Systems, Nijmegen, The Netherlands, 2004.

M. Kopytonenko, K. Lyytinen, and T. Krkkinen, “Comparison of phonological representations for the grapheme-to-phoneme mapping”, In Constraints on Spelling Changes: Fifth International Workshop on Writing Systems, Nijmegen, The Netherlands, 2006.

Baraha - Free* Indian Language Software http://www.baraha.com/

J. Zobel, P. Dart, “Phonetic string matching: Lessons from information retrieval”, In Proceedings of the Eighteenth ACM SIGIR International Conference on Research and Development in Information Retrieval, Zurich, Switzerland, August 1996, pp. 166-173.

Levenshtein, V.I. 1966, Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10: 707–710.

P Srikanth and Kavi Narayana Murthy,” Named Entity Recognition for Telugu”, Proceedings of the IJCNLP Workshop on NER for South and South East Asian Languages,2008, pp. 41–50

Sujan Kumar Saha, Partha Sarathi Ghosh, Sudeshna Sarkar, and Pabitra Mitra, “Named Entity Recognition in Hindi using Maximum Entropy and Transliteration”, 2008


Refbacks

  • There are currently no refbacks.