Open Access Open Access  Restricted Access Subscription or Fee Access

Knowledge of Word Alignment Position in English-Hindi Sentences

Shweta Dubey, Tarun Dhar Diwan

Abstract


This paper describes methodology to present knowledge of word alignment position related to parallel English-Hindi sentences. This methodology is base to develop the parallel English-Hindi word dictionary after syntactically and semantically analysis of the English-Hindi source text. Methodology of proposed system is using for aligning the English and Hindi sentences; also the methodology can be used for other languages. Large parallel corpus of English-Hindi pair language is not frequently available. Development is based on two strategies to solve this problem. First is normalization of tagged English sentences and Hindi sentences. Second is mapping English-Hindi sentence using parallel English-Hindi word dictionary. Hence proposed system is desirable to encourage English and Hindi parallel sentences.


Keywords


Multi Word Expressions, Mapping Score, Tagging, Local Word Grouping, Word Mapping, Normalization, Part of Speech Tagging (Post), Word Dictionary,

Full Text:

PDF

References


Niraj Aswani, “Aligning words in English-Hindi parallel corpora”, Proceedings of the ACL Workshop on Building and Using Parallel Texts, pages 115–118.

Tong Xiao, Huizhen Wang, “The NiuT rans Machine Translation System for NTCIR-9 Patent”, Proceedings of NTCIR-9, December 6-9, 2011, Tokyo, Japan, Pages 593-599.

Niraj Aswani, “A hybrid approach to align sentences and words in English-Hindi parallel corpora”, Proceedings of the ACL Workshop on Building and Using Parallel Texts, pages 57–64.

Antony P J, Nandini. J. Warrier, Dr. Soman K P,“Penn Treebank-Based Syntactic Parsers for South Dravidian Languages using a Machine Learning Approach”, International Journal of Computer Applications (0975 –8887), Volume 7– No.8, October 2010, pages 14-21.

Yoshinobu Kano, Jun’ichi Tsujii, “Sharable Type System Design for Tool Inter-Operability and Combinatorial Comparison”, The First International Conference on Global Interoperability for Language Resources, pages 121-129.

Richard Beaufort, Sophie Roekhaut, Louise-Amélie, Cougnon Cédrick Fairon, “A hybrid rule/model-based finite-state framework for normalizing SMS messages”, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 770– 779.

Hassan Al-Haj, Shuly Wintner, “Identifying Multi-word Expressions by Leveraging Morphological and Syntactic Idiosyncrasy”, Proceedings of the 23rd International conference on Computational Linguistics (Coling 2010), pages 10–18.

Yulia Tsvetkov, Shuly -Wintner, “Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources” Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 836–845.


Refbacks

  • There are currently no refbacks.