Open Access Open Access  Restricted Access Subscription or Fee Access

Development of English-Hindi Parallel Corpus using Sentence Alignments

Tarun Dhar Diwan, Shweta Dubey

Abstract


In this survey paper, we have taken problem of “development of Hindi-Punjabi parallel corpus using existing English to Hindi machine translation system and using sentence alignment”. The alignment based on the length based technique, location based technique and lexical techniques. We will use English-Hindi machine translation system (i.e. h2p.learnpunjabi.org). These tasks are need to Hindi-Punjabi parallel corpus. Sentence alignment is useful to developing English-Hindi parallel corpus and English-Hindi dictionary. The accuracy is basically depending upon the complexity of the corpus, more the complexity less the accuracy. Complexity means how to distribution of sentence in the target file. If any of these categories 1:1, 1:2, 2:1, 1:3, 3:1 sentences occur simultaneously in a paragraph. Our objective in this research paper is to develop English-Hindi parallel corpus using latest and existing techniques and method with a high accuracy and time efficiency.

Keywords


Parallel Corpus, English-Hindi, Sentence Alignment, Length Based, Location Based

Full Text:

PDF

References


Mehdi M. Kashani, Fred Popowich, & Fatiha Sadat. Automatic transliteration of proper nouns from Arabic to English. The Challenge of Arabic for NLP/MT. International conference at the British Computer Society, London, 23October 2006; pp.76- 83.

Sarvnaz Karimi, Andrew Turpin, Falk Scholer, Punkt.2006. English to Persian Transliteration. SPIRE 2006: 255-266.

Spector, A. Z. 1989. Achieving application requirements. In Distributed Systems, S.Mullender, Ed. Acm Press Frontier Series. ACM Press, New York, NY, 19-33.

Kiyotaka Uchimoto, Satoshi Sekine, and Hitoshi Isahara.1999. Japanese Dependency Structure Analysis Based on Maximum Entropy Models. In Proceedings of the EACL, pages 196–203.

Kiyotaka Uchimoto, Masaki Murata, Satoshi Sekine, and HitoshiIsahara. 2000. Dependency Model Using Posterior Context. In Proceedings of the IWPT, pages 321–322.


Refbacks

  • There are currently no refbacks.