Segmentation of Printed Meitei/Meetei Script Documents

Y. Loijing Khomba Khuman; Dr. H. Mamata Devi; Ksh. Nareshkumar Singh; Dr. S. Poireiton Meitei; Dr. N. Ajith Singh

Segmentation of Printed Meitei/Meetei Script Documents

Y. Loijing Khomba Khuman, Dr. H. Mamata Devi, Ksh. Nareshkumar Singh, Dr. S. Poireiton Meitei, Dr. N. Ajith Singh

Abstract

There are three main Process in Optical Character Recognition (OCR) System – Pre Processing, Segmentation and Recognition. Segmentation process of characters is one of the most crucial step in the development of OCR system of any language. Perfect segmentation of individual characters will determine the accuracy of the OCR system. It is used to segment the lines, words and individual characters from the document image. Meitei/Meetei script is not much popular script in India, but this language is schedule Indian language of Tibeto-Burman origin, which is also a very highly agglutinative language. Characters Segmentation of the Meitei/Meetei script is a difficult task because of the overlapping adjacent characters. In this paper we proposed a methodology, individual text lines and words are segmented by using Projection Profile technique. And for the individual characters we proposed Connected Component Analysis method. Proposed method was tested and segmentation accuracy rate of 95.6% is achieved.

Keywords

Characters Segmentation, Connected Component Analysis, Meitei/Meetei Script, OCR, Projection Profile.

Full Text:

PDF

References

Utpal Garain and B. B. Chaudhuri, “Segmentation of Touching Symbols for OCR of Printed Mathematical Expressions: An Approach based on Multifactorial Analysis”, Proceedings of the 2005 Eight International Conference on Document Analysis and Recognition (ICDAR-05), IEEE, 2005.

K. Wong, R. Casey and F. Wahl, (1982) “Document Analysis System”, IBM j. Res. Dev., 26(6), pp. 647-656.

G. Nagy, S. Seth, and M. Viswanathan, (1992) “A prototype document image analysis system for technical journals”, Computer, vol. 25, pp. 10-22.

Vijay Kumar, Pankaj K.Senegar, (2010) “Segmentation of Printed Text in Devnagari Script and Gurmukhi Script”, IJCA: International Journal of Computer Applications, Vol.3,pp. 24-29.

U. Pal and Sagarika Datta, (2003) “Segmentation of Bangla Unconstrained Handwritten Text”, Proc. 7th Int. Conf. on Document analysis and Recognition, pp. 1128-113.

Vikas J Dongre, Vijay H Mankar, (July 2011) “Segmentation of Devnagari Documents”, Communications in Computer and Information Science, 2011, Volume 198, Part 1, Springer proceedings, 1st International conference, ACITY Chennai, India, pp 211-218.

Vikas J Dongre, Vijay H Mankar, (2010) “A Review of Research on Devnagari Character Recognition”, International Journal of Computer Applications (0975 – 8887) Volume 12– No.2, pp. 8-15.

B. Amara Najouna and E. Noureddine. “A Robust Approach for Arabic Printed Character Segmentation”. In proceeding of ICDAR 2003, pages 865-868, 2003.

U. Pal and Sagarika Dutta. “Segmentation of Bangla Unconstraint Handwritten Text”. In proceeding of ICDAR 2003, pages 1128-1132, 2003.

T.V Ashwin and P.S. Sastry. “Font and size independent OCR for printed Kannada documents using SVM classifier”. Sadhana, 27:35-57, 2002.

Punam Thakare, “A Study of Image Segmentation and Edge Detection Techniques”, International Journal on Computer Science and Engineering (IJCSE) ISSN: 0975-3397 Vol. 3 No. 2 Feb 2011

G. Magy, “Twenty years of Document Analysis in PAMI”, IEEE Trans. In PAMI, Vol.22, pp. 38-61, 2000.

Rajiv Kumar and Amardeep Singh, 2010.” Detection and Segmentation of Lines and Words in Gurmukhi Handwritten Text”. In the proceedings of IEEE 2nd International Advance Computing Conference,2010, pp 353-356.

M. Arivazhagan, H. Srinivasan, S. N. Srihari.2007. A “Statistical Approach to Handwritten Line Segmentation”. In Proceedings of SPIE Document Recognition and Retrieval XIV, San Jose, CA, February 2007.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me