An Approach for Segmentation of Handwritten Documents Using ICA Algorithm

T. Dhanalakshmi; R. Malar

An Approach for Segmentation of Handwritten Documents Using ICA Algorithm

T. Dhanalakshmi, R. Malar

Abstract

There are numerous governmental, social, business, and educational associations that manage large number of handwritten documents. As time proceeded, the content in the handwritten document gets blurred and that cannot be readable. In order to make that text visible and accessible, Document Image Analysis (DIA) is used. The previous work deals with the pair wise similarities between word-separators as well as unary properties in the document. But there is a failure case, when both the inter and intra word gap is same. Then structured SVM fails to work. The proposed system uses Independent Component Analysis (ICA) algorithm, which transforms multivariate data so as to make its essential structure more visible or more accessible, thus facilitating the analysis of the data. Here the Tamil handwritten document is taken into consideration since the local language is Tamil.

Full Text:

PDF

References

C.Weliwitage, A.L.Harvey, A.B.Jennings, “Handwritten Document Offline Text Line Segmentation”,Proceeding of the Digital Imaging Computing: Techniques and Applications (DICTA), IEEE, 2005

S. Basu, C. Chaudhuri, M. Kundu, M. Nasipuri, D.K. Basu, “Text line extraction from multi-skewed handwritten documents”, Elsevier, 2006.

P. Common “Independent component analysis - a new concept”, Signal Processing, 36, 1994, pp: 287-314

C. Jutten, J. Herault, “Blind separation of sources”, Signal Processing, Part I: An adaptive algorithm based on neuromimetic architecture. 24, 1991, pp: 1-10.G. Seni and E. Cohen, “External word segmentation of off-line hand-writtentextlines,”Patt. Recognit., vol.27, no.1, pp.41–52, Jan.1994.

V. Papavassiliou, T. Stafylakis, V. Katsouros, and G. Carayannis,“Handwritten document image segmentation into text lines andwords,”Patt. Recognit., vol. 43, no. 1, pp. 369–377, Jan. 2010.

T. Stafylakis, V. Papavassiliou, V. Katsouros, and G. Carayannis,“Robust text-line and word segmentation for handwritten documentsimages,” in proc. IEEE Int. Conf. Acoustics, Speech and SignalProcessing (ICASSP), 2008, pp. 3393–3396.

T.VargaandH.Bunke,“Treestructureforwordextractionfromhand-written text lines,” inproc. Int. Conf. Document Analysis and Recog-nition (ICDAR), 2005, pp. 352–356.

S.H.Kim, S. Jeong, G.S. Lee, and C.Y.Suen,“Wordsegmentationinhandwritten Korean text lines based on gap clustering techniques,” in Proc. Int. Conf. Document Analysis and Recognition (ICDAR), 2001,pp. 189–193.

G. Louloudis, B. Gatos, I. Pratikakis, and C. Halatsis, “Text line andword segmentation of handwritten documents,”Patt. Recognit, vol.42, no. 12, pp. 3169–3183, Dec. 2009.

R. Manmatha and J. L. Rothfeder, “A scale space approach for automatically segmenting words from historical handwritten documents,”IEEE Trans. Patt. Anal. Mach. Intell., vol. 27, no. 8, pp. 1212–1225, 2005.

G. Kim, V. Govindaraju, and S. Srihari, “A segmentation and recognition strategy for handwritten phrases,” inProc. Int. Conf. Pattern Recognition, 1996, pp. 510–514.

L. O’Gorman, “The document spectrum for page layout analysis,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 15, no. 11, pp. 1162–1173, Nov. 1993

F. Yin and C.-L. Liu, “Handwritten Chinese text line segmentation by clustering with distance metric learning,” Patt. Recognit., vol. 42, no. 12, pp. 3146–3157, Dec. 2009.

H. I. Koo and N. I. Cho, “Text-line extraction in handwritten Chinese Documents based on an energy minimization framework,” IEEE Trans.Image Process., vol. 21, no. 3, pp. 1169–75, Mar. 2012.

J.W.Ryu, H.I.Koo, and N.Cho, “Language independent text-line extraction algorithm for handwritten documents,” IEEE Signal Process. Lett., vol. 21, no. 9, pp. 1115–1119, Sep. 2014.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me