Open Access Open Access  Restricted Access Subscription or Fee Access

Optical Character Recognition using Hybrid Classifiers

Dr.V. S. Giridhar Akula, D. Sreenivasa Rao, S. Sravanthi

Abstract


Optical character recognition (OCR) refers to a process whereby printed documents are transformed into ASCII files for the purpose of compact storage, editing, fast retrieval, and other file manipulations through the use of a computer. The principle motivation for the development of OCR Systems is the need to cope with the enormous flood of paper in the form of documents, bank cheques, commercial forms, government records, credit card imprints and mail sorting, generated by expanding technological society. A method has been developed for single font clear printed documents. This system is primarily designed for Telugu and used the Uniform Sampling Method as the basis for extraction of low-level, structural and stroke-type features and also used the nearest neighbor classifier for classification. The accuracy rate was 96%. The Objective of the current project is to improve the accuracy using different types of hybrid classifiers. This algorithm used segmentation process to isolate words. In this process the process of clipping has been applied to by deleting al zero rows and columns of the image matrix. K-means clusting algorithm is used to determine cluster k and centroids.

Keywords


Nearest Neighbor, K Means Algorithm, Centroids, Filters, Mmse, Skewing, Clipping, Training Patterns, Feature Extraction, Veronoi Diagram, Uniform Sampling.

Full Text:

PDF

References


Chaudhuri B. B., and Pal U., „An OCR System to Read Two Indian Languages Scripts: Bangla and Devanagari (Hindi)‟, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, (1997 b).

Chaudhuri B. B., and Pal U., „ A Complete Printed BANGLA OCR System‟, Pattern Recognition, Vol. 31, No. 5, (1998), pp. 521-549.

Sachwani, Praveen, „An OCR System for printed Dravidian Scripts Using Uniformly Sampled Feature Extraction Method‟, M.tech Project, Sri Sathya Sai Institute of Higher Learning, 2001.

„OCR of Printed Telugu Text with High Recognition Accuracies‟ C. Vasantha Lakshmi, Ritu Jain, and C. Patvardhan, Dayalbagh Educational Institute.

„OCR of Printed Telugu Text with High Recognition Accuracies‟

Chaudhuri B.B, Kumar O. A., and Ramana K.V., „Automatic Generation and Recognition of Telugu Script Characters‟, Jour. Instn. Electronics and Telecom. Engrs., Vol. 37, No. 5&6, (1991).

Cho-Hauk Teh, Roland T. Chin, „On Image Analysis by the Methods of Moments‟, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 10, No. 4, (July 1988).

Handwritten Character Recognition of popular south Indian scripts- Umapada Pal Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata, India.

Daniel S. Le, George R. Thoma and Harry Wechsler, „Automated Page Orientation and Skew Angle Detection For Binary Document Images‟, Pattern Recognition, Vol. 27, No. 10, (1994), pp. 1325-1344.

Gatos B., Papramarkos N., Chamzas C., „Skew Detection and Text line Position Determination in Digitized Documents‟, Pattern Recognition, Vol. 30, No. 9, (1997), pp. 1505-1519.

Glauberman M. H., „Character Recognition for business machines‟, Electronics, (February 1956), pp. 132-136.1

Gonzalez R.C. and Woods E.R., „Digital Image Processing‟, Addision-Wesley Publishing Company, (1988).

“Character recognition systems: a guide for students and practioners” -Mohamed Cheriet, Nawwaf Kharma, Cheng Lin Liu- 2

Subramani H , “ Telugu Character Reconition” International Conference on Pattern Recognition” Mar 2010.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.