Volume 18, No. 5, 2021

Optical Character Recognition Of Sanskrit Manuscripts Using Convolution Neural Networks


Bhavesh Kataria , Dr. Harikrishna B. Jethva

Abstract

Sanskrit is a 3,500-year-old Indian language and the liturgical language of Hinduism, Buddhism, and Jainism. Due to resemblance in the forms of distinct letters, script complication, non-forte in the representation, and a large number of symbols, the current study on Sanskrit Character Recognition from images of text documents is one of the most challenging. The Sanskrit language is written in the Devanagari script. There are a variety of approaches for recognizing characters in a scanned image [1,2,3,4,5]. This research provides an optical character recognition (OCR) system that enables to analyse the word recognition and translate various types of Sanskrit documents or images into text using deep learning architectures which include Recurrent Neural Networks (RNN), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (BLSTM) networks. Existing methods focus only upon the single touching characters. But we also focus on designing a robust architecture for overlapping lines, touching characters in the middle and upper zone and half character which would increase the accuracy of the present OCR system for recognition of Sanskrit literature. The results of the proposed system yield good recognition accuracy rates comparable to that of other character recognition systems.


Pages: 403-424

Keywords: OCR, LSTM, BLSTM, SVM, ANN, Hidden Markov Model

Full Text