ABSTRACT

It has always been difficult to communicate and interact with those people who are unable to speak or listen. Human translators somewhat try to bridge the communication gap between the deaf-mute community and those who do not know how to read and use the sign language. However, they are limited in number and are not available everywhere, all the time. So, to solve this problem, we can use various computer science technologies to detect and classify the sign language gestures. This chapter proposes a system to detect and recognize dynamic Nepali Sign Language (NSL) in real time using a deep learning technique with the help of computer vision. The proposed approach takes video input from the user, extracts its frames, and classifies the sequence of images using a combined model of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM). We have used InceptionV3, a transfer learning approach to extract spatial features and LSTM, a type of Recurrent Neural Network (RNN) to recognize the temporal features. The dataset is collected manually by capturing videos using a smartphone for five different classes.