Static Sign Language Recognition Using Segmented Images and HOG on Cluttered Backgrounds

doi:10.1201/9781032636054-3

Chapter

Static Sign Language Recognition Using Segmented Images and HOG on Cluttered Backgrounds

ABSTRACT

Sign language (SL) is of great importance for hearing-impaired and deaf community as their primary communication means. Large variations in the available SLs around the world bring an inevitable necessity for automatic SL interpretation systems to attenuate the communication barrier between the deaf and general public. Despite the existence of numerous innovative studies in this domain, providing an efficient highly accurate system for real-world applications is still challenging especially in the presence of complex backgrounds, low inter-class and large intra-class variations, and changes in illumination conditions. To address these issues, a novel Convolutional Neural Network (CNN)-based static sign language recognition (SLR) system is proposed by gaining the maximum benefits from the segmented hand images and Histogram of Oriented Gradients (HOG) handcrafted features. To this end, a U-Net architecture is trained by a small-scale annotated SL dataset for hand segmentation, which is then successfully applied to the other non-annotated datasets to mitigate the detrimental effects of the complex backgrounds. The robustness of the system against environmental and user-dependent variations is further improved, taking advantage of HOG handcrafted features extracted from the segmented images in the form of 2D images. These generated images are fed into our proposed CNN model whose number of layers and filters, kernel sizes, activation functions, optimization method, learning rate, and regularization techniques are properly selected so that the performance accuracy is maximized. Extensive experiments conducted on three different American Sign Language (ASL) datasets with variations in background and lighting, i.e., MU HandImages ASL (Massey), NUSII, and Static Hand Gesture ASL, with an accuracy of 99.71%, 99.50%, and 100% demonstrate the robustness, superiority and high capabilities of our proposed system over the existing approaches.