Deep Learning based Speech and Gesture Recognition System for the Disabled

Sheena Christabel Pravin*, Saranya.J, M. Palanivelan, Priya L
House Speech and Gesture Recognition System; Deep Learning; Convolutional Neural Network; Speech and Hearing Impairment.
Speech and Gesture recognition systems constitute an ideal aid for the disabled with speech and hearing impairments. Approximately, there are 466 million people in the world with hearing impairment and around 16 million with speech impairment. They require an external aid to recognize their speech and gestures, to express their thoughts and ideas to the world. The proposed Speech and Gesture Recognition System (SGRS) takes forward to solve the communication barriers faced by the disabled subjects, by recognizing both the speech and gestures of the subjects with promising accuracy using the convolutional neural network. The proposed SGRS model is competent to convert the sign-language into pictures and speech to text as well with high accuracy. Thus, SGRS can be a suitable aid for the subjects with speech and hearing impairment. SGRS has been evaluated with standard evaluation scores such as validation accuracy, validation loss, recall, precision and F1-score and has been proved to be proficient.
[1] TaskiranM, KilliogluM and Kahraman N. A Real-Time System for Recognition of American Sign Language by using Deep Learning, 2018; 41st International Conference on Telecommunications and Signal Processing (TSP), Athens, 2018, pp. 1-5.
[2] Zaki,M M, Shaheen S.I. Sign language recognition using a combination of new vision based features; Pattern Recognition Letters, 2011;201132 : 572–577.
[3] Aryanie D, Heryadi Y. American sign language-based finger-spelling recognition using k-Nearest Neighbors classifier, in Proc. 3rd International Conference on Information and Communication Technology.
[4] Joshi A, Sierra H, Arzuaga, E. American sign language translationusing edge detection and cross correlation, in Proc. IEEE Colombian Conference on Communications and Computing (COLCOM), Cartagena,Colombia, 2017; 1–6.
[5] Das A, Gawde S, Suratwala K and Kalbande D. Sign Language Recognition Using Deep Learning on Custom Processed Static Gesture Images, 2018 International Conference on Smart City and Emerging Technology (ICSCET), Mumbai, 2018; 1-6.
[6] Barath K, OpenCV: Complete Beginners Guide To Master Basics Of Computer Vision With Codes, 2020; Available:
[7] Manan Parekh. A Brief Guide to Convolutional Neural Network(CNN), 2019; Available:
[8] Jiwon Jeong. The Most Intuitive and Easiest Guide for Convolutional Neural Network, 2019; Available: https:// 20is%20converting%20the%20data,called%20a%20fully%2Dconnected%20layer.
[9] Arunava. Convolutional Neural Network, 2019; Available:,into%20the%20fully%20connected%20layer.
[10] Kevin Vu. Activation Functions and Optimizers for Deep Learning Models, 2019; Available: ReLU%20is%20a% 20non%2Dlinear,the%20output%20would%20be%20zero.
[11] Nagesh Singh Chauhan. Optimization Algorithms in Neural Networks, 2020; Available: optimization-algorithms-neural-networks.html#:~:text=Optimizers% 20are%20algorithms%20or%20methods,problems%20by%20minimizing%20the%20function.
[12] Jason Brownlee. Softmax Activation Function with Python, 2020; Available: with%20is%20used%20as%20the%20activation%20function%20in,more%20than%20two%20class%20labels.
[13] Rohit Dwivedi. Everything You Should Know About Dropouts and BatchNormalization In CNN, 2020; Available: https:// -and-batchnormalization-in-cnn/
[14] Evgeny A. S, Denis M. Serge N.A. Comparison of Regularization Methods for ImageNet Classification with Deep Convolutional Neural Networks, 2014; 6:89-94.
[15] Anup Kumar, Karun T.M, Dominic M, Sign Language Recognition, in Recent Advances in Information Technology, 3rd International Conference, 2016.
[16] Vysocký A, Grushko S, Oščádal P, Kot T, Babjak J, Jánoš R, Sukop M, Bobovský Z. Analysis of Precision and Stability of Hand Tracking with Leap Motion Sensor. Sensors. 2020; 20(15):4088.
[17] Sheena Christabel Pravin, Palanivelan, M. Regularized Deep LSTM Autoencoder for Phonological Deviation Assessment. International Journal of Pattern Recognition and Artificial Intelligence, 2021; 35(4): 2152002.
[18] Sheena Christabel Pravin, Palanivelan, M. A Hybrid Deep Ensemble for Speech Disfluency Classification. Circuits, Systems, and Signal Processing, Springer, 2021; 40 (8): 3968-3995

Received : 02 September 2021
Accepted : 18 February 2022
Published : 27 February 2022
DOI: 10.30726/esij/v9.i1.2022.91002