Selfie sign language recognition with convolutional neural networks
Автор: P.V.V. Kishore, G. Anantha Rao, E. Kiran Kumar, M. Teja Kiran Kumar, D. Anil Kumar
Журнал: International Journal of Intelligent Systems and Applications @ijisa
Статья в выпуске: 10 vol.10, 2018 года.
Бесплатный доступ
Extraction of complex head and hand movements along with their constantly changing shapes for recognition of sign language is considered a difficult problem in computer vision. This paper proposes the recognition of Indian sign language gestures using a powerful artificial intelligence tool, convolutional neural networks (CNN). Selfie mode continuous sign language video is the capture method used in this work, where a hearing-impaired person can operate the Sign language recognition (SLR) mobile application independently. Due to non-availability of datasets on mobile selfie sign language, we initiated to create the dataset with five different subjects performing 200 signs in 5 different viewing angles under various background environments. Each sign occupied for 60 frames or images in a video. CNN training is performed with 3 different sample sizes, each consisting of multiple sets of subjects and viewing angles. The remaining 2 samples are used for testing the trained CNN. Different CNN architectures were designed and tested with our selfie sign language data to obtain better accuracy in recognition. We achieved 92.88 % recognition rate compared to other classifier models reported on the same dataset.
Selfie sign language, Convolutional Neural Networks (CNN), Stochastic pooling, Sign language recognition (SLR), Deep learning
Короткий адрес: https://sciup.org/15016536
IDR: 15016536 | DOI: 10.5815/ijisa.2018.10.07
Список литературы Selfie sign language recognition with convolutional neural networks
- Parton, Becky Sue. "Sign language recognition and translation: A multidisciplined approach from the field of artificial intelligence." Journal of deaf studies and deaf education, winter:11, no.1, 2006, pp:94-101. doi:10.1093/deafed/enj003.
- Mitra, Sushmita, and Tinku Acharya. "Gesture recognition: A survey." IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 37, no.3, 2007, pp: 311-324. doi: 10.1109/TSMCC.2007.893280.
- Raffa, Giuseppe, Lama Nachman, and Jinwon Lee. "Efficient gesture processing." U.S. Patent 9,535,506, issued January 3, 2017.
- Liu, Zhengzhe, Fuyang Huang, Gladys Wai Lan Tang, Felix Yim Binh Sze, Jing Qin, et al. "Real-time Sign Language Recognition with Guided Deep Convolutional Neural Networks." In Proceedings of the 2016 Symposium on Spatial User Interaction, pp. 187-187. ACM, 2016. doi:10.1145/2983310.2989187.
- Chen, Feng-Sheng, Chih-Ming Fu, and Chung-Lin Huang. "Hand gesture recognition using a real-time tracking method and hidden Markov models." Image and vision computing 21, no.8, 2003,pp: 745-758. doi: 10.1016/S0262-8856(03)00070-2.
- Cavender, Anna, Rahul Vanam, Dane K. Barney, Richard E. Ladner, and Eve A. Riskin. "MobileASL: Intelligibility of sign language video over mobile phones." Disability and Rehabilitation: Assistive Technology 3, no. 1-2 , 2008 pp: 93-105. doi: 10.1080/17483100701343475.
- Starner, Thad, Joshua Weaver, and Alex Pentland. "Real-time american sign language recognition using desk and wearable computer based video." IEEE Transactions on Pattern Analysis and Machine Intelligence 20, no. 12, 1998, pp:1371-1375. doi: 10.1109/34.735811.
- Kushwah, Mukul Singh, Manish Sharma, Kunal Jain, and Anish Chopra. "Sign Language Interpretation Using Pseudo Glove." In Proceeding of International Conference on Intelligent Communication, Control and Devices, pp. 9-18. Springer Singapore, 2017.
- Kumar, Pradeep, Himaanshu Gauba, Partha Pratim Roy, and Debi Prosad Dogra. "Coupled HMM-based Multi-Sensor Data Fusion for Sign Language Recognition." Pattern Recognition Letters, Vol. 86, pp.1-8, 2017. doi: 10.1016/j.patrec.2016.12.004
- Bhuyan, M. K., D. Ghoah, and P. K. Bora. "A framework for hand gesture recognition with applications to sign language." In India Conference, 2006 Annual IEEE, pp. 1-6. IEEE, 2006. doi: 10.1109/INDCON.2006.302823.
- Yu Zhou and Xilin Chen, “Adaptive sign language recognition with Exemplar extraction and MAP/IVFS”, IEEE signal processing letters, Vol 17, No-3, March 2010, pp297-300. doi: 10.1109/LSP.2009.2038251.
- Och, J., Ney, H., “A systematic comparison of various alignment models”. Computational Linguistics 29 (1), pp.19–51, 2003. doi: 10.1162/089120103321337421
- Koehn, Philipp. "Pharaoh: a beam search decoder for phrase-based statistical machine translation models." In Conference of the Association for Machine Translation in the Americas, pp. 115-124. Springer, Berlin, Heidelberg, 2004.
- Kishore PVV, Rajesh Kumar P. “A video based Indian Sign Language Recognition System (INSLR) using wavelet transform and fuzzy logic”. International Journal of Engineering and Technology. 4(5), pp.537-42, 2012. doi: 10.7763/IJET.2012.V4.427.
- Inthiyaz Syed, B.T.P.Madhav, and P.V.V.Kishore. "Flower segmentation with level sets evolution controlled by colour, texture and shape features." Cogent Engineering 4, no.1(2017):1323572.doi:10.1080/23311916.2017.1323572.
- Shimada, Mitsuaki, Satoshi Iwasaki, and Toshiyuki Asakura. "Finger spelling recognition using neural network with pattern recognition model." In SICE 2003 Annual Conference, vol. 3, pp. 2458-2463. IEEE, 2003.
- Rätsch, Gunnar, Takashi Onoda, and K-R. Müller. "Soft margins for AdaBoost." Machine learning, vol.42, no.3, pp.287-320, 2001. doi: 10.1023/A:1007618119488.
- Z. Dong, X. Tian, “Multi-level photo quality assessment with multi-view features”, Neurocomputing. Vol.168, pp.308-319, 2015. doi: 10.1016/j.neucom.2015.05.095.
- Z. Dong, X. Shen, H. Li, X. Tian, “Photo quality assessment with DCNN that understands image well”, In proceedings of the International Conference on MultiMedia Modeling (MMM), 2015, pp.524-535.
- X. Lu, Z. Lin, H. Jin, J. Yang, J. Wang, “Rating pictorial aesthetics using deep learning”, In proceedings of the ACM Conference on Multimedia, 2014, 457-466.
- A. Krizhevsky, I.Sutskever, G.E. Hinton, “ImageNet classification with deep convolution neural networks”, In proceedings of the Annual Conference on Neural Information Processing System (NIPS), 2012, pp.1097-1105.
- Y. Sun, X. Wang, X. Tang, “Deep learning face representation from predicting 10,000 classes”, In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1891-1898.
- K. Jarrett, K. Kavukcuoglu, M. Ranzato, Y. LeCun, “What is the best multi-stage architecture for object recognition”, In proceedings of the IEEE International Conference on Computer Vision (ICCV), 2009, pp. 2146-2153. doi: 10.1109/ICCV.2009.5459469.
- H. Lee, R. Grosse, R. Ranganath, A.Y.Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations”, In proceedings of the International Conference on Machine Learning (ICML), 2009, pp. 609-616. doi: 10.1145/1553374.1553453.
- Y. Bengio, “Learning deep architectures for AI, Foundations and trends in Machine Learning”, Vol. 2, No. 1, pp. 1-127, 2009. doi: 10.1561/2200000006.
- Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition”, In proceedings of the IEEE , Vol. 86, No. 11, pp. 2278-2324, 1998. doi: 10.1109/5.726791.
- H. Lee, A. Battle, R. Raina and A. Y. Ng, “Efficient sparse coding algorithms”, In Advances in neural information processing systems, pp. 801-808, 2006.
- R. Salakhutdinov and G. E. Hinton, “Deep Boltzmann Machines”, In proceedings of the International Conference on Artificial Intelligence and Statistics, Clearwater Beach, Florida USA, pp. 448-455, 2009.
- Y. LeCun, Y. Bengio and G. Hinton, “Deep learning”, Nature, vol. 521, No. 7553, pp. 436-444, 2015. doi: 10.1038/nature14539.
- Karpathy, Andrej, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. "Large-scale video classification with convolutional neural networks." In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725-1732. 2014. doi: 10.1109/CVPR.2014.223.
- Simonyan, Karen, and Andrew Zisserman. "Two-stream convolutional networks for action recognition in videos." In Advances in neural information processing systems, pp. 568-576. 2014.
- H. Lee, R. Grosse, R. Ranganath, A.Y.Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations”, In proceedings of the International Conference on Machine Learning (ICML), 2009, pp. 609-616. doi: 10.1145/1553374.1553453.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, “ImageNet: a large-scale hierarchical image dataset”, In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR) , 2009, pp. 248-255. doi: 10.1109/CVPR.2009.5206848.
- A. Krizhevsky, I. Sutskever and G. E. Hinton, “Imagenet classification with deep convolutional neural networks”, In Advances in Neural Information Processing Systems(NIPS), Lake Tahoe, Nevada, USA pp. 1097-1105, 2012.
- Rao, G. Anantha, and P. V. V. Kishore. "Sign language recognition system simulated for video captured with smart phone front camera." International Journal of Electrical and Computer Engineering 6.5 (2016): 2176. doi: 10.11591/ijece.v6i5.11384
- Rao, G. Anantha, P. V. V. Kishore, D. Anil Kumar, and A. S. C. S. Sastry. "Neural network classifier for continuous sign language recognition with selfie video." Far East Journal of Electronics and Communications 17.1: 49,2017.
- Rao, G. Anantha, and P. V. V. Kishore. "Selfie video based continuous Indian sign language recognition system." Ain Shams Engineering Journal (2017). doi: 10.1016/j.asej.2016.10.013
- K. V. V. Kumar, P. V. V. Kishore, and D. Anil Kumar, “Indian Classical Dance Classification with Adaboost Multiclass Classifier on Multifeature Fusion,” Mathematical Problems in Engineering, vol. 2017, Article ID 6204742, 18 pages, 2017. doi: 10.1155/2017/6204742