Efficient Acoustic Front-End Processing for Tamil Speech Recognition using Modified GFCC Features

Автор: Vimala. C, V. Radha

Журнал: International Journal of Image, Graphics and Signal Processing(IJIGSP) @ijigsp

Статья в выпуске: 7 vol.8, 2016 года.

Бесплатный доступ

Giving suitable input and features are always essential to obtain better accuracy in Automatic Speech Recognition (ASR). The type of signal and feature vectors given as an input is highly essential as the pattern matching algorithms strongly depends on these two components. The primary goal of this paper is to propose a suitable Pre-processing and feature extraction techniques for speaker independent speech recognition for Tamil language. The five pass Pre-processing and three types of modified feature extraction techniques are introduced using Gammatone Filtering and Cochleagram Coefficients (GFCC) to achieve better recognition performance. The modified GFCC features using multi taper Yule walker AR power spectrum, combinational features using Formant Frequencies (FF), combined frequency warping and feature normalization techniques using Linear Predictive Coding (LPC) and Cepstral Mean Normalization (CMN) are investigated. The experimental results prove that the proposed techniques have produced high recognition accuracy when compared with the conventional GFCC feature extraction technique.

Еще

Gammatone Filter banks, Multi Taper window, Yule Walker AR, Formant Feature extraction, Cepstral Mean Normalization, Tamil Speech Recognition

Короткий адрес: https://sciup.org/15013993

IDR: 15013993

Список литературы Efficient Acoustic Front-End Processing for Tamil Speech Recognition using Modified GFCC Features

  • Urmila Shrawankar and Vilas Thakare, "Techniques for Feature Extraction in Speech Recognition System: A Comparative Study", International Journal of Computer Applications in Engineering, Technology and Sciences (IJCAETS), pp. 412-418, 2010.
  • Chadawan Ittichaichareon, Siwat Suksri and Thaweesak Yingthawornsuk, "Speech ecognition using MFCC" International Conference on Computer Graphics, Simulation and Modeling (ICGSM'2012,) July 28-29, Pattaya (Thailand), 2012.
  • J.R Deller, J.G. Proakis and F.H.L. Hansen, Discrete-Time Processing of Speech Signals, IEEE Press, chapter 12, 2000.
  • Y. Lee and K.W. Hwang, "Selecting Good speech Features for Recognition", ETRI Journal, Vol. 18(1), 1996.
  • Hui Yin, Volker Hohmann and Climent Nadeu, "Acoustic features for speech recognition based on Gammatone filterbank and instantaneous frequency", Speech Communication 53, pp. 707–715, 2011.
  • R. Schluter, L. Bezrukov, H. Wagner and H. Ney, "Gammatone features and feature combination for large vocabulary speech recognition" in ICASSP 2007, Vol.4, pp. 649–654, 2007.
  • Shaveta Sharma and Parminder Singh, "Speech Emotion Recognition using GFCC and BPNN", International Journal of Engineering Trends and Technology (IJETT), Vol.18 (6), pp, 321-322, ISSN: 2231-5381, 2007.
  • Shaveta Sharma and Parminder Singh, "Extracting GFCC Features for Emotion Recognition from Audio Speech Signals", International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE), Vol.5(1), pp.89-91, ISSN: 2277 -128X, 2015.
  • P.K. Sahu, Astik Biswas, Anirban Bhowmick and Mahesh Chandra, " Auditory ERB like admissible wavelet packet features for TIMIT phoneme recognition", Engineering Science and Technology, an International Journal, Vol. 17, pp. 145-151, 2014.
  • Shruti and Bharti Chhabra, "An Approach For Singer Identification Technique Using Artificial Neural Network", International Journal of Engineering Research and Modern Education (IJERME), Vol. 1(1), pp-16-23, ISSN (Online): 2455 - 4200, 2016.
  • Hari Krishna Maganti and Marco Matassoni, "Auditory processing-based features for improving speech recognition in adverse acoustic conditions", EURASIP Journal on Audio, Speech, and Music Processing, Vol. 1(21), pp- 1-9, 2014.
  • Shaik Shafee and B.Anuradha, " Speaker Identification and Spoken word Recognition In Noisy Background using Artificial Neural Networks, International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT) - 2016, IEEE.
  • C. Vimala and V. Radha, Suitable Feature Extraction and Speech Recognition Technique for Isolated Tamil Spoken Words, International Journal of Computer Science and Information Technologies (IJCSIT), Vol. 5 (1), pp. 378-383,ISSN:0975-9646, 2014.
  • C. Vimala and V. Radha, Isolated Speech Recognition System for Tamil Language using Statistical Pattern Matching and Machine Learning Techniques, Journal of Engineering Science and Technology (JESTEC), Vol. 10 (5), pp.617-632, 2015.
  • Abhishek Singh and Pravin Katwe, "Study of decaying dc removal techniques", Bachelor Thesis in Electrical Engineering, National Institute of Technology, Rourkela, 2010.
  • M. Benzeguiba, R.D Mori, O. Deroo, S. Dupon, T. Erbes, D. Jouvet, L. Fissore, P. Laface, A. Mertins, C. Ris, R. Rose, V. Tyagi, and C. Wellekens, "Automatic Speech Recognition and Speech Variability: a Review", Speech Communication 49, pp. 763–786, 2006.
  • Matthew Richardson, Mei-Yuh Hwang, Alex Acero, and Xuedong Huang, "Improvements on Speech Recognition for Fast Talkers", Proceedings of the Euro speech Conference, 1999.
  • Nitin, N. Lokhande, Navnath, S. Nehe and Pratap, S. Vikhe, "Voice Activity Detection Algorithm for Speech Recognition Applications", IJCA Proceedings on International Conference in Computational Intelligence (ICCIA), Vol. 6, pp. 5-7, 2011.
  • M. Hansson and G. Salomonsson, "A Multiple Window Method for Estimation of Peaked Spectra", IEEE Transaction on Signal Processing, Vol.45 (3), pp. 778-781, 1997.
  • V. Radha, C. Vimala and M. Krishnaveni, "Power Spectral Density Estimation using Yule Walker AR method for Tamil Speech Signal", International Conference on Information Systems for Indian Languages (ICISIL 2011), Springer, pp.284-288, ISBN:978-3-642-19402-3-1865-0929, 2011.
  • L. Lee and R.C Rose, "Speaker Normalization Using Efficient Frequency Warping Procedures", IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-96, Atlanda, GA, pp. 353-356, 1996.
  • Wakita. K, "Normalization of vowels by vocal-tract length and its application to vowel identification", IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 25, pp. 183–192,1997.
  • Hemant Misra, "Multi-stream Processing for Noise Robust Speech Recognition", Doctoral thesis, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland, March 2006.
  • http://www.ee.uwa.edu.au/~roberto/research/speech/local/entropic/HAPIBook/node85.html.
Еще
Статья научная