Performance analysis of Gaussian mixture model speaker recognition systems with different speaker features

Akira Kurematsu; Mariko Nakano-Miyatake; Hector Perez-Meana; Eric Simancas Acevedo; Куремацу Акира; Накано-Миятаке Марико; Перес-Меана Гектор; Симанкас-Асеведо Эрик

Scientific articles \ Prolegomena. Fundamentals of knowledge and culture. Propaedeutics \ Computer science and technology. Computing. Data processing \ Application-oriented computer-based techniques

Performance analysis of Gaussian mixture model speaker recognition systems with different speaker features

Author: Akira Kurematsu , Mariko Nakano-Miyatake , Hector Perez-Meana , Eric Simancas Acevedo

Journal: Техническая акустика @ejta

Article in issue: т.5, 2005.

Free access

This paper analyzes the effect of the speaker feature vector characteristics, in the performance of speaker recognition systems (SRS) based on the Gaussian Mixture Model (GMM). To this end, the performance of the SRS is analized using speaker features derived from: a) linear predictive cepstral coefficients (LPCepstral) extracted from the whole speech frame, b) LPCepstral derived from the voiced parts of the speech frame, c) LPCepstral extracted from voiced segments of speech frame together with the pitch information, d) LPCepstral extracted from voiced segments of each frame normalized using a Cepstral Mean Normalization (CMN). Evaluation results, using phrases of 2.5-3 second of telephone speech utterances in Japanese language, show that a fairly good performance of GMM-based SRS is achieved with most speaker features vectors with both, close test as well as with open-test, although the features vector providing the best recognition performance closely depends on each particular speaker.

Short address: https://sciup.org/14316017

IDR: 14316017

Анализ характеристик систем распознавания речи на основе гауссовой модели со сложным ядром

В статье представлен анализ влияния особенностей речи диктора на характеристики системы распознавания речи, основанной на гауссовой модели со сложным ядром. С этой целью система распознавания речи анализировалась с использованием особенностей речи полученных (а) из линейных кепстральных коэффициентов, выделенных из целого фрагмента речи, (б) из линейных кепстральных коэффициентов, полученных из голосовых частей фрагмента речи, (в) из линейных кепстральных коэффициентов, полученных из голосовых сегментов речи вместе с информацией о высоте звука, (г) из линейных кепстральных коэффициентов, полученных из голосовых сегментов, нормированных с использованием кепстральной нормализации среднего. Оценка результатов с использованием фраз фрагментов телефонного разговора на японском языке длиной 2,5-3 секунды показала, что хорошие характеристики системы распознавания речи, основанные на гауссовой модели, достигаются в большинстве случаев вне зависимости от особенностей голоса диктора как в случае системы, «обученной» конкретным фразам, так и «необученной». При этом вектор, характеризующий особенности речи и обеспечивающий лучшее распознавание, в значительной степени зависит от конкретного диктора.

E. Simancas-Acevedo, A. Kurematsu, M. Nakano-Miyatake, H. Perez-Meana. Speaker Recognition Using Gaussian Mixtures Model. Lecture Notes in Computer Science, BioInspired Applications of Connectionism, Springer Verlag, Berlin, 2001, 287-294.
H. A. Murthy, F. Beaufays, L. P. Heck, M. Weintraub. Robust Text-Independent Speaker Identification over Telephone Channels. IEEE Transactions on Speech and Audio Processing, vol. 7, N°5, September 1999.
D. A. Reynolds. Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Transactions on Speech and Audio Processing, vol. 3, N°1, 72-83, January 1995.
S. Van Vuren. Comparison of Text-Independent Speaker Recognition Methods on Telephone Speech with Acoustic Mismatch. Oregon Graduate Institute of Science Technology Center for Spoken Language Understanding, 20000 N.W. Walker Road, Beaverton, Oregon 97006 USA.
J. P. Campbell. Speaker Recognition: A Tutorial. Proceedings of the IEEE, vol. 85, N°9, 1437-1462, Sept. 1997.
H. K. Kim, H. S. Lee. Use of Spectral Autocorrelation in Spectral Envelope Linear Prediction for Speech Recognition. IEEE Transactions on Speech and Audio Processing, vol. 7, N°5, September 1999.
T. Ganchev, A. Tsopanoglou, N. Fakotakis, G. Kokkinakis. Probabilistic Neural Networks Combined with GMMs For Speaker Recognition over Telephone Channels. 14-th International Conference On Digital Signal Processing (DSP 2002), 2002, July 1-3, Santorini, Greece, Volume II, 1081-1084.
D. A. Reynolds. Experimental Evaluation of Features for Robust Speaker Identification. IEEE Transactions on Speech and Audio Processing, vol. 2, N°4, October 1994.
K. P. Markov, S. Nakagawa. Integrating Pitch and LPC-Residual Information with LPC-Cepstral for Text-independent Speaker Recognition. J. Acoustic Society of Japan (E), 20, 4, 281-291, 1999.
J. Pool, J. A. du Preez. HF Speaker Recognition. Thesis notes, Digital Signal Processing Group, Department of Electrical and Electronic Engineering, University of Stellenbosch, March 1999.
M. D. Plumper, T. F. Quatieri, D. A. Reynolds. Modeling of the Glottal Flow Derivative Waveform with Application to Speaker Identification. IEEE Transactions on Speech and Audio Processing, vol. 7, N°5, September 1999.
K. Markov, S. Nakagawa. Frame Level Likehood Normalization For Text-Independent Speaker Identification Using Gaussian Mixture Models. The Fourth International Conference on Spoken Language Processing, ICSLP96, vol. 3, October 3-6, Wyndham Franklin Plaza Hotel, Philadelphia, PA, USA.
J. de Vetch, L. Boves. Comparison of Channel Normalization Techniques For Automatic Speech Recognition Over the Telephone. Department of Language and Speech, University of Nijmegen, P.O. Box 9103, 6500 HD Nijmen, The Netherlands.
F. Liu, Richard M. Stern, Xuedong Huang, Alejandro Acero. Efficient Cepstral Normalization For Robust Speech recognition. Department of Electrical and Computer Engineering, School of Computer Science, Carnegie Mellon University. Pittsburgh, PA 15213.
L. R. Rabiner, M. Cheng, A. Rosemberg, C. McGoegal. A Comparative Performance Study of Several Pitch Detection Algorithms. IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. ASSP-24, N°5, 399-418, October 1976.
B. Rabiner, B. Gold. Digital Processing of Speech Signals. Prentice Hall, Englewood Cliffd, NJ, 1975.
D. Hardt and K. Fellbaum. Spectral Subtraction and Rasta Filtering in Text Dependent HMM-based Speaker Verification. Proc. of ICASSP, vol. 2, 867-870, April 1997.
E. Simancas, M. Nakano Miyatake, H. Perez-Meana. Speaker Verification Using Pitch and Melspec Information. Journal of Telecommunications and Radio Engineering, vol. 56, 46-57, Jan. 2000.
F. Hou, B. Wong. Text Independent Speaker Recognition Using Probabilistic SVM with GMM Adjustment. Proc. of the International Conference of Speech, Acoustics and Signal Processing, 305-308, 2003.
D. A. Reynolds. An Overview of Automatic Speaker Recognition Technology. Proc. of the International Conference of Speech, Acoustics and Signal Processing, vol. 4, 40724075, 2002.
E. Simancas Acevedo, H. Perez-Meana, M. Nakano Miyatake, A. Kurematsu. Effect of Voiced Segments in Gaussian Mixture Model Text Independent Speaker Verification. Journal of Electromagnetics Waves and Electronic Systems, vol. 8, N°7, 34-42, August, 2003.
R. Zheng, S. Zhang, B. S. Xu. Text Independent Speaker Identification Using GMMUBM and Frame Level Likelihood Normalization. International Symposium on Chinese Spoken Language Processing, 289-292, Dec. 2004.
M. Kepesi, J. Macku. Introducing the Single-Channel Speech Separation Problem. Department of Telecommunications, Brno University of Technology, Purkynova 118, 612 00 Brno.
M. Plsek, M. Vondra. Pitch Detection in Noisy Speech Recordings. Brno University of Technology, Faculty of Electrical Engineering and Communications, Department of Telecommunications, Purkynova 118, 61200 Brno, Czech Republic.

Performance analysis of Gaussian mixture model speaker recognition systems with different speaker features

Анализ характеристик систем распознавания речи на основе гауссовой модели со сложным ядром

References Performance analysis of Gaussian mixture model speaker recognition systems with different speaker features