Automatic text-independent speaker verification using convolutional deep belief network

Автор: Rakhmanenko Ivan Andreevich, Shelupanov Alexander Alexandrovich, Kostyuchenko Evgeny Yurievich

Журнал: Компьютерная оптика @computer-optics

Рубрика: Обработка изображений, распознавание образов

Статья в выпуске: 4 т.44, 2020 года.

Бесплатный доступ

This paper is devoted to the use of the convolutional deep belief network as a speech feature extractor for automatic text-independent speaker verification. The paper describes the scope and problems of automatic speaker verification systems. Types of modern speaker verification systems and types of speech features used in speaker verification systems are considered. The structure and learning algorithm of convolutional deep belief networks is described. The use of speech features extracted from three layers of a trained convolution deep belief network is proposed. Experimental studies of the proposed features were performed on two speech corpora: own speech corpus including audio recordings of 50 speakers and TIMIT speech corpus including audio recordings of 630 speakers. The accuracy of the proposed features was assessed using different types of classifiers. Direct use of these features did not increase the accuracy compared to the use of traditional spectral speech features, such as mel-frequency cepstral coefficients. However, the use of these features in the classifiers ensemble made it possible to achieve a reduction of the equal error rate to 0.21% on 50-speaker speech corpus and to 0.23% on the TIMIT speech corpus.

Еще

Gmm-ubm-система, speaker recognition, speaker verification, gaussian mixture models, gmm-ubm system, speech features, speech processing, deep learning, neural networks, pattern recognition

Короткий адрес: https://sciup.org/140250028

IDR: 140250028 | DOI: 10.18287/2412-6179-CO-621