Speech/pause segmentation algorithm based on empirical mode decomposition and one-dimensional Mahalanobis distance

Автор: Alimuradov A.K., Tychkov A. Yu., Churakov P.P., Ageykin A.V., Kuleshov A.P., Chernov I.A.

Журнал: Труды Московского физико-технического института @trudy-mipt

Рубрика: Информатика и управление

Статья в выпуске: 3 (51) т.13, 2021 года.

Бесплатный доступ

Speech/pause segmentation is an accurate detection of the boundaries of the beginning and the end of informative speech sections (voiced and unvoiced speech, and pauses). Segmentation into informative sections is an important stage in speech preprocessing. The segmentation accuracy affects the performance of almost all speech applications (speech recognition, voice control, speaker identification, speech-to-text conversion, etc.). The article presents a speech/pause segmentation algorithm for fragmentation of speech, and decomposition of fragments into empirical modes for subsequent analysis of onedimensional Mahalanobis distance for discrete timing of each mode. The study of the algorithm is carried out in comparison with the original algorithm based on the analysis of onedimensional Mahalanobis distance, and the known segmentation methods based on the analysis of zero-crossing rate and short-term energy. Based on the obtained research results, we conclude that the developed segmentation algorithm provides the best detection of the boundaries of the beginning and the end of informative speech sections with the first and second kind errors of 4.576 % and 1.421 %, respectively.

Еще

Speech signal processing, speech segmentation, voiced and unvoiced speech, empirical mode decomposition, onedimensional mahalanobis distance

Короткий адрес: https://sciup.org/142231491

IDR: 142231491   |   DOI: 10.53815/20726759_2021_13_3_4

Статья научная