The issue of determining the parameters of a training sample for a neural network training to solve tasks of assessing the security of speech acoustic information using a speech choir
Автор: Volkov N.A., Ivanov A.V.
Журнал: Инфокоммуникационные технологии @ikt-psuti
Рубрика: Новые информационные технологии
Статья в выпуске: 1 (89) т.23, 2025 года.
Бесплатный доступ
In order to assess the speech acoustic information security, it is proposed to use convolutional neural networks. This article considers the selection of the most appropriate parameters of spectrograms and mel-frequency cepstral coefficients generated on the basis of audio recordings of speech with superimposed speech-like interference of the «speech chorus» type to get a training sample used in the convolutional neural network training process. Key parameters of the convolutional neural network architecture, as well as the requirements for the data set necessary for its training, are defined. During the study, one of the parameters of the training sample was varied in order to identify the most appropriate values. Based on the results of the analysis, it was found that the best results in solving this problem are achieved when data is presented in the form of spectrograms. In future, it is planned to expand the data set by increasing the number of speakers.
Deep neural networks, convolutional neural networks, signal-to-noise ratio, audio recording noise, speech recognition, spectrograms, mel-frequency cepstral coefficients, assessment of the security of speech acoustic information, speech chorus
Короткий адрес: https://sciup.org/140312331
IDR: 140312331 | УДК: 004.056.53 | DOI: 10.18469/ikt.2025.23.1.09