Machine learning-based voice assistant: optimizing the efficiency of speech conversion for people with speech disorders

Автор: Antor M.H., Chudinovskikh N.V., Bachurin M.V., Shurpikov A.A., Khlebnikov N.A., Bredikhin B.A.

Журнал: Компьютерная оптика @computer-optics

Рубрика: Численные методы и анализ данных

Статья в выпуске: 1 т.49, 2025 года.

Бесплатный доступ

An automatic speech recognition system has the possibility of enhancing the standard of living for persons with disabilities by solving issues such as dysarthria, stuttering, and other speech defects. In this paper, we introduce a voice assistant using hyperkinetic dysarthria (HD) defect speeches. It contains the data preprocessing steps and the development of a novel convolutional recurrent network (CRN) model that is built depending on the convolutional neural networks and recurrent neural networks. We implemented data preprocessing methods, including filtering, down-sampling, and splitting, to prevent overfitting and decrease processing power as well as time. In addition, the technique of Mel Frequency Cepstral Coefficients (MFCC) has been utilized to extract speech characteristics. The proposed model is trained to recognize HD speech disorders using a dataset including 2000 Russian speeches. The experimental results demonstrate that the proposed method obtains a character error rate (CER) of 14.76 %. It indicates that approximately 85 % of characters are able to correctly recognize on the test dataset. We have created a telegram bot that utilizes our trained model to help people with hyperkinetic dysarthria speech disorder. This bot is capable of providing assistance independently, without the need for any third-party assistance.

Еще

Natural language processing, hyperkinetic dysarthria, speech recognition, feature extraction, optimization

Короткий адрес: https://sciup.org/140310450

IDR: 140310450   |   DOI: 10.18287/2412-6179-CO-1482

Статья научная