Method for visual analysis of driver's face for automatic lip-reading in the wild

Основное

Автор: Axyonov Alexandr Alexandrovich, Ryumin Dmitry Alexandrovich, Kashevnik Alexey Mikhailovich, Ivanko Denis Viktorovich, Karpov Alexey Anatolievich

Журнал: Компьютерная оптика @computer-optics

Рубрика: Обработка изображений, распознавание образов

Статья в выпуске: 6 т.46, 2022 года.

Бесплатный доступ

The paper proposes a method of visual analysis for automatic speech recognition of the vehicle driver. Speech recognition in acoustically noisy conditions is one of big challenges of artificial intelligence. The problem of effective automatic lip-reading in vehicle environment has not yet been resolved due to the presence of various kinds of interference (frequent turns of driver's head, vibration, varying lighting conditions, etc.). In addition, the problem is aggravated by the lack of available databases on this topic. A MediaPipe Face Mesh is used to find and extract the region-of-interest (ROI). We have developed End-to-End neural network architecture for the analysis of visual speech. Visual features are extracted from a single image using a convolutional neural network (CNN) in conjunction with a fully connected layer. The extracted features are input to a Long Short-Term Memory (LSTM) neural network. Due to a small amount of training data we proposed that a Transfer Learning method should be applied. Experiments on visual analysis and speech recognition present great opportunities for solving the problem of automatic lip-reading. The experiments were performed on an in-house multi-speaker audio-visual dataset RUSAVIC. The maximum recognition accuracy of 62 commands is 64.09 %. The results can be used in various automatic speech recognition systems, especially in acoustically noisy conditions (high speed, open windows or a sunroof in a vehicle, backgoround music, poor noise insulation, etc.) on the road.

Еще

Vehicle, driver, visual speech recognition, automated lip-reading, machine learning, end-to-end, cnn, lstm

Короткий адрес: https://sciup.org/140296244

IDR: 140296244 | DOI: 10.18287/2412-6179-CO-1092

Статья научная