Development of modern audio and video content transcription systems

Автор: Baruzdin M.M., Raskatova M.V., Shchegolev P.

Журнал: Вестник Российского нового университета. Серия: Сложные системы: модели, анализ и управление @vestnik-rosnou-complex-systems-models-analysis-management

Рубрика: Информатика и вычислительная техника

Статья в выпуске: 4, 2024 года.

Бесплатный доступ

The article focuses on the existing problems in transcription. Current technologies used in transcription systems are reviewed. Modern open-source solutions are examined in detail, and their capabilities in addressing the described transcription challenges are explored. The four most popular open-source platforms are described: Kaldi, Mozilla Deep Speech, Whisper, Wav2Vec 2.0. Comparing the architecture and features of the models gives an idea of their capabilities and limitations. The article shows how models cope with problems faced by automatic speech recognition systems. The choice of the automatic speech recognition model depends on the specific tasks and conditions.

Еще

Kaldi, mozilla deep speech, whisper, wav2vec 2.0

Короткий адрес: https://sciup.org/148330268

IDR: 148330268 | УДК: 004.93 | DOI: 10.18137/RNU.V9187.24.04.P.71