Analysis of Approaches to Paralinguistic Feature Annotation Automation in Russian Speech
Автор: Evgenii N. Radchenko, Ekaterina. V. Isaeva
Журнал: Вестник Пермского университета. Математика. Механика. Информатика @vestnik-psu-mmi
Рубрика: Компьютерные науки и информатика
Статья в выпуске: 2 (69), 2025 года.
Бесплатный доступ
The development of speech synthesis systems with the ability to control speech character-istics using natural language is of practical interest, since it provides an intuitive way to influence the results of the generation. At the same time, for Russian-language data there exists a shortage of both such systems and labeled datasets required to create them. Man-ual labeling of large datasets is a resource-intensive process that requires not only expert knowledge, but also inter-annotator labeling consistency. In this regard, the task of auto-mating the annotation of paralinguistic characteristics of Russian-language speech becomes relevant, allowing to unify the labeling already existing in available datasets as well as accelerate its scaling to unlabeled ones. This article considers the main approaches to the annotation of such paralinguistic charac-teristics as pauses, stresses, as well as the pitch and timbre of the voice. In particular, at-tention is paid to reviewing available software implementations of the methods described.The key conclusion from the analysis was the existence of a sufficient number of methods suitable for annotating "basic" characteristics in Russian-language speech. Pauses and fundamental frequency can be extracted using methods that do not use linguistic information, while for stress annotation there are methods based on neural networks and, thus, taking into account the context of the utterance to resolve stress placement in homo-graphs, achieving an Accuracy metric score as high as 98%. At the same time, automatic annotation of more complex characteristics, such as timbre and expressed emotions, re-mains poorly studied. These results indicate the need for additional research in the field of methods for automatic annotation of paralinguistic features in Russian-language speech corpora.
Automatic annotation, audio annotation, text annotation, paralinguistic characteristics, speech generation
Короткий адрес: https://sciup.org/147251032
IDR: 147251032 | DOI: 10.17072/1993-0550-2025-2-101-122