Parus - syntax annotated Russian corpus
Автор: Vlasova Natalia Aleksandrovna, Trofimov Igor Vladimirovich, Serdyuk Yuri Petrovich, Suleymanova Elena Anatoljevna, Vozdvizhenskiy Ilia Nikolayevich
Журнал: Программные системы: теория и приложения @programmnye-sistemy
Рубрика: Искусственный интеллект, интеллектуальные системы, нейронные сети
Статья в выпуске: 4 (43) т.10, 2019 года.
Бесплатный доступ
In this article we present a new annotated Russian language corpus named PaRuS (Parsed Russian Sentences). The corpus containing over 2.5 billion tokens is intended for use in computer linguistics tasks involving machine learning methods. PaRuS is a collection of annotated literary Russian sentences. Our linguistic annotation includes morphological features in MULTEXT-East format, and syntactic information in SynTagRus notation. We consider the methodology of corpus creation and describe PaRuS_pipe, a hybrid linguistic pipe developed for sentence annotation. We also discuss the quality of linguistic annotation in PaRuS and provide an assessment of the PaRuS_pipe morphological analyzer, according to the MorphoRuEval-2017 competition methodology.
Computer linguistics, corpus linguistics, russian, language corpus, markup, morphology, syntax
Короткий адрес: https://sciup.org/143169807
IDR: 143169807 | УДК: 004.89:81'322.2 | DOI: 10.25209/2079-3316-2019-10-4-181-199
 
	