Corpus of the archival documents of the Don Cossack Army: problems of morphological analysis
Автор: Gorban Oksana A., Kosova Marina V., Sheptukhina Elena M., Svetlov Andrey V.
Журнал: Вестник Волгоградского государственного университета. Серия 2: Языкознание @jvolsu-linguistics
Рубрика: Главная тема номера
Статья в выпуске: 6 т.21, 2022 года.
Бесплатный доступ
The article presents the results of the collective project aimed at comprising a special annotated diachronic corpus of documents of the 18th - 19th cen. from the “Mikhailovsky Stanitsa Ataman” Archive Fund (State Archive of Volgograd Region, Russia). In the course of the work, linguistic, technical and software tasks related to meta-marking, morphological tagging and representation of marked texts in an electronic search environment were solved. The texts are written in cursive script of the 18th cen. with the use of the old Cyrillic letters, which have spelling specificity. To work correctly with them, an add-on to the stemming tool MyStem by I. Segalovich was created. This application adds to the MyStem the following capabilities: the option to work with the old Cyrillic symbols, a convenient graphical interface; it provides the opportunity to remove homonymy manually, enables marked text exporting to an external data storage and processing system. Morphological analysis of some texts revealed the presence of nominal case form variants, which were not noted in the “Russian Grammar” by M.V. Lomonosov, in modern studies of literary texts of the 18th century. These findings point to effectiveness of automatic tagging which allows word form correction. The research results substantiated text tagging software tools adjustment for the extension of homonymous forms grammatical analysis options, aimed at identification and manual removal of homonymy. A quantitative analysis of these variants will allow the authors to evaluate their significance for the regional administrative language. The information obtained confirms the importance of the corpus creation for studying the history of the Russian language.
History of the russian language, regional business writing, linguistic corpus, morphological markup, variants of case forms, grammatical homonymy
Короткий адрес: https://sciup.org/149141658
IDR: 149141658 | DOI: 10.15688/jvolsu2.2022.6.4