Automation of morphological tagging of archival documents

Автор: Komendantov Anatoly S., Matveev Alexander G., Svetlov Andrey V.

Журнал: Математическая физика и компьютерное моделирование @mpcm-jvolsu

Рубрика: Моделирование, информатика и управление

Статья в выпуске: 4 т.22, 2019 года.

Бесплатный доступ

The paper provides the description of the add-on to MyStem stemming tool by I. Segalovich. We designe the application to add to MyStem a convenient graphical interface that is easy to learn and intuitive for users who do not specialize in information technology. It turned out that MyStem correctly processes outdated vocabulary if it is passed into the program using modern Cyrillic. In addition to the convenient interface, our program has the option to work with the outdated Cyrillic alphabet, when for instance, the letters zelo and omega are replaced by “ks” and “o” respectively, and only then the text is transferred for analysis to MyStem, and then the characters are replaced back in the processed document. So our add-on intercepts the output of MyStem tool, reformats and analyzes it in a special way. In addition, the application has functionality for removing homonyms manually if the program was not correct with automatic tagging of morphological characteristics of a word. The main purpose of this application is to prepare morphological tagging of documents of the archival fund “Mikhailovsky Stanichny Ataman” to create a linguistic corpus. During the work on the application, we solved the problem with correct processing of texts containing outdated Cyrillic characters. To implement a functional and user-friendly graphical interface, we use JavaFX platform (OpenJFX).

Еще

Утилита mystem, automation of linguistic analysis, automation of morphological analysis, mystem tool, graphical interface, software shell, corpus-based linguistics

Короткий адрес: https://sciup.org/149129872

IDR: 149129872   |   DOI: 10.15688/mpcm.jvolsu.2019.4.4

Статья научная