Development of a word frequency lists and texts corpus of Russian pre-reform language

Бесплатный доступ

The paper is devoted to the development of the Russian language corpus in pre-reform spelling and the development of a frequency word list based on this corpus of the Russian language of the 18th - early 20th centuries. Existing approaches to solving this problem are considered and analyzed, including an overview of a number of the most popular electronic national corpuses - Russian, British and Czech. The model of the internal organization of the electronic frequency word list and its functionality are formulated. The software implementation of the Russian pre-reform language corpus and the frequency word list based on it is described using the programming languages Python and Javascript and the Mongo DB database. The issues of web application implementation for access to the developed electronic dictionary are considered.

Еще

Word frequency list, linguistic corpus, texts recognition

Короткий адрес: https://sciup.org/14122719

IDR: 14122719

Статья научная