Application of machine learning methods in the recognition of printed texts of the 19th century

Бесплатный доступ

The paper deals with the study of the possibilities of using machine learning methods in the problem of recognizing Russian printed documents of the 19th century. The results of the analysis of existing methods and tools for recognizing printed texts, including proprietary ones, are presented on the example of the analysis of some Russian documents of the 19th century. The paper proposes an approach to text recognition using the Tesseract software package, on the base of which two versions of a software system were developed and it works with digitized images of text documents. The results of testing the developed software system are presented, showing the prospects of the proposed approach. The work was carried out with the financial support of the Russian Foundation for Basic Research (grant No. 20-07-01053 A).

Еще

Optical character recognition, recurrent neural networks

Короткий адрес: https://sciup.org/14122726

IDR: 14122726

Статья научная