Development of methods for automatic structuring and defragmentation of images of text documents
Автор: Gorshkov Danila A., Ershov Nikolay M.
Журнал: Сетевое научное издание «Системный анализ в науке и образовании» @journal-sanse
Статья в выпуске: 2, 2021 года.
Бесплатный доступ
The work is devoted to the research of image segmentation methods and methods of automatic recognition of formatting styles of a given text block. The aim of the work is to develop methods for automatic structuring and defragmentation of images of text documents, i.e. it is necessary to segment a text fragment on an image with further software implementation of automatic classification of text fragments. The paper proposes an image segmentation algorithm based on threshold segmentation. This algorithm allows you to achieve fairly accurate image segmentation. The review of the developed methods of style recognition is carried out, the relevance of the research is described. A numerical study of the methods is also carried out. The paper describes the software implementation of the proposed algorithms and methods using the Python programming language, demonstrates examples of the program's operation on images with text blocks. Tests of the developed methods were carried out on two samples of text images, for this purpose test images were generated using the Python programming language, the ImageMagic library and the Latex computer layout system. The conducted testing showed the prospects of the proposed approaches and methods for structuring and classifying text blocks.
Segmentation of text blocks, recognition of formatting styles
Короткий адрес: https://sciup.org/14123334
IDR: 14123334