International conference on machine vision. Рубрика в журнале - Компьютерная оптика
Статья
The classical Otsu method is a common tool in document image binarization. Often, two classes, text and background, are imbalanced, which means that the assumption of the classical Otsu method is not met. In this work, we considered the imbalanced pixel classes of background and text: weights of two classes are different, but variances are the same. We experimentally demonstrated that the employment of a criterion that takes into account the imbalance of the classes' weights, allows attaining higher binarization accuracy. We described the generalization of the criteria for a two-parametric model, for which an algorithm for the optimal linear separation search via fast linear clustering was proposed. We also demonstrated that the two-parametric model with the proposed separation allows increasing the image binarization accuracy for the documents with a complex background or spots.
Бесплатно
Algorithm for choosing the best frame in a video stream in the task of identity document recognition
Статья
During the process of document recognition in a video stream using a mobile device camera, the image quality of the document varies greatly from frame to frame. Sometimes recognition system is required not only to recognize all the specified attributes of the document, but also to select final document image of the best quality. This is necessary, for example, for archiving or providing various services; in some countries it can be required by law. In this case, recognition system needs to assess the quality of frames in the video stream and choose the "best" frame. In this paper we considered the solution to such a problem where the "best" frame means the presence of all specified attributes in a readable form in the document image. The method was set up on a private dataset, and then tested on documents from the open MIDV-2019 dataset. A practically applicable result was obtained for use in recognition systems.
Бесплатно
Статья
An algorithm for post-processing of the grayscale 3D computed tomography (CT) images of porous structures with the automatic selection of filtering parameters is proposed. The determination of parameters is carried out on a representative part of the image under analysis. A criterion for the search for optimal filtering parameters based on the count of "levitating stone" voxels is described. The stages of CT image filtering and its binarization are performed sequentially. Bilateral and anisotropic diffuse filtering is implemented; the Otsu method for unbalanced classes is chosen for binarization. Verification of the proposed algorithm was carried out on model data. To create model porous structures, we used our image generator, which implements the function of anisotropic porous structures generation. Results of the post-processing of real CT images containing noise and reconstruction artifacts by the proposed method are discussed.
Бесплатно
Handwritten text generation and strikethrough characters augmentation
Статья научная
We introduce two data augmentation techniques, which, used with a Resnet - BiLSTM - CTC network, significantly reduce Word Error Rate and Character Error Rate beyond best-reported results on handwriting text recognition tasks. We apply a novel augmentation that simulates strikethrough text (HandWritten Blots) and a handwritten text generation method based on printed text (StackMix), which proved to be very effective in handwriting text recognition tasks. StackMix uses weakly-supervised framework to get character boundaries. Because these data augmentation techniques are independent of the network used, they could also be applied to enhance the performance of other networks and approaches to handwriting text recognition. Extensive experiments on ten handwritten text datasets show that HandWritten Blots augmentation and StackMix significantly improve the quality of handwriting text recognition models.
Бесплатно
Neural network regularization in the problem of few-view computed tomography
Статья научная
The computed tomography allows to reconstruct the inner morphological structure of an object without physical destructing. The accuracy of digital image reconstruction directly depends on the measurement conditions of tomographic projections, in particular, on the number of recorded projections. In medicine, to reduce the dose of the patient load there try to reduce the number of measured projections. However, in a few-view computed tomography, when we have a small number of projections, using standard reconstruction algorithms leads to the reconstructed images degradation. The main feature of our approach for few-view tomography is that algebraic reconstruction is being finalized by a neural network with keeping measured projection data because the additive result is in zero space of the forward projection operator. The final reconstruction presents the sum of the additive calculated with the neural network and the algebraic reconstruction. First is an element of zero space of the forward projection operator. The second is an element of orthogonal addition to the zero space. Last is the result of applying the algebraic reconstruction method to a few-angle sinogram. The dependency model between elements of zero space of forward projection operator and algebraic reconstruction is built with neural networks. It demonstrated that realization of the suggested approach allows achieving better reconstruction accuracy and better computation time than state-of-the-art approaches on test data from the Low Dose CT Challenge dataset without increasing reprojection error.
Бесплатно
Optimal affine image normalization approach for optical character recognition
Статья
Optical character recognition (OCR) in images captured from arbitrary angles requires preliminary normalization, i.e. a geometric transformation resulting in an image as if it was captured at an angle suitable for OCR. In most cases, a surface containing characters can be considered flat, and a pinhole model can be adopted for a camera. Thus, in theory, the normalization should be projective. Usually, the camera optical axis is approximately perpendicular to the document surface, so the projective normalization can be replaced with an affine one without a significant loss of accuracy. An affine image transformation is performed significantly faster than a projective normalization, which is important for OCR on mobile devices. In this work, we propose a fast approach for image normalization. It utilizes an affine normalization instead of a projective one if there is no significant loss of accuracy. The approach is based on a proposed criterion for the normalization accuracy: root mean square (RMS) coordinate discrepancies over the region of interest (ROI). The problem of optimal affine normalization according to this criterion is considered. We have established that this unconstrained optimization is quadratic and can be reduced to a problem of fractional quadratic functions integration over the ROI. The latter was solved analytically in the case of OCR where the ROI consists of rectangles. The proposed approach is generalized for various cases when instead of the affine transform its special cases are used: scaling, translation, shearing, and their superposition, allowing the image normalization procedure to be further accelerated.
Бесплатно
Tiny CNN for feature point description for document analysis: approach and dataset
Статья научная
In this paper, we study the problem of feature points description in the context of document analysis and template matching. Our study shows that specific training data is required for the task especially if we are to train a lightweight neural network that will be usable on devices with limited computational resources. In this paper, we construct and provide a dataset of photo and synthetically generated images and a method of training patches generation from it. We prove the effectiveness of this data by training a lightweight neural network and show how it performs in both general and documents patches matching. The training was done on the provided dataset in comparison with HPatches training dataset and for the testing, we solve HPatches testing framework tasks and template matching task on two publicly available datasets with various documents pictured on complex backgrounds: MIDV-500 and MIDV-2019.
Бесплатно
Towards a unified framework for identity documents analysis and recognition
Статья научная
Identity documents recognition is far beyond classical optical character recognition problems. Automated ID document recognition systems are tasked not only with the extraction of editable and transferable data but with performing identity validation and preventing fraud, with an increasingly high cost of error. A significant amount of research is directed to the creation of ID analysis systems with a specific focus for a subset of document types, or a particular mode of image acquisition, however, one of the challenges of the modern world is an increasing demand for identity document recognition from a wide variety of image sources, such as scans, photos, or video frames, as well as in a variety of virtually uncontrolled capturing conditions. In this paper, we describe the scope and context of identity document analysis and recognition problem and its challenges; analyze the existing works on implementing ID document recognition systems; and set a task to construct a unified framework for identity document recognition, which would be applicable for different types of image sources and capturing conditions, as well as scalable enough to support large number of identity document types. The aim of the presented framework is to serve as a basis for developing new methods and algorithms for ID document recognition, as well as for far more heavy challenges of identity document forensics, fully automated personal authentication and fraud prevention.
Бесплатно
Weighted combination of per-frame recognition results for text recognition in a video stream
Статья
The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.
Бесплатно