Статьи журнала - Компьютерная оптика
Все статьи: 2590
Статья научная
In the context of modern digital document management, the automation of document pro-cessing, particularly in accounting, is a crucial factor in enhancing the efficiency of business pro-cesses. However, automated document processing encounters a range of specific challenges, both linguistic and structural characteristics of the data. Traditional text processing methods that rely on classical optical character recognition (OCR) algorithms do not provide sufficient accuracy in extracting data from document images, which limits their use in automated accounting systems. These challenges are particularly evident when processing documents with complex structures, specific element placement, and text content. This paper proposes a solution to this problem by applying a model based on a transformer neural network architecture, specifically adapted for working with document images. Within the scope of this study, the transformer model is trained on a dataset of accounting document images with varying element placements and text with Cy-rillic characters. The focus on Cyrillic text is particularly relevant, as research in this area has pre-dominantly concentrated on documents in English or other Latin-based scripts. This article in-cludes the results of training evaluated through specialized performance metrics. As a result of the experiment, at the final stage of training the model, the confidence loss was 0.156, which indicates that the model effectively minimizes the prediction error. The obtained accuracy of 0.868 showed a relatively high accuracy of forecasts. The Recall value of 0.905 indicates that the model effectively identifies most of the positive examples. The indicator F1=0.886 reflects a good balance between accuracy and memorability. The accuracy of 0.96798 indicates that the model's predictions are highly accurate. The use of the transformer model significantly improves the accuracy of extracting key in-formation, such as date, number, and organization name, from accounting documents containing Cyrillic text. The findings of this study affirm the potential of this method for implementation in automated accounting systems, contributing to enhanced efficiency and precision in processing accounting documents.
Бесплатно
Efficiency of object identification for binary images
Статья научная
In this paper, a comparative analysis of the correlation-extreme method, the method of contour analysis and the method of stochastic gradient identification in the objects identification for a binary image is carried out. The results are obtained for a situation where possible deformations of an identified object with respect to a pattern can be reduced to a similarity model, that is, the pattern and the object may differ in scale, orientation angle, shift along the base axes, and additive noise. The identification of an object is understood as the recognition of its image with an estimate of the strain parameters relative to the template.
Бесплатно
Статья научная
This paper presents the diffraction characteristics of electrically controlled multiplexed three-layer holographic diffraction structures formed in photopolymer materials with a high proportion of nematic liquid crystals. The results obtained demonstrate the possibility of using multilayer holographic diffraction structures as the main element for electrically controlled optical spectral filters for dense wavelength division multiplexing communication networks.
Бесплатно
Enhanced dynamic programming-based method for text line recognition in documents
Статья научная
On-premise text recognition is in demand. Customers want to recognize bank cards to pay online, passports to fill in tickets' information and many more using their smartphones. As main approach to text recognition in the last two decades is artificial neural networks the resulting solutions tend to be resource-hungry and not fitting on mobile devices. In our paper, we introduce an enhanced method based on dynamic programming and a fully convolutional network for text line recognition that allows this classic model to demonstrate competitive results with much heavier architectures. The main idea is the addition of the special pin into the network alphabet that allows to apply dynamic programming to analyze the raw neural network output effectively. As our main focus is the recognition of identity documents we employ public dataset MIDV-500 and its extension MIDV-2019 as a test sample. We compare our resulting recognizer with several published models, including TrOCR, Paddle OCR, and Tesseract OCR 5, to demonstrate its superiority in accuracy and performance trade-off. Our method is about 200 times faster than TrOCR, and in the most cases is about 2 times faster than Paddle OCR. The accuracy of our recognizer is comparable with Paddle OCR on MIDV-500 and is better on MIDV-2019, including it being about 2 times more accurate for machine-readable zones images.
Бесплатно
Enhancing forest cover analysis through super-resolution of Sentinel-2 multispectral images
Статья научная
Machine learning (ML) algorithms, combined with satellite observations, offer significant advantages in environmental studies, particularly in vegetation cover analysis. The varying spectral resolution and number of spectral bands of remote sensing imagery allow for different tasks to be addressed with different levels of detail and accuracy. A current limitation in advanced Geographic Information System (GIS) development is the availability and accessibility of data. High-resolution data with a wide spectral range are often expensive, while open-access data typically force researchers to choose between high spatial and temporal resolution or large number of spectral bands. In this study, we investigate this issue through a case study of forest type classification. We employed and trained a single-image super-resolution model based on the Residual Channel Attention Network (RCAN) to upscale Sentinel-2 multispectral images from 10 to 5 meters. We then compared image segmentation results from the original Sentinel-2 data, the upscaled data, and WorldView-3 images. In addition to experiments with spatial resolution, we explored the effect of number of spectral bands on segmentation quality. The results confirm our hypothesis that artificially upscaled data provide more information than low-resolution data, both for narrow and wider spectral ranges, with the increase in spatial resolution proving more significant than the increase in number of spectral bands.
Бесплатно
Erratum: dynamic analysis of optical cell trapping in the ray optics regime
Статья научная
In this additional part of the original paper [1], revised calculations and corrected equations are presented. Also some conclusions from the original paper are revised and discussed briefly.
Бесплатно
Статья
В настоящем дополнении к оригинальной статье [1] исправляется ошибка, допущенная при расчетах спектров отражения и пропускания изогнутого волноводного резонатора Фабри–Перо. Ошибка возникла вследствие пренебрежения оболочечными модами в прямых волноводных участках перед и после исследуемого резонатора (рис. 2а в оригинальной статье). Хотя данные моды не вносят непосредственного вклада в вычисляемую прошедшую и отраженную мощность, их учет необходим для корректного расчета спектров отражения и пропускания резонатора, что было обнаружено после опубликования статьи. В данном дополнении приводятся исправленные результаты, а также некоторые корректировки выводов оригинальной статьи.
Бесплатно
Статья научная
Change detection from synthetic aperture radar images becomes a key technique to detect change area related to some phenomenon as flood and deformation of the earth surface. This paper proposes a transfer learning and Residual Network with 18 layers (ResNet-18) architecture-based method for change detection from two synthetic aperture radar images. Before the application of the proposed technique, batch denoising using convolutional neural network is applied to the two input synthetic aperture radar image for speckle noise reduction. To validate the performance of the proposed method, three known synthetic aperture radar datasets (Ottawa; Mexican and for Taiwan Shimen datasets) are exploited in this paper. The use of these datasets is important because the ground truth is known, and this can be considered as the use of numerical simulation. The detected change image obtained by the proposed method is compared using two image metrics. The first metric is image quality index that measures the similarity ratio between the obtained image and the image of the ground truth, the second metrics is edge preservation index, it measures the performance of the method to preserve edges. Finally, the method is applied to determine the changed area using two Sentinel 1 B synthetic aperture radar images of Eddahbi dam situated in Morocco.
Бесплатно
Статья научная
This paper considers an experimental study of the layout of an active-pulse television measuring system in the problem of assessing the accuracy of measuring the distance to objects using the depth maps. The main technical characteristics and structure of the active-pulse television measuring system layout are described, the description of the multi-zone ranging method used in the experiment is given. The field tests were carried out using a system for terrain orthophotomaps construction by an unmanned aerial vehicle and a geodetic measuring instrument, which is a reference for building a terrain plan and fixing distances between objects on the ground. The technique of carrying out aerial work is described to obtain the necessary data array, on which a digital model and an orthophotomap of the area were subsequently built. Conclusions are drawn about the accuracy of digital terrain models built based on the results of aerial photography from an unmanned aerial vehicle with a geodetic receiver on board and the applicability of these data as reference data for testing a prototype of an active-pulse television measuring system.
Бесплатно
Experimental investigation of multimode dispersionless beams
Статья научная
Laser light modes are beams in whose cross-section the complex amplitude is described by eigenfunctions of the operator of light propagation in the waveguide medium. The fundamental properties of modes are their orthogonality and their ability to retain their structure during propagation for example in a lenslike medium or in free space. Developed diffractive optical elements (DOEs) of MODAN-type open up new promising potentialities of solving the tasks of generation, transformation, superposition of different laser modes and their combinations. Now we present new results obtained by synthesis and investigation of beams consisting of more than one two-dimensional Gaussian laser modes with the same value of propagation constant - multimode dispersionless beams.
Бесплатно
Face anti-spoofing with joint spoofing medium detection and eye blinking analysis
Статья научная
Modern biometric systems based on face recognition demonstrate high recognition quality, but they are vulnerable to face presentation attacks, such as photo or replay attack. Existing face anti-spoofing methods are mostly based on texture analysis and due to lack of training data either use hand-crafted features or fine-tuned pretrained deep models. In this paper we present a novel CNN-based approach for face anti-spoofing, based on joint analysis of the presence of a spoofing medium and eye blinking. For training our classifiers we propose the procedure of synthetic data generation which allows us to train powerful deep models from scratch. Experimental analysis on the challenging datasets (CASIA-FASD, NUUA Imposter) shows that our method can obtain state-of-the-art results.
Бесплатно
Face photo retrieval based on sketches
Статья научная
The paper deals with the problem of the automatic retrieval of face photos using sketch drawings based on the witness description. We propose new methods for the generation of a sketch population from the initial one to improve the performance of sketch-based photo image retrieval systems. The method based on the computation of an average sketch from the generated population has been applied to increase the index of similarity in sketch-photo pairs. It is shown that such sketches are more similar to the original photographic images and their use leads to good results. Results of the experiments on CUHK Face Sketch and CUHK Face Sketch FERET databases and open access databases of photo-sketches pairs are discussed.
Бесплатно
Face recognition based on the proximity measure clustering
Статья научная
In this paper problems of featureless face recognition are considered. The recognition is based on clustering the proximity measures between the distributions of brightness clusters cardinality for segmented images. As a proximity measure three types of distances are used in this work: the Euclidean, cosine and Kullback-Leibler distances. Image segmentation and proximity measure clustering are carried out by means of a software model of the recurrent neural network. Results of the experimental studies of the proposed approach are presented.
Бесплатно
Facedetectnet: face detection via fully-convolutional network
Статья научная
Ace detection is one of the most popular computer vision tasks. There are a lot of face detection approaches proposed including different CNN-based techniques, but the problem of optimal balancing between detection quality and computational speed is still relevant. In this paper we propose new CNN-based solution for face detection called FaceDetectNet. Our CNN architecture is based on ideas of YOLO/DetectNet and GoogleNet architecture supported with some new tools and implementation details created especially for our face detection application. We propose: original iterative proposal clustering (IPC) algorithm for aggregation of output face proposals formed by CNN and the 2-level “weak pyramid” providing better detection quality on the testing sets containing both small and huge images. Our face detection approach is close to previously proposed SSD-based face detection, but the principal difference is that we use the deep features of top hidden CNN layer for forming the face proposals of any size...
Бесплатно
Fast localization and rectification of documents folded into thirds
Статья научная
The ubiquitous usage of smartphones makes camera-captured document images as widely used as scanned ones as the input of a modern document recognition system. A document captured by a smartphone camera may appear mechanically distorted in the image creating the need for an image rectification step. The present paper considers a particular case of document image distortions. Specifically, if a business document is sent via postal service, it may need to be folded to fit the envelope. Once the document is taken out of the envelope and unfolded, its geometric shape is distorted in a very particular pattern. Since the most popular envelope formats in Europe and America require the document to be folded into thirds, this case is considered in this paper. We propose a novel content-independent model-based algorithm for the localization and geometrical rectification of documents folded into thirds. Our algorithm outperforms current SOTA rectification methods on the recently published dataset FDI by key rectification accuracy metrics (AD and CER) and is able to rectify documents held in hand. Moreover, it can be executed on a mobile CPU and has a reasonable execution time: it takes only about 17 ms to localize a document and about 110 ms to projectively rectify it. So it makes it possible to embed the proposed algorithm into document recognition systems designed for on-device acquisition.
Бесплатно
Статья научная
We have studied the nanostructuring and colorizing of the copper surface by scanning with a femtosecond laser beam with a near-Gaussian beam profile. The experimental studies were conducted using a femtosecond laser comprising a Ti:Sapphire oscillator and a multi-pass amplifier with the maximum pulse energy of 0.7 mJ, pulse frequency of 1 kHz, and pulse duration <30 fs. It is shown that the use of a short-pulsed femtosecond laser leads to the formation of wavelength scale periodic surface structures and eventually increases the brightness of the color of the copper surface. It is revealed that via reciprocally scanning the copper surface by multiple ultrashort laser pulses with a weakly asymmetric spatial energy density distribution and an energy density below the material ablation threshold, it is possible to create a combined nanostructure composed of low-spatial-frequency laser-induced periodic surface structures coated with nanoscale roughness. It is shown that relatively minor changes in the nanostructures obtained by scanning the copper surface by multiple ultrashort laser pulses can lead to a significant change in the color during surface colorizing.
Бесплатно
Fine-tuning the hyperparameters of pre-trained models for solving multiclass classification problems
Статья научная
This study is devoted to the application of fine-tuning methods for Transfer Learning models to solve the multiclass image classification problem using the medical X-ray images. To achieve this goal, the structural features of such pre-trained models as VGG-19, ResNet-50, InceptionV3 were studied. For these models, the following fine-tuning methods were used: unfreezing the last convolutional layer and updating its weights, selecting the learning rate and optimizer. As a dataset chest X-Ray images of the Society for Imaging Informatics in Medicine (SIIM), as the leading healthcare organization in its field, in partnership with the Foundation for the Promotion of Health and Biomedical Research of Valencia Region (FISABIO), the Valencian Region Medical ImageBank (BIMCV) ) and the Radiological Society of North America (RSNA) were used. Thus, the results of the experiments carried out illustrated that the pre-trained models with their subsequent tuning are excellent for solving the problem of multiclass classification in the field of medical image processing. It should be noted that ResNet-50 based model showed the best result with 82.74 % accuracy. Results obtained for all models are reflected in the corresponding tables.
Бесплатно
Focal-plane field when lighting double-ring phase elements
Статья научная
The focal-plane field amplitude is calculated when lighting double-ring phase elements by flat and Gaussian beams. Emerging conditions in the minimum or maximum centers, including flat-top maxima, are given. For the field amplitude, we obtain equations that define the radius of the first zero-intensity ring based on the deduced expressions. The root values are listed for several parameters of optical elements and incident beams due to the lack of analytical solutions. Numerical simulation results are given for flat incident beams; they are fully consistent with the theoretical calculations.
Бесплатно
Focusing of light beams with the phase apodization of the optical system
Статья научная
We investigated reduction of the size of the illuminated beam in the focal region produced by the optical systems of NA=0.99 has been. The intensity distributions of polarized light field in the focal volume for the phase apodization pupil have been discussed. The circular pupil in different phase apodization situations can be employed to control the field components in the resultant in-tensity distribution. We show that both axial and transverse resolution improvement in the focal distribution is possible by applying proper phase engineering in the annulus of the pupil function.
Бесплатно