Статьи журнала - Компьютерная оптика
Все статьи: 2553
MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream
Статья научная
A lot of research has been devoted to identity documents analysis and recognition on mobile devices. However, no publicly available datasets designed for this particular problem currently exist. There are a few datasets which are useful for associated subtasks but in order to facilitate a more comprehensive scientific and technical approach to identity document recognition more specialized datasets are required. In this paper we present a Mobile Identity Document Video dataset (MIDV-500) consisting of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide scope of document analysis problems. The paper presents characteristics of the dataset and evaluation results for existing methods of face detection, text line recognition, and document fields data extraction. Since an important feature of identity documents is their sensitiveness as they contain personal data, all source document images used in MIDV-500 are either in public domain or distributed under public copyright licenses. The main goal of this paper is to present a dataset. However, in addition and as a baseline, we present evaluation results for existing methods for face detection, text line recognition, and document data extraction, using the presented dataset.
Бесплатно
MIDV-DM: A Document-Oriented Dataset for Image Manipulation Detection and Localization
Статья научная
As the scope of application of document recognition systems in business processes increases, so does the number of attacks on these systems. One form of such attacks could involve software for manipulating a digital image of a document. The development of methods for image manipulation detection and localization is complicated with the fact that available datasets neither contain images of documents nor lack diversity in capture conditions and document types. Furthermore, these datasets do not cover the range of possible kinds of manipulations that occur under natural conditions. In this paper, we introduce MIDV-DM – a publicly available benchmark designed for the development and testing of methods aimed at detecting and localizing manipulations in identity document images. It contains images subjected to eight types of manipulations, which we have conceptually categorized based on our analysis of over 2000 real-world fraud attempts. In total, MIDV-DM contains 1000 original document images from the public MIDV-2020 dataset and 8000 automatically created manipulated images based on them, along with the ground truth masks and annotations. The paper also describes the process of obtaining baseline quality based on the IML-ViT model. The authors believe that MIDV-DM will open new opportunities for researchers to advance technologies for document image authenticity analysis.
Бесплатно
MIMO communication system capacity in random visible light channel
Статья научная
Being a promising one, optical information transmission standard expands capabilities of communication systems in the conditions of heavy frequency band load. Optical communication system efficiency in a room can be improved by multi-antenna systems. The aim of this paper is a theoretical study of MIMO Li-Fi communication system capacity. The calculation of ergodic capacity is performed for MIMO optical communication system in terms of various scenarios of light propagation. Receiving and transmitting system is modeled in the form of receivers and transmitters randomly placed in a room with randomly oriented light-emitting and photo diodes. A matrix of channel parameters is modeled using corresponding probability density functions and additive Gaussian noise at receiver inputs. The paper also considers various scenarios of optical signal propagation and their influence on optical channel capacity. The comparison of various methods of power distribution between original modes of MIMO optical communication system as well as their influence on capacity is carried out. Optimal power distribution between MIMO system eigenmodes is determined by maximum capacity criterion.
Бесплатно
Статья научная
An automatic speech recognition system has the possibility of enhancing the standard of living for persons with disabilities by solving issues such as dysarthria, stuttering, and other speech defects. In this paper, we introduce a voice assistant using hyperkinetic dysarthria (HD) defect speeches. It contains the data preprocessing steps and the development of a novel convolutional recurrent network (CRN) model that is built depending on the convolutional neural networks and recurrent neural networks. We implemented data preprocessing methods, including filtering, down-sampling, and splitting, to prevent overfitting and decrease processing power as well as time. In addition, the technique of Mel Frequency Cepstral Coefficients (MFCC) has been utilized to extract speech characteristics. The proposed model is trained to recognize HD speech disorders using a dataset including 2000 Russian speeches. The experimental results demonstrate that the proposed method obtains a character error rate (CER) of 14.76 %. It indicates that approximately 85 % of characters are able to correctly recognize on the test dataset. We have created a telegram bot that utilizes our trained model to help people with hyperkinetic dysarthria speech disorder. This bot is capable of providing assistance independently, without the need for any third-party assistance.
Бесплатно
Many heads but one brain: fusionbrain - a single multimodal multitask architecture and a competition
Статья научная
Supporting the current trend in the AI community, we present the AI Journey 2021 Challenge called FusionBrain, the first competition which is targeted to make a universal architecture which could process different modalities (in this case, images, texts, and code) and solve multiple tasks for vision and language. The FusionBrain Challenge combines the following specific tasks: Code2code Translation, Handwritten Text recognition, Zero-shot Object Detection, and Visual Question Answering. We have created datasets for each task to test the participants' submissions on it. Moreover, we have collected and made publicly available a new handwritten dataset in both English and Russian, which consists of 94,128 pairs of images and texts. We also propose a multimodal and multitask architecture - a baseline solution, in the centre of which is a frozen foundation model and which has been trained in Fusion mode along with Single-task mode. The proposed Fusion approach proves to be competitive and more energy-efficient compared to the task-specific one.
Бесплатно
Many-parameter m-complementary Golay sequences and transforms
Статья научная
In this paper, we develop the family of Golay–Rudin–Shapiro (GRS) m-complementary many-parameter sequences and many-parameter Golay transforms. The approach is based on a new gen-eralized iteration generating construction, associated with n unitary many-parameter transforms and n arbitrary groups of given fixed order. We are going to use multi-parameter Golay transform in Intelligent-OFDM-TCS instead of discrete Fourier transform in order to find out optimal values of parameters optimized PARP, BER, SER, anti-eavesdropping and anti-jamming effects.
Бесплатно
Mapping and evaluating urban density patterns in Moscow, Russia
Статья научная
The defense of the notion of ‘compact city’ as a strategy to reduce urban sprawl to support greater utilization of existing infrastructure and services in more compact areas and to improve the connectivity of employment hubs is actively discussed in urban research. Using the urban residential density as a surrogate measure for urban compactness, this paper empirically examines a cadaster database that contains details of every property with a view of capturing changes in urban residential density patterns across Moscow using geospatial techniques. The policy of densification in chase of a more compact city has produced mixed results. Findings of this study signal that the urban densities across the buffer zones around Moscow city are significantly different. The Landsat images from 1995, 2005 and 2016 are classified based on the maximum likelihood to expand the land use/cover maps and identify the land cover. Then, the area coverage for all the land use/cover types at different points in time is combined with the distance from the city center. After that, urbanization densities from the city center toward the outskirts for every 1-km distance from 1 to 60 km are calculated. The city density on the distance of 1 to 35 km is found to be very high in the years 1995 to 2016. As usual, the population, traffic conditions, industrialization and government policy are the major factors that influenced the urban expansion.
Бесплатно
Статья научная
The relaxation of a three-level atom interacting with a photon heat bath and an external stochastic field is investigated. For the reduced density matrix, a master equation averaged over stochastic process realizations is derived. An exact solution is obtained and the radiation line shapes are calculated.
Бесплатно
Статья научная
The paper studies entangled states of two qubits interacting with each other and with an electromagnetic field. The state of the qubits is determined by a statistical density matrix. The degree of entanglement of the state is characterized by the Peres-Gorodeckii (PG) parameter. The statistical density matrix and its evolution are determined in the energy representation within the framework of the path integral formalism. The obtained equations determine the dependence of the PG parameter on the parameters of qubit dipole-dipole interaction and the acting electromagnetic field. The results of numerical calculations are presented in graphs for the PG parameter. It is shown that it is possible to choose parameters corresponding to qubit states with a high degree of entanglement (0.99).
Бесплатно
Method for removing haze from images, captured under a wide range of lighting conditions
Статья научная
The presence of haze on images degrades the quality of perception and automatic analysis of scenes. One of the most popular methods of haze removal is the dark channel prior method, which is based on the Koschmieder atmospheric scattering model. However, its underlying assumptions are not met for nighttime, since localized light sources make a significant, if not the main, contribution to lighting. We propose here to use the degree of belonging of an image element to a localized light source, determined based on a one-class classifier, as a value that characterizes the confidence of the corresponding element of the estimated transmission map during its rectifi-cation based on the gamma-normal model, which makes it possible to increase the accuracy of dehazing when processing images, captured in low-light or nighttime conditions.
Бесплатно
Статья научная
An original approach to solving difficult time-consuming problems of registration and analysis of random point images is described. The approach is based on the development and application of high-performance specialized computer algebra systems. Three software packages have been created specifically for carrying out equivalent analytical transformations on a computer. The first software system is designed to calculate formulas describing the volumes of convex polyhedra with parametrically specified boundaries in n -dimensional space. The second system is based on the calculation of multidimensional integral expressions by the method of cyclic differentiation of the integral with respect to the parameter. The third system is based on the accelerated implementation of complex combinatorial-recursive transformations on a computer. Another distinctive feature of the work is the extension of the classical Catalan numbers to the multidimensional case (they were required to solve a number of intermediate probabilistic-combinatorial problems). The implementation of the above computer algebra software systems on a multi-core cluster of Novosibirsk State University, together with the direct use of the explicit form of generalized Catalan numbers, allowed the authors to obtain several new previously unknown probabilistic formulas and dependencies required for solving problems in the field of analysis of random point images.
Бесплатно
Статья научная
ID document recognition systems are already deeply integrated into human activity, and the pace of integration is only increasing. The first and most fundamental problems of such systems are document image localization and classification. In this field, template matching-based approaches have become widely used. These methods offer industrial precision, require minimal training data, and provide real-time performance on mobile devices. However, these methods have a significant limitation in scalability: every document type represents a set of local features to store and process, which affects the required computing resources. Moreover, considering the number of different document types supported by modern industrial recognition systems, they become unusable. To mitigate the drawback, we propose a method to select a subset of the most "stable" keypoints. To estimate keypoints' stability we synthesize a dataset of images containing various distortions relevant to the process of taking photos of hand-held documents with a smartphone camera in uncontrolled lighting conditions. To perform experiments we use well-known MIDV datasets, which have been designed to benchmark modern ID document recognition. The experiments show that the proposed method allows for increased ID document detection performance given thousands of document types and with limited computing resources.
Бесплатно
Статья научная
We propose a method of analysis of spontaneous emission of a quantum emitter (an atom, a luminescence center, a quantum dot) inside or in vicinity of a cylinder. At the focus of our method are analytical expressions for the scattering matrix of the cylindrical nanoobject. We propose the approach to electromagnetic field quantization based of eigenvalues and eigenvectors of the scattering matrix. The method is applicable for calculation and analysis of spontaneous emission rates and angular dependences of radiation for a set of different systems: semiconductor nanowires with quantum dots, plasmonic nanowires, cylindrical hollows in dielectrics and metals. Relative simplicity of the method allows obtaining analytical and semi-analytical expressions for both cases of radiation into external medium and into guided modes.
Бесплатно
Статья научная
Beam divergence is one of the instrument resolution parameters in neutron computed tomography. In pinhole geometry, due to the finite size of the source, geometric unsharpness affects the transmission images and therefore influences the reconstructed data. In this paper, we propose an approach for deterministic simulation of this effect for a voxelized 3D object. The idea behind the proposed approach is to use multiple point sources at a pinhole position and collect transmission images from each of them. The implementation was done using the ASTRA toolbox by calculating cone beam projections from each point source. This approach was applied to a porous phantom. Artifacts associated with beam divergence were identified in the reconstructed data. The influence of beam divergence on the segmentation of pores by binarization of the reconstructed data has been considered.
Бесплатно
Modeling the light diffraction by micro-optics elements using the finite element method
Статья
Бесплатно
Статья научная
We report a design for creating multilayer dielectric optical filters based on TiO2 and SiO2/MgF2 alternating layers. We have selected Titanium dioxide (TiO2) for high refractive index (2.5), Silicon dioxide (SiO2) and Magnesium fluoride (MgF2) as a low refractive index layer (1.45 and 1.37) respectively. Miniaturized visible spectrometers are useful for quick and mobile characterization of biological samples. Such devices can be fabricated by using Fabry-Perot (FP) filters consisting of two highly reflecting mirrors with a central cavity in between. Distributed Bragg Re-flectors (DBRs) consisting of alternating high and low refractive index material pairs are the most commonly used mirrors in FP filters, due to their high reflectivity. However, DBRs have high re-flectivity for a selected range of wavelengths known as the stopband of the DBR. This range is usually much smaller than the sensitivity range of the spectrometer. Therefore, bandpass filters are required to restrict the wavelength outside the stopband of the FP DBRs. The proposed filter shows high quality with an average transmission of 97 % within the passbands and the transmission outside the passband is around 3 %. Special attention has been given to keep the thickness of the filters within the economic limits. It can be suggested that these filters are exceptionally promising for florescence imaging and narrow-band imaging endoscopy.
Бесплатно
Monitored reconstruction improved by post-processing neural network
Статья научная
Computed tomography (CT) is widely utilized for analyzing internal structures, but the limitations of traditional reconstruction algorithms, which often require a large number of projections, restrict their effectiveness in time-critical tasks or for biological objects studying. Recently Monitored reconstruction approach was proposed for reducing the requirement of dose load. In this paper, there were investigated the advantages of using post-processing neural networks within a monitored reconstruction approach. Three algorithms, namely FBP, FBPConvNet, and LRFR, are evaluated based on their mean count of projections required for the achievement of target reconstruction accuracy. A novel training method specifically designed for neural network algorithms within the Monitored reconstruction framework is proposed. It is shown that the use of the LRFR approach allows one to achieve both a reduction in the number of measured projections and an improvement in the reconstruction accuracy over a certain range of stopping rules. These findings highlight the significant potential of neural networks to be used in the Monitored reconstruction approach.
Бесплатно
Статья научная
In this paper, we demonstrate that combining a laser heating (LH) system with a tandem acousto-optical tunable filter (TAOTF) allows us to measure the temperature distribution (TD) across a laser-heated microscopic specimen. Spectral image processing is based on one-dimensional (1D) non-linear least squares fitting of the Planck radiation function. It is applied for determining the temperature T at each point ( x, y ) of the specimen surface. It is shown that spectral image processing using the 1D non-linear least squares fitting allows measurement of the TD of the laser-heated microscopic specimen with higher precision and stability than those of the conventional linear least-squares fitting of the Wien approximation of Planck’s law.
Бесплатно
Multigrammatical modelling of neural networks
Статья научная
This paper is dedicated to the proposed techniques of modelling artificial neural networks (NNs) by application of the multigrammatical framework. Multigrammatical representations of feed-forward and recurrent NNs are described. Application of multiset metagrammars to modelling deep learning of NNs of the aforementioned classes is considered. Possible developments of the announced approach are discussed.
Бесплатно