Статьи журнала - Компьютерная оптика

Все статьи: 2572

Traffic extreme situations detection in video sequences based on integral optical flow

Traffic extreme situations detection in video sequences based on integral optical flow

Chen Huafeng, Ye Shiping, Nedzvedz Alexander, Nedzvedz Olga, Lv Hexin, Ablameyko Sergey

Статья научная

Road traffic analysis is an important task in many applications and it can be used in video surveillance systems to prevent many undesirable events. In this paper, we propose a new method based on integral optical flow to analyze cars movement in video and detect flow extreme situations in real-world videos. Firstly, integral optical flow is calculated for video sequences based on optical flow, thus random background motion is eliminated; secondly, pixel-level motion maps which describe cars movement from different perspectives are created based on integral optical flow; thirdly, region-level indicators are defined and calculated; finally, threshold segmentation is used to identify different cars movements. We also define and calculate several parameters of moving car flow including direction, speed, density, and intensity without detecting and counting cars. Experimental results show that our method can identify cars directional movement, cars divergence and cars accumulation effectively.

Бесплатно

Transformer point net: cost-efficient classification of on-road objects captured by light ranging sensors on low-resolution conditions

Transformer point net: cost-efficient classification of on-road objects captured by light ranging sensors on low-resolution conditions

Pamplona Jos Fernando, Madrigal Carlos Andrs, Herrera-Ramirez Jorge Alexis

Статья научная

The three-dimensional perception applications have been growing since Light Detection and Ranging devices have become more affordable. On those applications, the navigation and collision avoidance systems stand out for their importance in autonomous vehicles, which are drawing an appreciable amount of attention these days. The on-road object classification task on three-dimensional information is a solid base for an autonomous vehicle perception system, where the analysis of the captured information has some factors that make this task challenging. On these applications, objects are represented only on one side, its shapes are highly variable and occlusions are commonly presented. But the highest challenge comes with the low resolution, which leads to a significant performance dropping on classification methods. While most of the classification architectures tend to get bigger to obtain deeper features, we explore the opposite side contributing to the implementation of low-cost mobile platforms that could use low-resolution detection and ranging devices. In this paper, we propose an approach for on-road objects classification on extremely low-resolution conditions. It uses directly three-dimensional point clouds as sequences on a transformer-convolutional architecture that could be useful on embedded devices. Our proposal shows an accuracy that reaches the 89.74 % tested on objects represented with only 16 points extracted from the Waymo, Lyft’s level 5 and Kitti datasets. It reaches a real time implementation (22 Hz) in a single core processor of 2.3 Ghz.

Бесплатно

Tree-serial parametric dynamic programming with flexible prior model for image denoising

Tree-serial parametric dynamic programming with flexible prior model for image denoising

Thang Pham Cong, Kopylov Andrei Valerievich

Статья научная

We consider here image denoising procedures, based on computationally effective tree-serial pa-rametric dynamic programming procedures, different representations of an image lattice by the set of acyclic graphs and non-convex regularization of a new type which allows to flexibly set a priori pref-erences. Experimental results in image denoising, as well as comparison with related methods, are provided. A new extended version of multi quadratic dynamic programming procedures for image denoising, proposed here, shows an improved accuracy for images of a different type.

Бесплатно

Tunable diffraction grating with transparent indium-tin oxide electrodes on a lithium niobate X-cut crystal

Tunable diffraction grating with transparent indium-tin oxide electrodes on a lithium niobate X-cut crystal

Paranin Vyacheslav Dmitrievich, Karpeev Sergei Vladimirovich, Tukmakov Konstantin Nickolaevich, Volodkin Boris Olegovich

Статья научная

A tunable diffraction grating based on an electrooptic X-cut lithium niobate crystal has been manufactured and experimentally analyzed. The period of electrodes is 290 μm, the electrode width is 117.5 μm, and the thickness of an electrode is 150 - 160 nm. The electrodes are made of a transparent conducting indium-tin oxide that serves as an antireflection coating with the aim of increasing the optical transmission. In order to prevent crystal polarization switching and electrical breakdown an optimized electrode topology with end ellipticity 1:1 and increased interelectrode gap is used. The optical diagram of the tunable grating with alternating electrode potentials for various gap voltages is analyzed. The intensity of the zero order of diffraction is shown to decrease by 40 % at a voltage of 800 V. At the same time, the origination of new diffraction orders at angles ± λ / (2 d ) is noted. The measurement of the forward-bias and reverse-bias regions of the modulation characteristic reveals the absence of hysteresis, which confirms the correctness of the electrode topology design.

Бесплатно

Two calibration models for compensation of the individual elements properties of self-emitting displays

Two calibration models for compensation of the individual elements properties of self-emitting displays

Basova Olga Andreevna, Gladilin Sergey Alexandrovich, Grigoryev Anton Sergeevich, Nikolaev Dmitry Petrovich

Статья научная

In this paper, we examine the applicability limits of different methods of compensation of the individual properties of self-emitting displays with significant non-uniformity of chromaticity and maximum brightness. The aim of the compensation is to minimize the perceived image non-uniformity. Compensation of the displayed image non-uniformity is based on minimizing the perceived distance between the target (ideally displayed) and the simulated image displayed by the calibrated screen. The S-CIELAB model of the human visual system properties is used to estimate the perceived distance between two images. In this work, we compare the efficiency of the channel-wise and linear (with channel mixing) compensation models depending on the models of variation in the characteristics of display elements (subpixels). It was found that even for a display with uniform chromatic subpixels characteristics, the linear model with channel mixing is superior in terms of compensation accuracy.

Бесплатно

U-net-bin: hacking the document image binarization contest

U-net-bin: hacking the document image binarization contest

Bezmaternykh Pavel Vladimirovich, Ilin Dmitrii Alexeevich, Nikolaev Dmitry Petrovich

Статья научная

Image binarization is still a challenging task in a variety of applications. In particular, Document Image Binarization Contest (DIBCO) is organized regularly to track the state-of-the-art techniques for the historical document binarization. In this work we present a binarization method that was ranked first in the DIBCO' 17 contest. It is a convolutional neural network (CNN) based method which uses U-Net architecture, originally designed for biomedical image segmentation. We describe our approach to training data preparation and contest ground truth examination and provide multiple insights on its construction (so called hacking). It led to more accurate historical document binarization problem statement with respect to the challenges one could face in the open access datasets. A docker container with the final network along with all the supplementary data we used in the training process has been published on Github.

Бесплатно

Uncertainty-based quantization method for stable training of binary neural networks

Uncertainty-based quantization method for stable training of binary neural networks

Trusov A.V., Putintsev D.N., Limonova E.E.

Статья научная

Binary neural networks (BNNs) have gained attention due to their computational efficiency. However, training BNNs has proven to be challenging. Existing algorithms either fail to produce stable and high-quality results or are overly complex for practical use. In this paper, we introduce a novel quantizer called UBQ (Uncertainty-based quantizer) for BNNs, which combines the advantages of existing methods, resulting in stable training and high-quality BNNs even with a low number of trainable parameters. We also propose a training method involving gradual network freezing and batch normalization replacement, facilitating a smooth transition from training mode to execution mode for BNNs. To evaluate UBQ, we conducted experiments on the MNIST and CIFAR-10 datasets and compared our method to existing algorithms. The results demonstrate that UBQ outperforms previous methods for smaller networks and achieves comparable results for larger networks.

Бесплатно

Uncovering unstable plaques: deep learning segmentation in optical coherence tomography

Uncovering unstable plaques: deep learning segmentation in optical coherence tomography

Laptev V.V., Danilov V.V., Ovcharenko E.A., Klyshnikov K.Y., Kolesnikov A.Y., Arnt A.A., Bessonov I.S., Litvinyuk N.V., Kochergin N.A.

Статья научная

One of the primary objectives in modern cardiology is to analyze the risk of acute coronary syndrome (ACS) in patients with ischemic heart disease to develop preventive measures and determine the optimal treatment strategy. This study aims to develop an automated approach for the timely detection of significant, rupture-prone coronary lesions (unstable plaques) to prevent ACS. We collected optical coherence tomography (OCT) volumes from 34 patients, with each OCT volume representing an RGB video of 704×704 pixels per frame, acquired over a certain depth. After filtering and manual annotation, 11,771 images were obtained to identify four types of objects: Lumen, Fibrous cap, Lipid core, and Vasa vasorum. To segment and quantitatively assess these features, we configured and evaluated the performance of nine deep learning models (U-Net, LinkNet, FPN, PSPNet, DeepLabV3, PAN, MA-Net, U-Net++, DeepLabV3++). The study presents two approaches for training the aforementioned models: 1) detecting all analyzed objects and 2) applying a cascade of neural network models to separately detect subsets of objects. The results demonstrate the superiority of the cascade approach for analyzing OCT images. The combined use of PAN and MA-Net models achieved the highest average Dice similarity coefficient (DSC) of 0.721.

Бесплатно

Unfolder: fast localization and image rectification of a document with a crease from folding in half

Unfolder: fast localization and image rectification of a document with a crease from folding in half

Ershov A.M., Tropin D.V., Limonova E.E., Nikolaev D.P., Arlazarov V.V.

Статья научная

Presentation of folded documents is not an uncommon case in modern society. Digitizing such documents by capturing them with a smartphone camera can be tricky since a crease can divide the document contents into separate planes. To unfold the document, one could hold the edges potentially obscuring it in a captured image. While there are many geometrical rectification methods, they were usually developed for arbitrary bends and folds. We consider such algorithms and propose a novel approach Unfolder developed specifically for images of documents with a crease from folding in half. Unfolder is robust to projective distortions of the document image and does not fragment the image in the vicinity of a crease after rectification. A new Folded Document Images dataset was created to investigate the rectification accuracy of folded (2, 3, 4, and 8 folds) documents. The dataset includes 1600 images captured when document placed on a table and when held in hand. The Unfolder algorithm allowed for a recognition error rate of 0.33, which is better than the advanced neural network methods DocTr (0.44) and DewarpNet (0.57). The average runtime for Unfolder was only 0.25 s/image on an iPhone XR.

Бесплатно

Unsupervised color texture segmentation based on multi-scale region-level Markov random field models

Unsupervised color texture segmentation based on multi-scale region-level Markov random field models

Song Xu, Wu Liang, Liu Guoying

Статья научная

In the field of color texture segmentation, region-level Markov random field model (RMRF) has become a focal problem because of its efficiency in modeling the large-range spatial constraints. However, the RMRF defined on a single scale cannot describe the un-stationary essence of the image, which highly limits its robustness. Hence, by combining wavelet transformation and the RMRF model, we present a multi-scale RMRF (MsRMRF) model in wavelet domainin this paper. In the Bayesian framework, the proposed model seamlessly integrates the multi-scale information stemmed from both the original image and the region-level spatial constraints. Therefore, the new model can accurately describe the characteristics of different kinds of texture. Based on MsRMRF, an unsupervised segmentation algorithm is designed for segmenting color texture images. Both synthetic color texture images and remote sensing images are employed in the comparative experiments, and the experimental results show that the proposed method can obtain more accurate segmentation results than the competitors.

Бесплатно

Vanishing point detection with direct and transposed fast hough transform inside the neural network

Vanishing point detection with direct and transposed fast hough transform inside the neural network

Sheshkus Alexander Vladimirovich, Chirvonaya Anastasiya Nikolaevna, Matveev Daniil Mikhailovich, Nikolaev Dmitry Petrovich, Arlazarov Vladimir Lvovich

Статья научная

In this paper, we suggest a new neural network architecture for vanishing point detection in images. The key element is the use of the direct and transposed fast Hough transforms separated by convolutional layer blocks with standard activation functions. It allows us to get the answer in the coordinates of the input image at the output of the network and thus to calculate the coordinates of the vanishing point by simply selecting the maximum. Besides, it was proved that calculation of the transposed fast Hough transform can be performed using the direct one. The use of integral operators enables the neural network to rely on global rectilinear features in the image, and so it is ideal for detecting vanishing points. To demonstrate the effectiveness of the proposed architecture, we use a set of images from a DVR and show its superiority over existing methods. Note, in addition, that the proposed neural network architecture essentially repeats the process of direct and back projection used, for example, in computed tomography.

Бесплатно

Vehicle wheel weld detection based on improved YOLO V4 algorithm

Vehicle wheel weld detection based on improved YOLO V4 algorithm

Liang Tian Jiao, Pan Wei Guo, Bao Hong, Pan Feng

Статья научная

In recent years, vision-based object detection has made great progress across different fields. For instance, in the field of automobile manufacturing, welding detection is a key step of weld inspection in wheel production. The automatic detection and positioning of welded parts on wheels can improve the efficiency of wheel hub production. At present, there are few deep learning based methods to detect vehicle wheel welds. In this paper, a method based on YOLO v4 algorithm is proposed to detect vehicle wheel welds. The main contributions of the proposed method are the use of k-means to optimize anchor box size, a Distance-IoU loss to optimize the loss function of YOLO v4, and non-maximum suppression using Distance-IoU to eliminate redundant candidate bounding boxes. These steps improve detection accuracy. The experiments show that the improved methods can achieve high accuracy in vehicle wheel weld detection (4.92 % points higher than the baseline model with respect to AP75 and 2.75 % points higher with respect to AP50). We also evaluated the proposed method on the public KITTI dataset. The detection results show the improved method’s effectiveness.

Бесплатно

Veiling glare removal: synthetic dataset generation, metrics and neural network architecture

Veiling glare removal: synthetic dataset generation, metrics and neural network architecture

Shoshin Alexey Valeryevich, Shvets Evgeny Alexandrovich

Статья научная

In photography, the presence of a bright light source often reduces the quality and readability of the resulting image. Light rays reflect and bounce off camera elements, sensor or diaphragm causing unwanted artifacts. These artifacts are generally known as “lens flare” and may have different influences on the photo: reduce contrast of the image (veiling glare), add circular or circular-like effects (ghosting flare), appear as bright rays spreading from light source (starburst pattern), or cause aberrations. All these effects are generally undesirable, as they reduce legibility and aesthetics of the image. In this paper we address the problem of removing or reducing the effect of veiling glare on the image. There are no available large-scale datasets for this problem and no established metrics, so we start by (i) proposing a simple and fast algorithm of generating synthetic veiling glare images necessary for training and (ii) studying metrics used in related image enhancement tasks (dehazing and underwater image enhancement). We select three such no-reference metrics (UCIQE, UIQM and CCF) and show that their improvement indicates better veil removal. Finally, we experiment on neural network architectures and propose a two-branched architecture and a training procedure utilizing structural similarity measure.

Бесплатно

Verification of color characteristics of document images captured in uncontrolled conditions

Verification of color characteristics of document images captured in uncontrolled conditions

Kunina I.A., Padas O.A., Kolomyttseva O.A.

Статья научная

This paper examines a presentation attack when a color photo of a gray copy of a document is presented instead of the original color document during remote user identification. To detect such an attack, we propose an algorithm based on the comparison of chromaticity histograms of presented color images of the document and the ideal template of this type of document. The chromaticity histograms of the original document and the template are expected to be quite identical, while the histograms of the gray copy of the document and the template would be different. The algorithm was tested on images from the open dataset DLC-2021, which contains color images of synthesized identity documents and color images of their gray copies. The precision of the proposed method was 98.99 %, the recall was 84.7 %, that gave 8 times fewer errors than the baseline provided by authors of DLC-2021.

Бесплатно

Video images compression and restoration methods based on optimal sampling

Video images compression and restoration methods based on optimal sampling

Drynkin Vladimir Nikolaevich, Nabokov Sergey Alexeyevich, Tsareva Tatiana Igorevna

Статья научная

The study proposes video images compression and restoration methods based on multidimensional sampling theory that provide four-fold video compression and subsequent real-time restoration with loss levels below visually perceptible threshold. The proposed methods can be used separately or along with any other video compression techniques, thus providing additional quadruple compression.

Бесплатно

Vortex beams in turbulent media: review

Vortex beams in turbulent media: review

Soifer Victor Alexandrovich, Korotkova Olga, Khonina Svetlana Nikolaevna, Shchepakina Elena Anatolevna

Статья научная

The review covers publications concerned with propagation of laser beams through turbulent media described by the Kolmogorov theory and generalizations thereof to describe signal transmission in optical communications and detection systems. In this case, the turbulent medium is interpreted as an optical channel with random parameters. Various optical signals considered include partially coherent beams, non-uniformly polarized vector beams, as well as specifically configured spatial laser beams. Special attention is given to vortex laser beams. The latter are shown to have a number of remarkable properties that give them an advantage over conventional Gaussian beams.

Бесплатно

Vortex-free laser beam with an orbital angular momentum

Vortex-free laser beam with an orbital angular momentum

Kotlyar Victor Victorovich, Kovalev Alexey Andreevich

Статья научная

We show that if one cylindrical lens is placed in the Gaussian beam waist and another cylindrical lens is placed at some distance from the first one and rotated by some angle, then the laser beam after the second lens has an orbital angular momentum (OAM). An explicit analytical expression for the OAM of such a beam is obtained. Depending on the inter-lens distance, the OAM can be positive, negative, or zero. Such a laser beam has no isolated intensity s with a singular phase and it is not an optical vortex, but has an OAM. By choosing the radius of the beam waist of the source Gaussian beam, the focal lengths of the lenses and the distance between them, it is possible to generate a vortex-free laser beam equivalent to an optical vortex with a topological charge of several hundreds.

Бесплатно

Vulnerability analysis on Hyderabad city, India

Vulnerability analysis on Hyderabad city, India

Boori Mukesh Singh, Choudhary Komal, Kupriyanov Alexander Victorovich

Статья научная

City vulnerability is an assessment of priorities for implementation in a city. Thus, it is imperative to determine vulnerable regions in the city to identify priority areas that may require immediate intervention. Several methods used for national, international and local level vulnerability assessment are based on remote sensing and GIS technology. This paper aims to determine the vulnerability of Hyderabad city using a geospatial based vulnerability index for sustainable development of the city. We use an urbanization and vulnerability concept for the development of city policy measures. We assessed the city vulnerability using a conceptual diagram composed of exposure, sensitivity and adaptive capacity. For Exposure, we considered the elevation (contour), watershed, waterway, roads, railways and airport thematic layers. For Sensitivity, the built-up area, industry, manages (?) system such as farmland and land use/cover map from GIS data were used. To examine the adaptive capacity, we addressed the natural vegetation layer, economic points and infrastructure. Results show that the center and northern part of the city are highly and extremely vulnerable due to industry and high socio-economic activities when compared with the southern part of the city. We divided the whole city into 5 types of vulnerability: Resilient 2.24 %, at risk 13.20 %, vulnerable 46.15 %, highly vulnerable 7.26 % and extremely vulnerable 31.15 %, in terms of the city area percentage. The vegetation area (50.51 %) has the maximum vulnerable area and the vulnerable class covers the maximum area (46.15 %) of the city. All this information is very indispensable and can be used to address management issues, such as resource prioritization and optimization.

Бесплатно

Weed detection on embedded systems using computer vision algorithms

Weed detection on embedded systems using computer vision algorithms

Shadrin D., Illarionova S., Kasatov R., Akimenkova M., Rudensky G., Erhan E.

Статья научная

Agriculture is a vital component of a sustainable development of many states. It supports economic growth and ensures food security. Therefore, great attention is paid to increasing production efficiency and yields. One of the problems occurring in the agricultural section is weed spreading that can corrupt the quality and amount of yields. To achieve better harvest, weed control measures should be conducted in time. Currently, computer vision techniques are implemented in various areas of industry, in particular, in agriculture. They allow one to automate data analysis process and to make decisions faster. However, the weed detection task in agriculture requires not only high recognition accuracy, but also fast computations on portable devices with low memory availability that makes it possible to embed computer vision systems on unmanned aerial vehicles (UAVs). To address these challenges, we proposed a neural-based approach for real-time weed recognition that combines state-of-the-art detection architectures and optimization techniques for faster inference. To conduct a comprehensive study using real field data, we collected and labelled two unique datasets in Volgograd Region. The experiments involved YOLO, SSD, and Faster R-CNN architectures with inference on NVIDIA Jetson Nano. The highest results were achieved for YOLOv5 architecture with mAP of 0.668 for Carrot Dataset (two weeds classes) and 0.882 for Onion Dataset (one weed class), while inference prediction time equals to 29 FPS and 31 FPS respectively.

Бесплатно

Weighted combination of per-frame recognition results for text recognition in a video stream

Weighted combination of per-frame recognition results for text recognition in a video stream

O. Petrova, K. Bulatov, V.V. Arlazarov, V.L. Arlazarov

Статья

The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.

Бесплатно

Журнал