Статьи журнала - Компьютерная оптика

Все статьи: 2553

Uncertainty-based quantization method for stable training of binary neural networks

Uncertainty-based quantization method for stable training of binary neural networks

Trusov A.V., Putintsev D.N., Limonova E.E.

Статья научная

Binary neural networks (BNNs) have gained attention due to their computational efficiency. However, training BNNs has proven to be challenging. Existing algorithms either fail to produce stable and high-quality results or are overly complex for practical use. In this paper, we introduce a novel quantizer called UBQ (Uncertainty-based quantizer) for BNNs, which combines the advantages of existing methods, resulting in stable training and high-quality BNNs even with a low number of trainable parameters. We also propose a training method involving gradual network freezing and batch normalization replacement, facilitating a smooth transition from training mode to execution mode for BNNs. To evaluate UBQ, we conducted experiments on the MNIST and CIFAR-10 datasets and compared our method to existing algorithms. The results demonstrate that UBQ outperforms previous methods for smaller networks and achieves comparable results for larger networks.

Бесплатно

Uncovering unstable plaques: deep learning segmentation in optical coherence tomography

Uncovering unstable plaques: deep learning segmentation in optical coherence tomography

Laptev V.V., Danilov V.V., Ovcharenko E.A., Klyshnikov K.Y., Kolesnikov A.Y., Arnt A.A., Bessonov I.S., Litvinyuk N.V., Kochergin N.A.

Статья научная

One of the primary objectives in modern cardiology is to analyze the risk of acute coronary syndrome (ACS) in patients with ischemic heart disease to develop preventive measures and determine the optimal treatment strategy. This study aims to develop an automated approach for the timely detection of significant, rupture-prone coronary lesions (unstable plaques) to prevent ACS. We collected optical coherence tomography (OCT) volumes from 34 patients, with each OCT volume representing an RGB video of 704×704 pixels per frame, acquired over a certain depth. After filtering and manual annotation, 11,771 images were obtained to identify four types of objects: Lumen, Fibrous cap, Lipid core, and Vasa vasorum. To segment and quantitatively assess these features, we configured and evaluated the performance of nine deep learning models (U-Net, LinkNet, FPN, PSPNet, DeepLabV3, PAN, MA-Net, U-Net++, DeepLabV3++). The study presents two approaches for training the aforementioned models: 1) detecting all analyzed objects and 2) applying a cascade of neural network models to separately detect subsets of objects. The results demonstrate the superiority of the cascade approach for analyzing OCT images. The combined use of PAN and MA-Net models achieved the highest average Dice similarity coefficient (DSC) of 0.721.

Бесплатно

Unfolder: fast localization and image rectification of a document with a crease from folding in half

Unfolder: fast localization and image rectification of a document with a crease from folding in half

Ershov A.M., Tropin D.V., Limonova E.E., Nikolaev D.P., Arlazarov V.V.

Статья научная

Presentation of folded documents is not an uncommon case in modern society. Digitizing such documents by capturing them with a smartphone camera can be tricky since a crease can divide the document contents into separate planes. To unfold the document, one could hold the edges potentially obscuring it in a captured image. While there are many geometrical rectification methods, they were usually developed for arbitrary bends and folds. We consider such algorithms and propose a novel approach Unfolder developed specifically for images of documents with a crease from folding in half. Unfolder is robust to projective distortions of the document image and does not fragment the image in the vicinity of a crease after rectification. A new Folded Document Images dataset was created to investigate the rectification accuracy of folded (2, 3, 4, and 8 folds) documents. The dataset includes 1600 images captured when document placed on a table and when held in hand. The Unfolder algorithm allowed for a recognition error rate of 0.33, which is better than the advanced neural network methods DocTr (0.44) and DewarpNet (0.57). The average runtime for Unfolder was only 0.25 s/image on an iPhone XR.

Бесплатно

Unsupervised color texture segmentation based on multi-scale region-level Markov random field models

Unsupervised color texture segmentation based on multi-scale region-level Markov random field models

Song Xu, Wu Liang, Liu Guoying

Статья научная

In the field of color texture segmentation, region-level Markov random field model (RMRF) has become a focal problem because of its efficiency in modeling the large-range spatial constraints. However, the RMRF defined on a single scale cannot describe the un-stationary essence of the image, which highly limits its robustness. Hence, by combining wavelet transformation and the RMRF model, we present a multi-scale RMRF (MsRMRF) model in wavelet domainin this paper. In the Bayesian framework, the proposed model seamlessly integrates the multi-scale information stemmed from both the original image and the region-level spatial constraints. Therefore, the new model can accurately describe the characteristics of different kinds of texture. Based on MsRMRF, an unsupervised segmentation algorithm is designed for segmenting color texture images. Both synthetic color texture images and remote sensing images are employed in the comparative experiments, and the experimental results show that the proposed method can obtain more accurate segmentation results than the competitors.

Бесплатно

Vanishing point detection with direct and transposed fast hough transform inside the neural network

Vanishing point detection with direct and transposed fast hough transform inside the neural network

Sheshkus Alexander Vladimirovich, Chirvonaya Anastasiya Nikolaevna, Matveev Daniil Mikhailovich, Nikolaev Dmitry Petrovich, Arlazarov Vladimir Lvovich

Статья научная

In this paper, we suggest a new neural network architecture for vanishing point detection in images. The key element is the use of the direct and transposed fast Hough transforms separated by convolutional layer blocks with standard activation functions. It allows us to get the answer in the coordinates of the input image at the output of the network and thus to calculate the coordinates of the vanishing point by simply selecting the maximum. Besides, it was proved that calculation of the transposed fast Hough transform can be performed using the direct one. The use of integral operators enables the neural network to rely on global rectilinear features in the image, and so it is ideal for detecting vanishing points. To demonstrate the effectiveness of the proposed architecture, we use a set of images from a DVR and show its superiority over existing methods. Note, in addition, that the proposed neural network architecture essentially repeats the process of direct and back projection used, for example, in computed tomography.

Бесплатно

Vehicle wheel weld detection based on improved YOLO V4 algorithm

Vehicle wheel weld detection based on improved YOLO V4 algorithm

Liang Tian Jiao, Pan Wei Guo, Bao Hong, Pan Feng

Статья научная

In recent years, vision-based object detection has made great progress across different fields. For instance, in the field of automobile manufacturing, welding detection is a key step of weld inspection in wheel production. The automatic detection and positioning of welded parts on wheels can improve the efficiency of wheel hub production. At present, there are few deep learning based methods to detect vehicle wheel welds. In this paper, a method based on YOLO v4 algorithm is proposed to detect vehicle wheel welds. The main contributions of the proposed method are the use of k-means to optimize anchor box size, a Distance-IoU loss to optimize the loss function of YOLO v4, and non-maximum suppression using Distance-IoU to eliminate redundant candidate bounding boxes. These steps improve detection accuracy. The experiments show that the improved methods can achieve high accuracy in vehicle wheel weld detection (4.92 % points higher than the baseline model with respect to AP75 and 2.75 % points higher with respect to AP50). We also evaluated the proposed method on the public KITTI dataset. The detection results show the improved method’s effectiveness.

Бесплатно

Veiling glare removal: synthetic dataset generation, metrics and neural network architecture

Veiling glare removal: synthetic dataset generation, metrics and neural network architecture

Shoshin Alexey Valeryevich, Shvets Evgeny Alexandrovich

Статья научная

In photography, the presence of a bright light source often reduces the quality and readability of the resulting image. Light rays reflect and bounce off camera elements, sensor or diaphragm causing unwanted artifacts. These artifacts are generally known as “lens flare” and may have different influences on the photo: reduce contrast of the image (veiling glare), add circular or circular-like effects (ghosting flare), appear as bright rays spreading from light source (starburst pattern), or cause aberrations. All these effects are generally undesirable, as they reduce legibility and aesthetics of the image. In this paper we address the problem of removing or reducing the effect of veiling glare on the image. There are no available large-scale datasets for this problem and no established metrics, so we start by (i) proposing a simple and fast algorithm of generating synthetic veiling glare images necessary for training and (ii) studying metrics used in related image enhancement tasks (dehazing and underwater image enhancement). We select three such no-reference metrics (UCIQE, UIQM and CCF) and show that their improvement indicates better veil removal. Finally, we experiment on neural network architectures and propose a two-branched architecture and a training procedure utilizing structural similarity measure.

Бесплатно

Verification of color characteristics of document images captured in uncontrolled conditions

Verification of color characteristics of document images captured in uncontrolled conditions

Kunina I.A., Padas O.A., Kolomyttseva O.A.

Статья научная

This paper examines a presentation attack when a color photo of a gray copy of a document is presented instead of the original color document during remote user identification. To detect such an attack, we propose an algorithm based on the comparison of chromaticity histograms of presented color images of the document and the ideal template of this type of document. The chromaticity histograms of the original document and the template are expected to be quite identical, while the histograms of the gray copy of the document and the template would be different. The algorithm was tested on images from the open dataset DLC-2021, which contains color images of synthesized identity documents and color images of their gray copies. The precision of the proposed method was 98.99 %, the recall was 84.7 %, that gave 8 times fewer errors than the baseline provided by authors of DLC-2021.

Бесплатно

Video images compression and restoration methods based on optimal sampling

Video images compression and restoration methods based on optimal sampling

Drynkin Vladimir Nikolaevich, Nabokov Sergey Alexeyevich, Tsareva Tatiana Igorevna

Статья научная

The study proposes video images compression and restoration methods based on multidimensional sampling theory that provide four-fold video compression and subsequent real-time restoration with loss levels below visually perceptible threshold. The proposed methods can be used separately or along with any other video compression techniques, thus providing additional quadruple compression.

Бесплатно

Vortex beams in turbulent media: review

Vortex beams in turbulent media: review

Soifer Victor Alexandrovich, Korotkova Olga, Khonina Svetlana Nikolaevna, Shchepakina Elena Anatolevna

Статья научная

The review covers publications concerned with propagation of laser beams through turbulent media described by the Kolmogorov theory and generalizations thereof to describe signal transmission in optical communications and detection systems. In this case, the turbulent medium is interpreted as an optical channel with random parameters. Various optical signals considered include partially coherent beams, non-uniformly polarized vector beams, as well as specifically configured spatial laser beams. Special attention is given to vortex laser beams. The latter are shown to have a number of remarkable properties that give them an advantage over conventional Gaussian beams.

Бесплатно

Vortex-free laser beam with an orbital angular momentum

Vortex-free laser beam with an orbital angular momentum

Kotlyar Victor Victorovich, Kovalev Alexey Andreevich

Статья научная

We show that if one cylindrical lens is placed in the Gaussian beam waist and another cylindrical lens is placed at some distance from the first one and rotated by some angle, then the laser beam after the second lens has an orbital angular momentum (OAM). An explicit analytical expression for the OAM of such a beam is obtained. Depending on the inter-lens distance, the OAM can be positive, negative, or zero. Such a laser beam has no isolated intensity s with a singular phase and it is not an optical vortex, but has an OAM. By choosing the radius of the beam waist of the source Gaussian beam, the focal lengths of the lenses and the distance between them, it is possible to generate a vortex-free laser beam equivalent to an optical vortex with a topological charge of several hundreds.

Бесплатно

Vulnerability analysis on Hyderabad city, India

Vulnerability analysis on Hyderabad city, India

Boori Mukesh Singh, Choudhary Komal, Kupriyanov Alexander Victorovich

Статья научная

City vulnerability is an assessment of priorities for implementation in a city. Thus, it is imperative to determine vulnerable regions in the city to identify priority areas that may require immediate intervention. Several methods used for national, international and local level vulnerability assessment are based on remote sensing and GIS technology. This paper aims to determine the vulnerability of Hyderabad city using a geospatial based vulnerability index for sustainable development of the city. We use an urbanization and vulnerability concept for the development of city policy measures. We assessed the city vulnerability using a conceptual diagram composed of exposure, sensitivity and adaptive capacity. For Exposure, we considered the elevation (contour), watershed, waterway, roads, railways and airport thematic layers. For Sensitivity, the built-up area, industry, manages (?) system such as farmland and land use/cover map from GIS data were used. To examine the adaptive capacity, we addressed the natural vegetation layer, economic points and infrastructure. Results show that the center and northern part of the city are highly and extremely vulnerable due to industry and high socio-economic activities when compared with the southern part of the city. We divided the whole city into 5 types of vulnerability: Resilient 2.24 %, at risk 13.20 %, vulnerable 46.15 %, highly vulnerable 7.26 % and extremely vulnerable 31.15 %, in terms of the city area percentage. The vegetation area (50.51 %) has the maximum vulnerable area and the vulnerable class covers the maximum area (46.15 %) of the city. All this information is very indispensable and can be used to address management issues, such as resource prioritization and optimization.

Бесплатно

Weed detection on embedded systems using computer vision algorithms

Weed detection on embedded systems using computer vision algorithms

Shadrin D., Illarionova S., Kasatov R., Akimenkova M., Rudensky G., Erhan E.

Статья научная

Agriculture is a vital component of a sustainable development of many states. It supports economic growth and ensures food security. Therefore, great attention is paid to increasing production efficiency and yields. One of the problems occurring in the agricultural section is weed spreading that can corrupt the quality and amount of yields. To achieve better harvest, weed control measures should be conducted in time. Currently, computer vision techniques are implemented in various areas of industry, in particular, in agriculture. They allow one to automate data analysis process and to make decisions faster. However, the weed detection task in agriculture requires not only high recognition accuracy, but also fast computations on portable devices with low memory availability that makes it possible to embed computer vision systems on unmanned aerial vehicles (UAVs). To address these challenges, we proposed a neural-based approach for real-time weed recognition that combines state-of-the-art detection architectures and optimization techniques for faster inference. To conduct a comprehensive study using real field data, we collected and labelled two unique datasets in Volgograd Region. The experiments involved YOLO, SSD, and Faster R-CNN architectures with inference on NVIDIA Jetson Nano. The highest results were achieved for YOLOv5 architecture with mAP of 0.668 for Carrot Dataset (two weeds classes) and 0.882 for Onion Dataset (one weed class), while inference prediction time equals to 29 FPS and 31 FPS respectively.

Бесплатно

Weighted combination of per-frame recognition results for text recognition in a video stream

Weighted combination of per-frame recognition results for text recognition in a video stream

O. Petrova, K. Bulatov, V.V. Arlazarov, V.L. Arlazarov

Статья

The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.

Бесплатно

Wescore: quality assessment method of multichannel image visualization with regard to angular resolution

Wescore: quality assessment method of multichannel image visualization with regard to angular resolution

Sidorchuk Dmitry Sergeevich

Статья научная

This work considers the problem of quality assessment of multichannel image visualization methods. One approach to such an assessment, the Escore quality measure, is studied. This measure, initially proposed for decolorization methods evaluation, can be generalized for the assessment of hyperspectral image visualization methods. It is shown that Escore does not account for the loss of local contrast at the supra-pixel scale. The sensitivity to the latter in humans depends on the observation conditions, so we propose a modified wEscore measure which includes the parameters allowing for the adjustment of the local contrast scale based on the angular resolution of the images. We also describe the adjustment of wEscore parameters for the evaluation of known decolorization algorithms applied to the images from the COLOR250 and the Cadik datasets with given observational conditions. When ranking the results of these algorithms and comparing it to the ranking based on human perception, wEscore turned out to be more accurate than Escore.

Бесплатно

X-ray tomography: the way from layer-by-layer radiography to computed tomography

X-ray tomography: the way from layer-by-layer radiography to computed tomography

Arlazarov Vladimir Lvovich, Nikolaev Dmitry Petrovich, Arlazarov Vladimir Viktorovich, Chukalina Marina Valerievna

Статья научная

The methods of X-ray computed tomography allow us to study the internal morphological structure of objects in a non-destructive way. The evolution of these methods is similar in many respects to the evolution of photography, where complex optics were replaced by mobile phone cameras, and the computers built into the phone took over the functions of high-quality image generation. X-ray tomography originated as a method of hardware non-invasive imaging of a certain internal cross-section of the human body. Today, thanks to the advanced reconstruction algorithms, a method makes it possible to reconstruct a digital 3D image of an object with a submicron resolution. In this article, we will analyze the tasks that the software part of the tomographic complex has to solve in addition to managing the process of data collection. The issues that are still considered open are also discussed. The relationship between the spatial resolution of the method, sensitivity and the radiation load is reviewed. An innovative approach to the organization of tomographic imaging, called “reconstruction with monitoring”, is described. This approach makes it possible to reduce the radiation load on the object by at least 2 - 3 times. In this work, we show that when X-ray computed tomography moves towards increasing the spatial resolution and reducing the radiation load, the software part of the method becomes increasingly important.

Бесплатно

Yolo-barcode: towards universal real-time barcode detection on mobile devices

Yolo-barcode: towards universal real-time barcode detection on mobile devices

Ershova D.M., Gayer A.V., Bezmaternykh P.V., Arlazarov V.V.

Статья научная

Existing approaches to barcode detection have a number of disadvantages, including being tied to specific types of barcodes, computational complexity or low detection accuracy. In this paper, we propose YOLO-Barcode – a deep learning model inspired by the You Only Look Once approach that allows to achieve high detection accuracy with real-time performance on mobile devices. The proposed model copes well with a large number of densely spaced barcodes, as well as highly elongated one-dimensional barcodes. YOLO-Barcode not only successfully detects the huge variety of barcode types, but also classifies them. Comparing with the previous universal barcode detector DilatedModel based on semantic segmentation, the YOLO-Barcode is 4 times faster and achieves state-of-the-art accuracy on the ZVZ-real public dataset: 98.6% versus 88.9% by F1-score. The analysis of existing publicly available datasets reveals the absence of many real-life scenarios of mobile barcode reading. To fill this gap, the new “SE-barcode” dataset is presented. The proposed model, used as a baseline, achieves a 92.11% by F1-score on this dataset.

Бесплатно

Аберрации второго порядка градиентной среды: методы расчета

Аберрации второго порядка градиентной среды: методы расчета

Ильинский Р.Е., Ровенская Т.С.

Статья

Бесплатно

Аберрации синтезированных дифракционных линз, вызванные ошибками их изготовления

Аберрации синтезированных дифракционных линз, вызванные ошибками их изготовления

Грейсух Г.И., Степанов С.А.

Статья научная

Приведены результаты исследований влияния ошибок при синтезе кольцевой структуры дифракционных линз на их аберрации для точки на оси. Определены типы аберрационных искажений, возникающих за счет эллиптичности зон дифракционной структуры и систематических ошибок их радиусов. На основе критерия Марешаля получены технологические допуски на параметры структуры линз.

Бесплатно

Аберрации третьего порядка градиентных оптических систем, обладающих двоякой симметрией

Аберрации третьего порядка градиентных оптических систем, обладающих двоякой симметрией

Ильинский Роман Евгеньевич

Статья научная

Для градиентных оптических систем, обладающих двоякой симметрией, получены в явном виде коэффициенты геометрических аберраций третьего порядка.

Бесплатно

Журнал