Обработка изображений, распознавание образов. Рубрика в журнале - Компьютерная оптика

Публикации в рубрике (330): Обработка изображений, распознавание образов
все рубрики
Improvements of programing methods for finding reference lines on X-ray images

Improvements of programing methods for finding reference lines on X-ray images

Al-Temimi Ammar Mudheher Sadeq, Pilidi Vladimir Stavrovich

Статья научная

The paper gives an overview of the algorithms developed to obtain reference lines and angles on X-ray images. These geometrical characteristics are used in the medical analysis of human joints. We propose the algorithm’s modifications based on the analysis of numerous X-ray images. These modifications allowed obtaining a great increase in calculation speed and the improvement of final results quality given by the corresponding application. They also lead to a significant reduction of manual tuning of the program, arising only in the rare cases when the properties of given images differ significantly from the mean ones.

Бесплатно

Improving generalization in classification novel bacterial strains: a multi-headed resnet approach for microscopic image classification

Improving generalization in classification novel bacterial strains: a multi-headed resnet approach for microscopic image classification

Yachnaya V.O., Mikhalkova M.A., Malashin R.O., Lutsiv V.R., Kraeva L.A., Khamdulayeva G.N., Nazarov V.E., Chelibanov V.P.

Статья научная

The purpose of this work is to design a system for microscopic bacterial images classification that can be generalized to new data. In the course of work, a dataset containing 23 bacterial species was collected. We use a strain-wise method for dividing the dataset into training and test sets. Such splitting (in contrast to random division) allows evaluating the performance of classifiers on new strains in the case of intra-species visual variability of bacteria. We propose a “Multi-headed” ResNet (ResNet-MH) for the analysis of microscopic images of bacterial colonies. This approach forces the neural network to analyze features of different resolutions, such as the shape of individual bacterial cells and the shape and number of bacterial clusters during training. Our network achieves the 41.6% accuracy species-wise and 64.06% accuracy genera-wise. The proposed method of dataset splitting guarantees generalization to new unseen strains, whereas random splitting into training and test sets leads to overfitting of the system (accuracy is over 90%). For the 10 visually strain-wise stable species, the accuracy of the proposed system reaches 83.6% species-wise.

Бесплатно

Improving the quality of building space depths maps using multi-area active-pulse television measuring systems in dynamic scenes

Improving the quality of building space depths maps using multi-area active-pulse television measuring systems in dynamic scenes

Zabuga S.A., Kapustin V.V., Musikhin I.D.

Статья научная

The purpose of this work is software implementation of the temporal frame interpolation, the formation of selection criteria and the choice of a suitable neural network model based on the obtained practical data. And also, evaluation of its efficiency for eliminating the interframe shift effect of dynamic objects on the depth maps of multi-area active-pulse television measuring systems in order to improve the accuracy of map building. As initial data for the experiments, static frames were recorded while moving the test rig along the X and Z axes. The static frames are images of the test rig, averaged 100 times, at a distance of 13 meters, which moved along an automated linear guide with a step of 1 mm. As a result of the work, an assessment of the interframe shift effect influence on space depth maps of multi-area active-pulse television measuring systems containing dynamic objects was made. The implementation and testing of the temporal frame interpolation algorithm for suppressing the interframe shift effect of dynamic objects on depth maps was also performed. The algorithm was implemented using Python and the PyCharm IDE with SciPy, NumPy, OpenCV, PyTorch, Threading and other libraries. Numerical values of the RMSE, PSNR, and SSIM metrics were obtained before and after eliminating the effect of interframe shift of dynamic objects on depth maps. The use of the temporal frame interpolation algorithm allows more accurate measurement of distance to moving object in the field of view of multi-area active-pulse television measuring systems.

Бесплатно

Integrating landscape ecological risk with ecosystem services in the Republic of Tatarstan, Russia

Integrating landscape ecological risk with ecosystem services in the Republic of Tatarstan, Russia

Boori Mukesh Singh, Choudhary Komal, Kupriyanov Alexander

Статья научная

It is a novel approach to linking landscape ecological risk (LER) and ecosystem services (ESs) for environmental management and sustainable development, since it enables real-time decision-making. This study used 12 natural factors relevant to LER and 11 ESs factors to analyze spatiotemporal changes and establish a relationship between them in Tatarstan, Russia, for the years 2010, 2015, and 2020. The statistical tests (Global Moran's I, Getis-Ord Gi*), analysis of habitat vulnerability, and ecological loss in the ArcGIS platform reveal a consistent variance in factor clustering and pattern as well as the impact of governmental policies in the studied area. According to analysis findings, 2015 had the best ecological conditions of the three years because 44.79 % of the research area had decreased landscape ecological risk, which increased ecosystem services. Additionally, the results show that both maps have significant spatial disparities and that LER and ESs are negatively impacted by high human-socioeconomic activity. The integration of LER and ESs through the overlap of both maps provides a significant amount of spatial information for mapping, monitoring, management, and the protection of the fragile environment for sustainable landscape development and management.

Бесплатно

Interpretable graph methods for determining nanoparticles ordering in electron microscopy images

Interpretable graph methods for determining nanoparticles ordering in electron microscopy images

Kurbakov M.Y., Sulimova V.V., Seredin O.S., Kopylov A.V.

Статья научная

An important step in determining the properties of carbon materials is the analysis of images from a scanning electron microscope (SEM). These images show the material surface after the application of metal nanoparticles. The order of these nanoparticles is a key characteristic that affects the material properties. We have previously proposed an approach to formalize the order features based on the identification of lines by nanoparticles in the SEM image. This paper proposes a novel approach to line allocation that is based on the concept of constructing a minimum spanning forest. Additionally, it introduces a set of novel ordering functions that are derived from this approach. The experimental study demonstrates that the combination of these new and previously extracted features improves the recognition quality of SEM images with ordered and disordered nanoparticles arrangements. This approach allows us to gain a better understanding of the nanoparticles arrangement and their effect on the material properties.

Бесплатно

Localization of mobile robot in prior 3D lidar maps using stereo image sequence

Localization of mobile robot in prior 3D lidar maps using stereo image sequence

Belkin I.V., Abramenko A.A., Bezuglyi V.D., Yudin D.A.

Статья научная

The paper studies the real-time stereo image-based localization of a vehicle in a prior 3D LiDAR map. A novel localization approach for mobile ground robot, which successfully combines conventional computer vision techniques, neural network based image analysis and numerical optimization, is proposed. It includes matching a noisy depth image and visible point cloud based on the modified Nelder-Mead optimization method. Deep neural network for image semantic segmentation is used to eliminate dynamic obstacles. The visible point cloud is extracted using a 3D mesh map representation. The proposed approach is evaluated on the KITTI dataset and a custom dataset collected from a ClearPath Husky mobile robot. It shows a stable absolute translation error of about 0.11 – 0.13 m. and a rotation error of 0.42 – 0.62 deg. The standard deviation of the obtained absolute metrics for our method is the smallest among other state-of-the-art approaches. Thus, our approach provides more stability in the estimated pose. It is achieved primarily through the use of multiple data frames during the optimization step and dynamic obstacles elimination on depth image. The method’s performance is demonstrated on different hardware platforms, including energy-efficient Nvidia Jetson Xavier AGX. With parallel code implementation, we achieve an input stereo image processing speed of 14 frames per second on Xavier AGX.

Бесплатно

MIDV-2020: a comprehensive benchmark dataset for identity document analysis

MIDV-2020: a comprehensive benchmark dataset for identity document analysis

Bulatov Konstantin Bulatovich, Emelianova Ekaterina Vladimirovna, Tropin Daniil Vyacheslavovich, Skoryukina Natalya Sergeevna, Chernyshova Yulia Sergeevna, Sheshkus Alexander Vladimirovich, Usilin Sergey Alexandrovich, Ming Zuheng, Burie Jean-Christophe, Luqman Muhammad Muzzamil, Arlazarov Vladimir Viktorovich

Статья научная

Identity documents recognition is an important sub-field of document analysis, which deals with tasks of robust document detection, type identification, text fields recognition, as well as identity fraud prevention and document authenticity validation given photos, scans, or video frames of an identity document capture. Significant amount of research has been published on this topic in recent years, however a chief difficulty for such research is scarcity of datasets, due to the subject matter being protected by security requirements. A few datasets of identity documents which are available lack diversity of document types, capturing conditions, or variability of document field values. In this paper, we present a dataset MIDV-2020 which consists of 1000 video clips, 2000 scanned images, and 1000 photos of 1000 unique mock identity documents, each with unique text field values and unique artificially generated faces, with rich annotation. The dataset contains 72409 annotated images in total, making it the largest publicly available identity document dataset to the date of publication. We describe the structure of the dataset, its content and annotations, and present baseline experimental results to serve as a basis for future research. For the task of document location and identification content-independent, feature-based, and semantic segmentation-based methods were evaluated. For the task of document text field recognition, the Tesseract system was evaluated on field and character levels with grouping by field alphabets and document types. For the task of face detection, the performance of Multi Task Cascaded Convolutional Neural Networks-based method was evaluated separately for different types of image input modes. The baseline evaluations show that the existing methods of identity document analysis have a lot of room for improvement given modern challenges. We believe that the proposed dataset will prove invaluable for advancement of the field of document analysis and recognition.

Бесплатно

MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream

MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream

Arlazarov Vladimir Viktorovich, Bulatov Konstantin Bulatovich, Chernov Timofey Sergeevich, Arlazarov Vladimir Lvovich

Статья научная

A lot of research has been devoted to identity documents analysis and recognition on mobile devices. However, no publicly available datasets designed for this particular problem currently exist. There are a few datasets which are useful for associated subtasks but in order to facilitate a more comprehensive scientific and technical approach to identity document recognition more specialized datasets are required. In this paper we present a Mobile Identity Document Video dataset (MIDV-500) consisting of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide scope of document analysis problems. The paper presents characteristics of the dataset and evaluation results for existing methods of face detection, text line recognition, and document fields data extraction. Since an important feature of identity documents is their sensitiveness as they contain personal data, all source document images used in MIDV-500 are either in public domain or distributed under public copyright licenses. The main goal of this paper is to present a dataset. However, in addition and as a baseline, we present evaluation results for existing methods for face detection, text line recognition, and document data extraction, using the presented dataset.

Бесплатно

Method for removing haze from images, captured under a wide range of lighting conditions

Method for removing haze from images, captured under a wide range of lighting conditions

Filin A.I., Kopylov A.V., Gracheva I.A.

Статья научная

The presence of haze on images degrades the quality of perception and automatic analysis of scenes. One of the most popular methods of haze removal is the dark channel prior method, which is based on the Koschmieder atmospheric scattering model. However, its underlying assumptions are not met for nighttime, since localized light sources make a significant, if not the main, contribution to lighting. We propose here to use the degree of belonging of an image element to a localized light source, determined based on a one-class classifier, as a value that characterizes the confidence of the corresponding element of the estimated transmission map during its rectifi-cation based on the gamma-normal model, which makes it possible to increase the accuracy of dehazing when processing images, captured in low-light or nighttime conditions.

Бесплатно

Multispectral optoelectronic device for controlling an autonomous mobile platform

Multispectral optoelectronic device for controlling an autonomous mobile platform

Titov Vitaliy Semenovich, Spevakov Alexander Gennadyevich, Primenko Dmitry Vladimirovich

Статья научная

The paper substantiates the use of multispectral optoelectronic sensors intended to solve the problem of improving the positioning accuracy of autonomous mobile platforms. A mathematical model of the developed device operation has been suggested in the paper. Its distinctive feature is the cooperative processing of signals obtained from sensors operating in ultraviolet, visible, and infrared ranges and lidar. It reduces the computational complexity of detecting dynamic and stationary objects within the field of view of the device by processing data on the diffuse reflectivity of materials. The paper presents the functional organization of a multispectral optoelectronic device that makes it possible to detect and classify working scene objects with less time spending as compared to analogs. In the course of experimental research, the validity of the mathematical model was evaluated and there were obtained empirical data by means of the proposed hardware and software test stand. The accuracy evaluation of the detected object, at a distance of up to 100m inclusive, is within 0.95. At a distance of more than 100 m, it decreases. This is due to the operating range of a lidar. Error in determining spatial coordinates is of exponential character and it also increases sharply at a distance close to 100 m.

Бесплатно

Mutual modality learning for video action classification

Mutual modality learning for video action classification

Komkov S.A., Dzabraev M.D., Petiushko A.A.

Статья научная

The construction of models for video action classification progresses rapidly. However, the performance of those models can still be easily improved by ensembling with the same models trained on different modalities (e.g. Optical flow). Unfortunately, it is computationally expensive to use several modalities during inference. Recent works examine the ways to integrate advantages of multi-modality into a single RGB-model. Yet, there is still room for improvement. In this paper, we explore various methods to embed the ensemble power into a single model. We show that proper initialization, as well as mutual modality learning, enhances single-modality models. As a result, we achieve state-of-the-art results in the Something-Something-v2 benchmark.

Бесплатно

Noise reduction and mammography image segmentation optimization with novel QIMFT-SSA method

Noise reduction and mammography image segmentation optimization with novel QIMFT-SSA method

Soewondo Widiastuti, Haji Salih Omer, Eftekharian Mohsen, Marhoon Haydar A., Dorofeev Aleksei Evgenievich, Jawad Mohammed Abed, Jabbar Abdullah Hasan, Jalil Abduladheem Turki

Статья научная

Breast cancer is one of the most dreaded diseases that affects women worldwide and has led to many deaths. Early detection of breast masses prolongs life expectancy in women and hence the development of an automated system for breast masses supports radiologists for accurate diagnosis. In fact, providing an optimal approach with the highest speed and more accuracy is an approach provided by computer-aided design techniques to determine the exact area of breast tumors to use a decision support management system as an assistant to physicians. This study proposes an optimal approach to noise reduction in mammographic images and to identify salt and pepper, Gaussian, Poisson and impact noises to determine the exact mass detection operation after these noise reduction. It therefore offers a method for noise reduction operations called Quantum Inverse MFT Filtering and a method for precision mass segmentation called the Optimal Social Spider Algorithm (SSA) in mammographic images. The hybrid approach called QIMFT-SSA is evaluated in terms of criteria compared to previous methods such as peak Signal-to-Noise Ratio (PSNR) and Mean-Squared Error (MSE) in noise reduction and accuracy of detection for mass area recognition. The proposed method presents more performance of noise reduction and segmentation in comparison to state-of-arts methods. supported the work.

Бесплатно

Novel approach of simplification detected contours on X-ray medical images

Novel approach of simplification detected contours on X-ray medical images

Al-Temimi Ammar Mudheher Sadeq, Pilidi Vladimir Stavrovich, Ibraheem Murooj Khalid Ibraheem

Статья научная

This paper gives description of a method for simplifying the number of points representing detected contours of the bones on digital X-ray images. Such simplification permits simplify way for correction the location of these points in the cases, if the analyzed image has poor quality, and to reduces the time of analysis it to get the reference lines and angles for diagnosis purposes of the area under investigation.

Бесплатно

On the automation of gestalt perception in remotely sensed data

On the automation of gestalt perception in remotely sensed data

Michaelsen Eckart

Статья научная

Gestalt perception, the laws of seeing, and perceptual grouping is rarely addressed in the con-text of remotely sensed imagery. The paper at hand reviews the corresponding state as well in ma-chine vision as in remote sensing, in particular concerning urban areas. Automatic methods can be separated into three types: 1) knowledge-based inference, which needs machine-readable knowl-edge, 2) automatic learning methods, which require labeled or un-labeled example images, and 3) perceptual grouping along the lines of the laws of seeing, which should be pre-coded and should work on any kind of imagery, but in particular on urban aerial or satellite data. Perceptual group-ing of parts into aggregates is a combinatorial problem. Exhaustive enumeration of all combina-tions is intractable. The paper at hand presents a constant-false-alarm-rate search rationale. An open problem is the choice of the extraction method for the primitive objects to start with. Here super-pixel-segmentation is used.

Бесплатно

One-shot learning with triplet loss for vegetation classification tasks

One-shot learning with triplet loss for vegetation classification tasks

Uzhinskiy Alexander Vladimirovich, Ososkov Gennady Alexeevich, Goncharov Pavel Vladimirovich, Nechaevskiy Andrey Vasilevich, Smetanin Artem Alekseevich

Статья научная

Triplet loss function is one of the options that can significantly improve the accuracy of the One-shot Learning tasks. Starting from 2015, many projects use Siamese networks and this kind of loss for face recognition and object classification. In our research, we focused on two tasks related to vegetation. The first one is plant disease detection on 25 classes of five crops (grape, cotton, wheat, cucumbers, and corn). This task is motivated because harvest losses due to diseases is a serious problem for both large farming structures and rural families. The second task is the identification of moss species (5 classes). Mosses are natural bioaccumulators of pollutants; therefore, they are used in environmental monitoring programs. The identification of moss species is an important step in the sample preprocessing. In both tasks, we used self-collected image databases. We tried several deep learning architectures and approaches. Our Siamese network architecture with a triplet loss function and MobileNetV2 as a base network showed the most impressive results in both above-mentioned tasks. The average accuracy for plant disease detection amounted to over 97.8 % and 97.6 % for moss species classification.

Бесплатно

People tracking accuracy improvement in video by matching relevant trackers and YOLO family detectors

People tracking accuracy improvement in video by matching relevant trackers and YOLO family detectors

Quan H., Ma G., Weichen Y., Bohush R., Zuo F., Ablameyko S.

Статья научная

The tracking-by-detection paradigm is widely used for people multi-object tracking tasks. Up to now, there exist many detectors and trackers, many evaluation benchmarks, which necessitates the use of relatively uniform estimation methods and metrics. It leads to necessity to choose better combined models of detectors and trackers. To solve this task, we developed a comprehensive performance evaluation methodology for estimation of people tracking accuracy and real-time by using different detectors and trackers. We conducted experiments by choosing the official pre-trained models of YOLOv5, YOLOv6, YOLOv7, YOLOv8 with representative BoTSORT, ByteTrack, DeepOCSORT, OCSORT, StrongSORT trackers under two benchmarks of MOT17 and MOT20. Detailed metrics in terms of error and speed such as higher order tracking accuracy and frames per second were analyzed for the combinations of detectors and trackers. It is concluded that the OCSORT+YOLOv6l model has the best comprehensive performance and the combination of OCSORT and YOLOv7 has the best average performance under MOT17 and MOT20.

Бесплатно

Quality inspection of fertilizer granules using computer vision – a review

Quality inspection of fertilizer granules using computer vision – a review

Ndukwe I.K., Yunovidov D., Bahrami M.R., Mazzara M., Olugbade T.O.

Статья научная

This research explores the fusion of computer vision and agricultural quality control. It investigates the efficacy of computer vision algorithms, particularly in image classification and object detection, for non-destructive assessment. These algorithms offer objective, rapid, and error-resistant analysis compared to human inspection. The study provides an extensive overview of using computer vision to evaluate grain and fertilizer granule quality, highlighting granule size’s significance. It assesses prevailing object detection methods, outlining their advantages and drawbacks. The paper identifies the prevailing trend of framing quality inspection as an image classification challenge and suggests future research directions. These involve exploring object detection, image segmentation, or hybrid models to enhance fertilizer granule quality assessment.

Бесплатно

Research on foreign body detection in transmission lines based on a multi-UAV cooperative system and YOLOV7

Research on foreign body detection in transmission lines based on a multi-UAV cooperative system and YOLOV7

Chang R., Mao Zh., Hu J., Bai H., Zhou Ch., Yang Ya., Gao Sh.

Статья научная

The unique plateau geographical features and variable weather of Yunnan, China make transmission lines in this region more susceptible to coverage and damage by various foreign bodies compared to flat areas. The mountainous terrain also presents great challenges for inspecting and removing such objects. In order to improve the efficiency and detection accuracy of foreign body inspection of transmission lines, we propose a multi-UAV collaborative system specifically designed for the geographical characteristics of Yunnan's transmission lines in this paper. Additionally, the image data of foreign bodies was augmented, and the YOLOv7 target detection model, which offers a more balanced trade-off between precision and speed, was adopted to improve the accuracy and speed of foreign body detection.

Бесплатно

Rice growth vegetation index 2 for improving estimation of rice plant phenology in costal ecosystems

Rice growth vegetation index 2 for improving estimation of rice plant phenology in costal ecosystems

Choudhary Komal, Shi Wen-Zhong John, Dong Yanni

Статья научная

Crop growth is one of the most important parameters of a crop and its knowledge before harvest is essential to help farmers, scientists, governments and agribusiness. This paper provides a novel demonstration of the use of freely available Sentinel-2 data to estimate rice crop growth in a single year. Sentinel 2 data provides frequent and consistent information to facilitate coastal monitoring from field scales. The aims of this study were to modify the rice growth vegetation index to improve rice growth phenology in the coastal areas. The rice growth vegetation index 2 is the best vegetation index, compared with 11 vegetation indices, plant height and biomass. The results demonstrate that the coefficient of rice growth vegetation index 2 was 0.83, has the highest correlation with plant height. Rice growth vegetation index 2 is more appropriate for enhancing and obtaining rice phenology information. This study analyses the best spectral vegetation indices for estimating rice growth.

Бесплатно

Road images augmentation with synthetic traffic signs using neural networks

Road images augmentation with synthetic traffic signs using neural networks

Konushin Anton Sergeevich, Faizov Boris Vladimirovich, Shakhuro Vladislav Igorevich

Статья научная

Traffic sign recognition is a well-researched problem in computer vision. However, the state of the art methods works only for frequent sign classes, which are well represented in training datasets. We consider the task of rare traffic sign detection and classification. We aim to solve that problem by using synthetic training data. Such training data is obtained by embedding synthetic images of signs in the real photos. We propose three methods for making synthetic signs consistent with a scene in appearance. These methods are based on modern generative adversarial network (GAN) architectures. Our proposed methods allow realistic embedding of rare traffic sign classes that are absent in the training set. We adapt a variational autoencoder for sampling plausible locations of new traffic signs in images. We demonstrate that using a mixture of our synthetic data with real data improves the accuracy of both classifier and detector.

Бесплатно

Журнал