International conference on machine vision. Рубрика в журнале - Компьютерная оптика

Публикации в рубрике (26): International conference on machine vision
все рубрики
P-CVD-SWIN: a parameterized neural network for image daltonization

P-CVD-SWIN: a parameterized neural network for image daltonization

Volkov V.V., Maximov P.V., Alkzir N.B., Gladilin S.A., Nikolaev D.P., Nikolaev I.P.

Статья научная

Nowadays, about 8 % of men and 0.5 % of women worldwide suffer from color vision deficiency. People with color vision deficiency are mostly dichromats and closely related anomalous trichromats, and are subdivided into three types: protans, deutans, and tritans. Special image preprocessing methods referred to as daltonization techniques allow increasing the distinguishability of chromatic contrasts for people with dichromacy. State-of-the-art neural network architectures involve training separate models for each type of dichromacy, which makes such models cumbersome and inconvenient. In this paper, we propose for the first time a parameterized neural network architecture, which allows training the same neural network model for any type of dichromacy, being specified as a parameter. We named this model P-CVD-SWIN, supposing it a parametrized development of the recently suggested CVD-SWIN model. A generalization of the Vienot dichromacy simulation method was proposed for model training. Experiments have shown that the P-CVD-SWIN neural network parameterized by the type of dichromacy provides better preservation of chromatic naturalness during daltonization, compared to a combination of several CVD-SWIN models, each trained for its own type of dichromacy.

Бесплатно

Pseudo-Boolean Polynomial Method for InterpreTab. Dimensionality Reduction: A Paradigm Shift from Abstract to Meaningful Feature Extraction

Pseudo-Boolean Polynomial Method for InterpreTab. Dimensionality Reduction: A Paradigm Shift from Abstract to Meaningful Feature Extraction

Chikake T.M., Goldengorin B.I., Pardalos P.M.

Статья научная

We present a general-purpose, training-free framework for dimensionality reduction and clustering based on per–sample pseudo–Boolean polynomials (PBP). The method constructs compact, interpreTab. features without model fitting and is evaluated under a standardized protocol that compares PBP to PCA, t-SNE, and UMAP using identical inputs and metrics: clustering alignment (V-measure, Adjusted Rand Index), cluster geometry (Silhouette coefficient, Calinski–Harabasz index, Davies–Bouldin index), and supervised probes (linear separability and boundary complexity (1–NN error)). Across 11 diverse datasets spanning tabular, signal, and ecological domains, PBP leads on linear separability in 5/11 datasets and achieves lower boundary complexity in 2/11 datasets, while remaining competitive on clustering metrics. We report best-performing aggregation and sorting configurations per dataset and provide guidance on when PBP should be preferred for interpreTab. analysis and reproducible evaluation.

Бесплатно

RANSAC-Scaled Depth: A Dual-Teacher Framework for Metric Depth Annotation in Data-Scarce Scenarios

RANSAC-Scaled Depth: A Dual-Teacher Framework for Metric Depth Annotation in Data-Scarce Scenarios

Lazukov M.V., Shoshin A.V., Belyaev P.V., Shvets E.A.

Статья научная

This paper addresses the problem of training metric monocular depth estimation models for specialized domains in the absence of labeled real-world data. We propose a hybrid pseudo-labeling method that combines the predictions of two models: a metric "teacher," trained on synthetic data to obtain the correct scale, and a foundational relative "teacher" for structurally accurate scene geometry and depth. The relative depth map is calibrated via a linear transformation, whose parameters are found using the outlier-robust RANSAC algorithm on a subset of "support" points. Experiments on the KITTI dataset show that the proposed approach improves the quality of the pseudo-labels, reducing the commonly used error metric AbsRel by 21.6 % compared to the baseline method. A compact "student" model trained on these labels demonstrated superiority over the baseline model, achieving a 23.8 % reduction in AbsRel and a 13.8 % reduction in RMSE log. The results confirm that the proposed method significantly improves domain adaptation from general purpose to the specific domain, allowing for the creation of high-precision metric models without the need to collect and annotate volumes of real data.

Бесплатно

Tiny CNN for feature point description for document analysis: approach and dataset

Tiny CNN for feature point description for document analysis: approach and dataset

Sheshkus Alexander Vladimirovich, Chirvonaya Anastasiya Nikolaevna, Arlazarov Vladimir Lvovich

Статья научная

In this paper, we study the problem of feature points description in the context of document analysis and template matching. Our study shows that specific training data is required for the task especially if we are to train a lightweight neural network that will be usable on devices with limited computational resources. In this paper, we construct and provide a dataset of photo and synthetically generated images and a method of training patches generation from it. We prove the effectiveness of this data by training a lightweight neural network and show how it performs in both general and documents patches matching. The training was done on the provided dataset in comparison with HPatches training dataset and for the testing, we solve HPatches testing framework tasks and template matching task on two publicly available datasets with various documents pictured on complex backgrounds: MIDV-500 and MIDV-2019.

Бесплатно

Towards a unified framework for identity documents analysis and recognition

Towards a unified framework for identity documents analysis and recognition

Bulatov Konstantin Bulatovich, Bezmaternykh Pavel Vladimirovich, Nikolaev Dmitry Petrovich, Arlazarov Vladimir Viktorovich

Статья научная

Identity documents recognition is far beyond classical optical character recognition problems. Automated ID document recognition systems are tasked not only with the extraction of editable and transferable data but with performing identity validation and preventing fraud, with an increasingly high cost of error. A significant amount of research is directed to the creation of ID analysis systems with a specific focus for a subset of document types, or a particular mode of image acquisition, however, one of the challenges of the modern world is an increasing demand for identity document recognition from a wide variety of image sources, such as scans, photos, or video frames, as well as in a variety of virtually uncontrolled capturing conditions. In this paper, we describe the scope and context of identity document analysis and recognition problem and its challenges; analyze the existing works on implementing ID document recognition systems; and set a task to construct a unified framework for identity document recognition, which would be applicable for different types of image sources and capturing conditions, as well as scalable enough to support large number of identity document types. The aim of the presented framework is to serve as a basis for developing new methods and algorithms for ID document recognition, as well as for far more heavy challenges of identity document forensics, fully automated personal authentication and fraud prevention.

Бесплатно

Weighted combination of per-frame recognition results for text recognition in a video stream

Weighted combination of per-frame recognition results for text recognition in a video stream

O. Petrova, K. Bulatov, V.V. Arlazarov, V.L. Arlazarov

Статья

The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.

Бесплатно

Журнал