Научные статьи \ Общие вопросы науки и культуры \ Информационные технологии. Вычислительная техника. Обработка данных \ Искусственный интеллект

Tiny CNN for feature point description for document analysis: approach and dataset

Автор: Sheshkus Alexander Vladimirovich, Chirvonaya Anastasiya Nikolaevna, Arlazarov Vladimir Lvovich

Журнал: Компьютерная оптика @computer-optics

Рубрика: International conference on machine vision

Статья в выпуске: 3 т.46, 2022 года.

Бесплатный доступ

In this paper, we study the problem of feature points description in the context of document analysis and template matching. Our study shows that specific training data is required for the task especially if we are to train a lightweight neural network that will be usable on devices with limited computational resources. In this paper, we construct and provide a dataset of photo and synthetically generated images and a method of training patches generation from it. We prove the effectiveness of this data by training a lightweight neural network and show how it performs in both general and documents patches matching. The training was done on the provided dataset in comparison with HPatches training dataset and for the testing, we solve HPatches testing framework tasks and template matching task on two publicly available datasets with various documents pictured on complex backgrounds: MIDV-500 and MIDV-2019.

Еще

Feature points description, metrics learning, training dataset

Короткий адрес: https://sciup.org/140294996

IDR: 140294996 | DOI: 10.18287/2412-6179-CO-1016

Список литературы Tiny CNN for feature point description for document analysis: approach and dataset

Kougia V, Pavlopoulos J, Androutsopoulos I. Medical image tagging by deep learning and retrieval. In Book: Arampatzis A. et al, eds. Experimental IR meets multi-linguality, multimodality, and interaction. CLEF 2020. Cham: Springer; 2020: 154-166. DOI: 10.1007/978-3-030-58219-7_14.
Shin Y, Seo K, Ahn J, Im DH. Deep-learning-based image tagging for semantic image annotation. In Book: Park J, Park DS, Jeong YS, Pan Y, eds. Advances in computer science and ubiquitous computing. CSA-CUTE 2018. 2018. Singapore: Springer; 2019: 54-59. DOI: 10.1007/978-981-13-9341-9_10.
William I, Ignatius Moses Setiadi DR, Rachmawanto EH, Santoso HA, Sari CA. Face recognition using FaceNet (survey, performance test, and comparison). Fourth Int Conf on Informatics and Computing 2019 (ICIC) 2019; 1: 1-6. DOI: 10.1109/ICIC47613.2019.8985786.
Skoryukina N, Arlazarov V, Nikolaev D. Fast method of ID documents location and type identification for mobile and server application. Int Conf on Document Analysis and Recognition, 2019 (ICDAR) 2019; 1: 850-857. DOI: 10.1109/ICDAR.2019.00141.
Kumar M, Gupta S, Mohan N. A computational approach for printed document forensics using SURF and ORB features. Soft Comput 2020; 24(1): 13197-13208. DOI: 10.1007/s00500-020-04733-x.
Ilyuhin SA, Sheshkus AV, Arlazarov VL. Recognition of images of Korean characters using embedded networks. In Book: Wolfgang O, Nikolaev D, Zhou J, eds. Twelfth Int Conf on Machine Vision 2019 (ICMV) 2020; 11433: 1-7. DOI: 10.10007/1234567890.
Duan Y, Lu J, Wang Z, Feng J, Zhou J. Learning deep binary descriptor with multi-quantization. IEEE Conf on Computer Vision and Pattern Recognition 2017; 1: 11831192. DOI: 10.1109/CVPR.2017.516.
Zhang J, Ye S, Huang T, Rui Y. CDbin: Compact discriminative binary descriptor learned with efficient neural network. IEEE Trans Circuits Syst Video Technol 2020; 30(3): 862-874. DOI: 10.1109/TCSVT.2019.2896095.
Balntas V, Lenc K, Vedaldi A, Mikolajczyk K. HPatches: A benchmark and evaluation of handcrafted and learned local descriptors. IEEE Conf on Computer Vision and Pattern Recognition 2017: 5173-5182.
Hoffer E, Ailon N. Deep metric learning using triplet network. In Book: Feragen A, Pelillo M, Loog M, eds. Similarity-based pattern recognition 2015 (SIMBAD). Cham: Springer; 2015: 84-92. DOI: 10.1007/978-3-319-24261-3_7.
Mishra A, Liwicki M. Using deep object features for image descriptions. arXiv preprint. Source: (https://arxiv.org/abs/1902.09969).
Paulin M, Douze M, Harchaoui Z, Mairal J, PerroninF, Schmid C. Local convolutional features with unsupervised training for image retrieval. 2015 IEEE Int Conf on Computer Vision (ICCV) 2016; 1: 91-99. DOI: 10.1109/ICCV.2015.19.
Schultz M, Joachims T. Learning a distance metric from relative comparisons. Adv Neural Inf Process Syst 2004; 16(1): 41-48.
Cacheux YL, Borgne HL, Crucianu M. Modeling inter and intra-class relations in the triplet loss for zero-shot learning. Proc IEEE/CVF Int Conf on Computer Vision (ICCV) 2019; 1: 10333-10342.
Chen W, Chen X, Zhang J, Huang K.: Beyond triplet loss: a deep quadruplet network for person reidentification. Proc IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2017; 1: 403-412.
Chernyshova YS, Gayer AV, Sheshkus AV. Generation method of synthetic training data for mobile OCR system. Proc SPIE 2018; 10696: 106962G. DOI: 10.1117/12.2310119.
Nikolaev DP, Karpenko SM, Nikolaev IP, Nikolayev PP. Hough transform: underestimated tool in the computer vision field. Proc 22th European Conf on Modelling and Simulation 2008: 238-246. DOI: 10.7148/2008-0238.
Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science 2006; 313(5786): 504-507. DOI: 10.1126/science.1127647.
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint. Source: (https://arxiv.org/abs/1602.07360).
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint. Source: (https://arxiv.org/abs/1704.04861).
Mishchuk A, Mishkin D, Radenovic F, Matas J. Working hard to know your neighbor's margins: Local descriptor learning loss. NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems 2017: 4826-4837.
Zhao Y, Jin Z, Qi GJ, Lu H, Hua XS. An adversarial approach to hard triplet generation. In Book: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, eds. Computer vision - Proceedings of the European conference on computer vision 2018. Cham: Springer; 2018: 501-517. DOI: 10.1007/978-3-030-01240-3_31.
Sikaroudi M, Ghojogh B, Safarpoor A, Karray F, Crow-ley M, Tizhoosh HR. Offline versus online triplet mining based on extreme distances of histopathology patches. In Book: Bebis G. et al, eds. Advances in visual computing 2020. Cham: Springer; 2020: 333-345. DOI: 10.1007/978-3-030-64556-4_26.
Gayer AV, Chernyshova YS, Sheshkus AV. Effective real-time augmentation of training dataset for the neural networks learning. Proc SPIE 2018; 11041: 10411I. DOI: 10.1117/12.2522969.
Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. Proc Thirteenth Int Conf on Artificial Intelligence and Statistics (AISTAST) 2010; 9: 249-256.
Arlazarov VV, Bulatov KB, Chernov TS, Arlazarov VL. MIDV-500: A dataset for identity document analysis and recognition on mobile devices in video stream. Computer Optics 2019; 43(5): 818-824. DOI: 10.18287/2412-61792019-43-5-818-824.
Bulatov K, Matalov D, Arlazarov VV. MIDV-2019: challenges of the modern mobile-based document OCR. Proc SPIE 2019; 11433: 114332N. DOI: 10.1117/12.2558438.
Arandjelovic R, Zisserman A. Three things everyone should know to improve object retrieval. Proc 2012 IEEE Conf on Computer Vision and Pattern Recognition 2012: 2911-2918. DOI: 10.1109/CVPR.2012.6248018.
Calonder M, Lepetit V, Strecha C, Fua P. BRIEF: Binary robust independent elementary features. In Book: Danii-lidis K, Maragos P, Paragios N, eds. Proceedings of the 11th European conference on computer vision. Berlin, Heidelberg: Springer; 2010: 778-792. DOI: 10.1007/978-3-642-15561-1_56.
Lowe DG. Object recognition from local scale-invariant features. Proc Seventh IEEE Int Conf on Computer Vision 1999; 2: 1150-1157. DOI: 10.1109/ICCV.1999.790410.
Trzcinski T, Christoudias M, Lepetit V. Learning image descriptors with boosting. IEEE Trans Pattern Anal Mach Intell 2015; 37(3): 597-610. DOI: 10.1109/TPAMI.2014.2343961.
Zagoruyko S, Komodakis N. Learning to compare image patches via convolutional neural networks. Proc 2015 IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2015: 4353-4361. DOI: 10.1109/CVPR.2015.7299064.
Simo-Serra E, Trulls E, Ferraz L, Kokkinos I, Fua P, Moreno-Noguer F. Discriminative learning of deep con-volutional feature point descriptors. Proc 2015 IEEE Int Conf on Computer Vision (ICCV) 2015: 118-126. DOI: 10.1109/ICCV.2015.22.
Balntas V, Riba E, Ponsa D, Mikolajczyk K. Learning local feature descriptors with triplets and shallow convo-lutional neural networks. Proc British Machine Vision Conf 2016: 119.1-119.11. DOI: 10.5244/C.30.119.

Еще