Научные статьи \ Прикладные науки. Медицина. Технология \ Oтрасли промышленности и ремесла для изготовления и обработки различных изделий

A joint study of deep learning-based methods for identity document image binarization and its influence on attribute recognition

Автор: Snchez-rivero R., Bezmaternykh P.V., Gayer A.V., Morales-gonzlez A., Jos silva-mata F., Bulatov K.B.

Журнал: Компьютерная оптика @computer-optics

Рубрика: Обработка изображений, распознавание образов

Статья в выпуске: 4 т.47, 2023 года.

Бесплатный доступ

Text recognition has benefited considerably from deep learning research, as well as the preprocessing methods included in its workflow. Identity documents are critical in the field of document analysis and should be thoroughly researched in relation to this workflow. We propose to examine the link between deep learning-based binarization and recognition algorithms for this sort of documents on the MIDV-500 and MIDV-2020 datasets. We provide a series of experiments to illustrate the relation between the quality of the collected images with respect to the binarization results, as well as the influence of its output on final recognition performance. We show that deep learning-based binarization solutions are affected by the capture quality, which implies that they still need significant improvements. We also show that proper binarization results can improve the performance for many recognition methods. Our retrained U-Net-bin outperformed all other binarization methods, and the best result in recognition was obtained by Paddle Paddle OCR v2.

Еще

Document image binarization, identity document recognition, optical character recognition, deep learning, u-net architecture

Короткий адрес: https://sciup.org/140301837

IDR: 140301837 | DOI: 10.18287/2412-6179-CO-1207

Список литературы A joint study of deep learning-based methods for identity document image binarization and its influence on attribute recognition

Doermann D, Tombre K. Handbook of document image processing and recognition. Springer Publishing Company Inc; 2014.
Arlazarov VV, Andreeva EI, Bulatov KB, Nikolaev DP, Petrova OO, Savelev BI, Slavin OA. Document image analysis and recognition: a survey. Computer Optics 2022; 46(4): 567-589. DOI: 10.18287/2412-6179-CO-1020.
Bulatov KB, Bezmaternykh PV, Nikolaev DP, Arlazarov VV. Towards a unified framework for identity documents analysis and recognition. Computer Optics 2022; 46(3): 436-454. DOI: 10.18287/2412-6179-CO-1024.
Arlazarov VL, Arlazarov VV, Bulatov KB, Chernov TS, Nikolaev DP, Polevoy DV, Sheshkus AV, Skoryukina NS, Slavin OA, Usilin SA. Mobile ID document recognition-coarse-to-fine approach. Pattern Recognit Image Anal 2022; 32(1): 89-108. DOI: 10.1134/S1054661822010023.
Arlazarov VV, Bulatov K, Chernov T, Arlazarov VL. MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream. Computer Optics 2019, 43(5): 818-824. DOI: 10.18287/2412-61792019-43-5-818-824.
Bulatov K, Emelianova E, Tropin D, et al. MIDV-2020: A comprehensive benchmark dataset for identity document analysis. arXiv Preprint. 2021. Source: áhttps://arxiv.org/abs/2107.00396ñ.
Sánchez-Rivero R, Bezmaternykh P, Morales-González A, Silva-Mata FJ, Bulatov K. Assessing the relationship between binarization and ocr in the context of deep learning-based id document analysis. In Book: Heredia YH, Núñez VM, Shulcloper JR, eds. Progress in artificial intelligence and pattern recognition. Cham: Springer International Publishing; 2021: 134-144.
Lins RD, Almeida MMD, Bernardino RB, Jesus D, Oliveira JM. Assessing binarization techniques for document images. DocEng 2017: Proc 2017 ACM Symposium on Document Engineering 2017: 183-192. DOI: 10.1145/3103010.3103021.
Mustafa WA, Kader MMMA. Binarization of document images: A comprehensive review. J Phys: Conf Ser 2018; 1019: 012023. DOI: 10.1088/1742-6596/1019/1/012023.
Tensmeyer C, Martinez T. Historical document image binarization: A review. SN Comput Sci 2020; 1(3): 173. DOI: 10.1007/s42979-020-00176-1.
Pratikakis I, Zagoris K, Barlas G, Gatos B. Icfhr2016 handwritten document image binarization contest (h-dibco 2016). 2016 15th Int Conf on Frontiers in Handwriting Recognition (ICFHR) 2016: 619-623.
Pratikakis I, Zagoris K, Karagiannis X, Tsochatzidis L, Mondal T, Marthot-Santaniello I. Document image binarization (dibco 2019). 2019 Int Conf on Document Analysis and Recognition (ICDAR) 2019: 1547-1556. DOI: 10.1109/ICDAR.2019.00249.
Smith EHB. An analysis of binarization ground truthing. Proc 8th IAPR Int Workshop on Document Analysis Systems (DAS '10) 2010: 27-34. DOI: 10.1145/1815330.1815334.
Ntirogiannis K, Gatos B, Pratikakis I. Performance evaluation methodology for historical document image binarization. IEEE Trans Image Process 2013; 22(2): 595609. DOI: 10.1109/TIP.2012.2219550.
Rani U, Kaur A, Josan G. A new binarization method for degraded document images. Int J Inf Technol 2019; 15(1): 1035-1053. DOI: 10.1007/s41870-019-00361-3.
Milyaev S, Barinova O, Novikova T, Kohli P, Lempitsky V. Image binarization for end-to-end text understanding in natural images. 2013 12th Int Conf on Document Analysis and Recognition 2013: 128-132. DOI: 10.1109/icdar.2013.33.
Chou C-H, Lin W-H, Chang F. A binarization method with learning-built rules for document images produced by cameras. Pattern Recogn 2010; 43(4): 1518-1530. DOI: 10.1016/j.patcog.2009.10.016.
Wen J, Li S, Sun J. A new binarization method for nonuniform illuminated document images. Pattern Recogn 2013; 46(6): 1670-1690. DOI: 10.1016/j.patcog.2012.11.027.
Tafti AP, Baghaie A, Assefi M, Arabnia HR, Yu Z, Peissig P. OCR as a service: An experimental evaluation of google docs OCR, tesseract, ABBYY FineReader, and transym. In Book: Bebis G, Boyle R, Parvin B, Koracin D, Porikli F, Skaff S, Entezari A, Min J, Iwai D, Sadagic A, Scheidegger C, Isenberg T, eds. Advances in visual computing. Cham, Switzerland: Springer International Publishing AG; 2016: 735-746. DOI: 10.1007/978-3-319-50835-1_66.
Li Z, Yang C, Shen Q, Wen S. A document image dataset for quality assessment. J Phys: Conf Ser 2021; 1828(1): 012033. DOI: 10.1088/1742-6596/1828/1/012033.
Ye P, Doermann D. Document image quality assessment: A brief survey. 2013 12th Int Conf on Document Analysis and Recognition 2013; 723-727. DOI: 10.1109/ICDAR.2013.148.
Polevoy DV, Bulatov KB, Skoryukina NS, Chernov TS, Arlazarov VV, Sheshkus AV. Key aspects of document recognition using small digital cameras. RFBR J 2016; 4: 97-108. DOI: 10.22204/2410-4639-2016-092-04-97-108.
Chernov T, Ilyuhin S, Arlazarov VV. Application of dynamic saliency maps to the video stream recognition systems with image quality assessment. Proc SPIE 2019; 11041: 110410T. DOI: 10.1117/12.2522768.
Shemiakina J, Limonova E, Skoryukina N, Arlazarov VV, Nikolaev DP. A method of image quality assessment for text recognition on camera-captured and projectively distorted documents. Mathematics 2021; 9(17): 2155. DOI: 10.3390/math9172155.
Bezmaternykh PV, Ilin DA, Nikolaev DP. U-Net-bin: hacking the document image binarization contest. Computer Optics 2019; 43(5): 825-832. DOI: 10.18287/2412-6179-2019-43-5-825-832.
Calvo-Zaragoza J, Gallego AJ. A selectional auto-encoder approach for document image binarization. Pattern Recogn 2019; 86: 37-47. DOI: 10.1016/j.patcog.2018.08.011.
Masyagin M. Robust document image binarization tool. 2021. Source: áhttps://github.com/masyagin1998/robinñ.
Otsu N. A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern Syst 1979; 9(1): 62-66. DOI: 10.1109/TSMC.1979.4310076.
Lins RD, Simske SJ, Bernardino RB. Doceng'2020 time-quality competition on binarizing photographed documents. Proc ACM Symposium on Document Engineering 2020; 2020: 2. DOI: 10.1145/3395027.3419578.
Yu D, Li X, Zhang C, Liu T, Han J, Liu J, Ding E. Towards accurate scene text recognition with semantic reasoning networks. Computer Vision and Pattern Recognition (CVPR) 2020: 12113-12122.
Du Y, Li C, Guo R, Cui C, Liu W, Zhou J, Lu B, Yang Y, Liu Q. Pp-ocrv2: Bag of tricks for ultra lightweight ocr system. arXiv Preprint. 2021. Source: áhttps://arxiv.org/abs/2109.03144ñ.
Lee J, Park S, Baek J, Oh SJ, Kim S, Lee H. On recognizing texts of arbitrary shapes with 2d self-attention. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops 2020: 546-547.
Baek J, Kim G, Lee J, Park S, Han D, Yun S, Oh SJ, Lee H. What is wrong with scene text recognition model comparisons? dataset and model analysis. 2019 IEEE/CVF Int Conf on Computer Vision (ICCV) 2019: 4714-4722. DOI: 10.1109/ICCV.2019.00481.
Cai H, Sun J, Xiong Y. Revisiting classification perspective on scene text recognition. arXiv Preprint. 2021. Source: áhttps://arxiv.org/abs/2102.10884ñ.
Smith R. An overview of the tesseract ocr engine IEEE Int conf on Document Analysis and Recognition (ICDAR'07) 2007; 2: 629-633. DOI: 10.1109/ICDAR.2007.4376991.
Michalak H, Okarma K. Robust combined binarization method of non-uniformly illuminated document images for alphanumerical character recognition. Sensors 2020; 20(10): 2914. DOI: 10.3390/s20102914.
Yujian L, Bo L. A normalized Levenshtein distance metric. IEEE Trans Pattern Anal Mach Intell 2007; 29(6): 1091-1095. DOI: 10.1109/TPAMI.2007.1078.
Schulz D, Maureira J, Tapia J, Busch C. Identity documents image quality assessment. 2022 30th European Signal Processing Conf (EUSIPCO) 2022: 1017-1021.

Еще