Towards a unified framework for identity documents analysis and recognition

Bulatov Konstantin Bulatovich; Bezmaternykh Pavel Vladimirovich; Nikolaev Dmitry Petrovich; Arlazarov Vladimir Viktorovich

doi:10.18287/2412-6179-CO-1024

Scientific articles \ Prolegomena. Fundamentals of knowledge and culture. Propaedeutics \ Computer science and technology. Computing. Data processing \ Application-oriented computer-based techniques

Towards a unified framework for identity documents analysis and recognition

Author: Bulatov Konstantin Bulatovich, Bezmaternykh Pavel Vladimirovich, Nikolaev Dmitry Petrovich, Arlazarov Vladimir Viktorovich

Journal: Компьютерная оптика @computer-optics

Section: International conference on machine vision

Article in issue: 3 т.46, 2022.

Free access

Identity documents recognition is far beyond classical optical character recognition problems. Automated ID document recognition systems are tasked not only with the extraction of editable and transferable data but with performing identity validation and preventing fraud, with an increasingly high cost of error. A significant amount of research is directed to the creation of ID analysis systems with a specific focus for a subset of document types, or a particular mode of image acquisition, however, one of the challenges of the modern world is an increasing demand for identity document recognition from a wide variety of image sources, such as scans, photos, or video frames, as well as in a variety of virtually uncontrolled capturing conditions. In this paper, we describe the scope and context of identity document analysis and recognition problem and its challenges; analyze the existing works on implementing ID document recognition systems; and set a task to construct a unified framework for identity document recognition, which would be applicable for different types of image sources and capturing conditions, as well as scalable enough to support large number of identity document types. The aim of the presented framework is to serve as a basis for developing new methods and algorithms for ID document recognition, as well as for far more heavy challenges of identity document forensics, fully automated personal authentication and fraud prevention.

Optical character recognition, document recognition, document analysis, identity documents, recognition system, mobile recognition, video stream recognition

Short address: https://sciup.org/140294997

IDR: 140294997 | DOI: 10.18287/2412-6179-CO-1024

References Towards a unified framework for identity documents analysis and recognition

Eikvil L. OCR - Optical Character Recognition. 1993. Source: (https://www.nr.no/~eikvil/OCR.pdf).
Doermann D, Tombre K, eds. Handbook of document image processing and recognition. London: Springer; 2014. ISBN: 978-0-85729-858-4.
International Civil Aviation Organization. ICAO Doc 9303 - Machine readable travel documents. Source: (https://www.icao.int/publications/pages/publication.aspx? docnum=9303).
Hartl A, Arth C, Schmalstieg D. Real-time detection and recognition of machine-readable zones with mobile devices. In Book: Braz J, Battiato S, Imai F, eds. Proceedings of the 10th International Conference on Computer Vision Theory and Applications. Volume 1: VISAPP. Berlin, Germany: 2015: 79-87. DOI: 10.5220/0005294700790087.
Avoine G, Kalach K, Quisquater J-J. ePassport: Securing International contacts with contactless chips. In Book: Tsudik G, ed. Financial cryptography and data security. Berlin, Heidelberg: Springer; 2008: 141-155. DOI: 10.1007/978-3-540-85230-8_11.
Buchmann N, Rathgeb C, Wagner J, Busch C, Baier H. A preliminary study on the feasibility of storing fingerprint and iris image data in 2d-barcodes. 2016 International Conference of the Biometrics Special Interest Group (BIOSIG) 2016: 1-5. DOI: 10.1109/BIOSIG.2016.7736904.
Agrawal H. Aadhaar enabled applications. 2015. Source: (https://darpg.gov.in/sites/default/files/Aadhaar.pptx).
ISO/IEC 7810:2003: Identification cards - Physical characteristics. 2003. Source: (https://www.iso.org/standard/31432.html).
Council of the European Union. PRADO - Public Register of Authentic identity and travel Documents Online. Source: (https://www.consilium.europa.eu/prado/en/prado-start-page.html).
American Association of Motor Vehicle Administrators. AAMVA DL/ID card design standard (CDS). Source: (https://www.aamva.org/DL-ID-Card-Design-Standard).
International Civil Aviation Organization. Traveller identification programme - ID management solutions for more secure travel documents. Source: (https://www.icao.int/security/FAL/TRIP/Pages/default. aspx).
Global coverage for identity verification. Source: (https://www.jumio.com/global-coverage).
Onfido. Supported documents. Source: (https://onfido.com/supported-documents).
Keesing Technologies. Unrivaled coverage of international ID documents. Source: (https://www.keesingtechnologies.com/documentchecker/id-documents).
Llados J, Lumbreras F, Chapaprieta V, Queralt J. ICAR: Identity card automatic reader. Proc Sixth Int Conf on Document Analysis and Recognition 2001: 470-474. DOI: 10.1109/ICDAR.2001.953834.
Mollah AF, Majumder N, Basu S, Nasipuri M. Design of an optical character recognition system for camera-based handheld devices. Int J Comput Sci Appl 2011; 8(4): 283289.
Ryan M, Hanafiah N. An examination of character recognition on ID card using template matching approach. Procedia Computer Science 2015; 59: 520-529. DOI: 10.1016/j.procs.2015.07.534.
Pratama MO, Satyawan W, Fajar B, Fikri R, Hamzah H. Indonesian ID card recognition using convolutional neural networks. 5th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI) 2018: 178-181. DOI: 10.1109/EECSI.2018.8752769.
Satyawan W, Pratama MO, Jannati R, Muhammad G, Fajar B, Hamzah H, Fikri R, Kristian K. Citizen ID card detection using image processing and optical character recognition. J Phys Conf Ser 2019; 1235: 012049. DOI: 10.1088/1742-6596/1235/1/012049.
Smith R. An overview of the Tesseract OCR engine. Ninth Int Conf on Document Analysis and Recognition (ICDAR 2007) 2007; 2: 629-633. DOI: 10.1109/ICDAR.2007.4376991.
Attivissimo F, Giaquinto N, Scarpetta M, Spadavecchia M. An automatic reader of identity documents. IEEE International Conference on Systems, Man and Cybernetics (SMC) 2019: 3525-3530. DOI: 10.1109/SMC.2019.8914438.
Viet HT, Hieu Dang Q, Vu TA. A robust end-to-end information extraction system for vietnamese identity cards. 6th NAFOSTED Conf on Information and Computer Science (NICS) 2019: 483-488. DOI: 10.1109/NICS48868.2019.9023853.
Thanh TNT, Trong KN. A method for segmentation of vietnamese identification card text fields. Int J Adv Comput Sci Appl 2019; 10(10): 415-421. DOI: 10.14569/IJACSA.2019.0101057.
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L. Mobilenetv2: Inverted residuals and linear bottlenecks. 2018 IEEE/CVF Conf on Computer Vision and Pattern Recognition 2018: 4510-4520. DOI: 10.1109/CVPR.2018.00474.
Guo Q, Deng Y. Attention OCR. 2017. Source: (https://github.com/da03/Attention-OCR).
Xu J, Wu X. A system to localize and recognize texts in oriented ID card images. 2018 IEEE Int Conf on Progress in Informatics and Computing (PIC) 2018: 149-153. DOI: 10.1109/PIC.2018.8706303.
Wu X, Xu J, Wang J, Li Y, Li W, Guo Y. Identity authentication on mobile devices using face verification and id image recognition. Procedia Computer Science 2019; 162: 932-939. DOI: 10.1016/j.procs.2019.12.070.
Fang X, Fu X, Xu X. Id card identification system based on image recognition. 2017 12th IEEE Conf on Industrial Electronics and Applications (ICIEA) 2017: 1488-1492. DOI: 10.1109/ICIEA.2017.8283074.
Castelblanco A, Solano J, Lopez C, Rivera E, Tengana L, Ochoa M. Machine learning techniques for identity document verification in uncontrolled environments: A case study. In Book: Mora KMF, Marín JA, Cerda J, Carrasco-Ochoa JA, José Martínez-Trinidad JF, Olvera-López JA, eds. MCPR 2020: Pattern Recognition. Cham, Switzerland: Springer Nature; 2020: 271-281. DOI: 10.1007/978-3-030-49076-8_26.
Arlazarov VV, Bulatov K, Chernov T, Arlazarov VL. MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream. Computer Optics 2019; 43(5): 818-824. DOI: 10.18287/2412-61792019-43-5-818-824.
Bulatov K, Matalov D, Arlazarov V. MIDV-2019: challenges of the modern mobile-based document OCR. Proc SPIE 2019; 11433: 114332N. DOI: 10.1117/12.2558438.
Skoryukina N, Arlazarov V, Nikolaev D. Fast method of id documents location and type identification for mobile and server application. 2019 Int Conf on Document Analysis and Recognition (ICDAR) 2019: 850-857. DOI: 10.1109/ICDAR.2019.00141.
de Sa Soares A, das Neves Junior R, Bezerra B. BID Dataset: a challenge dataset for document processing tasks. Anais Estendidos do XXXIII Conference on Graphics, Patterns and Images 2020: 143-146. DOI: 10.5753/sibgrapi.est.2020.12997.
Ngoc MOV, Fabrizio J, Géraud T. Saliency-based detection of identy documents captured by smartphones. 13th IAPR International Workshop on Document Analysis Systems (DAS) 2018: 387-392. DOI: 10.1109/DAS.2018.17.
Chazalon J, Gomez-Krämer P, Burie J, Coustaty M, Es-kenazi S, Luqman M, Nayef N, Rusinol M, Sidère N, Ogier J. SmartDoc 2017 video capture: Mobile document acquisition in video mode. 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) 2017; 4: 11-16. DOI: 10.1109/ICDAR.2017.306.
Sencar HT, Memon N. Overview of state-of-the-art in digital image forensics. In Book: Bhattacharya BB, Sur-Kolay S, Nandy SC, Bagchi A, eds. Statistical science and interdisciplinary research: Volume 3. Algorithms, architectures and information systems security. Singapore: World Scientific Publishing Co Pte Ltd; 2009: 325-347. DOI: 10.1142/9789812836243_0015.
Piva A. An overview on image forensics. ISRN Signal Process 2013; 2013: 68-73. DOI: 10.1155/2013/496701.
Centeno AB, Terrades OR, Canet JL, Morales CC. Identity document and banknote security forensics: A survey. arXiv preprint, 2019. Source: (https://arxiv.org/abs/1910.08993).
Ferreira WD, Ferreira CB, da Cruz Junior G, Soares F. A review of digital image forensics. Comput Electr Eng 2020; 85: 106685. DOI: 10.1016/j.compeleceng.2020.106685.
Council of the European Union. PRADO Glossary - Technical terms related to security features and to security documents in general (in alphabetical order) 2021. Source: (https://www.consilium.europa.eu/prado/en/prado-glossary/prado-glossary.pdf).
Arlazarov VV, Chernov TS, Nikolaev DP, Skoryukina NS, Slavin OA. Method for holographic elements detection in video stream. 2017, US Patent US10354142B2 of July 16, 2019. Source: (https://patents.google.com/patent/US10354142B2/en).
Kunina IA, Aliev MA, Arlazarov NV, Polevoy DV. A method of fluorescent fibers detection on identity documents under ultraviolet light. Proc SPIE 2020; 11433: 114330D. DOI: 10.1117/12.2558080.
Li H, Wang S, Kot AC. Image recapture detection with convolutional and recurrent neural networks. Electronic Imaging 2017; 2017(7): 87-91. DOI: 10.2352/ISSN.2470-1173.2017.7.MWSF-329.
Sun Y, Shen X, Liu C, Zhao Y. Recaptured image foren-sics algorithm based on image texture feature. Intern J Pattern Recognit Artif Intell 2020; 34(03): 2054011. DOI: 10.1142/S0218001420540117.
Warbhe AD, Dharaskar R, Thakare V. A scaling robust copy-paste tampering detection for digital image forensics. Procedia Computer Science 2016; 79: 458-465. DOI: 10.1016/j.procs.2016.03.059.
Yusoff N, Alamro L. Implementation of feature extraction algorithms for image tampering detection. Int J Adv Comput Res 2019; 9(43): 197-211. DOI: 10.19101/IJACR.PID37.
Kumar M, Rani A, Srivastava S. Image forensics based on lighting estimation. Int J Image Graph 2019; 19(03): 1950014. DOI: 10.1142/S0219467819500141.
ISO 1073-2:1976: Alphanumeric character sets for optical recognition - Part 2: Character set OCR-B - Shapes and dimensions of the printed image. International Organization for Standardization; 1976. Source: (https://www. iso.org/standard/5568. html).
Starovoitov V, Samal D, Sankur B. Matching of faces in camera images and document photographs. IEEE Int Conf on Acoustics, Speech, and Signal Processing 2000; 4: 2349-2352. DOI: 10.1109/ICASSP.2000.859312.
Fysh MC, Bindemann M. Forensic face matching: A review. In Book: Bindemann M, Megreya AM, eds. Face processing: Systems, disorders and cultural differences. New York: Nova Science Publishing Inc; 2017: 1-20.
Bulatov K, Arlazarov VV, Chernov T, Slavin O, Nikolaev D. Smart IDReader: Document recognition in video stream. 14th Int Conf on Document Analysis and Recognition (ICDAR) 2017; 6: 39-44. DOI: 10.1109/ICDAR.2017.347.
Valentin K, Wild P, Stolc S, Daubner F, Clabian M. Optical benchmarking of security document readers for automated border control. Proc SPIE 2016; 9995: 999503. DOI: 10.1117/12.2241169.
Fujitsu fi-65F: Flatbed scanner for passports, ID cards. Spigraph catalogue, 2021. Source: (http://www.spigraph.com/Scanners/Catalogue-scanner/Documents/Specifics/Fujitsu/fi-65F).
PS667 Simplex ID Card Scanner with AmbirScan. Ambir Technology. Source: (https://www.ambir.com/product/simplex-id-card-scanner-ambirscan-ps667-as).
Talwerdi M. Apparatus and method for reading a document and printing a mark on the document. 2018, Japan patent JP6314332B2 of July 4, 2017. Source: (https://patents.google.com/patent/JP6314332B2/en).
Bocharov NA, Limonova EE, Nikolaev DP, Paramonov NB, Slavin OA, Usilin SA. Automatized workplace for passport documents control. Pat RF of Invent N RU 182557 U1 of August 22, 2018. Source: (https://yandex.ru/patents/doc/RU182557U1_20180822/).
Volonkin VM, Evstafjev EN, Nikonorov MV, Podoljskii AD, Stolyarov EV. Universal reader of passport and visa documents. 2013, Pat RF of Invent N RU 127977 U1 of May 10, 2013. Source: (https://patents.google.com/patent/RU127977U1/en).
Arlazarov VV, Zhukovskiy AE, Krivtsov VE, Nikolaev DP, Polevoy DV Analysis of the usage specifics of stationary and small-scale mobile video cameras for documents recognition [In Russian]. Information Technologies and Computing Systems (ITiVS) 2014; 3: 71-81.
Li X, Zhang B, Liao J, Sander PV. Document rectification and illumination correction using a patch-based CNN. ACM Trans Graph 2019; 38(6): 168. DOI: 10.1145/3355089.3356563.
Asad F, Ul-Hasan A, Shafait F, Dengel A. High performance OCR for camera-captured blurred documents with LSTM networks. 12th IAPR Workshop on Document Analysis Systems (DAS) 2016: 7-12. DOI: 10.1109/DAS.2016.69.
Chernov TS, Razumnuy NP, Kozharinov AS, Nikolaev DP, Arlazarov VV. Image quality assessment for video stream recognition systems. Proc SPIE 2017; 10696: 106961U. DOI: 10.1117/12.2309628.
Nunnagoppula G, Deepak KS, Harikrishna G, Rai N, Krishna PR, Vesdapunt N. Automatic blur detection in mobile captured document images: Towards quality check in mobile based document imaging applications. IEEE Second Int Conf on Image Information Processing (ICIIP-2013) 2013: 299-304. DOI: 10.1109/ICIIP.2013.6707602.
Miao L, Peng S. Perspective rectification of document images based on morphology. 2006 Int Conf on Computational Intelligence and Security 2006; 2: 1805-1808. DOI: 10.1109/ICCIAS.2006.295374.
Takezawa Y, Hasegawa M, Tabbone S. Robust perspective rectification of camera-captured document images. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017; 06: 27-32. DOI: 10.1109/ICDAR.2017.345.
Kunina I, Gladilin S, Nikolaev D. Blind radial distortion compensation in a single image using fast Hough transform. Computer Optics 2016; 40(3): 395-403. DOI: 10.18287/2412-6179-2016-40-3-395-403.
Zhukovsky A, Nikolaev D, Arlazarov V, Postnikov V, Polevoy D, Skoryukina N, Chernov T, Shemiakina J, Mukovozov A, Konovalenko I, Povolotsky M. Segments graph-based approach for document capture in a smartphone video stream. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017; 01: 337342. DOI: 10.1109/ICDAR.2017.63.
Haris M, Shakhnarovich G, Ukita N. Recurrent back-projection network for video super-resolution. IEEE/CVF Conf on Computer Vision and Pattern Recognition (CVPR) 2019: 3892-3901. DOI: 10.1109/CVPR.2019.00402.
Petrova O, Bulatov K, Arlazarov VV, Arlazarov VL. Weighted combination of per-frame recognition results for text recognition in a video stream. Computer Optics 2021; 45(1): 77-89. DOI: 10.18287/2412-6179-CO-795.
Awal AM, Ghanmi N, Sicre R, Furon T. Complex document classification and localization application on identity document images. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017; 01: 426-431, DOI: 10.1109/ICDAR.2017.77.
Augereau O, Journet N, Domenger J-P. Semi-structured document image matching and recognition. Proc SPIE 2013; 8658: 865804. DOI: 10.1117/12.2003911.
Slavin OA. Using special text points in the recognition of documents. In Book: Kravets AG, Bolshakov AA, Shcher-bakov MV. Cyber-physical systems: Advances in design & modelling. Cham: Springer International Publishing; 2020: 43-53. DOI: 10.1007/978-3-030-32579-4_4.
Minkina A, Nikolaev D, Usilin S, Kozyrev V. Generalization of the Viola-Jones method as a decision tree of strong classifiers for real-time object recognition in video stream. Proc SPIE 2015; 9445: 944517. DOI: 10.1117/12.2180941.
Puybareau E, Geraud T. Real-time document detection in smartphone videos. 25th IEEE International Conference on Image Processing (ICIP) 2018: 1498-1502. DOI: 10.1109/ICIP.2018.8451533.
das Neves Junior RB, Lima E, Bezerra BL, Zanchettin C, Toselli AH. HU-PageScan: a fully convolutional neural network for document page crop. IET Image Process 2020; 14: 3890-3898. DOI: 10.1049/iet-ipr.2020.0532.
Loc CV, Cao De T, Burie JC, Ogier JM. Content region detection and feature adjustment for securing genuine documents. 12th Int Conf on Knowledge and Systems Engineering (KSE) 2020: 103-108. DOI: 10.1109/KSE50997.2020.9287382.
Forman S, Samanthula BK. Secure similar document detection: Optimized computation using the Jaccard coefficient. IEEE 4th Int Conf on Big Data Security on Cloud, IEEE Int Conf on High Performance and Smart Computing, (HPSC) and IEEE Int Conf on Intelligent Data and Security (IDS) 2018: 1-4. DOI: 10.1109/BDS/HPSC/IDS18.2018.00015.
Skoryukina N, Nikolaev DP, Sheshkus A, Polevoy D. Real time rectangular document detection on mobile devices. Proc SPIE 2015; 9445: 94452A. DOI: 10.1117/12.2181377.
Bulatov K, Razumnyi N, Arlazarov VV. On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision model. Int J Doc Anal Recognit 2019; 22(3): 303-314. DOI: 10.1007/s10032-019-00333-0.
Povolotskiy MA, Tropin DV. Dynamic programming approach to template-based OCR. Proc SPIE 2019; 11041: 110411T. DOI: 10.1117/12.2522974.
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J. EAST: An efficient and accurate scene text detector. IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2017: 2642-2651. DOI: 10.1109/CVPR.2017.283.
Wolf C, Jolion J-M. Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int J Doc Anal Recognit 2006; 8(4): 280-296.
Lee CY, Baek Y, Lee H. TedEval: A fair evaluation metric for scene text detectors. arXiv preprint, 2019. Source: (https://arxiv.org/abs/1907.01227).
Baek Y, Nam D, Park S, Lee J, Shin S, Baek J, Lee CY, Lee H. CLEval: Character-level evaluation for text detection and recognition tasks. arXiv preprint, 2020. Source: (https://arxiv.org/abs/2006.06244).
Bezmaternykh PV, Nikolaev DP, Arlazarov VL. Textual blocks rectification method based on fast Hough transform analysis in identity documents recognition. Proc SPIE 2018; 10696: 1069606. DOI: 10.1117/12.2310162.
Chernyshova YS, Sheshkus AV, Arlazarov VV. Two-step CNN framework for text line recognition in camera-captured images. IEEE Access 2020; 8: 32587-32600. DOI: 10.1109/ACCESS.2020.2974051.
Bulatov KB. A method to reduce errors of string recognition based on combination of several recognition results with per-character alternatives. Bulletin of the South Ural State University, Series: Mathematical Modelling, Programming and Computer Software 2019; 12(3): 74-88. DOI: 10.14529/mmp190307.
Yujian L, Bo L. A normalized Levenshtein distance metric. IEEE Trans Pattern Anal Mach Intell 2007; 29(6): 1091-1095. DOI: 10.1109/TPAMI.2007.1078.
Fiscus JG. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER). IEEE Workshop on Automatic Speech Recognition and Understanding 1997: 347-354. DOI: 10.1109/ASRU.1997.659110.
Arlazarov VV, Bulatov K, Manzhikov T, Slavin O, Jan-iszewski I. Method of determining the necessary number of observations for video stream documents recognition. Proc SPIE 2018; 10696: 106961X. DOI: 10.1117/12.2310132.
Tolstov I, Martynov S, Farsobina V, Bulatov K. A modification of a stopping method for text recognition in a video stream with best frame selection. Proc SPIE 2021; 11605: 116051M. DOI: 10.1117/12.2586928.
Polevoy DV, Aliev MA, Nikolaev DP. Choosing the best image of the document owner's photograph in the video stream on the mobile device. Proc SPIE 2021; 11605: 116050F. DOI: 10.1117/12.2586939.
Shi W, Caballero J, Huszar F, Totz J, Aitken AP, Bishop R, Rueckert D, Wang Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2016: 1874-1883. DOI: 10.1109/CVPR.2016.207.
Ren H, El-Khamy M, Lee J. Video super resolution based on deep convolution neural network with two-stage motion compensation. IEEE Int Conf on Multimedia Expo Workshops (ICMEW) 2018: 1-6. DOI: 10.1109/ICMEW.2018.8551569.
Mei J, Islam A, Wu Y, Moh'd A, Milios EE. Statistical learning for OCR text correction. arXiv preprint, 2016. Source: (https://arxiv.org/abs/1611.06950).
Nguyen T, Jatowt A, Coustaty M, Nguyen N, Doucet A. Post-OCR error detection by generating plausible candidates. Int Conf on Document Analysis and Recognition (ICDAR) 2019: 876-881. DOI: 10.1109/ICDAR.2019.00145.
Llobet R, Cerdan-Navarro J, Perez-Cortes J, Arlandis J. OCR post-processing using weighted finite-state transducers. 20th Int Conf on Pattern Recognition 2010: 20212024. DOI: 10.1109/ICPR.2010.498.
Bulatov KB, Nikolaev DP, Postnikov VV. Universal algorithm for post-processing of recognition results based on validation grammars [In Russian]. Trudy ISA RAN 2015; 65(4): 68-73.
Petrova O, Bulatov K. Methods of machine-readable zone recognition results post-processing. Proc SPIE 2019; 11041: 110411H. DOI: 10.1117/12.2522792.