Enhanced dynamic programming-based method for text line recognition in documents
Автор: Chernyshova Y.S., Suloev K.K., Shehskus A.V., Arlazarov V.V.
Журнал: Компьютерная оптика @computer-optics
Рубрика: International conference on machine vision
Статья в выпуске: 6 т.49, 2025 года.
Бесплатный доступ
On-premise text recognition is in demand. Customers want to recognize bank cards to pay online, passports to fill in tickets' information and many more using their smartphones. As main approach to text recognition in the last two decades is artificial neural networks the resulting solutions tend to be resource-hungry and not fitting on mobile devices. In our paper, we introduce an enhanced method based on dynamic programming and a fully convolutional network for text line recognition that allows this classic model to demonstrate competitive results with much heavier architectures. The main idea is the addition of the special pin into the network alphabet that allows to apply dynamic programming to analyze the raw neural network output effectively. As our main focus is the recognition of identity documents we employ public dataset MIDV-500 and its extension MIDV-2019 as a test sample. We compare our resulting recognizer with several published models, including TrOCR, Paddle OCR, and Tesseract OCR 5, to demonstrate its superiority in accuracy and performance trade-off. Our method is about 200 times faster than TrOCR, and in the most cases is about 2 times faster than Paddle OCR. The accuracy of our recognizer is comparable with Paddle OCR on MIDV-500 and is better on MIDV-2019, including it being about 2 times more accurate for machine-readable zones images.
Data synthesis, fully convolutional neural networks, ID documents recognition, OCR, on-device recognition, text line recognition
Короткий адрес: https://sciup.org/140313270
IDR: 140313270 | DOI: 10.18287/COJ1761