Using a convolutional neural network to recognize text elements in poor quality scanned images

Бесплатный доступ

The paper proposes a method for recognizing the content of scanned images of poor quality using convolutional neural networks (CNNs). The method involves the implementation of three main stages. At the first stage, image preprocessing is implemented, which consists of identifying the contours of its alphabetic and numeric elements and basic punctuation marks. At the second stage, the content of the image fragments inside the identified contours is sequentially fed to the input of the CNN, which implements a multiclass classification. At the third and final stage, the post-processing of the set of SNA responses and the formation of a text document with recognition results are implemented. An experimental study of all stages was carried out in Python using the Keras deep learning libraries and OpenCV computer vision and showed fairly good results for the main types of deterioration in the quality of a scanned image: geometric distortions, blurring of borders, the appearance of extra lines and spots during scanning, etc.

Еще

Image processing, convolutional neural network, python, keras, opencv

Короткий адрес: https://sciup.org/143179409

IDR: 143179409   |   DOI: 10.25209/2079-3316-2022-13-3-45-59

Список литературы Using a convolutional neural network to recognize text elements in poor quality scanned images

  • A. Chauhan. Convolutional Neural Networks for multiclass image classification — A beginners guide to understand CNN, Published in The Startup, 2020. hUtRtpLs://medium.com/swlh/convolutional-neural-networks-for-multiclass-image-classification-a-beginners-guide-to-6dbc09fabbd
  • J. Brownlee. How to develop a CNN for MNIST handwritten digit classification, Machine Learning Mastery, 2019. hUtRtpLs://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-from-scratch-for-mnist-handwritten-digit-classification/
  • V. Mokin. MNIST models testing: typographic digits, Kaggle, 2021. hUtRtpLs://www.kaggle.com/datasets/vbmokin/typographic-digits-first-10-fonts
  • K. Y. Chan. Font recognition using CNN approach, Final Year Project (Bachelor), Tunku Abdul Rahman University College, 2021. UhtRtpLs://eprints.tarc.edu.my/id/eprint/19207
  • Y. Gao, Y. Chen, J.Wang, M. Tang, H. Lu. “Reading scene text with fully convolutional sequence modeling”, Neurocomputing, 339 (2019), pp. 161–170. https://doi.org/10.1016/j.neucom.2019.01.094
  • A. A. Chandio, Mehw. Leghari, Mehj. Leghari, A. H. Jalbani. “Multi-font and multi-size printed Sindhi character recognition using Convolutional Neural Networks”, Pak. J. Engg. Appl. Sci., 25 (2019), pp. 36–42. hUtRtpLs://journal.uet.edu.pk/ojs_old/index.php/pjeas/article/view/1635/332
  • G. J. Ansari, J. H. Shah, M. A. Khan, M. Sharif, U. Tariq, T. Akram. “A non-blind deconvolution semi pipelined approach to understand text in blurry natural images for edge intelligence”, Information Processing & Management, 58:6 (2021), 102675. https://doi.org/10.1016/j.ipm.2021.102675
  • Zh. Zhong, L. Jin, Z. Feng. “Multi-font printed Chinese character recognition using multi-pooling convolutional neural network”, 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (23–26 August 2015, Tunis, Tunisia), 2015, pp. 96–100. https://doi.org/10.1109/ICDAR.2005.233
  • Ch. Tensmeyer, D. Saunders, T. Martinez. Convolutional Neural Networks for font classification, ICDAR 2017, 2017, 7 pp. arXivarXiv 1708.03669 [cs.CV]
  • J. Charles, P. Y. Simard, P. Viola, J. Rinker. “Text recognition of low-resolution document images”, Eighth International Conference on Document Analysis and Recognition . V. 2, ICDAR’05 (31 August 2005–01 September 2005, Seoul, South Korea), 2005, ISBN 0-7695-2420-6, pp. 695–699. https://doi.org/10.1109/ICDAR.2015.7333733
  • A. Myuller, S. Gvido. Introduction to Machine Learning with Python. Guide for data scientists, Vil’yams, M., 2017, ISBN 978-5-9908910-8-1 (In Russian), 394 pp
  • A. Dzhulli, S. Pal. Keras library—deep learning tool, DMK Press, M., 2017, ISBN 978-5-97060-573-8 (In Russian), 294 pp.
  • S. Datta. Learning OpenCV 3 Application Development, Packt Publiching, 2016, ISBN 978-1784391454, 310 pp.
  • S. Gruppetta. Image processing with the Python Pillow Library, RealPython, 2022. hUtRtpLs://realpython.com/image-processing-with-the-python-pillow-library/
  • S. Dey. Hands-On Image Processing with Python: Expert techniques for advanced image analysis and effective interpretation of image data, Packt Publiching, 2018, ISBN 978-1789343731, 492 pp.
  • V.K. Ayyadevara. Neural Networks with Keras Cookbook: Over 70 recipes leveraging deep learning techniques across image, text, audio, and game bots, Packt Publiching, 2019, ISBN 978-1789346640, 568 pp.
  • A. Rosebrock. Using Tesseract OCR with Python, PyimageSearch, 2021. hUtRtpLs://pyimagesearch.com/2017/07/10/using-tesseract-ocr-python/
  • U. Makkinni. Python and data analysis, DMK Press, M., 2019, ISBN 978-5-97060-590-5 (In Russian), 540 pp.
Еще
Статья научная