Обзор алгоритмов детектирования текстовых областей на изображениях и видеозаписях

Автор: Болотова Юлия Александровна, Спицын Владимир Григорьевич, Осина Полина Максимовна

Журнал: Компьютерная оптика @computer-optics

Рубрика: Обработка изображений, распознавание образов

Статья в выпуске: 3 т.41, 2017 года.

Бесплатный доступ

Статья посвящена обзору методов детектирования и сегментации текстовых областей на изображениях и видеозаписях. Определяется обобщенный алгоритм работы систем распознавания текстов. Проводится обзор методов детектирования, определения структуры и сегментации текстовых документов в рамках решения задачи распознавания текстовых областей на изображениях и видеозаписях. Методы, предложенные в течение 30 лет исследований, анализируются с точки зрения точности, скорости и универсальности. В работе затрагиваются современные проблемы, касающиеся детектирования и распознавания текстовых областей на изображениях.

Распознавание образов, анализ структуры документа, сегментация текстовых изображений, определение угла наклона текста

Короткий адрес: https://sciup.org/140228627

IDR: 140228627   |   DOI: 10.18287/2412-6179-2017-41-3-441-452

A review of algorithms for text detection in images and videos

This article reviews the history and state-of-the-art optical character recognition systems, such as ABBYY FineReader, Tesseract, CuneiForm, with particular attention given to their inner algorithms, including page layout analysis; page segmentation and document skew angle estimation. The overview includes the description and comparison of different methods proposed for the last 30 years in terms of speed and versatility. Critical analysis and discussions about the status of the field and open problems are reported.

Список литературы Обзор алгоритмов детектирования текстовых областей на изображениях и видеозаписях

  • Кузьмицкий, Н.Н. Обнаружение фрагментов текста на изображениях реальных сцен на базе сверточной нейросетевой модели/Н.Н. Кузьмицкий//Информатика. -2015. -№ 2(46). -С. 12-21.
  • Казанский, Н.Л. Распределённая система технического зрения регистрации железнодорожных составов/Н.Л. Казанский, С.Б. Попов//Компьютерная оптика. -2012. -Т. 36, № 3.-С. 419-428.
  • Smith, R.W. Hybrid page layout analysis via tab-stop detection/R.W. Smith//Proceedings of 10th International Conference on Document Analysis and Recognition (ICDAR '09). -2009. -P. 214-245. - DOI: 10.1109/ICDAR.2009.257
  • Yin, X.-C. Multi-orientation scene text detection with adaptive clustering/X.-C. Yin, W.-Y. Pei, J. Zhang, H.-W. Hao//IEEE Transactions on Pattern Analysis and Machine Intelligence. -2015. -Vol. 37, Issue 9. -P. 1930-1937. - DOI: 10.1109/TPAMI.2014.2388210
  • Zuo, Z.-Y. Multi-strategy tracking based text detection in scene videos/Z.-Y. Zuo, S. Tian, X.-C. Yin//13th International Conference on Document Analysis and Recognition (ICDAR). -2015. -P. 66-70. - DOI: 10.1109/ICDAR.2015.7333727
  • Koo, H.I. Scene text detection via connected component clustering and nontext filtering/H.I. Koo, D.H. Kim//IEEE Transactions on Image Processing. -2013.-Vol. 22, Issue 6. -P. 2296-2305. - DOI: 10.1109/TIP.2013.2249082
  • Nagy, G. Twenty years of document image analysis in PAMI/G. Nagy//IEEE Transactions on Pattern Analysis and Machine Intelligence. -2000. -Vol. 22(1). -P. 38-62. - DOI: 10.1109/34.824820
  • Болотова, Ю.А. Распознавание автомобильных номеров на основе метода связных компонент и иерархической временной сети/Ю.А. Болотова, В.Г. Спицын, М.Н. Рудомёткина//Компьютерная оптика. -2015. -Т. 39, № 2. -С. 275-280. - DOI: 10.18287/0134-2452-2015-39-2-275-280
  • Jaderberg, M. Reading text in the wild with convolutional neural networks/M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman//International Journal of Computer Vision. -2016. -Vol. 116, Issue 1. -P. 1-20. - DOI: 10.1007/s11263-015-0823-z
  • Novikova, T. Large-lexicon attribute-consistent text recognition in natural images/T. Novikova, O. Barinova, P. Kohli, V. Lempitsky//European Conference on Computer Vision. -2012. -С. 752-765. - DOI: 10.1007/978-3-642-33783-3_54
  • Запрягаев, С.А. Распознавание рукописных символов на основе анализа дескрипторов функций длины хорды/С.А. Запрягаев, А.И. Сорокин//Вестник Воронежского государственного университета. Серия: Системный анализ и информационные технологии. -2009.-№ 2. -С. 49-58.
  • Глумов, Н.И. Метод быстрой корреляции с использованием тернарных шаблонов при распознавании объектов на изображениях/Н.И. Глумов, Е.В. Мясников, В.Н. Копенков, М.А. Чичёва//Компьютерная оптика. -2008. -Т. 32, № 3. -С. 277-282.
  • Smith, R.W. History of the Tesseract OCR engine: what worked and what didn’t/R.W. Smith//Proceedings of SPIE. -2013. -Vol. 8658. -865802. - DOI: 10.1117/12.2010051
  • Breuel, T.M. The OCRopus open source OCR system/T.M. Breuel//Proceedings of SPIE. -2008. -Vol. 6815. -68150F. - DOI: 10.1117/12.783598
  • Senior, A.W. Off-line cursive handwriting recognition using recurrent neural networks/A.W. Senior//PhD thesis. -Cambridge: Cambridge University, 1994. -121 с.
  • Graves, A. A novel connectionist system for unconstrained handwriting recognition/A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, J. Schmidhuber//IEEE Transactions on Pattern Analysis and Machine Intelligence.-2008. -Vol. 31, Issue 5. -P. 855-868. - DOI: 10.1109/TPAMI.2008.137
  • Srihari, S.N. Document image analysis/S.N. Srihari, G.W. Zack//Proceedings of 8th International Conference on Pattern Recognition. -1986. -P. 434-436.
  • Гороховатский, А.В. Детектирование текстовых областей на изображении документа методом слияния/А.В. Гороховатский//Системи обробки iнформацiї. -2014. -Випуск 1(117). -С. 75-81.
  • Cattoni, R. Geometric layout analysis techniques for document image understanding: A review /R. Cattoni, T. Coianiz, S. Messelodi, C.M. Modena//ITC-irst technical report TR#9703-09. -1998. -URL: www.academia.edu/18416548/Geometric_Layout_Analysis_Techniques_for_Document_Image_Understanding_a_Review._TR_9703-09. -68 p.
  • Negi, A. Localization, extraction and recognition of text in Telugu document images/A. Negi, K.N. Shanker, C.K. Chereddi//Proceedings of the 7-th International Conference on Document Analysis and Recognition. -2003. -P. 1193-1197. - DOI: 10.1109/ICDAR.2003.1227846
  • Bukhari, S.S. High performance layout analysis of Arabic and Urdu document images/S.S. Bukhari, F. Shafait, T.M. Breuel//Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR 2011). -2011. -P. 1275-1279. - DOI: 10.1109/ICDAR.2011.257
  • Wong, K.Y. Document analysis system/K.Y. Wong, R.G. Casey, F.M. Wahl//IBM Journal of Research and Development. -1982. -Vol. 26(6).-P. 647-656. - DOI: 10.1147/rd.266.0647
  • Nagy, G. Hierarchical representation of optically scanned documents/G. Nagy, S. Wagle//Proceedings of 7th International Conference on Pattern Recognition. -1984.-P. 347-349.
  • Baird, H.S. Image segmentation by shape-directed covers/H.S. Baird, S.E. Jones, S.J. Fortune//Proceedings of 10th International Conference on Pattern Recognition. -1990. -P. 820-825. - DOI: 10.1109/ICPR.1990.118223
  • Oudjemia, S. Segmentation of complex document/S. Oudjemia, Z. Ameur, A. Ouahabi//Carpathian Journal of Electronic and Computer Engineering. -2014. -Vol. 7(1). -P. 13-18.
  • Breuel, T.M. An algorithm for finding maximal whitespace rectangles at arbitrary orientations for document layout analysis/T.M. Breuel//Proceedings of the 7th International Conference on Document Analysis and Recognition. -2003. -Vol. 1. -P. 66-70. - DOI: 10.1109/ICDAR.2003.1227629
  • Winder, A. Extending page segmentation algorithms for mixed-layout document processing/A. Winder, T. Andersen, E.H.B. Smith//Proceedings of International Conference on Document Analysis and Recognition. -2011. -P. 1245-1249. - DOI: 10.1109/ICDAR.2011.251
  • Breuel, T.M. Two geometric algorithms for layout analysis/T.M. Breuel//International Workshop on Document Analysis Systems: DAS V. -2002. -P. 188-199. - DOI: 10.1007/3-540-45869-7_23
  • Shafait, F. Performance comparison of six algorithms for page segmentation/F. Shafait, D. Keysers, T.M. Breuel//International Workshop on Document Analysis Systems: DAS VII. -2006. -P. 368-379. - DOI: 10.1007/11669487_33
  • Baird, H.S. Background structure in document images/H.S. Baird//International Journal of Pattern Recognition and Artificial Intelligence. -1994. -Vol. 8, Issue 05. -P. 1013-1030. - DOI: 10.1142/S0218001494000516
  • O'Gorman, L. The document spectrum for page layout analysis/L. O'Gorman//IEEE Transactions on Pattern Analysis and Machine Intelligence. -1993. -Vol. 15, Issue 11. -P. 1162-1173. - DOI: 10.1109/34.244677
  • Скворцов, А.В. Триангуляция Делоне и её применение/А.В. Скворцов. -Томск: Изд-во Томского ун-та, 2002. -128 с. -ISBN: 5-7511-1501-5.
  • Kise, K. Segmentation of page images using the area Voronoi diagram/K. Kise, A. Sato, M. Iwata//Computer Vision and Image Understanding. -1998. -Vol. 70, Issue 3. -P. 370-382. - DOI: 10.1006/cviu.1998.0684
  • Mao, S. Empirical performance evaluation methodology and its application to page segmentation algorithms/S. Mao, T. Kanungo//IEEE Transactions on Pattern Analysis and Machine Intelligence. -2001. -Vol. 23, Issue 3. -P. 242-256. - DOI: 10.1109/34.910877
  • Gather, P. Empirical performance evaluation methodology and its application to page segmentation algorithms: A review/P. Gather, A. Singh//International Journal of Advanced Research in Computer Engineering & Technology. -2015. -Vol. 4, Issue 4. -P. 1277-1279.
  • Esposito, F. A knowledge-based approach to the layout analysis/F. Esposito, D. Malerba, G. Semeraro//Proceedings of the 3rd International Conference on Document Analysis and Recognition. -1995. -Vol. 1. -P. 466-471. - DOI: 10.1109/ICDAR.1995.599037
  • Li, L. Multilingual text detection with nonlinear neural network/L. Li, S. Yu, L. Zhong, X. Li//Mathematical Problems in Engineering. -2015. -Vol. 2015. -431608 (7 p.). - DOI: 10.1155/2015/431608
  • Shih, F.Y. Adaptive document block segmentation and classification/F.Y. Shih, S.S. Chen//IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics. -1996. -Vol. 26, Issue 5. -P. 797-802. - DOI: 10.1109/3477.537322
  • Wang, D. Classification of newspaper image blocks using texture analysis/D. Wang, S.N. Srihari//Computer Vision, Graphics, and Image Processing. -1989. -Vol. 47, Issue 3. -P. 327-352. - DOI: 10.1016/0734-189X(89)90116-3
  • Vil’kin, A.M. Algorithm for segmentation of documents based on texture features/A.M. Vil’kin, I.V. Safonov, M.A. Egorova//Pattern Recognition and Image Analysis. -2013. -Vol. 23, Issue 1. -P. 153-159. - DOI: 10.1134/S1054661813010136
  • Sauvola, J.J. Page segmentation and classification using fast feature extraction and connectivity analysis/J. Sauvola, M. Pietikäinen//Proceedings of the 3rd International Conference on Document Analysis and Recognition (ICDAR '95). -1995. -Vol. 2. -P. 1127-1131. - DOI: 10.1109/ICDAR.1995.602118
  • Scherl, W. Automatic separation of text, graphic and picture segments in printed material/W. Scherl, F. Wahl, H. Fuchsberger//Pattern Recognition in Practice. -1980. -P. 213-221.
  • Tsujimoto, S. Major components of a complete text reading system/S. Tsujimoto, H. Asada//Proceedings of the IEEE. -1992. -Vol. 80, Issue 7. -P. 1133-1149. - DOI: 10.1109/5.156475
  • Jain, A.K. Page segmentation using texture analysis/A.K. Jain, Y. Zhong//Pattern Recognition. -1996. -Vol. 29, Issue 5. -P. 743-770. - DOI: 10.1016/0031-3203(95)00131-X
  • Cattoni, R. Geometric layout analysis techniques for document image understanding: A review /R. Cattoni, T. Coianiz, S. Messelodi, C.M. Modena//ITC-irst technical report TR#9703-09. -1998. -URL: www.academia.edu/18416548/Geometric_Layout_Analysis_Techniques_for_Document_Image_Understanding_a_Review._TR_9703-09. -68 p.
  • Jain, A.K. Text segmentation using Gabor filters for automatic document processing/A.K. Jain, S. Bhattacharjee//Machine Vision and Applications. -1992. -Vol. 5, Issue 3. -P. 169-184. - DOI: 10.1007/BF02626996
  • Smith, R. A simple and efficient skew detection algorithm via text row accumulation/R. Smith//Proceedings of the 3rd International Conference on Document Analysis and Recognition (ICDAR '95). -1995. -Vol. 2. -P. 1145-1148. - DOI: 10.1109/ICDAR.1995.602124
  • U.S. Patent 3,069,654 G06K9/46, G01T5/02, G01T5/00, 382/281. Method and means for recognizing complex patterns/P.V.C. Hough, filed of March 26, 1960, published of Desember 18, 1962.
  • Hinds, S.C. A document skew detection method using run-length encoding and the Hough transform/S.C. Hinds, J.L. Fisher, D.P. D'Amato//Proceedings of 10th International Conference on Pattern Recognition. -1990. -Vol. 1. -P. 464-468. - DOI: 10.1109/ICPR.1990.118147
  • Rashid, S.F. Scanning neural network for text line recognition/S.F. Rashid, F. Shafait, T.M. Breuel//10th IAPR International Workshop on Document Analysis Systems (DAS). -2012. -P. 105-109. - DOI: 10.1109/DAS.2012.77
  • Breuel, T.M. High-performance OCR for printed English and Fraktur using LSTM networks/T.M. Breuel, A. Ul-Hasan, M.A. Al-Azawi//Proceedings of 12th International Conference on Document Analysis and Recognition. -2013. -P. 683-687. - DOI: 10.1109/ICDAR.2013.140
  • Nagy, G. Optical character recognition: An illustrated guide to the frontier/G. Nagy, T.A. Nartker, S.V. Rice//In: Proceedings of the IS&T/SPIE Symposium on Electronic Imaging: Science and Technology. -1999. -P. 58-69.
  • Масалович, А. Распрямление текстовых строк на основе непрерывного гранично-скелетного представления изображений /А. Масалович, Л. Местецкий//Труды Международной конференции «Графикон», Новосибирск. -2006. -4 c. -URL: graphicon.ru/html/2006/wr34_16_MestetskiyMasalovitch.pdf.
  • Wang, T. End-to-end text recognition with convolutional neural networks/T. Wang, D.J. Wu, A. Coates, A.Y. Ng,//Proceedings of 21st International Conference on Pattern Recognition (ICPR 2012). -2012. -P. 3304-3308.
  • Zhong, Y. Automatic caption localization in compressed video/Y. Zhong, H. Zhang, A.K. Jain//IEEE Transactions on Pattern Analysis and Machine Intelligence. -2000. -Vol. 22, Issue 4. -P. 385-392. - DOI: 10.1109/34.845381
Еще