Handwritten text generation and strikethrough characters augmentation
Автор: Shonenkov Alex Vladimirovich, Karachev Denis Konstantinovich, Novopoltsev Maxim Yurievich, Potanin Mark Stanislavovich, Dimitrov Denis Valerievich, Chertok Andrey Victorovich
Журнал: Компьютерная оптика @computer-optics
Рубрика: International conference on machine vision
Статья в выпуске: 3 т.46, 2022 года.
Бесплатный доступ
We introduce two data augmentation techniques, which, used with a Resnet - BiLSTM - CTC network, significantly reduce Word Error Rate and Character Error Rate beyond best-reported results on handwriting text recognition tasks. We apply a novel augmentation that simulates strikethrough text (HandWritten Blots) and a handwritten text generation method based on printed text (StackMix), which proved to be very effective in handwriting text recognition tasks. StackMix uses weakly-supervised framework to get character boundaries. Because these data augmentation techniques are independent of the network used, they could also be applied to enhance the performance of other networks and approaches to handwriting text recognition. Extensive experiments on ten handwritten text datasets show that HandWritten Blots augmentation and StackMix significantly improve the quality of handwriting text recognition models.
Data augmentation, handwritten text recognition, strikethrough text, computer vision, stackmix, handwritten blots
Короткий адрес: https://sciup.org/140294998
IDR: 140294998 | DOI: 10.18287/2412-6179-CO-1049
Список литературы Handwritten text generation and strikethrough characters augmentation
- Potanin M, Dimitrov D, Shonenkov A, Bataev V, Karachev D, Novopoltsev M. Digital peter: Dataset, competition and handwriting recognition methods. arXiv preprint, 2021. Source:
- Yun S, Han D, Chun S, Oh SJ, Yoo Y, Choe J. CutMix: Regularization strategy to train strong classifiers with lo-calizable features. 2019 IEEE/CVF Int Conf on Computer Vision (ICCV) 2019: 6022-6031.
- Huang S, Wang X, Tao D. SnapMix: Semantically proportional mixing for augmenting fine-grained data. Proc AAAI Conf on Artificial Intelligence 2021; 35(2): 1628-1636.
- Zhang H, Cisse M, Dauphin YN, Lopez-Paz D. mixup: Beyond empirical risk minimization. Int Conf on Learning Representations 2018.
- Yu H, Wang H, Wu J. Mixup without hesitation. arXiv preprint, 2021. Source:
- Wigington C, Stewart S, Davis B, Barrett B, Price B, Cohen S. Data augmentation for recognition of handwritten words and lines using a CNN-LSTM network. 2017 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017; 1: 639-645.
- Poznanski A, Wolf L. Cnn-n-gram for handwriting word recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition 2016: 2305-2314.
- Krishnan P, Jawahar C. Matching handwritten document images. Proc European Conf on Computer Vision 2016: 766-782.
- Shen X, Messina R. A method of synthesizing handwritten chinese images for data augmentation. 2016 15th Int Conf on Frontiers in Handwriting Recognition (ICFHR) 2015: 114-119.
- Chammas E, Mokbel C, Likforman-Sulem L. Handwriting recognition of historical documents with few labeled data. 2018 13th IAPR Int Workshop on Document Analysis Systems (DAS) 2018: 43-48.
- Aradillas JC, Murillo-Fuentes JJ, Olmos PM. Boosting offline handwritten text recognition in historical documents with few labeled lines. IEEE Access 2020; 9: 7667476688.
- Fogel S, Averbuch-Elor H, Cohen S, Mazor S, Litman R. Scrabblegan: Semi-supervised varying length handwritten text generation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition 2020: 4324-4333.
- Bengio Y, et al. Markovian models for sequential data. Neural Computing Surveys 1999; 2(199): 129-162.
- Bourlard HA, Morgan N. Connnectionist speech recognition: A hybrid approach. Kluwer Academic Publishers; 1994.
- Almazän J, Gordo A, Fornes A, Valveny E. Word spotting and recognition with embedded attributes. IEEE Trans Pattern Anal Mach Intell 2014; 36(12): 2552-2566.
- Krishnan P, Dutta K, Jawahar C. Deep feature embedding for accurate recognition and retrieval of handwritten text. 15th Int Conf on Frontiers in Handwriting Recognition (ICFHR) 2016: 289-294.
- Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997; 9(8): 1735-1780.
- Voigtlaender P, Doetsch P, Ney H. Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. 15th Int Conf on Frontiers in Handwriting Recognition (ICFHR) 2016: 228-233.
- Marti U-V, Bunke H. The IAM-database: an English sentence database for offline handwriting recognition. Int J Doc Anal Recognit 2002; 5(1): 39-46.
- Coquenet D, Chatelain C, Paquet T. Recurrence-free unconstrained handwritten text recognition using gated fully convolutional network. 17th Int Conf on Frontiers in Handwriting Recognition (ICFHR) 2020: 19-24.
- Ingle RR, Fujii Y, Deselaers T, Baccash J, Popat AC. A scalable handwritten text recognition system. Int Conf on Document Analysis and Recognition (ICDAR) 2019: 17-24.
- Michael J, Labahn R, Grüning T, Zöllner J. Evaluating sequence-to-sequence models for handwritten text recognition. Int Conf on Document Analysis and Recognition (ICDAR) 2019: 1286-1293.
- Yousef M, Bishop TE. OrigamiNet: Weakly-supervised, segmentation-free, one-step, full page text recognition by learning to unfold. IEEE/CVF Conf on Computer Vision and Pattern Recognition (CVPR) 2020: 14710-14719.
- Competition digital peter. 2020. Source: (https://github.com/sberbank-ai/digital_peter_aij2020).
- DeVries T, Taylor GW. Improved regularization of convo-lutional neural networks with cutout. arXiv preprint, 2017. Source: (https://arxiv.org/abs/1708.04552).
- Hermes D. Helper for Bezier curves, triangles, and higher order objects. J Open Source Softw 2017; 2(16): 267.
- Method implementation (our code). 2021. Source: (https://github.com/TheDenk/augmixations).
- Bird S, Loper E, Klein E. Natural language processing with python. O'Reilly Media Inc; 2009.
- Malouf R. Multi-word expression tokenizer. Source: (https://www.nltk.org/_modules/nltk/tokenize/mwe.html).
- The conversation AI team, T. C. A. Jigsaw unintended bias in toxicity classification. 2018. Source: (https://www. kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification).
- Credits for the Latin library. Source: (https://www.thelatinlibrary.com/cred.html).
- Russian wikimedia downloads. 2021. Source: (https://dumps.wikimedia.org/ruwiki/).
- Transcribe Bentham. 2010. Source: (http://transcribe-bentham.ucl.ac.uk/td/TranscribeBentham).
- Gatos B, Louloudis G, Causer T, Grint K, Romero V, Sánchez J-A, Toselli A, Vidal E. Ground-truth production in the transcriptorium project. 11th IAPR Int Workshop on Document Analysis Systems 2014: 237-241.
- Theodore Bluche. 2002. Source: (http://www.tbluche.com/resources.html).
- IAM Handwriting Database. 2002. Source: (https://fki.tic.heia-fr.ch/databases/iam-handwriting-database).
- Github repository with various IAM splits. 2021. Source: (https://github.com/shonenkov/IAM-Splitting).
- Nurseitov D, Bostanbekov K, Kurmankhojayev D, Alimo-va A, Abdallah A. HKR for Handwritten Kazakh and Russian database. arXiv preprint, 2020. Source: (https://arxiv.org/abs/2007.03579).
- Github with HKR dataset splitting. 2020. Source: (https://github.com/bosskairat/Dataset).
- Reza AM. Realization of the contrast limited adaptive histogram equalization (CLAHE) for real-time image enhancement. The Journal of VLSI Signal Processing-Sy stems for Signal, Image, and Video Technology 2004; 38(1): 35-44.
- Fischer A, Frinken V, Fornés A, Bunke H. Transcription alignment of Latin manuscripts using Hidden Markov Models. Proc 2011 Workshop on Historical Document Imaging and Processing (HIP'11) 2011: 29-36.
- de Sousa Neto AF, Bezerra BLD, Toselli AH, Lima EB. HTR-Flor: A deep learning system for offline handwritten text recognition. 33rd SIBGRAPI Conference on Graphics, Patterns and Images 2020: 54-61.
- HTR-Flor implementation. 2019. Source: (https://github.com/arthurflor23/handwritten-text-recognition).
- Strauss T, Leifert G, Labahn R, Hodel T, Mühlberger G. Icfhr2018 competition on automated text recognition on a read dataset. 16th Int Conf on Frontiers in Handwriting Recognition (ICFHR) 2018: 477-482.
- Coquenet D, Chatelain C, Paquet T. End-to-end handwritten paragraph text recognition using a vertical attention network. arXiv preprint, 2020. Source: (https://arxiv.org/abs/2012.03868).
- Moysset B, Messina R. Are 2D-LSTM really dead for offline text recognition. Int J Doc Anal Recognit 2019; 22(3): 193-208.
- Wang T, Zhu Y, Jin L, Luo C, Chen X, Wu Y, Wang Q, Cai M. Decoupled attention network for text recognition. Proc AAAI Conf on Artificial Intelligence 2020; 34(07): 12216-12224.