An Integrated Pipeline with Internal Image Processing for Efficient Image to Text to Speech Conversion

Shreyas Reddy; Rashmi Ranjan Das; Anjali Mohapatra

doi:10.5815/ijem.2023.06.01

Scientific articles \ Prolegomena. Fundamentals of knowledge and culture. Propaedeutics \ Computer science and technology. Computing. Data processing \ Software

An Integrated Pipeline with Internal Image Processing for Efficient Image to Text to Speech Conversion

Author: Shreyas Reddy, Rashmi Ranjan Das, Anjali Mohapatra

Journal: International Journal of Engineering and Manufacturing @ijem

Article in issue: 6 vol.13, 2023.

Free access

Optical Character Recognition Systems (OCR) is a tool that helps computers read text from pictures of papers. It makes it easier for machines to understand what the words say without needing a person to read it out loud. It allows for easy digitizing of historical documents, archival material, and medical records thereby saving on their retrieval times. However, the accuracy of OCR systems heavily relies on the quality of the input images. To negate the contribution of the quality of input images to the accuracy of OCR systems, in this paper, we propose an integrated image pre-processing pipeline integrated with the OCR systems that enhances the quality of input images for efficient image to text conversion. This method results in an easily understandable text output with a lower Character Error Rate (CER) in comparison to the current methods. In addition, we explore a technique for converting text from a document or image into machine-readable form and then converting it to audio output using gTTS, a Python library that interfaces with Google Translate's text-to-speech API. We assess the effectiveness of this approach and illustrate that it substantially enhances OCR precision when compared to other existing methods. This paper presents a clear overview of the growth phases and significant obstacles, accompanied by compelling comparisons of results achieved through various methods.

Optical Character Recognition(OCR), Text-to-speech(TTS), Image processing, Character Error Rate(CER)

Short address: https://sciup.org/15018715

IDR: 15018715 | DOI: 10.5815/ijem.2023.06.01

Text of the scientific article An Integrated Pipeline with Internal Image Processing for Efficient Image to Text to Speech Conversion

There exists a population of approximately 285 million individuals with visual impairments among which approximately 30% of them are blind. One potential strategy to improve reading abilities could be through the use of character recognition techniques. Recent developments in the domain of digital image processing and the increase in computational power in recent times have made it possible to implement OCR systems effectively.

Optical character recognition involves the use of specialized software that can recognize individual characters from printed or handwritten text and convert them into a machine-readable format. They have revolutionized the way we handle, store, and analyze text data. With this technology, it is possible to digitize and preserve valuable records, as well as improve accessibility for people with visual impairments. The integration of this technology has been extensively embraced in diverse sectors, particularly in the banking industry and in fields such as healthcare, publishing, and education where an efficient and precise recognition of text is of utmost importance. Text detection, which entails finding the location of text in an image, is one of two crucial steps in OCR. The extraction of text from the image, or text recognition, comes next. Only a small number of the OCR engines used in current research studies are free and open source for usage. In accordance with the amount of noise in the document images, their accuracy ranges from 70% to 98%.

While text-to-speech (TTS) and optical character recognition (OCR) technologies exist to help address the challenges faced by the visually impaired, they are often not integrated into a single, user-friendly pipeline. This lack of integration can make the process of converting textual images to speech output cumbersome and time-consuming. Therefore, there is a need to design an integrated package that combines OCR and TTS technologies to allow for seamless and efficient conversion of textual images to speech output. Such a package would provide individuals with visual impairments or reading difficulties with greater accessibility to written information, thereby improving their independence, quality of life, and participation in society. The motivation behind this problem statement is to develop a pipeline that enables individuals to access and comprehend textual information easily, regardless of their visual capabilities or reading abilities.

References An Integrated Pipeline with Internal Image Processing for Efficient Image to Text to Speech Conversion

Sonia Bhaskar, Nicholas Lavassar and Scott Green, Implementing Optical Character Recognition on the Android Operating System for Business Cards, EE 368 Digital Image Processing.
Abdullah-Al Mahmud, Ahmed Sabbir Arif, Md. Mahbubur Rahman, and Muhammad Abul Hasan, ”Development of an intelligent text-to-speech (ITTTS) system for visually impaired people,” Journal of Assistive Technologies, vol. 11, no. 2, pp. 91-99, 2017
Mishra, A., Tiwari, V. (2019). Usability and Accessibility Evaluation of Intelligent Text to Speech (ITTTS) Software for Visually Impaired Users. Journal of Accessibility and Design for All, 9(1), 106-129.
Aditya Bakshi, Sunanda Gupta et al., “3T-FASDM: Linear Discriminant Analysis based 3-Tier Face Anti-Spoofing Detection Model using Support Vector”, International Journal of Communication Systems, Wiley, 2020, vol 33, issue 12.
Aditya Bakshi, Sunanda Gupta “An Efficient Face Anti-Spoofing and Detection Model Using Image Quality Assessment Parameters” in Multimedia Tools and Applications, 2020.
Shakti, Aditya Bakshi “An Optimal Energy Efficient Spatial-Temporal Correlation Method for Data Aggregation in Wireless Sensor Networks” published in International Journal of Control Theory and Applications, ISSN : 0974-5572,Number 45(2016).
Aditya Bakshi, Sunanda Gupta “A Taxonomy on Biometric Security and its Applications” International Conference on Innovations in Information and Communication Technologies.
Aditya Bakshi and Sunanda Gupta” A Comparative Analysis of Different Intrusion Detection Techniques in Cloud Computing” published in 2nd International Conference on Advanced Informatics for Computing Research ,2018, CCIS 956, pp. 358–378.
Zheng, C., Wang, B., Liu, Y., Yang, M., Han, J. (2021). EasyOCR: End-to-End Scene Text Recognition. Pattern Recognition, 114, 107778. doi: 10.1016/j.patcog.2021.107778.
Gao, Z., Yang, Y., Chen, Y., Deng, L., Wang, Y. (2020). EasyOCR: A Practical Scene Text Recognition System. In 2020 IEEE International Conference on Multimedia and Expo (ICME) (pp. 1-6). IEEE. doi: 10.1109/ICME46284.2020.9102593.
https://www.kaggle.com/datasets/shreyaspj/tiocr
https://pypi.org/project/img2speech/
Chucai Yi & Yingli Tian, 2014 Scene Text Recognition in Mobile Applications by Character Descriptor and Structure Configuration, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 7, JULY 2014
Julinda Gllavata', Ralph Ewerth' and Bemd Freisleben’ 2003 , A Robust Algorithm for Text Detection in Images, Proceedings of the 3rd International Symposium on Image and Signal Processing and Analysis (2003).