An Efficient Approach for Text-to-Speech Conversion Using Machine Learning and Image Processing Technique

Smt. Swaroopa Shastri; Shashank Vishwakarma

doi:10.5815/ijem.2023.04.05

Scientific articles \ Prolegomena. Fundamentals of knowledge and culture. Propaedeutics \ Computer science and technology. Computing. Data processing \ Application-oriented computer-based techniques

An Efficient Approach for Text-to-Speech Conversion Using Machine Learning and Image Processing Technique

Author: Smt. Swaroopa Shastri, Shashank Vishwakarma

Journal: International Journal of Engineering and Manufacturing @ijem

Article in issue: 4 vol.13, 2023.

Free access

This study explores the conversion of English to Hindi, ﬁrst to text, and subsequently to speech. The ﬁrst part of the implementation is the text recognition from images, in which two approaches are used for text character recognition: a maximally stable extensible region (MSER) and grayscale conversion the second part of the paper deals with the geometric filtering in combination with stroke width transform (SWT). Subsequently, letter/alphabets are grouped to detect text sequences, which are then fragmented into words. Finally, a 96 percent accurate spell check is performed using naive Bayes and decision tree algorithms, followed by the use of optical character recognition (OCR) to digitize. The word Give our text-to-speech synthesizer (TTS) the recognized text to convert it to Hindi language using the text-to-speech model. Based on aspects such speech speed, sound quality, pronunciation, and clarity.

Image processing MSER, OCR, Geometrical properties, SWT, TTS Synthesizer

Short address: https://sciup.org/15018705

IDR: 15018705 | DOI: 10.5815/ijem.2023.04.05

References An Efficient Approach for Text-to-Speech Conversion Using Machine Learning and Image Processing Technique

Niblack, W. 1993. The QBIC Project: Querying Images by Content Using Color, Texture, and Shape. In Proc. Storage and Retrieval for Image and Video Databases, SPIE Bellingham, Wash,173-187
Asha G. Hagargund, Shasha Vanaria Thota, Mitadru Bera, Eram Fatima Shaik (2017) “Image to Speech Conversion for Visually Impaired”, International Journal of Latest Research in Engineering and Technology, ISSN: 2454- 5031, Issue 06, Vol. 03, No. 0, pp. 09-15.
A. V. Bapat and L. K. Nagalkar, "Phonetic Speech Analysis for Speech to Text Conversion," 2008 IEEE Region 10 and the Third International Conference on Industrial and Information Systems, 2008, pp. 1-4, DOI: 10.1109/ICIINFS.2008.4798390.
Kiran Rakshana R, Chitra C(2019) “A Smart Navguide System for Visually Impaired”, International Journal of Innovative Technology and Exploring Engineering, ISSN: 2278- 3075, Issue 6S3, Vol. 8, No. 0, pp. 0.
Jain, A.K., and Yu, B. 1998. Automatic Text Lo cation in Images and Video Frames, Pattern Recognition Society. Vol. 31(12), 2055-2076.
Wolf, C., and Jo lion, J.M. 2004. Model-Based Text Detection in Images and Videos: A Learning Approach. Technical Report LIRIS RR.
Vaibhav V. Govekar, Meenakshi A (2018) “A Smart Reader for Blind People”, International Journal of Science Technology & Engineering, ISSN: 2349-784X, Issue 1, Vol. 5, pp. 0.
A. Laptev, R. Korostik, A. Svischev, A. Andrusenko, I. Medennikov, and S. Rybin, "You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation," 2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 2020, pp. 439-444, DOI: 10.1109/CISPBMEI51763.2020.9263564.
Hao, Y., Yi, Z., Zeng-Guang H., and Min, T. 2003. Automatic Text Detection in Video Frames Based on Bootstrap Artificial Neural Network and CED. Journal of Winter School of Computer Graphics (WSCG), Vol. 11.
Misran, C., and Swain, P.K. 2011. An Automated HSV-Based Text Tracking System from Complex Color Video. LNCS, Vol 6536, 255-26