Автоматическое распознавание речи на тамильском языке: полная модель

Автор: Чандрасекар М., Понавайко М.

Журнал: Техническая акустика @ejta

Статья в выпуске: т.8, 2008 года.

Бесплатный доступ

В статье представлен новый метод построения системы автоматического распознавания речи на тамильском языке. Разработан алгоритм для сегментации речевых сигналов, выделяющий из речи слова и отдельные символы (буквы). Затем обратный алгоритм используется для обучения системы. Предложенный метод проверен экспериментально и доказал свою эффективность.

Распознавание речи, тамильский язык

Короткий адрес: https://sciup.org/14316103

IDR: 14316103

Список литературы Автоматическое распознавание речи на тамильском языке: полная модель

  • M. Chandrasekar, M. Ponnavaikko. A Survey of methods used for Speech processing and the issues related to Indian Language Processing. Int. Conference on Spoken language processing. New Delhi, India, October 2002.
  • Davis K. H., Biddulph R., Balashek S. Automatic recognition of spoken digits. Journal of Acoust. Soc. of America, volume 24, pp. 637-642, 1952.
  • Olson H. F., Belar H. Phonetic Typewriter. Journal of Acoust. Soc. of America, 28(4), 1072-1081, 1956.
  • Fry D. B. Theoretical aspects of mechanical speech recognition. J. British Inst. Radio Engineer, 19(4), 211-218, 1959.
  • Forgie J. W., Forgie C. D. Results obtained from vowel recognition Computer program. J. Acoust. Soc. of America. 31(11), 1480-1489, 1959.
  • Sakai T., Doshita S. The phonetic typewriter. Information Processing. Proc. IFIP Congress, Munich, 445-450, 1962.
  • T. B. Martin, A. L. Nelson, H. J. Zadell. Speech recognition by feature extraction techniques. Tech. Report AL-TDR-64-176, Air Force Avionics Lab, 1964.
  • T. K.Vintsyuk. Speech discrimination by dynamic programming. Kibernetika, 4(2), 81-88, January-February, 1968.
  • D. R. Reddy. An approach to computer speech recognition by direct analysis of the speech wave. Tech. Report C549, Computer Science Dept., Stanford Univ., September 1966.
  • V. M. Velichko, N. G. Zagoruyko. Automatic recognition of 200 words. Int. J.Man-Machine Studies, 2:223, June 1970.
  • H. Sakoe, S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoustics, Speech, Signal Proc., ASSP 26(1), 43-49, February 1978.
  • F. Itakura. Minimum prediction residual applied to speech recognition. IEEE Trans. Acoustics, Speech, Signal Proc., ASSP 23 (1), 67-72, February1975.
  • C. C. Tappert, N. R. Dixon, A. S. Rabinowitz, W. D. Chapma. Automatic recognition of continuous speech utilizing dynamic segmentations, dual classification, sequential decoding and error recovery. Tech. Report TR-71-146, Rome Air Dev. Cen, Rome, NY, 1971.
  • F. Jelinek, L R. Bahl, R. L Mercer. Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Trans. Information Theory, 1T-21, 250-256, 1975.
  • F. Jelinek. The development of an experimental discrete dictation recognizer. Proc. IEEE, 73(11), 1616-1624, 1985.
  • L. R. Rabiner, S. E. Levinson, A. E. Rosenberg, J. G. Wilpon. Speaker independent recognition of isolated words using clustering techniques. IEEE Trans. Acoustics, Speech, Signal Proc., ASSP-27: 336-349, August 1979.
  • H. Sakoe. Two level DP matching -a dynamic programming based pattern matching algorithm for connected word recognition. IEEE Trans. Acoustics, Speech, Signal Proc., ASSP 27, 588-595, December 1979.
  • J. S. Bridle, M. D. Brown. Connected word recognition using whole word templates. Proc. Int. Acoust. Autumn Conf., 25-28, November 1979.
  • C. S. Myers, L. R. Rabiner. A level building dynamic time warping algorithm for connected word recognition. IEEE Trans. Acoustics, Speech, Signal Proc., ASSP 29: 284-297, April 1981.
  • C. H. Lee, L. R. Rabiner. A frame synchronous network search algorithm for connected word recognition. IEEE Trans. Acoustics, Speech, Signal Proc., 37(11), 1649-1658, November 1989.
  • L. Rabiner, B. Juang, S. Levinson, M. Sondhi. Recent developments in the application of hidden Markov models to speaker-independent isolated word recognition. In Proc. of IEEE ICASSP-85, 9-12, Tampa, Florida, 1985.
  • J. Furguson, editor. Hidden Markov models for speech. IDA Princeton, NJ 1980.
  • L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. of IEEE, 77(2): 257-286, February 1989.
  • L. F. Baum, T. Petrie, G. Soules, N. Weiss. A maximization technique occurring statistical analysis of probablisitic functions of Markov chains. Ann. Math. Stat., 41, 164-171, 1970.
  • L. R. Liporace. Maximum likelihood estimation multivariate observations of Markov sources. IEEE Trans Info. Theory. IT-28, 729-34, September 1982.
  • B. Juang. Maximum likelihood estimation for mixture multivariate stochastic observations of Markov chains. AT&T Technical Journal, 64(6), 1235-1250, Part 1, July-August 1985.
  • B. H. Juang, S. E. Levinson, M. M. Sondhi. Maximum likelihood estimation for multivariate mixture observations of Markov chains. IEEE Trans Information Theory, IT-32(2), 307-309, March 1986.
  • J. K. Baker. Stochastic modeling for automatic speech understanding. In D. R. Reddy, editor, Speech Recognition. Invited Papers for the IEEE Symp.1975.
  • L. Bahl et al. Automatic recognition of continuously spoken sentences from a finite state grammar. In Proceedings ICASSP, Tulsa, OK, 1978.
  • B. T. Lowerre, R. Reddy. The HARPY speech understanding system. In W. Lea, editor, Trends in Speech Recognition, 340-360. Prentice Hall, Englewood Cliffs, NJ, 1980.
  • L. R. Bahl, F. Jelinek, R. L Mercer. A maximum likelihood approach to continuous speech recognition. IEEE Trans. PAMI, 5(2), 179-190, March 1983.
  • F. K. Soong, E. F. Huang. A tree-trellis fast search for finding N-best sentence hypotheses. In Proc. ICASSP 91, 705-708, Toronto, May 1991.
  • D. Paul. Algorithms for an optimal A* search and linearizing the search in the stack decoder. In IEEE ICASSP-91, 693-696, Toronto, Canada, May 1991.
  • B. H. Juang, R. Perdue, D. Thomson. Deployable automatic speech recognition systems: Advances and challenges. AT&T technical Journal, 74(2), 1995.
  • T. Kawahara, C-H Lee, B-H. Juang. Key-phrase detection and verification for flexible speech understanding. In Proc. ICSLP-96, Philadelphia, PA, October 1996.
  • M. Rahim, C-H Lee, B-H. Juang. A study on robust utterance verification for connected digits recognition. J. Acoustical Society of America, 1997.
  • T. Kawahara, C-H. Lee, B-R. Juang. Combining key-phrase detection and subword-based verification for flexible speech understanding. In Proc. of ICASSP-97, Munich, April 1997.
  • Z. Harris. Methods in Structural Linguistics. University of Chicago Press, 1951. Later updated and published as Structural Linguistics in 1960 and 1974.
  • F. Jelinek, L. R. Baul, R. L. Mercer. Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE, Trans. Information Theory, IT-21, 250-256, 1975.
  • B. Juang. Speech recognition in adverse environments. Computer Speech & Language, 5, 275-294, 1991.
  • Y. Chen. Cepstral domain stress compensation for robust speech recognition. In Proc. of ICASSP-87, 717-720, Dallas, Texas, April 1987.
  • R. M. Stern, A. Acero, F. H. Liu, Y. Ohshima. Signal processing for robust speech recognition. In Automatic Speech and Speaker Recognition-Advanced Topics. Lee, Soong, Paliwal (eds.), 357-384, Kluwer, 1996.
  • M. G. Rahim, B. H. Juang. Signal bias removal by maximum likelihood estimation for robust telephone speech recognition. IEEE Trans. SAP, 4(1), 19-30, January 1996.
  • M. J. F. Gales, S. J. Young. Robust speech recognition in additive and convolutional noise using parallel model combination. Computer Speech and language, 9, 289-307, 1995.
  • C. H. Lee, J. L. Gallvain. Baysian adaptive learning and MAP estimation of HMM. In C.H. Lee, F. K. Soong, K. K. Paliwal, editors, Automatic Speech and Speaker recognition: Advanced Topics, chapter 4. Kluwer Academic Publishers, 1996.
  • B. H. Juang, C. H. Lee, C. H. Lin. A study of speaker adaptation of the parameters of continuous density hidden Markov models. IEEE trans. Acoustic, Speech, Signal Proc., 39(4), 806-814, April, 1991.
  • A. Sankar, C. H. Lee. A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Trans on Audio and Speech Forcasting, 4(3), 190-202, 1996.
  • B. Juang, W. Chou, C. H. Lee. Minimum classification error rate methods for speech recognition. IEEE trans. Speech and Audio Proc. T-SAP, 5(3), 257-265, May 1997.
  • B-H. Juang, S. Katagiri. Discriminative learning for minimum error training. IEEE Trans. Signal Proc. 40(12), 3043-3054, December, 1992.
  • Ji Ming, Peter O'Boyle, Marie Owens, F. Jack Smith. A Bayesian approach for building triphone models for continuous speech recognition. IEEE Trans. on Speech and Audio Processing, vol. 7, 678-684, 1999.
  • L. Lamel, J. L. Gauvain. Large-vocabulary continuous speech recognition: Advances and applications. Proceedings of IEEE, vol. 88, 1181-1200, 2000.
  • R. Lippmann. An introduction to computing with neural nets. IEEE ASSP Magazine, vol. 4, 4-22, April 1987.
  • R. Lippmann, W. Huang, B. Gold. A neural net approach to speech recognition. Int. Conf. ASSP, 99-102, 1988.
  • D. J. Burr. Experiments on neural net recognition of spoken and written text. IEEE Trans. ASSP, vol. 36, no. 7, 1162-1168, 1988.
  • Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K. J. Phoneme recognition using time-delay neural networks. IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 37(3), 328-339, 1989.
  • Levin, E. Word recognition using hidden control neural architecture. IEEE Conf. on Acoustics, Speech, and Signal Processing, ICASSP-90, vol. 1, 433-436, 1990.
  • Islam, R., Hiroshige, M., Miyanaga, Y., Tochinai, K. Phoneme recognition using modified TDNN and a self-organizing clustering network. IEEE Int. Symp. on Circuits and Systems, ISCAS 95, vol. 3, 1816-1819, 1995.
  • B. A. Pearlmutter. Learning State Space Trajectories in Recurrent Neural Networks. Neural Computation, vol. 1, 365-372, 1989.
  • Hasegawa H., Inasumi, M. Speech Recognition By Dynamic Recurrent Neural Networks. International Joint Conference on Neural Networks, IJCNN'93, vol. 3, 2219 -2222, 1993.
  • Jaing Minghu. Fast Learning Algorithms for Time-Delay Neural Networks Phoneme Recognition. Proc. ICSP, 730-733, 1998.
  • H. Iwamida, S. Katagiri, E. McDermott, Y. Tohkura. A hybrid speech recognition system using HMM with an LVQ trained code book. J. Acoust. Soc. Japan, vol. 11, no. 5, 277-286, 1990.
  • Katagiri.S, Lee. C. H. A new hybrid algorithm for speech recognition based on HMM segmentation and learning vector quantization. IEEE Trans. on Speeech and Audio Processing, vol. 1, no. 4, 421-430, 1993.
  • C. Dugast, L. Devillers, X. Aubert. Combining TDNN and HMM in a hybrid system for improved continuous speech-recognition system. IEEE Trans. on Speeech and Audio Processing, vol. 3, no. 1, 1994.
  • G. Zavaliagkos, Y. Zhao, R. Schwart, J. Makhoul. A hybrid segmental neural net/hidden Markov model system for continuous speech recognition. IEEE Trans. on Speeech and Audio Processing, vol. 2, no. 1, 151-160, 1994.
  • J. Suzuki, K. Nakata. Recognition of Japanese vowels-preliminary to the recognition of speech. J. Radio Res. Lab, 37(8), 193-212, I961.
  • K. Nagata, Y. Kato, S. Chiba. Spoken digit recognizer for Japanese language. NEC Res. Develop., 6, 1963.
  • Jean-Marc Boite, Christophe Ris. Development of a French Speech Recognizer Using a Hybrid HMM/MLP System. ESANN'1999 proceedings -European Symposium on Artificial Neural Networks, Bruges (Belgium), 441-446, 21-23 April 1999.
  • Solomon Teferra Abate, Wolfgang Menzel. Syllable-Based Speech Recognition for Amharic. Proc. Of the 5th workshop on important unresolved matters, 33-40, Prague, Czeech Republic, June, 2007.
  • C. Chandra Sekhar, J. Y. Siva Rama Krishna Rao, Recognition of Consonant-Vowel (CV) units of speech in Indian languages, Proc. National seminar on Information Revolution and Indian Languages, Hyderabad, 22.1-22.6, Nov. 12-14, 1999.
  • Suryakanth V. Gangashetty, B. Yegnanarayana, Neural network models for recognition of Consonant-Vowel (CnV) utterances, INNS-IEEE International Joint Conference on Neural Networks, Washington DC, July 14-19, 2001.
  • A. Nayeemulla Khan, Suryakanth V. Gangashetty, S. Rajendran. Speech database for Indian languages -A preliminary study. Proc. Int. Conf. Natural Language Processing, IIT Bombay, Mumbai, 295-301, Dec. 2002.
  • S. R. M. Prasanna, J. M. Zachariah, B. Yegnanarayana. Begin-end detection using vowel onset points. In Proc. Workshop on Spoken Language Processing, TIFR, India, Jan. 2003.
  • S. V. Gangashetty, K. Sreenivasa Rao, A. Nayeemulla Khan, C.Chandra Sekhar, B.Yegnanarayana. Combining evidence from multiple modular networks for recognition of consonant-vowel units of speech. Int. Joint Conf. Neural Networks, Portland, USA, July 2003.
  • Samudravijaya K, P. V. S.Rao, S. S. Agrawal. Hindi Speech Database. Proc. Int. Conf. on Spoken Language processing, Beijing, China, October 2000.
  • Samudravijaya K. Computer Recognition of Spoken Hindi. Proc. Int. Conf. Speech, Music and Allied Signal Processing, Thiruvananthapuram, 8-13, Dec. 2000.
  • Samudravijaya K. Hindi Speech Recognition. J. Acoustic Society of India, vol. 29, issue 1, 385-393, 2001.
  • Samudravijaya K. Durational Characteristics of Hindi Phonemes in Continuous Speech. Sept. 2003.
  • Chalapathi Neti, Nitendra Rajput, Ashish Verma. A large vocabulary continuous speech recognition system for Hindi. IIT, Bombay, Jan. 2000.
  • Paul Mermelstein. Automatic segmentation of speech into syllabic units. J. Acoust. Soc. Am., vol. 58, no. 4, October 1975.
  • Andre G. Adamiand and Hynek Hermansky. Segmentation of speech for speaker and language recognition. EUROSPEECH, Geneeva, 2003.
  • Ramil G. Sagum, Ryan A.Ensono, Emerson M. Tan, Rowena Christiana L. Guevara. Phoneme alignment of Filipino speech corpus. Conference on convergent technologies for Asia Pacific Region, vol. 3, 964-968, Oct. 2003.
  • A.Lipeika, G.Tamulevicius. Segmentation of words into phones. ISSN 1392-1215, Electronics and Electrical Engineering, 1(65), 2006.
  • R. W. Schafer, L. R. Rabiner. Digital representations of speech signals. Proc. IEEE, vol. 63, no. 4, April 1975.
Еще
Статья научная