Automatic Speech Recognition using Vector Quantization Concept

Anjali Diwan; Bhargav Ravat

Automatic Speech Recognition using Vector Quantization Concept

Anjali Diwan, Bhargav Ravat

Abstract

Since even before the time of Alexander Graham Bell’s revolutionary invention, engineers and scientists have studied the phenomenon of speech communication with an eye on creating more efficient and effective systems of human-to-human and human-to-machine communication digital signal processing (DSP), assumed a central role in speech studies. The first step is the extraction of feature vectors based on MFCC. The second is the classification of feature vectors using Vector quantization. The extracted acoustic parameters from the voice signals are used as an input for the MFCC. The main advantage of this method is less computation time and possibility of real-time system development. This paper introduces the design and implementation of the system for recognizing pathological and normal voice. In this ASR system we have used Vector quantization Algorithm.

Keywords

Mel Frequency Cepstral Coefficient (MFCC), Acoustic Parameters, Speech Processing, Vector Quantization, ASR, Hamming Window, Feature Extraction

Full Text:

PDF

References

Speaker Recognition Application using MFCC GUI Concept – CIIT Journal (April – 2012) , Bhargav Ravat

Robust Speaker Recognition using MFCC –FFT- GUI Approach - National Conference On Recent Trends in Engineering And Technology, July 2012.

Isolated Word Speech Recognition Using VQ, Internatioanal Journal of Advance Research in Science And Software Engineering , Volume 2, Issue 5, May 2012

Pathological voice recognition for vocal fold Disease , International Journal Of Computer Application , Volume 47 – No 13 , June 2012

Nonlinear Speech Recognition: Overview and Applications, M. Faúndez-Zanuy, G. Kubin, W. B. Kleijn, P. Maragos, S. McLaughlin, A. Esposito, A. Hussain, J.

Lawrence Rabiner, Biing-Hwang Juang – „Fundamentals of Speech Recognition’

Wei Han, Cheong-Fat Chan, Chiu-Sing Choy and Kong-Pang Pun – „An Efficient MFCC Extraction Method in Speech Recognition’, Department of Electronic Engineering, The Chinese University of Hong Kong, Hong, IEEE – ISCAS, 2006

Leigh D. Alsteris and Kuldip K. Paliwal – „ASR on Speech Reconstructed from, Short- time Fourier Phase Spectra’, School of Microelectronic Engineering Griffth University, Brisbane, Australia, ICLSP - 2004

Waleed H. Abdulla – „Auditory Based Feature Vectors for Speech Recognition Systems’, Electrical & Electronic Engineering Department, The University of Auckland

Pradeep Kumar P and Preeti Rao – „A Study of Frequency-Scale Warping for Speaker Recognition’, Dept of Electrical Engineering, IIT- Bombay, National Conference on Communications, NCC 2004, IISc Bangalore, Jan 30 -Feb 1, 2004

Beth Logan – „Mel Frequency Cepstral Coefficients for Music Modeling’, Cambridge Research Laboratory, Compaq Computer Corporation

Keller, E.: “Fundamentals of Speech Synthesis and Speech Recognition”, John Wiley & Sons, New York, USA, (1994).

Markowitz, J.A.: “Using Speech Recognition”, Prentice Hall, (1996).

Yılmaz, C.: “A Large Vocabulary Speech Recognition System for Turkish“, MS Thesis, Bilkent University, Institute of Engineering and Science, Ankara, Turkey, (1999).

Mengüsoglu, E.: “Rule Based Design and Implementation of a Speech Recognition System for Turkish Language”, MS Thesis, Hacettepe University, Inst. for Graduate Studies in Pure and Applied Sciences, Ankara, Turkey, (1999).

Zegers, P.: “Speech Recognition Using Neural Networks”, MS Thesis, University of Arizona, Department of Electrical Engineering in the Graduate College, Arizona, USA, (1998).

Woszczyna, M.: “JANUS 93: Towards Spontaneous Speech Translation”, IEEE Electronics & communication Eng. Institute of technology, Nirma University _ Page 67 Proceedings Conference on Neural Networks, (1994).

Somervuo, P.: “Speech Recognition using context vectors and multiple feature streams”, MS Thesis, (1996).

Nilsson, M.; Ejnarsson, M.: “Speech Recognition Using HMM: Performance Evaluation in Noisy Environments”, MS Thesis, Blekinge Institute of Technology, Department of Telecommunications and Signal Processing, (2002).

Hakkani-Tur, D.; Oflazer, K.; Tur, G.:. “Statistical Morphological Disambiguation for Agglutinative Languages”, Technical Report, Bilkent University, (2000).

Ursin, M.: “Triphone Clustering in Continuous Speech Recognition”, MS Thesis, Helsinki University of Technology, Department of Computer Science, (2002).

www.dspguide.com/zipped.htm: “The Scientist and Engineer's Guide to Digital Signal Processing” (Access date: March 2005).

Brookes, M.: “VOICEBOX: a MATLAB toolbox for speech processing”, www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html, (2003).

Davis, S.; Mermelstein, P.: “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 4 (1980).

Skowronski, M.D.: “Biologically Inspired Noise-Robust Speech Recognition for Both Man and Machine”, PhD Thesis, The Graduate School of the University of Florida, (2004).

Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi , Voice Recognition Algorithms using Mel frequency Cepstral Coefficient (MFCC) and Dynamic Time Wraping (DTW) Techniques

Vibha Tiwari, MFCC and its application in Speaker Recognition.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me