Delta Mel Frequency Cepstral Coefficient based Feature Extraction Algorithm for Continuous Tamil Speech Recognition

M. Kalamani; M. Krishnamoorthi; R.S. Valarmathi

Delta Mel Frequency Cepstral Coefficient based Feature Extraction Algorithm for Continuous Tamil Speech Recognition

M. Kalamani, M. Krishnamoorthi, R.S. Valarmathi

Abstract

Continuous Speech Recognition for human-machine interface still remains a challenging problem today. It requires more sophisticated pre processing such as feature extraction techniques to overcome the challenging problem faced by recognizer under different environmental conditions. In this paper, the Delta Mel Frequency Cepstral Coefficient based feature extraction method for continuous tamil speech recognition is proposed to extract the predominant features. Performance measures are evaluated for different speech recognition models with various MFCCs per frame. From the evaluated results, it is observed that the proposed delta MFCC (26 MFCCs per frame) provides significant improvement for all models under various speech signal environments.

Keywords

Delta Features, Energy, Feature Extraction, Speech Recognition.

Full Text:

PDF

References

M.P.Kesarkar, “Feature extraction for speech recognition”, Technical Credit Seminar Report, Electronic Systems Group, IIT Bombay, 2003.

S.Furui, “Speaker-independent isolated word recognition using dynamic features of speech spectrum”, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 34, no. 1, pp. 52-59, 1986.

S.Davis and P.Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 28, no. 4,pp. 357-366, 1980.

H.Hermansky, “Perceptual linear predictive (PLP) analysis for speech”, Journal of Acoustic Society of America, vol. 87, no. 4, pp. 1738–1752, 1990.

H.Hermansky and N.Morgan, “RASTA processing of speech”, IEEE Transactions on Speech and Audio Processing, vol. 2, no. 4, pp. 578-589, 1994.

K. Koishida, K. Tokuda, T. Kobayashi, and S. Imai, “Celp coding based on me1 cepstral analysis,” in the proc. of ICASSP, pp. 33, 1995.

R.S.Kurcan, “Isolated word recognition from in-ear microphone data using Hidden Markov Models”, Ph.D. thesis, Naval Postgraduate School Monterey CA, 2006.

M.A.Hossan, S.Memon and M.A.Gregory, “A novel approach for MFCC feature extraction”, in the proc. of IEEE fourth international conference on signal processing and communication systems, pp. 1-5, 2010.

J. Chen , K. K. Paliwal, M. Mizumachi and S. Nakamura, "Robust MFCCs derived from differentiated power spectrum " Eurospeech 2001, Scandinavia, 2001.

Wang Chen, Miao Zhenjiang and Meng Xiao, "Differential MFCC and vector quantization used for real-time speaker recognition system," in the proc. of Congress on Image and Signal Processing, pp. 319 – 323, 2008.

S.Sunny, P.S.David and K.P.Jacob, “Feature Extraction Methods Based on Linear Predictive Coding and Wavelet Packet Decomposition for Recognizing Spoken Words in Malayalam”, in the proc. of IEEE international conference on advances in computing and communications, pp. 27-30, 2012.

D.Chazan, R.Hoory, G.Cohen and M.Zibulski, “Speech reconstruction from mel frequency cepstral coefficients and pitch frequency”, in the proc. of IEEE international conference on acoustics, speech, and signal processing, vol. 3,pp. 1299-1302, 2000.

M.J.Alam, T.Kinnunen, P.Kenny, P.Ouellet and D.O’Shauaghnessy, “Multitaper MFCC and PLP features for speaker verification using i-vectors”, Speech Communication, vol. 55, no. 2, pp. 237-251, 2013.

G. N. Ramaswamy and P. S. Gopalakrishnan,”Compression of acoustic features for speech recognition in network environments,” in the proc. of ICASSP, 1998.

K.Paliwal and K.Wójcicki, “Effect of analysis window duration on speech intelligibility”, IEEE Signal Processing Letters, vol. 15, pp.785-788, 2008.

Ran D. Zilca, Jiri Navratil and Ganesh N. Ramaswamy, “Depitch and the role of fundamental frequency in speaker recognition”, in the proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), vol. 2, pp. II - 81-84, 2003.

Samuel Kim and Thomas Eriksson, "A pitch synchronous feature extraction method for speaker recognition," in the proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), vol. vol.1 pp. I - 405-408, 2004.

Q.Zhu and A.Alwan, “Non-linear feature extraction for robust speech recognition in stationary and non-stationary noise”, Computer speech & language, vol. 17, no. 4, pp. 381-402, 2003.

X.Shao and B.Milner, “Clean speech reconstruction from noisy mel-frequency cepstral coefficients using a sinusoidal model”, in the proc. of IEEE international conference on acoustics, , speech and signal processing, vol. 1, pp. I- 704-707, 2003.

B.Milner and X.Shao, “Prediction of fundamental frequency and voicing from mel-frequency cepstral coefficients for unconstrained speech reconstruction”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 1, pp. 24-33, 2007.

A.C.Lindgren, “Speech recognition using features extracted from phase space reconstructions”, Dissertation, Marquette University Milwaukee, Wisconsin, 2003.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me