

Text-Independent Speaker Identification using Residual Feature Extraction Technique
Abstract
The Mel Frequency Cepstral Coefficients (MFCC)
parameters are derived mainly to represent the spectral envelope or formant structure of the vocal tract system. In this paper, a new feature extraction technique WOCOR is proposed to capture the spectro temporal source excitation characteristics embedded in the linear predictive (LP) residual signal. The vocal Source Wavelet Octave Coefficients Of Residues (WOCOR) information contains pitch frequency and phase in the residual signal. WOCOR features are called vocal source feature because they are dependent on the source of the speech namely the pitch being generated by the vocal
folds. WOCOR is generated by applying pitch synchronous wavelet transform to the residual signal. Pitch Synchronous wavelet transform is used to capture the spectro temporal characteristics of the excitation signal. Experimental evaluation is carried out on TIMIT database with 630 speakers using Gaussian Mixture Model (GMM) and Naive Bayesian Classifier. Experimental results show that, speaker identification based on GMM modeling out performs Naive
Bayesian classifier based speaker identification. Comparatively an increased in speaker identification efficiency of 6.69% is achieved with GMM modeling for WOCOR feature extraction.
Keywords
References
R.Shantha Selva Kumari, S.Selva Nidhyananthan and G.Jaffino,”Vocal
Source Feature Extraction for Robust Speaker Identification,
“International conference for AEEICB’12.
D.O.Shaughnessy,”Speaker Recognition,” IEEE Acoustic speech signal
process, Mag.,vol.3,no.4, pp 4-7, oct-1986.
Douglas O’ Shaughnessy,”Speech Communication Human and
Machines,” II nd edition, Universities press (India) Limited (2001).
Ning Wang,” Robust Speaker Recognition using denoised vocal source
and vocal tract features,” IEEE Transactions on audio, speech and
language processing vol.19,no.1,Jan.2011.
D.A.Reynolds and R.C.Rose published a paper,” Robust textindependent
speaker identification using Gaussian mixture speaker
models,” IEEE Transactions on speech audio processing, vol.3, 1995, pp
-83.
A.E.Rosenberg et al.,” Connected word talker verification using whole
word Hidden Markov Models,” in Proc.ICASSP, 1991, pp 381-384.
Tomoko Matsui and Sadaoki Furui,” Comparison of Text independent
speaker Recognition methods using VQ Distortion and Discrete
Continuous HMM’s,” IEEE Transactions on speech and audio
processing, vol.2, no.3, July1994.
L.Baird, D.Smalenberger, S.Ingkiriwang, “One-step Neural network
inversion with pdf learning and emulation”, IEEE International
conference, vol.2, Aug.2005.
Jesper Kjaer Nielsen, Mads Graesboll Christensen, A.Taylan Cemgil,
Simon J.Godstill and Soren Holdt Jensen,” Bayesian Interpolation and
parameter estimation in a dynamic sinusoidal model,” IEEE
Transactions on audio, speech and language processing, vol.19, no.7,
September 2011.
Wai Nang Chan, Nengheng Zheng and Tan Lee,” Discrimination power
of vocal source and vocal tract related features for speaker
segmentation,” IEEE Transactions on audio, speech and language
processing, vol.15, no.6, august 2007.
Nengheng Zheng, Tan Lee and P.C.Ching,” Integration of
complementary acoustic features for speaker Recognition,” IEEE signal
processing letters, vol.14, no.3, march 2007.
Nengheng Zheng, P.C.Ching and Tan Lee,” Time-Frequency analysis of
vocal source signal for speaker Recognition,” in Proc.ICSLP 2004,
pp.2336.
L.Daubechies, Ten Lectures on wavelets. Philadelphia, PA: SIAM,
Lawrence R.Rabiner, Ronald W.Schafer,” Introduction to Digital Speech
Processing,” vol.1, Nos. 1-2 (2007)1-194.
C.Miyajima, Y.Hattori, K.Tokuda, T.Kabayashi and T.Kitamura,” Text-
Independent speaker identification using Gaussian mixture models based
on multispace probability distribution,” IEEE Transactions on
information and system, vol.E84-B, 2001, pp.847-855.
Ching Tang Hsieh, Eugene Lai and You Chuang Wang,” Robust
Speaker identification system based on wavelet transform and Gaussian
mixture model,” Journal of Information science and Engineering 19,
-282 (2003).
B.S.Atal,”Effectiveness of linear prediction characteristics of the speech
wave for automatic speaker identification and verification,” The journal
of Acoustical society of America, vol.55, no.6, pp.1304-1312, 1974.
Nengheng Zheng, Ning wang, Tan Lee and P.C.Ching,”Speaker
Verification using complementary information from vocal source and
vocal tract,” IEEE conference ISCSLP 2006.
Ke Chen, Lan Wang and Huisheng Chi,”Methods of combining multiple
classifiers with different features and their applications to textindependent
speaker identification,” International Journal of pattern
Recognition and artificial Intelligence, vol.11, no.3, pp.417-445, 1997.
Ke Chen, Lan Wang and Huisheng Chi,”Methods of combining multiple
classifiers with different features and their applications to textindependent
speaker identification,” International Journal of pattern
Recognition and artificial Intelligence, vol.11, no.3, pp.417-445, 1997.
Refbacks
- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.