Robust Features for Automatic Text-Independent Speaker Recognition using Ergodic Hidden Markov Models (HMMs)

R. Rajeswara Rao; A. Prasad; Ch. Kedari Rao

Robust Features for Automatic Text-Independent Speaker Recognition using Ergodic Hidden Markov Models (HMMs)

R. Rajeswara Rao, A. Prasad, Ch. Kedari Rao

Abstract

In this paper, robust feature for text-independent speaker
recognition has been explored. Through different experimental studies, it is demonstrated that, these robust features captures speaker specific information effectively by using Ergodic hidden Markov models (HMMs). The study on the effect of feature vector size for good speaker recognition demonstrates that, feature vector size in the range of 18-22 can capture speaker specific related information effectively for a speech signal sampled at 16 kHz, it is established that
the proposed speaker recognition system using robust features requires significantly less amount of training data during both in the training as well as in testing. Finally, the speaker recognition studies using robust features for different mixtures components, training and test durations have been exploited. We demonstrate the speaker recognition studies
on TIMIT database.

Keywords

Hidden Markov Models (HMMs), MFCC, Robust Features and Speaker.

Full Text:

PDF

References

S.Furui, “An overview of speaker recognition technology in Automatic

Speech and Speaker Recognition “(C.-H. Lee, F. K. Soong, and K. K.

Paliwal, eds.), ch. 2, pp.31-56, Boston: Kluwer Academic, 1996.

D. A. Reynolds, “The effects of handset variability on speaker recognition

performance: Experiments on switch board corpus,” in Proceedings of

IEEE Int. Conf. Acoust Speech, and Signal Processing, pp. 113-116,1996.

Dempster, A., Laird, N., and Rubin, D., “Maximum likelihood from

incomplete data via the EM algorithm,” Journal of the Royal Statistical

Society, vol. 39, pp. 1-38,1977.

Makhoul, J., 1975. Linear prediction: a tutorial review. Proc. IEEE 63,

-580.

Molau, S., Pitz, M., Schluter, R., and Ney, H., “Computing Mel-frequency

cepstral coefficients on the power spectrum,“ Proceedings of the IEEE

International Conference on Acoustics, Speech, and Signal Processing (

ICASSP), vol. 1, pp. 73-76, May. 2001.

Picone, J. W., “Signal modeling techniques in speech recognition,”

Proceedings of IEEE, vol. 81, no.9, pp. 1215-1247, Sep. 1993.

Doddington.G., Speaker Recognition based on idiolectal differences

between speakers. In Proc. 7th European Conference on SpeechCommunication and Technology (Eurospeech 2001) (Aalborg, Denmark,

September 2001), pp. 2521 – 2524.

Andrews.W., Kohler.M., Campbell.J., Godfrey.J., and Hernandez

–Cordero.J. Gender-dependent phonetic refraction for speaker

recognition. In Proc. Int. Conf. on Acoustics, Speech, and Signal

Processing (ICASSP 2002) (Orlando, Florida, USA, May 2002), vol. 1,

pp. 149-152.

Campbell.W., Campbell.J., Reynolds.D., Jones.D., and Leek.T., Phonetic

speaker recognition with support vector machines. In Advances in Neural

Information Processing Systems 16,

S.Thrun, L.Saul, and B.Scholkopf, Eds. MIT Press, Cambridge, MA,

Adami.A, Mihaescu.R, Reynolds.D, and Godfrey.J., Modelling Prosodic

dynamics for speaker recognition. In Proc. Int. Conf. on Acoustics,

Speech, and Signal Processing (ICASSP 2003) ( Hong Kong, China,

April 2003), pp. 788-791.

Chen, Z.-H., Liao, Y.-F., and Juang, Y.-T. Eigen-Prosody analysis for

robust speaker recognition under mismatch handset environment. In Proc.

Int. Conf. on Spoken language processing (ICSLP 2004) (Jeju, South

Korea, October 2004), pp. 1421-1424.

Shriberg, E., Ferrer, L., Kajarekar, S., Venkataraman, A., and Stolcke, A.

Modeling Prosodic feature sequences for speaker recognition. Speech

Communication 46, 3-4 (July 2005), 455-472.

Leung, K., Mak, M., Siu, M., and Kung, S. Adaptive articulatory feature –

based conditional pronunciation modeling for speaker verification.

Speech Communication 48, 1 (January 2006), 71-84.

MA, B., Zhu, D., Tong, R., and Li, H. Speaker cluster based GMM

tokenization for Speaker recognition. In Proc. Interspeech 2006 (ICSLP)

(Pittsburgh, Pennsylvania, USA, September 2006), pp. 505-508.

Torres-Carrasquillo, P., Reynolds, D., and Jr., J. D. Language

identification using Gaussian mixture model tokenization. In Proc. Int.

Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2002)

(Orlando, Florida, USA, May 2002), vol. 1, pp. 757-760.

Xiang, B. Text-independent speaker verification with dynamic trajectory

model. IEEE Signal Processing Letters 10 (May 2003), 141 -143.

Jin, Q., Schultz , T., and Waibel, A. Speaker identification using

multilingual phone strings. In Proc. Int. Int. Conf. on Acoustics, Speech,

and Signal Processing (ICASSP 2002) (Orlando, Florida, USA, May

, vol. 1, pp. 145-148.

Zissman, M. Comparison of four approaches to automatic language

identification of telephone speech. IEEE Trans. on Speech and Audio

Processing 4, 1 (January 1996), 31-44.

MA, B., Li, H., and Tong, R. Spoken language recognition with ensemble

classifier. IEEE Trans. Audio, Speech and Language Processing 15, 7

(September 2007), 2053-2062.)

Markel, J., Oshika, B., and A.H. Gray, J. Long-term feature averaging for

speaker recognition. IEEE Trans. Acoustics, Speech, and Signal

Processing 25, 4 (August 1977), 330-337.

Kinnunen, T., Hautamaki, V., and Franti, P. On the use of long-term

average spectrum in automatic speaker recognition. In 5th Int.

Symposium on Chinese Spoken Language Processing (ISCSLP’06)

(Singapore, Dec, 2006), pp.559-567.

Tomi Kinnunen., and Haizhou Li., An overview of Text-Independent

Speaker Recognition: from Features to Supervectors. Speech

Communication, July 1, 2009.

M. Forsyth and M. Jack, “Discriminating semi-continuous HMM for

speaker verification”, Proceedings of IEEE Int. Conf. on Acoust.,

Speech., and Signal Processing, vol. 1, pp. 313-316, 1994.

M. Forsyth, “Discriminating observation probability (dop) HMM for

speaker verification”, Speech Comm., vol. 17, pp. 117-129, 1995.

R, Rajeshwara Rao, “Automatic Text -Independent Speaker Recognition

using source features”, Ph.D. thesis., Jan-2010.

L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition.

Prentice-Hall, 1993.

MA, B., Li, H., and Tong, R. Spoken language recognition with ensemble

classifier. IEEE Trans. Audio, Speech and Language Processing 15, 7

(September 2007), 2053-2062.)

A. P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum likelihood from

incomplete data via the EM algorithm”, J. Royal Statist. Soc. Ser. B.

(methodologies), vol. 39, pp. 1-38, 1977.

K.N. Stevens, Acoustic Phonetics. Cambridge, England: The MIT Press,

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me