Open Access Open Access  Restricted Access Subscription or Fee Access

Effect of Speech Rate on the Speech Recognition Accuracy: A Review

N. Usha Rani, P.N. Girija

Abstract


Humans can understand different speaking styles and peaking rates (Normal, Fast and Slow) of speech, but for the computers it is difficult task to recognize the speech accurately. Many variations such as co-articulation, physiology of speaker‟s formant frequencies between speaker vocal tracts will affect the accuracy of speech recognition. In this review, it focuses on variations with in the acoustic signal that makes speech recognition task difficult. In this survey, more research works was summarized relating to the effect of speech rate in speech recognition accuracy. In this the features which mainly affect the recognition accuracy and the different techniques for improving the speech recognition accuracy are discussed in the proposed research activities. In order to get better accuracy for the speech recognition, the role of pronunciation dictionary is also discussed. It can be used as a source of reference material.

Keywords


Pronunciation Dictionary, Pronunciation Variation, Speech Rate, Speech Recognition and Speech Rate Normalization.

Full Text:

PDF

References


M. Benzeguiba, R.De Mori, O.Deroo, S.Dupont, T.Erbes, D.Jouvet,L.Fissore, P.Laface, A.Mertins, C.Ris, R.Rose, V.Tyagi, C.Wellekens, “Automatic Speech Recognition and Intrinsic Speech Variation”, In Proceedings of ICASSP, pp. 1021-1024, 2006.

Teemu Hirisimaki and Mikko Kurimo, “Analyzing Recognition Errors in Unlimited Vocabulary Speech Recognition”, In Proceedings of NAACL HLT, pp. 193-196, 2009.

Nivja H. De Jong, “Praat Script to detect syllable nuclei and measure speech rate automatically”, The Psychonomic Society Inc., pp. 385-390, 2009.

Nivja H. de Jong and Ton Wempe, “Automatic measurement of speech rate in spoken Dutch”, In Proceedings of ACLC Working Papers, pp.51-60, 2007.

Matin Adda-Decker, Philippe Boula de Mareuil, Gilles Adda and Lori Lamel, “Investigating syllabic structure and its variation in speech from French radio interviews”, In PMLA, pp. 89-94, 2002.

Petti Vayrynen, Johannes Peltola and Tapio Seppanen, “Enhancing Phoneme Recognizer performance with a Simple Rule-based Language model”, In Proceedings of STeP- Finnish Artificial Intelligence Days, pp. 171-178, 2000.

C. J.van Heerden and E. Barnard, “Speech rate normalization used to improve speaker verification:, Journal of SAIEE, pp. 129-135, 2007.

Janne, “Phone Duration Modeling Techniques in Continuous Speech”, Masters Degree thesis, Helsinki University of Technology, 2004.

Steven Greenberg, Shuangyu Chang and Joy Hollenback, “An Introduction to the Diagnostic Evaluation of Switchboard-Corpus Automatic Speech Recognition Systems”, In Proceedings of NIST Speech Transcription Workshop, 2000.

Francois Pellegrino, J.Farinos and J L Rousas, “Automatic Estimation of Speaking Rate in Multilingual Spontaneous Speech”, In Proceedings of International Conference on Speech Prosody, 2004.

Rebecca Scarborough, Jason Brenier, Yuan Zhao, Lauren Hall-Lew, Olga Dmitrieva, “An Acoustic study of Real and Imagined Foreigner Directed Speech”, In Proceedings of ICPHs, pp. 1673-1678, 2007.

Diego Giuliani, Matteo Gerosa and Fabio Brugnara, “Improved automatic speech recognition through speaker normalization”, Computer Speech and Language,, pp.107-123, 2005.

Rachel M Theodore, Joanne L. Miller and David DeSteno, “The effect of speaking rate on voice-onset-time is talker specific”, In Proceedings of ICPHs XVI, pp.473-476, 2007.

K.Sreenivasa Rao and B.Yegnanarayana, “Modeling Syllable Duration in Indian Languages using Neural Networks”, Computer Speech and Language, vol 21, pp.282-295, 2007.

Yukari Hirata, “Effect of speaking rate on the vowel length distinction in Japanese”, Journal of Phonetics, pp. 565-589, 2004.

Cedric Gendrot and Martine Adda-Decker, “Impact of Duration and Vowel Inventory Size on Formant Values of Oral Vowels: An Automated Formant Analysis from Eight Languages”, ICPHs, pp. 1481-1484, 2007.

Sabato Marco Siniscalchi, Jinyu and Chin-Hui Lee, “A study on Lattice Rescoring with knowledge scores for Automatic Speech Recognition”, INTERSPEECH, 2007.

Dekens, Tomas, Mike Demol, Werner Verhelst and Piet Verhoeve, “A Comparative Study of Speech Rate Estimation Techniques”, INTERSPEECH, 2007.

Matthew A. Siegler, “Measuring and Compensating for the Effects of Speech Rate in Large Vocabulary Continuous Speech Recognition”, Masters Report, CMU, 1995.

Eric Fosler-Lussier, Steven Greenberg and Nelson Morgan, “Incorporating Contextual Phonetics into Automatic Speech Recognition”, ICPHs, pp.1611-1614, 1999.

Andreas Stolcke, Elizabeth Shriberg, “Statistical Language Modeling for Speech DIsfluencies”, IEEE ICASSP, pp. 405-408, 1996.

V.R. Rao Gadde, Modeling Word Durations for better Speech Recognition”, In Proceedings of NIST Speech Transcription Workshop, 2000.

V.R. Rao Gadde, “Modeling Word Durations”, In Proceedings of International Conference on Spoken Language Processing, vol 1, pp. 601-604, 2000.

Janse, E., “Word perception in fast speech: Artificially time-compressed vs naturally produced fast speech”, Speech Communication 42(2), pp. 155-173, 2004.

Mirghafori, N., Fosler, E., Morgan N., “Fast speakers in large vocabulary continuous speech recognition: analysis & antinodes”, In Proceedings of Eurospeech, pp.491-494, 1995.

Mirghafori, N., Fosler, E., Morgan N., “Towards robustness to fast speech in ASR”, In Proceeding of ICASSP, pp. 335-338, 1998.

E.Fosler-Lussier, S.Greenberg and N.Morg, “Incorporating Contextual Phonetics into Automatic Speech Recognition”, International Congress of Phonetic Sciences, pp. 1611-1614, 1999.

Hauke Schramm, Xavier Aubert, Bart Bakker, Carsten Meyer, Hermann Ney, “Modeling spontaneous speech variability in Professional Dictation”, In Proceedings of Speech Communication, pp. 493-515, 2005.

Ki-Seung, “Robust Recognition of Fast Speech”, IEICE Transactions on Information and System, vol 89-D, pp. 2456-2459, 2006.

Cosmin Munteanu, Gerald Penn, Xiaodam Zhu, “Improving Automatic Speech Recognition for Lectures through Transformation-based Rules Learned from Minimal Data”, In Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, Singapore, pp. 764-772, 2009.

Masanobu Nakamura, Sadaoki Furui and Koji Iwano, “Acoustic and Linguistic characterization of Spontaneous Speech”, In Proceeding of ISCA Tutorial and Research workshop on Speech Recognition and Intrinsic Variation, France, 2006.

Hiroki Yamazaki, Koji Iwano, Koichi Shinoda, Sadaoki Furui and Haruo Yokota, “Dynamic Language Model Adaptation using presentation slides for lecture speech recognition”, INTERSPEECH, Belgium, pp. 2349-2352, 2007.

Sharon Goldwater, Dan Jurafsky and Christopher D. Manning, “Which words are hard to recognize? Prosodic, lexical and disfluency factors that increase ASR error rate”, In Proceedings of the Association for Computational Linguistics, pp. 380-389, 2008.

Keith Vertanen, “Speech and Speech Recognition during Dictation Corrections”, INTERSPEECH, ICSLP, pp. 1890-1893, 2006.

Ralf Winkler, “Influences of pitch and speech rate on the perception of age from voice”, In Proceedings of ICPHs, pp.1849-1852, 2007.

Markpong Jongtaveesataporn, chai wutiwiwatchai, Koji Iwano, Sadaoki Furui, “Thai Broadcast News Corpus Construction and Evaluation”, In Proceedings of 6th LREC, pp. 1249-1254, 2008.

Hiroaki Nanjo and Tatsuya Kawahara, “Speaking-Rate Dependent Decoding and Adaptation for Spontaneous Lecture Speech Recognition”, IEE, pp. 725-728, 2002.

Nigel G.Ward and S.Kumar Mamidipally, “Factors Affecting Speaking-Rate Adaptation in Task-Oriented Dialogs”, ISCA-Speech Prosody, pp.323-326, 2008.

E. Ramaraj and E.Chandra, „Speech Recognition Standards Procedures, Error Recognition and Repair Strategies”, International Journal of Soft Computing, pp. 88-95, 2007.

Jiahong Yuan, Mark Liberman, Christopher Cieri, “Towards an Integrated Understanding of Speaking Rate in Conversation”, In Proceedings of INTERSPEECH, pp. 541-544, 2006.

T.Pfau, R.Falthauser and G.Ruske, “A Combination of speaker normalization and speech rate normalization for automatic speech recognition”, In Proceedings of ICSLP, vol 4, pp.362-365, 2000.

M.Richardson, M.Hwang, A.Acero and X.D.Huang, “ Improvements on speech recognition for fast talker”, 6th European Conference on Speech Communication and Technology, Budapest, Hungary, vol 1, pp. 411-414, 1999.

Eric Fosler-Lussier, Nelson Morgan, „Effect of speaking rate and word frequency on pronunciations in conventional speech”, Speech Communication, pp. 137-158, 1999.

Verhelst W. and Roelands M., “An Overlap-Add Technique based on Waveform Similarity (WSOLA) for High Quality Time-Scale Modification of Speech”, In Proceedings of International Conference on Acoustic Speech and Signal Processing, pp.554-557, 1993.

Mike Demol, Werner Verhelst, Kris Struyve, Piet Verhoeve, “Efficient Non-Uniform Time-Scale of Speech with WSOLA”, Symposium on Computer Assisted Learning, In ICALL, 2004.

Morgan N, Fosler E and Mirghafori N, “Speech Recognition using on-line Estimation of Speaking Rate”, EUROSPEECH, pp. 2079-2082, 1997.

Dagen Wang, Shrikanth Narayanan, “Speech Rate Estimation via Temporal Correlation and Selected Sub-Band Correlation”, ICASSP, pp.413-416, 2005.

J.Zheng, H.Franco and A.Stolcke, „Modeling Word-level Rate-of-Speech Variation in Large Vocabulary Conversational Speech Recognition”, Speech Communication, vol. 41, pp. 273-285, 2003.

J.Zheng, H.Franco and A.Stolcke, “Rate-dependent acoustic modeling for large vocabulary conversational speech recognition”, In Proceedings of ISCA Tutorial and Research Workshop on Automatic Speech Recognition: Challenges for the new millennium, pp.145-149, 2000.

A. Bell, D. Jurafsky, E. Fosler-Lussier, C.Girand, M.Gregory and D.Gildea, “Effects of disfluencies, predictability and utterance position on word form variations in English conversation”, The journal of the Acoustic Society of America, vol 113, no 2, pp. 1001-1024, 2003.

Caitlin G.O‟Neill, “Dialectal Variation in Speaking Rate”, A Senior Honors Thesis, The Ohio State University, 2008.

Koreman, Jacques, “Perceived speech rate: the effects of articulation rate and speaking style in spontaneous speech”, Journal of the Acoustic Society of America, 119(1), pp.582-596, 2006.

Mile Demol, Werner Verhelst, Kris Struyye, Piet Verhoeve, “Efficient Non-Uniform Time-Scaling of Speech with WSOLA”, In International Conference on Speech and Computers (SPECOM), pp. 163-166, 2005.

Stephane Dupont, Christophe Ris, Laurnent Couvreur and Jean-Marc Boite, “A study of implicit and explicit modeling of co articulation and pronunciation variation”, INTERSPEECH, 2005.

Yaodong Zhang and James R.Glass, “Speech Rhythm Guided Syllable Nuclei Detection”, ICASSP, IEEE, pp. 3797-3800, 2009.

Nipa Chowdhury, Md. Abdus Sattar, Arup Kanti Biswas, “Separating Words from Continuous Bangla Speech”, Global Journal of Computer Science and Technology, vol 4, pp. 172-175, 2009.

Md. Mijanur Rahman, Md.Farukuzzaman Khan and Md Ali Moni, “Speech Recognition Front-End for Segmenting and Clustering Continuous Bangla Speech”, Daffodil International University Journal of Science and Technology, vol 5, pp. 67-72, 2010.

M.S. Salam, Dzulkifli Mohammad and S.H.Salleh, “Improved statistical Speech Segmentation using connectionist approach”, Journal of Computer Science, pp. 275-282, 2009.

Jun Ogata and Masataka Goto, “Speech Repair: Quick Error Correction just by using selection operation for Speech Input Interfaces”, INTERSPEECH, pp. 133-136, 2005.

Marelie Davel and Olga Martirosian, “Pronunciation Dictionary Development in Resource-Scarce Environments”, In Proceedings of INTERSPEECH, pp. 2851-2854, 2009.

Teemu Hirisimaki, Mathias Creutz, Vesa Siivola, Mikko Kurima, Sami Virpioja and Janne Pylkkonen, “Unlimited vocabulary speech recognition with morph language models applied to Finnish”, Computer Speech and Language, vol 20(4), pp. 515-541, 2006.

Zheng Chen, Mingjing Li, Kai-Fu Lee, “ Discriminative Training on Language Model”, International Conference on Spoken Language Processing, vol 1, pp. 493-496, 2000.


Refbacks

  • There are currently no refbacks.