Open Access Open Access  Restricted Access Subscription or Fee Access

An Automatic Tool for High Quality Arabic Speech Synthesis

Abdelkader CHABCHOUB, Adnen CHERIF

Abstract


Speech synthesis TTS (text-to-speech) is the process of
converting the written text into machine generated synthetic speech.Concatenative speech synthesis systems render speech byconcatenating pre-recorded speech units. This work describes the Arabic TTS synthesis system. This system uses an automatic tool based on concatenation of the Arabic diphone with MBROLA synthesizer. The quality of a synthesized speech is improved by analyzing the spectrum features of voice source in various F0 ranges and timbres in detail. It generates speech synthesis based on estimation
and optimization of the Arabic prosody by classifying the voice source into different types. The developed model enhances the quality of the naturalness, and the intelligibility of speech synthesis in various speaking environment.


Keywords


Analysis, Synthesis, Diphone, Prosody, Formant, Pitch, Timbre, Mbrola. Arabic Speech.

Full Text:

PDF

References


A.R. Greenwood, “Articulatory Speech Synthesis Using Diphone Units”,IEEE international Conference on Acoustics, Speech and SignalProcessing, 1997,pp. 1635–1638.

X. Huang, A. Acero and H. W. Hon, Spoken Language Processing, Prentice Hall PTR, New Jersey, 2001.

M. Assaf, “A Prototype of an Arabic Diphone Speech Synthesizer in Festival,” Master Thesis, Department of Linguistics and Philology, Uppsala University, 2005.

M. Al-Zabibi, “An Acoustic–Phonetic Approach in Automatic Arabic Speech Recognition”, the British Library in Association with UMI, 1990.

M. Elshafi, H. Al-Muhtaseb, M. Al-Ghamdi, “Techniques for high quality Arabic speech synthesis”, Information Sciences, 2002, 140,255-267, Elsevier.

T. Dutoit, V. Pagel, N. Pierret, F. Bataille, O. Van der Vrecken, The MBROLA Project: Towards a Set of High-Quality Speech Synthesizers Free of Use, 1996.

T. Dutoit, “An Introduction to Text-To-Speech Synthesis”. Dordrecht: Kluwer Academic Publishers, 1997.

T.Dutoit,TheMBROLAproject., 2005 accessed, 2010.

G. Demenko, S. Grocholewski, A. Wagner & M. Szymański, “Prosody Annotation for Corpus Based Speech Synthesis”, In: Proceedings of the Eleventh Australasian International Conference on Speech Science and Technology. Auckland, New Zealand, 2006, pp. 460-465.

Boersma, P. and D. Weenink. PRAAT, a system for doing phonetics by computer. Glot International.2001. 5(9/10): 341-345.

J. Bachan, & D. Gibbon, “Close Copy Speech Synthesis for Speech Perception Testing”, In: Investigationes Linguisticae, 2006, vol. 13, pp. 9--24.

L. Welling, H. Ney, “Formant Estimation for Speech Recognition”, IEEE Trans. On Speech and Audio Processing, 1998, Vol.6, No.1.

H. Fujisaki, Recent Research towards, “Advanced Man-Machine Interface through Spoken Language”, Elsevier Science, 1996.

J. Walker, P., Murphy “A review of glottal waveform analysis. In:Progress in Nonlinear Speech Processing”, 2007.

K. S. Rao and B. Yegnanarayana, “Intonation modeling for Indian languages”, in Proccedings of Interspeech’04, Jeju Island, K0rea, pp733-736 October 2004.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.