Open Access Open Access  Restricted Access Subscription or Fee Access

Speech Emotion Recognition with Low Level and Prosodic Features: A Review

S. M. Jagdale

Abstract


Emotional speech recognition is the interesting area for human computer interaction. This paper is a survey of speech emotion classification with Low Level and Prosodic features, addressing important aspects in the design of a speech emotion recognition system such as the selection of suitable features for speech representation, selection of an appropriate classification scheme and the proper preparation of an emotional speech database for evaluating system performance. Also robust speech emotion recognition issue is addressed here in this paper. Speech emotion recognition has potentially wide application such as in banking, call centers, car board systems, computer games etc.


Keywords


Classification Scheme, Emotional Speech, Low Level, Prosodic, Features. Robust.

Full Text:

PDF

References


Moataz ElAyadi , MohamedS.Kamel , FakhriKarray , “Survey on speech emotion recognition: Features, classification schemes, and databases ”Elsvier journal on ‘Pattern Recognition’2011, pp.572-587

Tomi Kinnunen , Haizhou Li “An overview of text-independent speaker recognition: From features to supervectors”Elsevier on speech communation,2010,pp.12-40.

Chung-Hsien Wu, “Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels”, IEEE Trasaction on Affective Computing, Vol. 2, No. 1, 2011.

Elizabeth Shriberg,”Higher-Level Features in Speaker Recognition”Springer-Speaker Classification, 2007, pp.241-259.

Hanwu Sun, Bin Ma Haizhou Li”An Efficient Feature Selection Method for Speaker Recognition” IEEE conference on, 2008, pp.1-4.

Alfredo Maesa, Fabio Garzia, Michele Scarpiniti, Roberto C., “Text Independent Automatic Speaker Recognition System Using Mel-Frequency Cepstrum Coefficient and Gaussian Mixture Models, Journal of Information Security, 2012, 3, 335-340

Josef P. Campbell,D.A. Reynolds,R.B.Dunn;“Fusing High and Low Level Features for Speaker Recognition”,Eurospeech 2003,pp.2665-2668.

Chakroborty,S.;Roy,A.;Saha,G. “Fusion of a complementary feature set with MFCC for Improved Closed Set Text-Independent Speaker Identification ”IEEE International conference on Computing and Processing,2006,pp.387-389.

Tim Polzehl, A. Schmitt, F. Metze”Anger recognition in speech using acoustic and linguistic cues”Elsevier, Speech ommunication, 2011, pp.1198-1209.

W.M. Campbell; “Compensating for Mismatch in High Level Speaker Recognition”, Speaker and Language Recognition Workshop, IEEE Odyssey 2006.

Tomi Kinnunen, Evgeny Karpov, Pasi Franti ;”Real-Time Speaker Identification and Verification”, IEEE transaction on Audio, Speech, and Language Processing, 2006,Vol. 14,No. 1,pp.277-288.

Jeong-Sik Park, Ji- hwan Kim, Yung-Hwan Oh, “ Feature Vector Classification based speech emotion Recognition for Service robots”, IEEE Transaction on Consumer electronics, Vol. 55, No. 3, 2009, PP 1590-1596.

D.G.Romero, J.F. Aguilar, J. Gonzalez, J. O. Garcia;”Support Vector Machine Fusion of Idiolectal and Acoustic Speaker information in Spanish Conversational Speech”IEEE Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings, pp.II-229-232.

D.Reynolds, W. Andrews, J. Campbell, J. Navratil, B. Peskin, Andre Adami, Qin Jin, D.Klusacek;”The SuperSID Project: Exploring High Level Information for High accuracySpeaker Recognition”, IEEE conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings, pp.IV-784-787.

W.M. Campbell, J. P. Campbell, T.P.Gleason, D. A. Reynold;” Speaker Verification using Support Vector Machine and High-Level Features” IEEE transaction on Audio, Speech and Language Processing, 2007, Vol. 15, pp. 2085-2094.

Chung-Hsien Wu, Senior Member, IEEE, and Wei-Bin Liang,“Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels”IEEE transactions on affective computing, vol. 2, no. 1, january-march 2011,pp.10-21

Pawan Kumar,Mahesh Chandra;”Hybrid of Wavelet and MFCC Feature for Speaker Verification ”IEEE conference on Information and Communication Technologies , 2011 pp.1150-1154

Anurag Jain, Nupur Prakash, S.S.agrawal, “Evaluation of MFCC for Emotion Identification in Hindi Speech”IEEE 2011

Siqing wu, Tiago H. Falk, Wai-Yip Chan, “ Automatic Speech Emotion Recognition using modulation spectral features ” Elsevier Speech Communication, 2011, pp. 768-785.

A.Jain, N. Prakash,S.S.Agrawal’ “Evaluation Of MFCC for emotion Identification in Hindi Speech”, IEEE conference, 2011, pp. 189-193.

Carlos Busso, Soroosh Mariooryad, Angeliki Metallinou, Student, Shrikanth Narayanan’ “Iterative Feature Normalization Scheme for Automatic Emotion Detection from Speech” IEEE transaction on Affective Computing’Vol. X, 2012, pp.1-14.

Tauhidur Rahman and Carlos Busso, “A Personalized Emotion Recognition System using an unsupervised Feature adaptation scheme” IEEE conference on “Acoustic Speech and Signal Processing” 2012, pp. 5117-5120.

Soroosh Mariooryad, Carlos Busso, “Exploring Cross-Modality Affective Reactions for Audiovisual Emotion Recognition” IEEE journal of Affective Computing, 2013, Vol. X, pp.1-15.

Mohammed Abdelwahab, Carlos Busso, “Supervised Domain Adaptation for Emotion Recognition from speech” IEEE International Conference on Acoustic, Speech and Signal Processing, 2015.

M Murugappan, Nurul Qasturi Idayu Baharuddin, Jerritta S, “DWT and MFCC Based Human Emotional Speech Classification Using LDA”IEEE International Conference on Biomedical Engineering, 2011.

Ravi P. Ramachandran, Kevin R. Farrell, Roopashri Ramachandran, Richard J. Mammone, “Speaker recognition—general classi er approaches and data fusion methods”, Elsevier Journal on Pattern Recognition, 2002, pp. 2801-2821.

Mehmet s. Unluturk, Kaya Oguz, Coskun Atay, “Emotion Recognition Using Neural Networks”, 10th WSEAS International Conference on NEURAL NETWORKS, pp.82-85.

Joy Nicholson, Kaxuhiko Takahashi, Ryohci Nakatsu, “Emotion Recognition in Speech Using Neural Networks”, IEEE Conference, 1999, pp.495-501.

Marcel Kockmann, Luka Burget, Jan Honza ernocky, “Application of speaker- and language identification state-of-the-art techniques for emotion recognition”, Elsevier on Speech Communication, 2011,pp.1172-1185.

William Yang Wang, Fadi Biadsy , Andrew Rosenberg, Julia Hirschberg, “Automatic detection of speaker state: Lexical, prosodic, and phonetic approaches to level-of-interest and intoxication classification” Elsevier on Computer Speech and Language,2013,pp.168-189.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.