Open Access Open Access  Restricted Access Subscription or Fee Access

Improving Performance of Multiclass Audio Classification Using SVM

Shrinivas P. Mahajan, Jyotsana Sahu, Mukul S. Sutaone, V. K. Kokate

Abstract


Audio classification has found widespread use in many emerging applications. It involves extraction of vital temporal, spectral and statistical features, and using these in creating an efficient classifier. Most of the audio classification work has been done on binary class classification. In our work we suggest best suited features for classification of different audio classes. Here, we present an algorithm for audio classification that is capable of segmenting and classifying an audio stream into speech male, speech female, music, noise and silence. The speech clips are further segment into voiced and unvoiced frames. A number of timbre features have been discussed, which distinguish the different audio formats. For pre classification, Probability Density Function (PDF), which is a threshold-based method, is performed over each audio clip. For further classification, K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) Classifiers are proposed. Experiments have been performed to determine the best features of each binary class. Utilization of these features in multiclass classification yielded accuracy 96.34% in audio discrimination.


Keywords


Audio Feature Extraction, Bayesian Classification, K-Nearest Neighbor, Support Vector Machine

Full Text:

PDF

References


J. Saunders, "Real-time discrimination of broadcast speech/music," in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP-96, Atlanta, GA, pp. 993-996, 1996.

J. Saunders, "Real-time discrimination of broadcast speech/music," in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP-96, Atlanta, GA, pp. 993-996, 1996.

E. Scheirer and M. Slaney, "Construction and evaluation of a robust multi feature speech/music discriminator," in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP-97, Munich, Germany, pp. 1-28, 1997.

M. J. Carey, E. S. Parris, and H. Lloyd-Thomas, "A comparison of features for speech, music discrimination," in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP-99, Phoenix, AZ, pp. 149-152, 1999.

T. Zhang and C. C. J. Kuo, "Hierarchical classification of audio data for archiving and retrieving," in IEEE Int. Conf. on Audio Speech and Signal Processing, ICASSP-99, Phoenix, AZ, USA, pp. 3301-3304, 1999.

K. El-Maleh, M. Klein, G. Petrucci, and P. Kabal, "Speech/music discrimination for multimedia applications," in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP-2000, Istanbul, Turkey, pp. 2445-2448, 2000.

L. Lu, H. J. Zhang, and H. Jiang, "Content analysis for audio classification and segmentation," IEEE Transactions on Speech and Audio Processing, vol. 10, pp. 504-516, 2002.

Wang WQ, Gao W, Ying D, “ A fast and robust speech/music discrimination approach” ,in Proc. 4th pacific rim conference on multimedia, vol 3. IEEE, Piscataway, pp. 1325–1329, 2003.

J. G. A. Barbedo and A. Lopes, “A robust and computationally efficient speech/music discriminator,” Journal of the Audio Engineering Society, vol. 54, no. 7-8, pp. 571–588, 2006.

Enrique Alexandre-Cortizo, Manuel Rosa-Zurera and Francisco Lopez-Ferreras, “Application of Fisher linear discriminant analysis to speech/music classification,” in Proceedings of the 120th Audio Engineering Society Convention (AES ’06), Paris, France, pp. 1666-1669,May 2006.

Wenjuan Pan, Yong Yao, Zhijing Liu, Weiyao Huang, “Audio Classification in a weighted SVM”, International Symposium on Communications and Information Technologies. ISCIT '07. pp. 468-472, 2007.

Wang J, Wu Q, Deng H, Yan Q, “Real-time speech/music classification with a hierarchical oblique decision tree” ,in IEEE international conference on acoustics, speech and signal Processing (ICASSP), pp. 2033–2036, March 2008.

N. Ruiz-Reyes , P. Vera-Candeas , J. E. Muñoz , S. García-Galán , F. J. Cañadas, New speech/music discrimination approach based on fundamental frequency estimation, Multimedia Tools and Applications, v.41 n.2, pp. 253-286, January 2009.

Yizhar Lavner1 and Dima Ruinskiy, “A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation”, EURASIP Journal on Audio, Speech, and Music Processing, pp. 1-14, 27 Feb, 2009.

Chiu Ying Lay, Ng Hian James, “Gender Classification from Speech”, CS5240, Theoretical Foundations of Multimedia, pp. 1-6, Sept 2000.

W. H. Abdulla and N. K. Kasabov, "Improving speech recognition performance through gender separation", Artificial Neural Networks and Expert Systems International Conference (ANNES), Dunedin, New Zealand, pp. 218-222, 2001.

L. Siegel, “A procedure for using pattern classification techniques to obtain a Voiced/unvoiced classifier,” IEEE Trans. Acoustic., Speech, Signal Processing, vol. ASSP-27, pp. 83-88, Feb. 1979.

C. J. C. Burges.,” A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, pp. 1-43, 1998.

Vladimir N. Vapnik, “An Overview of Statistical Learning Theory” ," IEEE Transactions on Neural Networks, vol. 10,No. 5, pp. 988-999, 1999.

S. Sathiya Keerthi, Chih-Jen Lin, ”Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel” ,National University of Singapore and National Taiwan University, Neural computation vol. 15,No. 7, pp. 1667-1689, 2003.

Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, “A Practical Guide to Support Vector Classification”, Department of Computer Science, National Taiwan University, Taipei 106, Taiwan, pp. 1-15, 2009.

Dan Ellis Database: http://www.ee.columbia.edu/~dpwe/sounds/musp/music-speech-20100223.tgz.

Database for White Noise: http://www.jetcityorange.com/SoundFiles/WhiteNoise.mp3


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.