Speech Recognition System for Isolated Words Using Ensemble-based Classification

Salaheddin H. Kanoun; Nabil M. Drawil; Ahmed M. Mehdawi; Musbah G. Saad

Speech Recognition System for Isolated Words Using Ensemble-based Classification

Salaheddin H. Kanoun, Nabil M. Drawil, Ahmed M. Mehdawi, Musbah G. Saad

Abstract

Automatic Speech Recognition (ASR) can be defined as an independent, computer-driven transcription of spoken language into readable text in real time. In this paper an ASR system was developed to recognize ten English isolated spoken digits. The work flow of the developed system starts with the speech signal being acquired from a microphone and brought into a Matlab development environment for analysis. Next a word-detection algorithm was used to separate each uttered digit from ambient noise and silence. Then a 13 Mel Frequency Cepstral Coefficients (MFCCs) were computed to represent the feature of each frame in the speech signal. Finally, an ensemble of 4 classifiers were combined in order to improve the overall performance of the classification. The ensemble consisted of Nearest Neighbor (1NN), five Nearest Neighbor (5NN), Dynamic Time Warping (DTW) and Minimum Euclidean Distance (MED). The outputs of these classifiers were combined by a majority voting method, to produce the final predicted class label. In the case of a tie .i.e. no majority, then a sensitivity measure (Se%) is calculated for each classifier. So if an unknown uttered digit was presented to the system, it would be assigned to the class predicted by the classifier which has the highest sensitivity value. The developed ASR system was trained and tested with a dataset of size 250 samples each. The ensemble based classification proved to be superior in terms of accuracy than any of the standalone classifiers.

Keywords

Speech Recognition, Word Detection, Mel Frequency, Classifier Ensemble, KNN Classifier, MED Classifier, DTW Classifier, Pre-Emphasis, Confusion Matrix.

Full Text:

PDF

References

D. Ninj. (2009). Developing an isolated word recognition system in matlab. [Online] Available: https://www.mathworks.com

M.p. Ponti-Jr. Combining Classifiers: from the creation of ensembles to the decision fusion. in conference on graphics, patterns an images, pp.1–10, 2011.

T. Fawcett.(August, 2015). The Basics of Classifier Evaluation: Part I [Online] Available:https://svds.com

L. Muda, M begam,I. Elamvazuthi (2010, March.).Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques. Journal of Computing, Volume 2, issue 3, ISSN 2151-9617

L. Rabiner and M. Sambur, “An algorithm for determining the endpoints of isolated utterances,” Bell System Technical Journal, vol. 54, no. 2, 1975.

A. Wong. (2014, spring), SYDE 372 Pattern Recognition. University of Waterloo, Canada [Online]. Available: www.coursehero.com

Fang Zheng, Guoliang Zhang and Zhanjiang Song (2001), "Comparisonof Different Implementations of MFCC," J. Computer Science & Technology, 16(6): 582–589.

R. Bonab, Hamed; Can, Fazli (2017). Less Is More: A Comprehensive Framework for the Number of Components of Ensemble Classifiers (PDF). TNNLS. USA: IEEE.

Opitz, D.; Maclin, R. (1999). "Popular ensemble methods: An empirical study". Journal of Artificial Intelligence Research. 11: 169–198. doi:10.1613/jair.614.

Jump up^ Polikar, R. (2006). "Ensemble based systems in decision making". IEEE Circuits and Systems Magazine. 6 (3): 21–45. doi:10.1109/MCAS.2006.1688199.

Rokach, L. (2010). "Ensemble-based classifiers". Artificial Intelligence Review. 33 (1-2): 1–39. doi:10.1007/s10462-009-9124-7.

Kuncheva, L. and Whitaker, C., Measures of diversity in classifier ensembles, Machine Learning, 51, pp. 181-207, 2003

Sollich, P. and Krogh, A., Learning with ensembles: How over fitting can be useful, Advances in Neural Information Processing Systems, volume 8, pp. 190-196, 1996.

Min Xu; et al. (2004). "HMM-based audio keyword generation". In Kiyoharu Aizawa; Yuichi Nakamura; Shin'ichi Satoh. Advances in Multimedia Information Processing – PCM 2004: 5th Pacific Rim Conference on Multimedia (PDF). Springer. ISBN 3-540-23985-5

Sahidullah, Md.; Saha, Goutam (May 2012). "Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition". Speech Communication. 54 (4): 543–565. doi:10.1016/j.specom.2011.11.004.

Hall P, Park BU, Samworth RJ (2008). "Choice of neighbor order in nearest-neighbor classification". Annals of Statistics. 36 (5): 2135–2152. doi:10.1214/07-AOS537.

Altman, N. S. (1992). "An introduction to kernel and nearest -neighbor nonparametri regression". The American Statistician. 46(3): 175 185. doi:10.1080/00031305.1992.10475879.

Refbacks

There are currently no refbacks.

Username
Password
Remember me