Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


An Extraction Method of Acoustic Features for Speech Recognition

1Ibrahim Missaoui and 1, 2Zied Lachiri
1Signal, Image and Pattern Recognition Laboratory, National Engineering School of Tunis (ENIT), University of Tunis El Manar, BP. 37 Belvédère, 1002,
2Physics and Instrumentation Department, National Institute of Applied Science and Technology, University of Carthage, BP 676, 1080 Tunis, Tunisia
Research Journal of Applied Sciences, Engineering and Technology  2016  9:964-967
http://dx.doi.org/10.19026/rjaset.12.2814  |  © The Author(s) 2016
Received: November ‎30, ‎2015  |  Accepted: February ‎10, ‎2016  |  Published: May 05, 2016

Abstract

This study presents a novel method that deals with extracting acoustic features for recognition of isolated speech words. This extraction method is based on the use of a bank of 41 Gabor filers, which aim to select the specific modulation frequencies and bring a limitation of information redundancy on feature level. The robustness and performance of proposed features, named as Gabor Mel Spectrum features (GMS features) are validated on isolated speech words in both clean and noisy environment case and compared to those of two classic methods such as PLP-features and MFCC-features. The recognition results obtained using HMM, show that our extraction method is more robust and achieve better recognition rates than the two latter methods.

Keywords:

2-D Gabor filters, acoustic features, clean and noisy environment, speech recognition,


References

  1. Garofolo, J.S., L.F. Lamel, W.M. Fisher, J.G. Fiscus and D.S. Pallett, 1993. DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST Speech Disc 1-1.1. NASA STI/Recon Technical Report No. 93, 27403.
  2. Hirsch, H. and D. Pearce, 2000. The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Proceeding of the ISCA Tutorial and Research Workshop on Automatic Speech Recognition: Challenges for the New Millennium (ASR, 2000). Sep. 18-20, pp: 181-188.
  3. Kim, C. and R.M. Stern, 2009. Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction. Proceeding of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH, 2009), pp: 28-31.
  4. Kovács, G. and L. Tóth, 2015. Joint optimization of spectro-temporal features and deep neural nets for robust automatic speech recognition. Acta Cybernet., 22(1): 117-134.
    CrossRef    
  5. Kovács, G., L. Tóth and D.V. Compernolle, 2015. Selection and enhancement of Gabor filters for automatic speech recognition. Int. J. Speech Technol., 18(1): 1-16.
    CrossRef    
  6. Mesgarani, N., S. David and S. Shamma, 2007. Representation of phonemes in primary auditory cortex: How the brain analyzes speech. Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP, 2007). Honolulu, HI, April 15-20, pp: IV-765-IV-768.
  7. Mesgarani, N. and S. Shamma, 2011. Speech processing with a cortical representation of audio. Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP, 2011). Prague, May 22-27, pp: 5872-5875.
    CrossRef    
  8. Meyer, B.T., S.V. Ravuri, M.R. Schädler and N. Morgan, 2011. Comparing different flavors of spectro-temporal features for ASR. Proceeding of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH, 2011). August 27-31, pp: 1269-1272.
  9. Meyer, B.T., C. Spille, B. Kollmeier and N. Morgan, 2012. Hooking up spectro-temporal filters with auditory-inspired representations for robust automatic speech recognition. Proceeding of the Annual Conference of the International Speech Communication Association (INTERSPEECH). Sep. 9-13, pp: 1259-1262.
  10. Missaoui, I. and Z. Lachiri, 2014. Gabor filterbank features for robust speech recognition. Proceeding of the International Conference on Image and Signal Processing (ICISP, 2014). Lecture Notes in Computer Science, Springer International Publishing, Switzerland, 8509: 665-671.
    CrossRef    
  11. Qi, J., D. Wang, Y. Jiang and R. Liu, 2013. Auditory features based on Gammatone filters for robust speech recognition. Proceeding of the IEEE International Symposium on Circuits and Systems (ISCAS, 2013). Beijing, May 19-23, pp: 305-308.
  12. Qiu, A., C.E. Schreiner and M.A. Escabí, 2003. Gabor analysis of auditory midbrain receptive fields: Spectro-temporal and binaural composition. J. Neurophysiol., 90(1): 456-476.
    CrossRef    PMid:12660353    
  13. Ravuri, S. and N. Morgan, 2010. Using spectro-temporal features to improve AFE feature extraction for ASR. Proceeding of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH, 2010). Makuhari, Shiba, Japan, September 26-30, pp: 1181-1184.
    PMid:21338238    
  14. Schädler, M., B.T. Meyer and B. Kollmeier, 2012. Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition. J. Acoust. Soc. Am., 131(5): 4134-4151.
    CrossRef    PMid:22559385    
  15. Schädler, M.R. and B. Kollmeier, 2015. Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition. J. Acoust. Soc. Am., 137(4): 2047-2059.
    CrossRef    PMid:25920855    
  16. Stevens, S.S. and J. Volkmann, 1940. The relation of pitch to frequency: A revised scale. Am. J. Psychol., 53(3): 329-353.
    CrossRef    
  17. Young, S.J., G. Evermann, M.J.F. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev and P.C. Woodland, 2009. The HTK Book Version 3.4.1. Department of Engineering, Cambridge University, Cambridge.
  18. Zouhir, Y. and K. Ouni, 2015. Noise robust speech parameterization using relative spectra and auditory filterbank. Res. J. Appl. Sci. Eng. Technol., 9(9): 755-759.
    CrossRef    

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved