Research Article | OPEN ACCESS
Speech Intelligibility Prediction Intended for State-of-the-Art Noise Estimation Algorithms
1Nasir Saleem, 2Sher Ali, 1Ehtasham Mustafa and 3Usman Khan
1Institute of Engineering and Technology, GU, D.I. Khan, KPK, Pakistan
2City University of Science and Technology, Peshawar, KPK, Pakistan
3University of Engineering and Technology, Kohat Campus, Kohat, KPK, Pakistan
Research Journal of Applied Sciences, Engineering and Technology 2014 2:296-302
Received: April 05, 2013 | Accepted: April 29, 2013 | Published: January 10, 2014
Abstract
Noise estimation is critical factor of any speech enhancement system. In presence of additive non-stationary background noise, it is difficult to understand speech for normal hearing particularly for hearing impaired person. The background interfering noise reduces the intelligibility and perceptual quality of speech. Speech enhancement with various noise estimation techniques attempts to minimize the interfering components and enhance the intelligibility and perceptual aspects of damaged speech. This study addresses the selection of right noise estimation algorithm in speech enhancement system for intelligent hearing. A noisy environment of airport is considered. The clean speech is corrupted by noisy environment for different noise levels ranging from 0 to 15 dB. Six diverse noise estimation algorithms are selected to estimate the noise including Minimum Controlled Recursive Average (MCRA), MCRA-2, improved MCRA, Martin minimum tracking, continuous spectral minimum tracking, and weighted spectral average. Spectral subtraction algorithm is used for enhancing the noisy speech. The intelligibility of enhanced speech is assessed by the fractional Articulation Index (fAI) and SNRLOSS.
Keywords:
fAI, IMCRA, MCRA, MCRA-2, noise estimate, SNRLOSS, spectral subtraction,
References
-
ANSI, 1997. Methods for calculation of the speech intelligibility index. Technical Report No, S3.5-1997, American National Standards Institute.
-
Cohen, I., 2002. Noise estimation by minima controlledrecursive averaging for robust speech enhancement. IEEE Signal Proc. Lett., 9(1): 12-15.
CrossRef -
Compernolle, D.V., 1989. Noise adaptation in hidden markov model speech recognition. Syst. Comput. Speech Language, 3: 151-167.
CrossRef -
Doblinger, G., 1995. Computationally efficient speech enhancement by spectral minimatracking in subbands. Proc. Euro Speech, 2: 1513-1516.
-
Fletcher, H. and R.H. Galt, 1950. The perception of Speech and its relation totelephony. J. Acoust. Soc. Amer., 22: 89-151.
CrossRef -
French, N.R. and J.C. Steinberg, 1947. Factors governing the intelligibility of speech sounds. J. Acoustic. Soc. Amer., 19: 90-119.
CrossRef -
Hirsch, H.G. and C. Ehrlicher, 1995. Noise estimation techniques for robust speech recognition. Processing of the International Conference on Acoustics, Speech and Signal ICASSP-95, 1: 153-156.
CrossRef -
Houtgast, T. and H. Steeneken, 1985. A review of the MTF concept in roomacoustics and its use for estimating speech intelligibility in auditoria. J. Acoust. Soc. Amer., 77: 1069-1077.
CrossRef -
Hu, Y., M. Bhatnager and P. Loizou, 2001. A cross correlation technique for enhancing speech corrupted with correlated noise. Proceedings of the (ICASSP '01) IEEE International Conference on Acoustics, Speech and Signal Processing, pp: 673-676.
-
Jianfen, M. and C.L. Philipos, 2011. SNR loss: A new objective measure for predicting the intelligibility of noise-suppressed speech. Speech Commun., 53(3): 340-354.
CrossRef PMid:21503274 PMCid:PMC3077765 -
Kates, J., 1992. On using coherence to Measure distortion in hearing aids. J. Acoust. Soc. Amer., 91: 2236-2244.
CrossRef PMid:1597612 -
Kates, J. and K. Arehart, 2005. Coherence and the speech intelligibility index. J. Acoust. Soc. Amer., 117: 2224-2237.
CrossRef PMid:15898663 -
Kryter, K., 1962a. Methods for calculation and use of the articulationIndex. J. Acoust. Soc. Amer., 34(11): 1689-1697.
CrossRef -
Kryter, K., 1962b. Validation of the articulation index. J. Acoust. Soc. Amer., 34: 1698-1706.
CrossRef -
Loizou, P.C., 2007. Speech Enhancement: Theory and Practice. 1st Edn., FL: CRC Press, Boca Raton.
-
Loizou, P. and R. Sundarajan, 2006. A noise estimation algorithm for highly non-stationary environments. Speech Commun. Sci. Direct, 48: 220-231.
CrossRef -
Loizou, P., R. Sundarajan and Y. Hu, 2004. Noise estimation algorithm with rapid adaption for highly non-stationary environments. Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP '04), pp: 1-305-8.
-
Manohar, K. and P. Rao, 2006. Speech enhancement in nonstationary noise environments using noise properties. Speech Commun., 48: 96-109.
CrossRef -
Martin, R., 2001. Noise power spectral density estimationbased on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process 9(5): 504-512.
CrossRef -
McAulay, R. and M. Malpass, 1980. IEEE transactions acoustics, speech and signal processing. IEEE J. Mag., 28(2): 137-145.
-
Meyer, J., K.U. Simmer and K.D. Kammeyer, 1997. Comparison of one- and two-channel noiseestimation techniques. Proceeding of the 5th International Workshop on Acoustic Echo and Noise Control, IWAENC-97, London, UK, pp: 137-145.
PMid:9532254 -
Philipos, C.L., 2011. Extending the articulation index to account for non-linear distortions introduced by noise-suppression algorithms. J. Acoust. Soc. Am. 130 (2): 986-995.
CrossRef PMid:21877811 PMCid:PMC3190662 -
Rhebergen, K.S. and N.J. Versfeld, 2005. A speech intelligibility index based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. J. Acoust. Soc. Am., 117: 2181-2192.
CrossRef PMid:15898659 -
Rhebergen, K.S., N.J. Versfeld and W. Dreschler, 2006. Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. J. Acoust. Soc. Am., 120: 3988-3997.
CrossRef PMid:17225425 -
Ris, C. and S. Dupont, 2001. Assessing local noise level estimation methods: Application to noise robust ASR. Speech Commun., 34(1): 141-158.
CrossRef -
Roberts, J., 1978. Modification to Piecewise LPC-10E.
Direct Link -
Sohn, J., N.S. Kim and W. Sung, 1999. A Statistical model-based voice activity detector. IEEE Signal Process. Lett., 6(1): 1-3.
CrossRef -
Stahl, V., A. Fischer and R. Bippus, 2000. Quantile based noise estimation for spectral subtraction and wiener filtering. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '00, 3: 1875-1878.
CrossRef -
Steeneken, H. and T. Houtgast, 1980. Physical method for measuring speech transmission quality. J. Acoust. Soc. Am., 67: 318-326.
CrossRef PMid:7354199
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|