Research Article | OPEN ACCESS
Improve the Quality of Synthetic Speech Trained with Found Data using Silence Cutter
Lau Chee Yong, Tan Tian Swee and Mohd Nizam Mazenan
Medical Implant Technology Group (MediTEG), Cardiovascular Engineering Center, Material Manufacturing Research Alliance (MMRA), Faculty of Biosciences and Medical Engineering (FBME), Universiti Teknologi Malaysia, Malaysia
Research Journal of Applied Sciences, Engineering and Technology 2014 14:1691-1694
Received: July 14, 2014 | Accepted: September 20, 2014 | Published: October 10, 2014
Abstract
Using found data as training data in statistical parametric speech synthesis can alleviate various problems in tedious database construction. However, the extra silences resided in found data degrades the quality of synthetic speech. Therefore, in this study, silence cutter was created to eliminate the extra silences in the training data. The motivation is the extra silences would be incorrectly assigned to training script and result in unnatural synthetic speech. Therefore, in this study, a Malay speech synthesis system has been constructed using found data from internet. Silence cutter has been utilized to cut out extra silences. The synthetic speech using found data with and without silence cutter was verified and compared to find out the effect of silence cutter. Result showed that silence cutter has help to improve synthetic speech naturalness and reduce the Word Error Rate (WER) in intelligibility test. In short, using found data can alleviate the problem of preparing high quality training data and silence cutter can be used to refine the found data to generate better quality of synthetic speech.
Keywords:
Found data, hidden Markov model, statistical parametric speech synthesis,
References
-
Benoît, C., M. Grice and V. Hazan, 1996. The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences. Speech Commun., 18(4): 381-392.
CrossRef -
Chopde, S. and U. Pushpa, 2014. HMM-based speech synthesis. Int. J. Mod. Eng. Res. (IJMER), 3(4): 1894-1899.
-
Dempster, A.P., N.M. Laird and D.B. Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B Met., 39(1): 1-38.
-
Ibe, O.C., 2013. 14-Hidden Markov Models. In: Ibe, O.C. (Ed.), Markov Processes for Stochastic Modeling. 2nd Edn., Elsevier, Oxford, pp: 417-451.
CrossRef -
Kawahara, H., 2006. STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds. Acoust. Sci. Technol., 27(6): 349.
CrossRef -
Tokuda, K., T. Yoshimura, T. Masuko, T. Kobayashi and T. Kitamura, 2000. Speech parameter generation algorithm for HMM-based speech synthesis. Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), pp: 1315-1318.
CrossRef -
Tokuda, K., Z. Heiga and A.W. Black, 2002. An HMM-based speech synthesis system applied to english. Proceeding of 2002 IEEE Workshop on Speech Synthesis, pp: 227-230.
-
Watts, O., J. Yamagishi and S. King, 2010. Letter-based speech synthesis. Proceeding of Speech Synthesis Workshop 2010.
-
Young, S.J., J.J. Odell and P.C. Woodland, 1994. Tree-based state tying for high accuracy acoustic modelling. Proceeding of ARPA Human Language Technology Workshop, pp: 307-312.
CrossRef -
Zen, H., T. Nose, J. Yamagishi, S. Sako, T. Masuko, A. Black and K. Tokuda, 2007. The HMM-based speech synthesis system (HTS) version 2.0. Proceeding of the 6th ISCA Workshop on Speech Synthesis. Bonn, Germany, August 22-24, 2007.
-
Zen, H., K. Tokuda and A.W. Black, 2009. Statistical parametric speech synthesis. Speech Commun., 51(11): 1039-1064.
CrossRef
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|