Improve the Quality of Synthetic Speech Trained with Found Data using Silence Cutter

Lau Chee Yong; Tan Tian Swee; Mohd Nizam Mazenan

doi:10.19026/rjaset.8.1151

Research Journal of Applied Sciences, Engineering and Technology

Research Article | OPEN ACCESS

Improve the Quality of Synthetic Speech Trained with Found Data using Silence Cutter

Lau Chee Yong, Tan Tian Swee and Mohd Nizam Mazenan

Medical Implant Technology Group (MediTEG), Cardiovascular Engineering Center, Material Manufacturing Research Alliance (MMRA), Faculty of Biosciences and Medical Engineering (FBME), Universiti Teknologi Malaysia, Malaysia

Research Journal of Applied Sciences, Engineering and Technology 2014 14:1691-1694

http://dx.doi.org/10.19026/rjaset.8.1151 | © The Author(s) 2014

Received: July ‎14, ‎2014 | Accepted: September ‎20, ‎2014 | Published: October 10, 2014

Back to issue | PDF | HTML

Abstract

Using found data as training data in statistical parametric speech synthesis can alleviate various problems in tedious database construction. However, the extra silences resided in found data degrades the quality of synthetic speech. Therefore, in this study, silence cutter was created to eliminate the extra silences in the training data. The motivation is the extra silences would be incorrectly assigned to training script and result in unnatural synthetic speech. Therefore, in this study, a Malay speech synthesis system has been constructed using found data from internet. Silence cutter has been utilized to cut out extra silences. The synthetic speech using found data with and without silence cutter was verified and compared to find out the effect of silence cutter. Result showed that silence cutter has help to improve synthetic speech naturalness and reduce the Word Error Rate (WER) in intelligibility test. In short, using found data can alleviate the problem of preparing high quality training data and silence cutter can be used to refine the found data to generate better quality of synthetic speech.

Keywords:

Found data, hidden Markov model, statistical parametric speech synthesis,

References

Beno�t, C., M. Grice and V. Hazan, 1996. The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences. Speech Commun., 18(4): 381-392.
CrossRef
Chopde, S. and U. Pushpa, 2014. HMM-based speech synthesis. Int. J. Mod. Eng. Res. (IJMER), 3(4): 1894-1899.
Dempster, A.P., N.M. Laird and D.B. Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B Met., 39(1): 1-38.
Ibe, O.C., 2013. 14-Hidden Markov Models. In: Ibe, O.C. (Ed.), Markov Processes for Stochastic Modeling. 2nd Edn., Elsevier, Oxford, pp: 417-451.
CrossRef
Kawahara, H., 2006. STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds. Acoust. Sci. Technol., 27(6): 349.
CrossRef
Tokuda, K., T. Yoshimura, T. Masuko, T. Kobayashi and T. Kitamura, 2000. Speech parameter generation algorithm for HMM-based speech synthesis. Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), pp: 1315-1318.
CrossRef
Tokuda, K., Z. Heiga and A.W. Black, 2002. An HMM-based speech synthesis system applied to english. Proceeding of 2002 IEEE Workshop on Speech Synthesis, pp: 227-230.
Watts, O., J. Yamagishi and S. King, 2010. Letter-based speech synthesis. Proceeding of Speech Synthesis Workshop 2010.
Young, S.J., J.J. Odell and P.C. Woodland, 1994. Tree-based state tying for high accuracy acoustic modelling. Proceeding of ARPA Human Language Technology Workshop, pp: 307-312.
CrossRef
Zen, H., T. Nose, J. Yamagishi, S. Sako, T. Masuko, A. Black and K. Tokuda, 2007. The HMM-based speech synthesis system (HTS) version 2.0. Proceeding of the 6th ISCA Workshop on Speech Synthesis. Bonn, Germany, August 22-24, 2007.
Zen, H., K. Tokuda and A.W. Black, 2009. Statistical parametric speech synthesis. Speech Commun., 51(11): 1039-1064.
CrossRef

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online): 2040-7467
ISSN (Print): 2040-7459

Information

Sales & Services



Journal Home \| Aim & Scope \| Author(s) Information \| Editorial Board \| MSP Download Statistics