Investigation of Effects of Different Synthesis Unit to the Quality of Malay  Synthetic Speech

Lau Chee Yong; Tan Tian Swee; Mohd Nizam Mazenan

doi:10.19026/rjaset.7.737

Research Journal of Applied Sciences, Engineering and Technology

Research Article | OPEN ACCESS

Investigation of Effects of Different Synthesis Unit to the Quality of Malay Synthetic Speech

Lau Chee Yong, Tan Tian Swee and Mohd Nizam Mazenan

Medical Implant Technology Group (MediTEG), Cardiovascular Engineering Center, Material Manufacturing Research Alliance (MMRA), Faculty of Biosciences and Medical Engineering (FBME), Universiti Teknologi Malaysia, Malaysia

Research Journal of Applied Sciences, Engineering and Technology 2014 18:3803-3808

http://dx.doi.org/10.19026/rjaset.7.737 | © The Author(s) 2014

Received: October 31, 2013 | Accepted: November 08, 2013 | Published: May 10, 2014

Back to issue | PDF | HTML

Abstract

Synthesis unit of a speech synthesizer directly affects the computational load and output speech quality. Generally, phoneme is the best choice to synthesize high quality speech. But it requires the knowledge of language to precisely draw the segmentation of words into phonemes. And it is expensive to compose an accurate phoneme dictionary. In this study, another type of synthesis unit is introduced which is letter. In Malay language, the unit size of letter is smaller than phoneme. And using letter as the synthesis unit could ease a lot of efforts because the context label can be created in fully automatic manner without the knowledge of the language. Four systems have been created and an investigation was done to find out how synthesis unit could affect the quality of synthetic speech. Forty eight listeners were hired to rate the output speech individually and result showed that no obvious difference between the output speech synthesized using different synthesis units. Listening test showed satisfactory result in terms of similarity, naturalness and intelligibility. Synthetic speech with polyphonic label showed increment in intelligibility compared to synthetic speech without polyphonic label. Using letter as the synthesis unit is recommended because it excludes the dependency of linguist and expands the idea of language independent front end text processing.

Keywords:

Hidden Markov model, letter, phoneme, statistical parametric speech synthesis,

References

Benoît, C., M. Grice and V. Hazan, 1996. The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences. Speech Commun., 18(4): 381-392.
CrossRef
Childers, D.G. and K. Wu, 1990. Quality of speech produced by analysis-synthesis. Speech Commun., 9(2): 97-117.
CrossRef
Dempster, A.P., N.M. Laird and D.B. Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. B Met., 39(1): 1-38.
Kawahara, H., I. Masuda-Katsuse and A. De Cheveigné, 1999. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Commun., 27(3-4): 187-207.
CrossRef
King, S. and V. Karaiskos, 2009. The blizzard challenge 2009. Proceeding of the Blizzard Challenge Workshop. Edinburgh, U.K.
Lim, Y.C., T.S. Tan, S.H. Shaikh Salleh and D.K. Ling, 2012. Application of genetic algorithm in unit selection for Malay speech synthesis system. Expert Syst. Appl., 39(5): 5376-5383.
CrossRef
Oura, K., K. Hashimoto, S. Shiota and K. Tokuda, 2010. Overview of NIT HMM-based Speech Synthesis System for Blizzard Challenge 2010.
Direct Link
Sagisaka, Y. and H. Sato, 1986. Composite phoneme units for the speech synthesis of Japanese. Speech Commun., 5(2): 217-223.
CrossRef
Stan, A., J. Yamagishi, S. King and M. Aylett, 2011. The Romanian Speech Synthesis (RSS) corpus: Building a high quality HMM-based speech synthesis system using a high sampling rate. Speech Commun., 53(3): 442-450.
CrossRef
Tan, T.S. and Sh-Hussain, 2009. Corpus design for Malay corpus-based speech synthesis system Am. J. Appl. Sci., 6(4): 696-702.
CrossRef
Toda, T. and K. Tokuda, 2005. Speech parameter generation algorithm considering global variance for HMM-based speech synthesis. Proceeding of the Interspeech, pp: 2801-2804.
Tokuda, K., T. Masuko and T. Yamada, 1995. An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features. Proceeding of the Eurospeech.
Tokuda, K., K. Yoshimura, T. Masuko, T. Kobayashi and T. Kitamura, 2000. Speech parameter generation algorithms for HMM-based speech synthesis. Proceeding of the International Conference on Acoustics, Speech and Signal Processing (ICASSP 2000), pp: 1315-1318.
CrossRef
Turk, O. and M. Schroder, 2010. Evaluation of expressive speech synthesis with voice conversion and copy resynthesis techniques. IEEE T. Audio Speech, 18(5): 965-973.
CrossRef
Watts, O., J. Yamagishi, S. King and K. Berkling, 2010. Synthesis of child speech with HMM adaptation and voice conversion. IEEE T. Audio Speech, 18(5): 1005-1016.
CrossRef
Yamagishi, J., T. Nose, H. Zen, L. Zhen-Hua, T. Toda, K. Tokuda, S. King and S. Renals, 2009. Robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE T. Audio Speech, 17(6): 1208-1230.
CrossRef
Young, S.J., J.J. Odell and P.C. Woodland, 1994. Tree-based state tying for high accuracy acoustic modelling. Proceedings of the ARPA Human Language Technology Workshop, pp: 307-312.
CrossRef
Zen, H., K. Tokuda and A.W. Black, 2009. Statistical parametric speech synthesis. Speech Commun., 51(11): 1039-1064.
CrossRef
Zen, H., K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, 2004. Hidden semi-Markov model based speech synthesis. Proceeding of the ICSLP, 2: 1397-1400.
Zen, H., N. Braunschweiler, S. Buchholz, M.J.F. Gales, K. Knill, S. Krstulovic and J. Latorre, 2012. Statistical parametric speech synthesis based on speaker and language factorization. IEEE T. Audio Speech, 20(6): 1713-1724.
CrossRef

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online): 2040-7467
ISSN (Print): 2040-7459

Information

Sales & Services



Journal Home \| Aim & Scope \| Author(s) Information \| Editorial Board \| MSP Download Statistics