Research Article | OPEN ACCESS
Investigation of Effects of Different Synthesis Unit to the Quality of Malay Synthetic Speech
Lau Chee Yong, Tan Tian Swee and Mohd Nizam Mazenan
Medical Implant Technology Group (MediTEG), Cardiovascular Engineering Center, Material Manufacturing Research Alliance (MMRA), Faculty of Biosciences and Medical Engineering (FBME), Universiti Teknologi Malaysia, Malaysia
Research Journal of Applied Sciences, Engineering and Technology 2014 18:3803-3808
Received: October 31, 2013 | Accepted: November 08, 2013 | Published: May 10, 2014
Abstract
Synthesis unit of a speech synthesizer directly affects the computational load and output speech quality. Generally, phoneme is the best choice to synthesize high quality speech. But it requires the knowledge of language to precisely draw the segmentation of words into phonemes. And it is expensive to compose an accurate phoneme dictionary. In this study, another type of synthesis unit is introduced which is letter. In Malay language, the unit size of letter is smaller than phoneme. And using letter as the synthesis unit could ease a lot of efforts because the context label can be created in fully automatic manner without the knowledge of the language. Four systems have been created and an investigation was done to find out how synthesis unit could affect the quality of synthetic speech. Forty eight listeners were hired to rate the output speech individually and result showed that no obvious difference between the output speech synthesized using different synthesis units. Listening test showed satisfactory result in terms of similarity, naturalness and intelligibility. Synthetic speech with polyphonic label showed increment in intelligibility compared to synthetic speech without polyphonic label. Using letter as the synthesis unit is recommended because it excludes the dependency of linguist and expands the idea of language independent front end text processing.
Keywords:
Hidden Markov model, letter, phoneme, statistical parametric speech synthesis,
References
-
Benoît, C., M. Grice and V. Hazan, 1996. The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences. Speech Commun., 18(4): 381-392.
CrossRef
-
Childers, D.G. and K. Wu, 1990. Quality of speech produced by analysis-synthesis. Speech Commun., 9(2): 97-117.
CrossRef
-
Dempster, A.P., N.M. Laird and D.B. Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. B Met., 39(1): 1-38.
-
Kawahara, H., I. Masuda-Katsuse and A. De Cheveigné, 1999. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Commun., 27(3-4): 187-207.
CrossRef
-
King, S. and V. Karaiskos, 2009. The blizzard challenge 2009. Proceeding of the Blizzard Challenge Workshop. Edinburgh, U.K.
-
Lim, Y.C., T.S. Tan, S.H. Shaikh Salleh and D.K. Ling, 2012. Application of genetic algorithm in unit selection for Malay speech synthesis system. Expert Syst. Appl., 39(5): 5376-5383.
CrossRef
-
Oura, K., K. Hashimoto, S. Shiota and K. Tokuda, 2010. Overview of NIT HMM-based Speech Synthesis System for Blizzard Challenge 2010.
Direct Link
-
Sagisaka, Y. and H. Sato, 1986. Composite phoneme units for the speech synthesis of Japanese. Speech Commun., 5(2): 217-223.
CrossRef
-
Stan, A., J. Yamagishi, S. King and M. Aylett, 2011. The Romanian Speech Synthesis (RSS) corpus: Building a high quality HMM-based speech synthesis system using a high sampling rate. Speech Commun., 53(3): 442-450.
CrossRef
-
Tan, T.S. and Sh-Hussain, 2009. Corpus design for Malay corpus-based speech synthesis system Am. J. Appl. Sci., 6(4): 696-702.
CrossRef
-
Toda, T. and K. Tokuda, 2005. Speech parameter generation algorithm considering global variance for HMM-based speech synthesis. Proceeding of the Interspeech, pp: 2801-2804.
-
Tokuda, K., T. Masuko and T. Yamada, 1995. An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features. Proceeding of the Eurospeech.
-
Tokuda, K., K. Yoshimura, T. Masuko, T. Kobayashi and T. Kitamura, 2000. Speech parameter generation algorithms for HMM-based speech synthesis. Proceeding of the International Conference on Acoustics, Speech and Signal Processing (ICASSP 2000), pp: 1315-1318.
CrossRef
-
Turk, O. and M. Schroder, 2010. Evaluation of expressive speech synthesis with voice conversion and copy resynthesis techniques. IEEE T. Audio Speech, 18(5): 965-973.
CrossRef
-
Watts, O., J. Yamagishi, S. King and K. Berkling, 2010. Synthesis of child speech with HMM adaptation and voice conversion. IEEE T. Audio Speech, 18(5): 1005-1016.
CrossRef
-
Yamagishi, J., T. Nose, H. Zen, L. Zhen-Hua, T. Toda, K. Tokuda, S. King and S. Renals, 2009. Robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE T. Audio Speech, 17(6): 1208-1230.
CrossRef
-
Young, S.J., J.J. Odell and P.C. Woodland, 1994. Tree-based state tying for high accuracy acoustic modelling. Proceedings of the ARPA Human Language Technology Workshop, pp: 307-312.
CrossRef
-
Zen, H., K. Tokuda and A.W. Black, 2009. Statistical parametric speech synthesis. Speech Commun., 51(11): 1039-1064.
CrossRef
-
Zen, H., K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, 2004. Hidden semi-Markov model based speech synthesis. Proceeding of the ICSLP, 2: 1397-1400.
-
Zen, H., N. Braunschweiler, S. Buchholz, M.J.F. Gales, K. Knill, S. Krstulovic and J. Latorre, 2012. Statistical parametric speech synthesis based on speaker and language factorization. IEEE T. Audio Speech, 20(6): 1713-1724.
CrossRef
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|