Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


Investigation of Effects of Different Synthesis Unit to the Quality of Malay Synthetic Speech

Lau Chee Yong, Tan Tian Swee and Mohd Nizam Mazenan
Medical Implant Technology Group (MediTEG), Cardiovascular Engineering Center, Material Manufacturing Research Alliance (MMRA), Faculty of Biosciences and Medical Engineering (FBME), Universiti Teknologi Malaysia, Malaysia
Research Journal of Applied Sciences, Engineering and Technology  2014  18:3803-3808
http://dx.doi.org/10.19026/rjaset.7.737  |  © The Author(s) 2014
Received: October 31, 2013  |  Accepted: November 08, 2013  |  Published: May 10, 2014

Abstract

Synthesis unit of a speech synthesizer directly affects the computational load and output speech quality. Generally, phoneme is the best choice to synthesize high quality speech. But it requires the knowledge of language to precisely draw the segmentation of words into phonemes. And it is expensive to compose an accurate phoneme dictionary. In this study, another type of synthesis unit is introduced which is letter. In Malay language, the unit size of letter is smaller than phoneme. And using letter as the synthesis unit could ease a lot of efforts because the context label can be created in fully automatic manner without the knowledge of the language. Four systems have been created and an investigation was done to find out how synthesis unit could affect the quality of synthetic speech. Forty eight listeners were hired to rate the output speech individually and result showed that no obvious difference between the output speech synthesized using different synthesis units. Listening test showed satisfactory result in terms of similarity, naturalness and intelligibility. Synthetic speech with polyphonic label showed increment in intelligibility compared to synthetic speech without polyphonic label. Using letter as the synthesis unit is recommended because it excludes the dependency of linguist and expands the idea of language independent front end text processing.

Keywords:

Hidden Markov model, letter, phoneme, statistical parametric speech synthesis,


References

  1. Benoît, C., M. Grice and V. Hazan, 1996. The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences. Speech Commun., 18(4): 381-392.
    CrossRef    
  2. Childers, D.G. and K. Wu, 1990. Quality of speech produced by analysis-synthesis. Speech Commun., 9(2): 97-117.
    CrossRef    
  3. Dempster, A.P., N.M. Laird and D.B. Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. B Met., 39(1): 1-38.
  4. Kawahara, H., I. Masuda-Katsuse and A. De Cheveigné, 1999. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Commun., 27(3-4): 187-207.
    CrossRef    
  5. King, S. and V. Karaiskos, 2009. The blizzard challenge 2009. Proceeding of the Blizzard Challenge Workshop. Edinburgh, U.K.
  6. Lim, Y.C., T.S. Tan, S.H. Shaikh Salleh and D.K. Ling, 2012. Application of genetic algorithm in unit selection for Malay speech synthesis system. Expert Syst. Appl., 39(5): 5376-5383.
    CrossRef    
  7. Oura, K., K. Hashimoto, S. Shiota and K. Tokuda, 2010. Overview of NIT HMM-based Speech Synthesis System for Blizzard Challenge 2010.
    Direct Link
  8. Sagisaka, Y. and H. Sato, 1986. Composite phoneme units for the speech synthesis of Japanese. Speech Commun., 5(2): 217-223.
    CrossRef    
  9. Stan, A., J. Yamagishi, S. King and M. Aylett, 2011. The Romanian Speech Synthesis (RSS) corpus: Building a high quality HMM-based speech synthesis system using a high sampling rate. Speech Commun., 53(3): 442-450.
    CrossRef    
  10. Tan, T.S. and Sh-Hussain, 2009. Corpus design for Malay corpus-based speech synthesis system Am. J. Appl. Sci., 6(4): 696-702.
    CrossRef    
  11. Toda, T. and K. Tokuda, 2005. Speech parameter generation algorithm considering global variance for HMM-based speech synthesis. Proceeding of the Interspeech, pp: 2801-2804.
  12. Tokuda, K., T. Masuko and T. Yamada, 1995. An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features. Proceeding of the Eurospeech.
  13. Tokuda, K., K. Yoshimura, T. Masuko, T. Kobayashi and T. Kitamura, 2000. Speech parameter generation algorithms for HMM-based speech synthesis. Proceeding of the International Conference on Acoustics, Speech and Signal Processing (ICASSP 2000), pp: 1315-1318.
    CrossRef    
  14. Turk, O. and M. Schroder, 2010. Evaluation of expressive speech synthesis with voice conversion and copy resynthesis techniques. IEEE T. Audio Speech, 18(5): 965-973.
    CrossRef    
  15. Watts, O., J. Yamagishi, S. King and K. Berkling, 2010. Synthesis of child speech with HMM adaptation and voice conversion. IEEE T. Audio Speech, 18(5): 1005-1016.
    CrossRef    
  16. Yamagishi, J., T. Nose, H. Zen, L. Zhen-Hua, T. Toda, K. Tokuda, S. King and S. Renals, 2009. Robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE T. Audio Speech, 17(6): 1208-1230.
    CrossRef    
  17. Young, S.J., J.J. Odell and P.C. Woodland, 1994. Tree-based state tying for high accuracy acoustic modelling. Proceedings of the ARPA Human Language Technology Workshop, pp: 307-312.
    CrossRef    
  18. Zen, H., K. Tokuda and A.W. Black, 2009. Statistical parametric speech synthesis. Speech Commun., 51(11): 1039-1064.
    CrossRef    
  19. Zen, H., K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, 2004. Hidden semi-Markov model based speech synthesis. Proceeding of the ICSLP, 2: 1397-1400.
  20. Zen, H., N. Braunschweiler, S. Buchholz, M.J.F. Gales, K. Knill, S. Krstulovic and J. Latorre, 2012. Statistical parametric speech synthesis based on speaker and language factorization. IEEE T. Audio Speech, 20(6): 1713-1724.
    CrossRef    

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved