An HMM Based Syllable Pitch-contour Generation Method for Mandarin Speech Synthesis
Hung-Yan Gu  and  Chung-Chieh Yang
e-mail: guhy@mail.ntust.edu.tw


ABSTRACT
In this paper, a method is proposed to generate pitch-contours for Mandarin speech synthesis. In this method, an HMM (hidden Markov model) is used to model the prosodic states implicitly stayed, and a syllable's pitch-contour is treated as an observation generated from a prosodic state. Such an HMM is called a syllable pitch-contour HMM (SPC-HMM). For training the SPC-HMM, we had developed a feasible method to normalize a pitch-contour's height. After normalization, each training syllable's pitch-contour is vector quantized and represented with a VQ (vector quantization) code. Then, the VQ code and its adjacent syllables' lexical tones are combined to define an observation symbol for training the SPC-HMM. In the synthesis phase, a sentence-wide most probable observation symbol sequence is searched on the SPC-HMM by using a dynamic programming algorithm proposed here. Then, the observation symbol found for a syllable is decoded to obtain its pitch-contour VQ code. We had conducted testing ex-periments to determine the size of a pitch-contour codebook and the number of states of an SPC-HMM. The results indicate that the better choice is to set the codebook size to eight and to use six states. Also, we had conducted perception tests to compare the naturalness levels of synthetic speech files. The results show that two operating modes, of the SPC-HMM, studied here can not be distinguished in naturalness level.

Script of the recorded sentences for SPC-HMM experiments


Short Text
當然不能 因為寂寞而找個女伴相陪,誰能保證先親密而後能有愛?

與女生相約出遊也可以是朋友關係,不可讓她認定是你約她就是把她當女友。


Synthetic Speech Files

Speech File
Pitch-contour
Generation Method
SPC-HMM state sequences
(char., state)
XA link to synthetic syllable waveform SPC-HMM (3 states)
in Mode-A generation mode
當,0
然,0
不,1
能,1
因,1
為,1
寂,1
寞,1
而,1
找,1
個,2
女,2
伴,2
相,2
陪,2



誰,0
能,0
保,0
證,1
先,1
親,1
密,1
而,1
後,2
能,2
有,2
愛,2






與,0
女,0
生,1
相,1
約,1
出,1
遊,1
也,1
可,2
以,2
是,2
朋,2
友,2
關,2
係,2



不,0
可,1
讓,1
她,2
認,2
定,2
是,2
你,2
約,2
她,2
就,2
是,2
把,2
她,2
當,2
女,2
友,2

XB link to synthetic syllable waveform SPC-HMM (6 states)
in Mode-A generation mode
當,0
然,1
不,1
能,1
因,1
為,1
寂,1
寞,2
而,2
找,3
個,3
女,4
伴,5
相,5
陪,5



誰,0
能,1
保,2
證,3
先,4
親,5
密,5
而,5
後,5
能,5
有,5
愛,5






與,0
女,1
生,2
相,3
約,3
出,4
遊,5
也,5
可,5
以,5
是,5
朋,5
友,5
關,5
係,5



不,0
可,1
讓,2
她,2
認,2
定,2
是,2
你,2
約,2
她,2
就,3
是,4
把,5
她,5
當,5
女,5
友,5

XC link to synthetic syllable waveform SPC-HMM (3 states)
 in Mode-B generation mode
當,0
然,0
不,1
能,1
因,2
為,2
寂,2
寞,2
而,1
找,1
個,1
女,2
伴,2
相,2
陪,2



誰,0
能,0
保,1
證,1
先,2
親,2
密,2
而,1
後,1
能,2
有,2
愛,2






與,0
女,1
生,1
相,1
約,1
出,2
遊,2
也,2
可,1
以,1
是,1
朋,2
友,2
關,1
係,2



不,0
可,0
讓,1
她,1
認,2
定,2
是,2
你,1
約,1
她,1
就,2
是,2
把,2
她,1
當,1
女,2
友,2

XD link to synthetic syllable waveform ANN based







Longer Text:  A White-flower Tree

(a)因為不知道你的名子,  (b)就讓我叫你白花樹,  (c)春天, (d)當你的花朵盛開時,  (e)就像點亮了滿樹白蠟燭.

(f)春天因你而閃閃發光,  (g)笑臉因你而更加明媚,  (h)微風因你而飄送芬芳, (i)日子就像緩緩的溪水.

(j)白花樹變成了一幅畫,  (k)引來了那麼多賞花人.  (l)白花樹從此有了一個家,  (m)他的根連著無數人的心.  (n)名子也許並不那麼重要,  (o)讓人懷念的名子最美好.


Synthetic Speech Files

Speech File
Pitch-contour
Generation Method
SPC-HMM state sequences
SA link to synthetic syllable waveform SPC-HMM (3 states)
in Mode-A generation mode
(a) 0 1 2 2 2 2 2 2 2;  (b) 0 0 1 1 1 1 2 2;  (c) 0 2;  (d) 0 1 2 2 2 2 2 2;  (e) 0 1 2 2 2 2 2 2 2 2;  (f) 0 1 1 1 2 2 2 2 2;  (g) 0 1 1 1 1 2 2 2 2;  (h) 0 0 1 1 1 2 2 2 2;  (i) 0 1 1 1 1 1 2 2 2;  (j) 0 1 2 2 2 2 2 2 2;  (k) 0 1 2 2 2 2 2 2 2;  (l) 0 1 1 1 1 1 2 2 2 2;  (m) 0 1 2 2 2 2 2 2 2 2;  (n) 0 1 2 2 2 2 2 2 2 2;  (o) 0 1 2 2 2 2 2 2 2 2;
SB link to synthetic syllable waveform SPC-HMM (6 states)
in Mode-A generation mode
(a) 0 1 1 2 3 3 4 4 5;  (b) 0 1 2 2 2 3 4 5;  (c) 0 5;  (d) 0 1 1 2 2 3 4 5;  (e) 0 1 2 3 3 4 5 5 5 5;  (f) 0 1 2 3 4 5 5 5 5;  (g) 0 1 2 3 3 4 5 5 5;  (h) 0 1 2 3 4 5 5 5 5;  (i) 0 1 2 2 2 3 4 5 5;  (j) 0 1 2 3 4 5 5 5 5;  (k) 0 1 2 3 4 5 5 5 5;  (l) 0 1 1 2 3 3 4 4 5 5;  (m) 0 1 2 3 4 5 5 5 5 5;  (n) 0 1 2 2 2 3 4 5 5 5;  (o) 0 1 2 3 4 4 5 5 5 5;
SC link to synthetic syllable waveform SPC-HMM (3 states)
in Mode-B generation mode
(a) 0 0 1 1 1 2 2 2 2;  (b) 0 0 1 1 1 2 2 2;  (c) 0 1;  (d) 0 1 1 1 1 2 2 2;  (e) 0 0 0 1 1 1 2 2 2 2;  (f) 0 0 0 1 1 2 2 2 2;  (g) 0 0 0 1 1 2 2 2 2;  (h) 0 0 0 1 1 1 2 2 2;  (i) 0 0 0 1 1 1 2 2 2;  (j) 0 0 0 1 1 1 2 2 2;  (k) 0 0 0 1 1 1 2 2 2;  (l) 0 0 0 1 1 2 2 2 2 2;  (m) 0 0 0 1 1 2 2 2 2 2;  (n) 0 0 1 1 1 1 2 2 1 2;  (o) 0 0 1 1 1 2 2 2 1 2;
SD link to synthetic syllable waveform ANN based