Integrating Speaker-nonspecific Timbre Transformation to an HNM Based Speech Synthesis Scheme
(HNM: harmonic plus noise model)
Hung-Yan Gu
e-mail: guhy@mail.ntust.edu.tw


ABSTRACT
In this paper, the harmonic-plus-noise model (HNM) based speech signal synthesis scheme studied previously is further extended to provide the function of speaker nonspecific timbre transformation. To transform synthetic speech's timbre, we have developed a formant based frequency mapping method called piece-wise linear frequency mapping (PLFM). In addition, a commonly adopted method is frequency axis scaling (FAS). Both methods have been integrated into our HNM speech synthesis scheme, and a real-time synthesis system is implemented according to this scheme. The perception test results show that the proposed scheme can indeed transform the source timbre of a female adult into the timbre of a male adult, boy, or girl. In addition, the method PLFM is shown to be better than FAS for obtaining more manful timbre.



(a) For each Mandarin syllable, its HNM parameters are analyzed from just one saved utterance.
(b) The HNM parameters of a source syllable are used to synthesize that syllable's speech signals with diverse prosodic characteristics.
(c) FAS: frequency axis scaling; VTLR: vocal-track-length ratio.
(d) PLFM: piece-wise linear frequency mapping, (or formant-based mapping).





Synthetic Speech
       Avg. syllable duration: 300ms;


         Source Timbre:
 

female adult (F)
text
another
female
adult(G)
Avg.
pitch: 240Hz
link to synthetic syllable waveform T 因為不知道你的名子, 就讓我叫你白花樹, 春天, 當你的花朵盛開時, 就像點亮了滿樹白蠟燭.
 
春天因你而閃閃發光, 笑臉因你而更加明媚, 微風因你而飄送芬芳, 日子就像緩緩的溪水.
 
白花樹變成了一幅畫, 引來了那麼多賞花人. 白花樹從此有了一個家, 他的根連著無數人的心. 名子也許並不那麼重要, 讓人懷念的名子最美好.
 
link to synthetic syllable waveform
link to synthetic syllable waveform U 家人的手,
爸爸的手掌很大, 摸起來粗粗壯壯的, 他是個裝潢師, 專門為人做室內裝潢, 做出來的家具, 大家都讚不絕口.    媽媽的手, 細細柔柔的, 炒出來的菜, 色香味俱全, 每天放學回家, 我都迫不及待的 想吃媽媽炒的菜.
link to synthetic syllable waveform







         Transformed timbres:
 

Male adult
Boy Girl
Avg. pitch,
FAS scaling factor

120Hz,
0.8 (VTLR=100/80)

140Hz,
1.2 (VTLR=100/120)

280Hz,
1.2 (VTLR=100/120)

synthetic
speech T (F)
link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform
synthetic
speech U (F)
link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform
       
synthetic
speech T (G)
link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform
synthetic
speech U (G)
link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform








         More male-adult timbres:
 
Male adult, Avg. Pitch: 120Hz

AA AB AC AD AX AY
text file:
text_s5
FAS
(90/100)
FAS
(80/100)
FAS
(70/100)
FAS
(60/100)
PLFM PLFM and
FAS (90/100)
synthetic
speech T (F)
link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform
synthetic
speech U (F)
link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform
             
synthetic
speech T (G)
link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform
synthetic
speech U (G)
link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform link to synthetic syllable waveform






Example utterances
     The female and male whose formant frequencies are used for mapping
 
link to synthetic syllable waveform link to synthetic syllable waveform 請把這籃兔子送走.
(Please send out this basket of rabbits)
link to synthetic syllable waveform link to synthetic syllable waveform 關心幼兒智力潛能
(Take care of infants' IQ and potential)