梁弘學Hung-Hsueh Liang


基本資料

 

  

台北縣

 

   

台北市立成淵高中

私立中華大學資訊工程系

國立台灣科技大學資訊工程所

 

   

唱歌,小說,看電影,電玩

 

 


論文研究

·         中文摘要本論文研究了英語歌聲的合成,首先收集約10,000個音節的語料,但只挑選出1,389個不同的音節,因此研究提出一個音節單元建造的方法,包括音節前後串接子音和半音節接合的方式,用以解決音節單元不足的問題。信號模型採用的是諧波加噪音模型(harmonic plus noise model),亦即音節單元的接合形成是在HNM參數的層次。此外,應用國語的ANN模型來產生英語音節的抖音參數,製作動態滿度設定和音量調整程式,以合成出較為自然的歌聲信號。目前已初步完成一個英語的歌聲合成系統,進行主觀的聽測實驗的結果是,以半音節作接合和優先選擇音節單元,兩種方式的合成歌聲幾乎無差異;另外,與一套市面販售的軟體作比較,兩者的評分亦相當接近。

·         英文摘要In this thesis, synthesis of English singing voices is studied. First, 10,000 syllables are segmented from real English sentences. However, only 1,389 different syllables are obtained. Therefore, we propose a syllable-unit construction method to solve the problem of strict lack of synthesis units. The method is to try appending a consonant to the end or front of an existing syllable, or concatenating two semi-syllable units. Signal model used here is based on harmonic-plus-noise model (HNM). This implies that the construction of a syllable unit is done as forming a sequence of frames of HNM parameters. To synthesize more natural singing voice, we apply an ANN model trained by Mandarin songs to generate vibrato parameters for an English syllable. Also, we have implemented the functions of dynamic syllable duration adjusting and volume control. Now an English singing voice synthesis system has been initially built. Synthetic songs are used to perform perception tests. The results of the tests show that the difference between selecting syllable-unit first and forced concatenation of semi-syllable units is nearly indistinguishable. In addition, when compared with a commercial singing-voice synthesis package, our system’s score is very close to that system’s score.