何嘉康(JIA-KANG,HE)

 















































 











基本資料


出生地:新竹市

學歷: 興趣:

論文研究


指導老師:
古鴻炎 博士

中文題目:
使用半音節單元挑選及HNM信號模型之國語歌聲合成

英文題目:
Mandarin Singing-voice Synthesis Using Demi-syllable Unit Selection and HNM Signal Model

中文摘要:
本論文採取半音節作為歌聲單元以降低文脈組合的數量,並且提出了一個對半音節作單元挑選之動態規劃演算法,而可在配合歌譜音符的音高、時長與文脈、頻譜之連續性的條件下,選出最適合的歌聲單元序列。在訓練階段,每一個半音節單元都會被分析成為一序列音框的DCC頻譜係數,以便在合成階段進行音高、時長的調整,而歌聲信號的合成則是採用HNM信號模型,去作信號波形的合成。對於不同來源之半音節單元的串接,會導致振幅不連續的問題,我們研究了一個振幅平滑化的處理方法。使用所合成出的歌聲音檔,我們進行了流暢度與自然度的聽測實驗,在流暢度之聽測實驗,比較的對象為單元挑選演算法設為最差情況所合成出的歌曲,結果得到的平均評分為1.04,表示我們的方法可明顯改進流暢度。在自然度聽測,比較的對象是實驗室學長所研究的HMM模型所合成出的歌曲,結果得到的平均評分為1.07,表示我們的歌聲合成方法可明顯改進自然度。

英文摘要:
In this thesis, the voice unit, demi-syllable, is adopted in order to reduce the quantity of context combinations. Also, a dynamic programming based algorithm is proposed for demi-syllable unit selection, which considers the costs of pitch and duration transformations, and considers the costs of context and spectrum continuities. Hence, a most suitable sequence of singing voice units can be found. In the training phase, each demi-syllable unit is analyzed to obtain a sequence of DCC (discrete cepstral coefficient) vectors. Then, the pitch and duration of a syllable can be adjusted in the synthesis phase. Next, the singing voice signals are synthesized with HNM(harmonic-plus-noise) model. Because the demi-syllable units to be concatenated may be selected from different songs, we thus study an amplitude-smoothing method. By using the synthesized singing voices, we conduct two types of listening tests. In the first type of tests, i.e. fluency tests, songs synthesized with cost minimized unit selection are compared with songs synthesized with cost maximized unit selection. As a result, the average score, 1.04, is obtained. This average score indicates that our method, cost minimized unit selection, can indeed improve the fluency level. In the second type of tests, i.e. naturalness tests, songs synthesized by our method are compared with songs synthesized by using HMM model(provided by another researcher). After listening tests, the average score 1.07 is obtained. This average score indicates that our method can indeed promote the naturalness level of synthesized singing voices.


合成音檔試聽