o                  

o       photo_one

o       研究生姓名:林祐靖

o       出生地:新北市

o       學歷

o         國立新莊高級中學

o         國立臺灣師範大學-工業科技教育學系

o         國立臺灣科技大學-資訊工程所

o       指導教授:古鴻炎老師

o       中文題目:結合HMM頻譜模型與ANN抖音模型之國語歌聲合成

o       英文題目Mandarin Singing Voice Synthesis Combining HMM Spectrum Models and ANN Vibrato Models

o       中文摘要

o                 本論文嘗試結合HTS的頻譜HMM模型與ANN抖音模型,來建造一個顯著提升自然度之國語歌聲合成系統。在訓練階段,使用STRAIGHT去分析各音框的 基頻,接著設計問題集,以使用HTS軟體來訓練音高HMM及訓練頻譜HMM模型與決策樹。在合成階段,先令HTS engine合成出初始歌聲,再令ANN抖音模型依時長資訊去產生各音節具有抖音特性的音高軌跡,然後取代HTS原本產生的音高軌跡,如此就可讓HTS合 成出音高正確且具有抖音的歌聲。此外,我們也考慮了音符滿度的設定,調整合成歌聲的振幅,以消除刺耳的聲音。對於合成的歌聲,我們量測了MGC頻譜係數的 誤差,並且進行主觀聽測實驗,結果顯示,修改音高後的合成歌聲,其自然度會比HTS原本合成的歌聲提高很多。

o        

o       Abstract

In this thesis, a Mandarin singing voice synthesis system is constructed by combining HTS (HMM-based speech synthesis system) trained HMM spectrum models and ANN (artificial neural network) vibrato models. This system is intended to promote the naturalness level of the synthesized singing voice. In the training stage, STRAIGHT is used to analyze the fundamental frequency of each signal frame. Then, we design question sets for the software, HTS, to train HMM fundamental frequency models, HMM spectrum models and decision trees. In the synthesis stage, we first command the HTS engine to synthesize an initial singing voice. Then, the HMM state-staying durations are sent to the ANN vibrato models to generate a natural pitch contour for each lyric syllable. Next, the pitch contour generated by HTS are replaced so that HTS can be enforced to synthesize a new singing voice that not only has correct melody but also has vibrato characteristic. In addition, we consider the occupying rate of a note’s duration, adjust the amplitude of the synthetic singing voice to decrease harsh noise. As to the quality of the synthetic singing voice, average spectral error in terms of MGC (mel generalized cepstrum) coefficients is measured, and listening tests are conducted. The results show that the synthetic singing voice with pitch-replacing is better in naturalness level than the original synthetic singing voice by HTS.

研究成果: http://speech9.csie.ntust.edu.tw/lab/hmm_ann_singing/