基本資料


  
台北市

   
台北市立陽明高中
私立德名財經科技大學
國立台灣科技大學資訊工程所

   
音樂,唱歌,游泳,電影



論文研究

    

指導教授:古鴻炎老師

中文題目結合HTS頻 譜模型與ANN韻律模型之國語語音合成系

 英文題目A Mandarin Speech Synthesis System Combining HTS Spectrum Models and ANN Prosody Models

中文摘要

        本論文研究了一種結合HTS頻 譜模型與ANN韻律模型的國語語音合成之系統架構。在訓練 階段,關於頻譜係數的分析,我們使用STRAIGHT來求 得較準確的頻譜包絡,然後將各音框的頻譜包絡換算成DCC係 數;接著使用HTS來訓練頻譜HMM模型及決策樹。在合成階段時,我們自行發展程式來利用HTS訓練之頻譜HMM模 型與決策樹,以產生各音框之DCC係數,並且使用ANN韻律模型來產生音長及基週軌跡參數,然後將兩者產生的參數送給HNM信號合成模組,以合成出語音信號。此外我們應用了一種共振峰增強的作 法,使得頻譜包絡過度平滑的問題獲得了改善;在音量及基週軌跡方面,我們分別採取了不同內插作法,也使得自然度獲得改善。在系統製作完成之後,我們對於原 本的HTS系統和本系統做了一個參數測量及兩項聽測實驗的 評估,結果顯示,本論文所製作的系統在合成語音的自然度、清晰度上,都獲得了明顯較佳的表現。

         英文摘要
In this thesis, a system framework for Mandarin speech synthesis is studied, which combines HTS (HMM-based speech synthesis system)  trained spectrum models and ANN (artificial neural network) prosody models. In the training stage, STRAIGHT is used to estimate more accurate spectral envelopes from the training speech frames.  The spectral envelope of each frame is converted to DCC (discrete cepstrum coefficients). Then, HMM (hidden Markov model) spectrum models and decision trees are trained by using HTS. In the synthesis stage, we develop programs to utilize HTS trained HMM spectrum models and decision trees in order to generate DCC for each frame. In addition, ANN prosody models are used to generate the parameters of syllable duration and pitch contour. Then, the prosodic parameters and DCC are sent to the HNM (harmonic plus noise model) signal synthesis module to synthesize speech signals. Additionally, we adopt a formant enhancement method to solve the problem of spectral over smoothing. In terms of some interpolation methods to adjust the intensities and pitch heights of the frames around a syllable boundary, the naturalness level is significantly improved. After our system is built, we compare the performances of our system and the original HTS system on DCC generation and two types of listening tests. The results show that our system is not only better on naturalness level but also better on speech clarity level.

合成音檔展示