許瓊之(基本資料)

出生地 : 台南市 學歷 : 桃園縣立龍星國小 桃園私立治平國中 桃園私立治平高中 私立大同大學資訊工程學系 國立台灣科技大學資工所

論文研究

指導老師: 古鴻炎 博士 中文題目: 整合聲學指引規則至HMM最佳路徑搜尋之歌聲分段方法 英文題目: Singing Voice Signal Segmentation Methods Integrating Acoustic-guiding Rules into HMM Based Best-Path Searching 中文/英文摘要:
  1. 本論文對於歌聲信號裡聲、韻母時間位置之自動分段的問題,提出了一種整合聲學指引規則至HMM維特比解碼中作最佳路徑搜尋的方法,可用以大幅提升基於HMM之基本分段方法的準確率。我們製作的聲、韻母自動分段程式,分成三種版本,分別使用不同的維特比解碼演算法,來作相互的效能比較。在HMM訓練階段,我們使用HTK軟體對TCC-300語料庫中選出的語句,去訓練出聲、韻母HMM模型;然後透過強制對齊,對自備的歌聲語料,分析各聲、韻母HMM之各狀態上的駐留時長參數,如此就可帶入伽瑪(gamma)機率分佈,去計算外顯式狀態時長機率。在測試階段,實驗的結果顯示,使用外顯式狀態時長機率之修正的維特比解碼可以比基本維特比解碼在10 ms之容忍度內提升7.55% 的準確率;進一步依各音框偵測出的基頻值與能量值,並依聲學知識去設計聲、韻母相關的限制規則,再把規則整合至維特比解碼的步驟中,如此比起基本的維特比解碼,可讓準確率從31.73% 提升到61.33%;接著再把聲、韻母HMM駐留時長的限制規則整合進去,則可讓準確率再提升至66.86%;最後當再加入一種靜音相關的後處理步驟來更正聲、韻母邊界,則準確率更可提升到68.45%。

  2. Is this thesis, we propose singing voice signal segmentation methods that integrate acoustic-guiding rules into HMM (hidden Markov model) based best-path searching to gratly improve the segmentation accuracies for HMM based segmentation of syllable initials and finals. In practice, we have programmed three versions of Viterbi decoding algorithms for automatical segmentation of initials and finals, and then compare their performances. In the training stage, the software package, HTK, is used to train syllable initial and finial HMM models with some selected setences from TCC-300 corpus. Next, we estimate the state-duration parameters of each HMM state by means of forced aligning our recorded singing voice signals. Then, the parameters of state durations can be used to calculate gamma distribution based explicit state-duration probability. In the testing stage, the results of the experiments show that the Viterbi decoding algorithm using explicit state duration probability can obtain the segmentation accuracy rate which is 7.55% higher than the Viterbi decoding algorithm using implicit state transition probability under the tolerance range of 10 ms. Furthermore, we base on the detected fundamental frequency and energy from each frame to design acoustic knowledge related constraint rules, and integrate these acoustic-guiding rules into the improved Viterbi decoding algorithm. By using this Viterbi decoding algorithm, the segmentation accuracy rate can be promoted from 31.73% to 61.33% as compared with the basic Viterbi decoding algorithm. In addition, if we integrate more rules to constrain the duration of initials and finals, the segmentation accurate rate will be raised to 66.86%. Finally, we add a post-processing step that adjusts the boundaries of initials and finals according to the detected silence frames. As a result, the segmentation accuracy rate is further raised to 68.45%.

研究成果
右圖中,橫軸代表容忍度範圍
(10ms~50ms),縱軸表示準確
率數值,圖中四條準確率曲線
分別為使用後處理步驟、使用
整合聲學規則之擴充式維特比
解碼、使用外顯式狀態時長機
率之維特比解碼及基本維特比
解碼。
論文四種歌聲聲、韻母自動分段方法