基本資料

出生地 : 苗栗縣
學歷 :
苗栗縣立尖山國小
苗栗縣立興華高級中學(國中部)
國立竹南高級中學
私立大同大學 資訊工程學系
國立台灣科技大學 資工所

論文研究

指導老師: 古鴻炎 博士
中文題目: 使用聲學特徵組合之歌唱聲與樂器聲識別
英文題目: Recognition of Singing Voice and Instrument Sound Using Combinations of Acoustic Features
中文摘要: 本論文的目標在於分辨輸入的聲音片段屬於歌唱聲(含有歌聲)或是樂器聲(不含歌聲),研究焦點放於組合不同種類之特徵係數以找出最具有識別效果之特徵向量,在此採用的特徵係數包括梅爾倒頻譜係數、基週偵測特徵係數與Chroma延伸特徵,並且加入前列係數的差分值。所採用的辨識方法則是基於高斯混合模型(GMM)之方法,我們分別訓練8、16、32和64等不同混合數之高斯混合模型,再據以進行外部聲音的識別實驗。在音框為單位之識別實驗中,我們嘗試了6種組合的特徵向量,其中MFCC加上基週偵測係數可比MFCC顯著提升識別率,若再加上差分值及投票機制處理,則音框之識別率最高可達71.3%。在片段為單位之識別實驗,我們嘗試了8種組合的特徵向量,對於純樂器聲片段之識別,識別率最高的是40維係數組合之特徵向量,達到97.1%,對於混合聲片段之識別,識別率最高的是17維係數組合之特徵向量(MFCC+P),達到94.7%,若以平均識別率來看,則最高的是40維係數之特徵向量,達到了93.8%。整體來說,組合MFCC、基週偵測係數、Chroma延伸特徵及它們的差分值之40維特徵向量,可得到最高的聲音片段識別率。
英文摘要: This thesis aims to recognize the class that an input sound clip belongs to. The two sound classes concerned here are singing sound (with vocal singing) and instrument sound (without vocal singing). The focus of this research is placed on testing different combinations of those considered acoustic features in order to find a most effective feature vector for sound class recognition. The acoustic coefficients considered here include mel-frequency cepstral coefficients (MFCC), pitch-detection coefficients (PDC), Chroma extended features, and their delta coefficients. The recognition method studied is based on Gaussian mixture model (GMM). Different numbers of mixtures, e.g. 8, 16, 32 and 64, are used to train the parameters of the GMMs. Then, these GMMs are used in the experiments for recognizing external sound clips. In the experiments for sound frame recognition, we have tried 6 different feature vectors, i.e. 6 different combinations of acoustic features. Among the 6 feature vectors, the vector, MFCC plus PDC, is found to be significantly better than MFCC only in recognition rate. If the feature vector is augmented with delta values and the processing of voting mechanism is added, the best recognition rate achieved is 71.3% for sound frame recognition. In the experiments for sound clips recognition, we have tried 8 different feature vectors, i.e. 8 different combinations of acoustic features. To recognize pure-instrument sound clips, the feature vector consisting of 40 coefficients is found to be the best. The recognition rate achieved is 97.1%. To recognize mixed-sound clips, the feature vector consisting of 17 coefficients (MFCC+PDC) is found to be the best. The recognition rate achieved is 94.7%. If average recognition rate is concerned, the feature vector consisting of 40 coefficients would be the best. The recognition rate achieved is 93.8%. Therefore, the feature vector that obtains the highest recognition rate is of 40 dimensions and consists of MFCC, PDC, Chroma-extended features, and their delta values.
研究成果: 在8種特徵向量中,我們從各種特徵向量的實驗結果找出具有最高平均識別率之混合數,並將該混合數之純歌聲、純樂器聲與混合聲之片段識別準確率,如下表:
                                               
純歌聲 純樂器 混合聲 平均
12維MFCC99% 72.08% 68.85% 79.98%
17維(MFCC+P)100% 75% 84.67% 89.89%
34維(3音框)100% 92.92% 86.48% 93.13%
34維(11音框)100% 96.25% 84.43% 93.56%
6維A型Chroma99.5% 95% 80.33% 91.61%
6維B型Chroma98.5% 96.25% 80.33% 91.69%
20維(17MP+3C)100% 84.17% 92.62% 92.26%
40維(34MP+6C)100% 94.28% 84.43% 93.83%