姜育民

Yu-Min Jiang

論文研究


指導教授:古鴻炎老師

中文題目:基於中值濾波及數項改進之語音分離系統

英文題目:A Voice Separation System Based on Median Filtering and a few Improvements

中文摘要:本論文基於相似音框尋找及中值篩選以求取背景音樂的觀念,研究了混合聲訊頻譜扣減音樂頻譜之人聲分離方法的一些相關問題,除了提出改進效能的方法,我們也實際製作了一個線上的人聲分離系統。我們對相似音框中值法測試了最近似音框的保留個數和遮罩參數設定,平均SDR (source to distortion ratio, SDR)提升0.94dB;接著在相似振幅頻譜的音框尋找上,我們把線性振幅改成對數振幅去算頻譜距離,也嘗試以平均振幅對頻譜能量作等化,結果發現採用對數振幅頻譜距離所分離出的人聲,可讓平均SDR效能獲得顯著提升,平均SDR提升了0.97dB;此外,應用頻譜平坦度量測去找出含鼓聲的人聲音框,然後將這些音框的頻譜改分離至音樂,如此分離出之人聲就可免除鼓聲的聽覺干擾了,平均SDR提升了0.02dB,接著我們對被移除頻譜的音框,測試填補與不填補頻譜的影響,結果發現差異很小;另外,我們嘗試把低頻的頻譜成分移除,如此可減少音樂對人聲的干擾,平均SDR值提升了1.01dB。整體來說,採用對數振幅頻譜距離、鼓聲移除和低頻移除,所分離出之人聲品質可獲得很顯著的改進,平均SDR值會由2.48dB提升到5.42dB。

ABSTRACT:In this thesis, we study some relevant problems about voice separation that subtracts music spectrum from mixed spectrum. To extract the music spectrogram from the mixed spectrogram, we adopt the concepts, searching nearest neighbor frames and median filtering. As the achievement, we have not only proposed some methods to improve the separation performance, but also implemented an on-line voice separation system. First, for the number of nearest neighbor frames to keep and the mask parameter value, we have run a few calibration experiments. By using the best values, the average SDR (source to distortion ratio) is raised 0.94dB. Next, for selecting the nearest neighbor frames, spectrum magnitude is changed from linear scale to logarithmic scale to calculate the spectral distance between two frames. Also, we have attempted to equalize a spectrum by using its average magnitude. According to the results of the experiments, using logarithmic magnitude to calculate the spectral distance may raise the average SDR considerably, i.e. 0.97dB. In addition, a spectral-flatness measure is used to detect the frames of drum sound. Then, the spectrum bins of these frames are reassigned to music spectrogram. Consequently, the separated voice can get rid of the interference of the drum sound, and the average SDR is raised 0.02dB. As to the removed spectrum bins in the drum-sound frames, it is found that filling or without filling the empty spectrums will not have noticeable difference. Moreover, we have attempted to remove the low frequency bins of the spectrum in order to reduce the interference from the low frequency music signal. By removing low frequency bins, the average SDR is further raised 1.01dB. Overall, using logarithmic magnitude spectrum to calculate spectral distance, removing drum sound, and removing low frequency bins can have the quality of the separated voice being considerably promoted, and the average SDR is raise from 2.48dB to 5.42dB.

Personal info

Name 姜育民
學歷 國立新莊高級中學
國立台灣師範大學工業科技教育學系
國立台灣科技大學資訊工程所
興趣 網球 桌遊
E-Mail cool60p@gmail.com
Resize