The Influence of Batch Length in Frames for Segment Selection
As to the batch length, T, it is defined according to our experiment results. Take the partial sentence /zie-3 zyue-2 fang-1 an-4/ as an example. When T is set to 20, 30, 40, and 50, the speech segments automatically selected will be as those shown in Fig. D, Fig. E, Fig. F, and Fig. G, respectively. The numbers of segments selected for the partial sentence are (3, 3, 4, 2) in Fig. D; (2, 3, 2, 2) in Fig. E; and (2, 2, 2, 2) in Fig. F and Fig. G. The best setting would be T=40. Nevertheless, we decide to use T=30 (equivalent to 150 ms) in order to treat a situation better, i.e. the speaking rate of the source utterance may become slightly faster sometimes.

Fig. D    

Fig. E    

Fig. F    

Fig. G