to the batch length, T, it is defined according to our experiment
results. Take the partial sentence /zie-3 zyue-2 fang-1 an-4/ as an
example. When T is set to 20, 30, 40, and 50, the speech segments
automatically selected will be as those shown in Fig. D, Fig. E, Fig.
F, and Fig. G, respectively. The numbers of segments selected for the
partial sentence are (3, 3, 4, 2) in Fig. D; (2, 3, 2, 2) in Fig. E;
and (2, 2, 2, 2) in Fig. F and Fig. G. The best setting would be T=40.
Nevertheless, we decide to use T=30 (equivalent to 150 ms) in order to
treat a situation better, i.e. the speaking rate of the source
utterance may become slightly faster sometimes.