End-to-End Speech Keyword Spotting Training Method Based on Sample's Class Uncertainty
End-to-end deep learning is the main technology for speech keyword spotting.The research focused on ex-ploring better network structures,modeling units,and search strategies,and have made a lot of progress.However,less at-tention is paid on training efficiency.In this paper,a novel class uncertainty sampling(CUS)strategy is proposed to select effective samples for each training epoch.Since only a subset is used,much training time is saved.The core idea of CUS is measuring the class uncertainty of samples with the forward information of the output layer during the middle and late train-ing stages,and samples are selected at a probability of their class uncertainty.Therefore more attention is paid to samples nearing the decision boundary,which are prone to missed detection or false alarm.Furthermore,the proposed method could shield the interference of label error samples.Experimental results on the AISHELL-1 Mandarin dataset showed that fast convergence and better training performance were achieved.Against the conventional training strategy,the average training time and the average converging time was relatively shortened by 60%and 47.5%,respectively.At 0.5 FP/h false accept rate(FAR),the false reject rate(FRR)was reduced from 4.75%to 3.65%,a relative reduction of 30.1%,and the maximum term weighted value(MTWV)was increased from 0.837 4 to 0.853 1.Moreover,it was experimentally verified that the method could shield most of the mislabeled samples.This conclusion was confirmed with the experiments on the large-scale AISHELL-2 Mandarin dataset.