Research on Lightweight End-to-End Speech Command Recognition Models
In response to the demands for small model size and low latency in speech command recognition applications with small vocabularies in smart homes,this paper designs two lightweight Chinese end-to-end command recognition models based on neural networks and connectionist temporal classification(CTC).Model lightness is achieved by simplifying network layers and structures,and CTC algorithm is introduced for end-to-end training and decoding using Chinese characters as modeling units,addressing the data prealignment problem.Finally,Comparative evaluations on the Aishell-I dataset and cus-tom corpora demonstrate that the CNN-CTC model,with a 350kb model size,5ms runtime,5.02%word error rate,and 92.0%intent recognition accuracy,is more suitable for small-vocabulary speech command recognition applications.