轻量化端到端语音指令识别模型研究
Research on Lightweight End-to-End Speech Command Recognition Models
黄晁 1赵忆 1张从连 1袁敏杰 1陈春燕1
作者信息
- 1. 宁波中科信息技术应用研究院(宁波人工智能产业研究院),浙江 宁波 315040
- 折叠
摘要
针对智能家居中小词表语音指令识别应用场景的模型小尺寸和低延时的需求,设计了两种轻量化的基于神经网络和连接时序分类算法(CTC)的中文端到端语音指令识别模型.通过精简网络层数和结构实现模型轻量化,引入CTC算法实现以汉字字符作为建模基元的端到端训练和解码,解决数据预对齐问题.最终在公开数据集Aishell-I和自制语料数据集上进行比较,最终得出CNN-CTC模型以 350 kB的模型大小、5 ms的运行速度、5.02%的字错率、92.0%的意图命中率综合评价后,更适用于小词表语音指令识别应用场景.
Abstract
In response to the demands for small model size and low latency in speech command recognition applications with small vocabularies in smart homes,this paper designs two lightweight Chinese end-to-end command recognition models based on neural networks and connectionist temporal classification(CTC).Model lightness is achieved by simplifying network layers and structures,and CTC algorithm is introduced for end-to-end training and decoding using Chinese characters as modeling units,addressing the data prealignment problem.Finally,Comparative evaluations on the Aishell-I dataset and cus-tom corpora demonstrate that the CNN-CTC model,with a 350kb model size,5ms runtime,5.02%word error rate,and 92.0%intent recognition accuracy,is more suitable for small-vocabulary speech command recognition applications.
关键词
语音指令识别/端到端/轻量化/连接时序分类算法Key words
speech command recognition/end-to-end/lightweight/CTC引用本文复制引用
出版年
2024