首页|双通道解码的端到端连续语音识别

双通道解码的端到端连续语音识别

扫码查看
在端到端连续语音识别系统中,完全基于自注意力机制的Transformer模型相比传统的混合模型提高了准确率.Conformer模型是在Transformer模型基础上增加一个擅长提取局部特征的卷积模块,将该模型作为整个识别系统的编码器,解码器使用注意力机制,注意力模型只适合短句子识别,并且在数据集存在噪声时会导致网络训练不稳定,添加CTC模型的序列对齐特性辅助训练来帮助模型收敛更快.针对单通道解码可在识别准确率上进一步优化的问题,提出了 CTC与Atten-tion 双通道解码模型,将双通道解码与单一的CTC解码和单一的Attention解码进行对比验证,结果表明双通道解码在识别性能上提升了 1%.针对在噪声环境下识别效果降低的问题,提出对端到端网络添加语言模型的方法.将N-gram语言模型加入网络中进行验证,结果表明在信噪比为10 dB的高噪声环境下,语言模型能够使字错率下降3.5%,提高了语音识别系统的鲁棒性.
End-to-end continuous speech recognition with dual-channel decoding
In the end-to-end continuous speech recognition system,the Transformer model based entirely on the self-attention mech-anism improves accuracy compared to the traditional hybrid model.The Conformer model adds a convolution module that is good at extracting local features based on the Transformer model,and uses this model as the encoder of the entire recognition system.The decoder uses an attention mechanism.Since the attention model is only suitable for short sentence recognition and will cause net-work training instability when there is noise in the data set,the sequence alignment characteristics of the CTC model are added to as-sist training to help the model converge faster.In view of the problem that single-channel decoding can further optimize the recogni-tion accuracy,a dual-channel decoding model of CTC and Attention was proposed.The dual-channel decoding was compared and verified with a single CTC decoding and a single Attention decoding.The results show that dual-channel decoding is more effective in recognition.Performance can be improved by 1%.In order to solve the problem of reduced recognition effect in noisy environ-ment,a method of adding language model to the end-to-end network was proposed.The N-gram language model was added to the network for verification.The results show that in a high-noise environment with a signal-to-noise ratio of 10 dB,the language model could reduce the word error rate by 3.5%,improving the robustness of the speech recognition system.

speech recognitionencoderdecoderend-to-enddual-channellanguage mode

朱洋、曾庆宁、赵学军

展开 >

桂林电子科技大学信息与通信学院,广西桂林 541004

语音识别 编码器 解码器 端到端 双通道 语言模型

2024

桂林电子科技大学学报
桂林电子科技大学

桂林电子科技大学学报

影响因子:0.247
ISSN:1673-808X
年,卷(期):2024.44(2)