基于Transformer-LSTM的闽南语唇语识别
A Research on Minnan Dialect Lip-Language Recognition Based on Transformer-LSTM
曾蔚 1罗仙仙 1王鸿伟1
作者信息
- 1. 泉州师范学院 数学与计算机科学学院,福建 泉州 362000;福建省大数据管理新技术与知识工程重点实验室,福建 泉州 362000;智能计算与信息处理福建省高等学校重点实验室,福建 泉州 362000
- 折叠
摘要
针对端到端句子级闽南语唇语识别的问题,提出一种基于 Transformer 和长短时记忆网络(LSTM)的编解码模型.编码器采用时空卷积神经网络及 Transformer 编码器用于提取唇读序列时空特征,解码器采用长短时记忆网络并结合交叉注意力机制用于文本序列预测.最后,在自建闽南语唇语数据集上进行实验.实验结果表明:模型能有效地提高唇语识别的准确率.
Abstract
An Encoder-Decoder Model based on Transformer and long short term memory(LSTM)was proposed for end-to-end sentence level Minnan dialect lip recognition.The encoder used a spatiotemporal convolutional neural network and Transformer encoder to extract spatiotemporal features of lip reading sequences.The decoder used a long-term and short-term memory network combined with cross attention mechanism for text sequence prediction.Finally,experiments were conducted on the self built Minnan di-alect lip language dataset,and the experimental results showed that the model can effectively improve the accuracy of lip language recognition.
关键词
唇语识别/闽南语/Transformer/长短时记忆网络(LSTM)/用时空卷积神经网络/注意力机制/端到端模型Key words
lip-language/Minnan language/Transformer/long short term memory(LSTM)/using spatio-temporal convolutional neural networks/attention mechanism/end-to-end model引用本文复制引用
基金项目
福建省教育厅中青年教师教育科研项目(JAT200542)
出版年
2024