首页|基于Transformer-LSTM的闽南语唇语识别

基于Transformer-LSTM的闽南语唇语识别

扫码查看
针对端到端句子级闽南语唇语识别的问题,提出一种基于 Transformer 和长短时记忆网络(LSTM)的编解码模型.编码器采用时空卷积神经网络及 Transformer 编码器用于提取唇读序列时空特征,解码器采用长短时记忆网络并结合交叉注意力机制用于文本序列预测.最后,在自建闽南语唇语数据集上进行实验.实验结果表明:模型能有效地提高唇语识别的准确率.
A Research on Minnan Dialect Lip-Language Recognition Based on Transformer-LSTM
An Encoder-Decoder Model based on Transformer and long short term memory(LSTM)was proposed for end-to-end sentence level Minnan dialect lip recognition.The encoder used a spatiotemporal convolutional neural network and Transformer encoder to extract spatiotemporal features of lip reading sequences.The decoder used a long-term and short-term memory network combined with cross attention mechanism for text sequence prediction.Finally,experiments were conducted on the self built Minnan di-alect lip language dataset,and the experimental results showed that the model can effectively improve the accuracy of lip language recognition.

lip-languageMinnan languageTransformerlong short term memory(LSTM)using spatio-temporal convolutional neural networksattention mechanismend-to-end model

曾蔚、罗仙仙、王鸿伟

展开 >

泉州师范学院 数学与计算机科学学院,福建 泉州 362000

福建省大数据管理新技术与知识工程重点实验室,福建 泉州 362000

智能计算与信息处理福建省高等学校重点实验室,福建 泉州 362000

唇语识别 闽南语 Transformer 长短时记忆网络(LSTM) 用时空卷积神经网络 注意力机制 端到端模型

福建省教育厅中青年教师教育科研项目

JAT200542

2024

泉州师范学院学报
泉州师范学院

泉州师范学院学报

影响因子:0.285
ISSN:1009-8224
年,卷(期):2024.42(2)
  • 18