泉州师范学院学报2024,Vol.42Issue(2) :10-17.

基于Transformer-LSTM的闽南语唇语识别

A Research on Minnan Dialect Lip-Language Recognition Based on Transformer-LSTM

曾蔚 罗仙仙 王鸿伟
泉州师范学院学报2024,Vol.42Issue(2) :10-17.

基于Transformer-LSTM的闽南语唇语识别

A Research on Minnan Dialect Lip-Language Recognition Based on Transformer-LSTM

曾蔚 1罗仙仙 1王鸿伟1
扫码查看

作者信息

  • 1. 泉州师范学院 数学与计算机科学学院,福建 泉州 362000;福建省大数据管理新技术与知识工程重点实验室,福建 泉州 362000;智能计算与信息处理福建省高等学校重点实验室,福建 泉州 362000
  • 折叠

摘要

针对端到端句子级闽南语唇语识别的问题,提出一种基于 Transformer 和长短时记忆网络(LSTM)的编解码模型.编码器采用时空卷积神经网络及 Transformer 编码器用于提取唇读序列时空特征,解码器采用长短时记忆网络并结合交叉注意力机制用于文本序列预测.最后,在自建闽南语唇语数据集上进行实验.实验结果表明:模型能有效地提高唇语识别的准确率.

Abstract

An Encoder-Decoder Model based on Transformer and long short term memory(LSTM)was proposed for end-to-end sentence level Minnan dialect lip recognition.The encoder used a spatiotemporal convolutional neural network and Transformer encoder to extract spatiotemporal features of lip reading sequences.The decoder used a long-term and short-term memory network combined with cross attention mechanism for text sequence prediction.Finally,experiments were conducted on the self built Minnan di-alect lip language dataset,and the experimental results showed that the model can effectively improve the accuracy of lip language recognition.

关键词

唇语识别/闽南语/Transformer/长短时记忆网络(LSTM)/用时空卷积神经网络/注意力机制/端到端模型

Key words

lip-language/Minnan language/Transformer/long short term memory(LSTM)/using spatio-temporal convolutional neural networks/attention mechanism/end-to-end model

引用本文复制引用

基金项目

福建省教育厅中青年教师教育科研项目(JAT200542)

出版年

2024
泉州师范学院学报
泉州师范学院

泉州师范学院学报

影响因子:0.285
ISSN:1009-8224
参考文献量18
段落导航相关论文