基于Transformer-LSTM的闽南语唇语识别

A Research on Minnan Dialect Lip-Language Recognition Based on Transformer-LSTM

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：针对端到端句子级闽南语唇语识别的问题,提出一种基于 Transformer 和长短时记忆网络(LSTM)的编解码模型.编码器采用时空卷积神经网络及 Transformer 编码器用于提取唇读序列时空特征,解码器采用长短时记忆网络并结合交叉注意力机制用于文本序列预测.最后,在自建闽南语唇语数据集上进行实验.实验结果表明:模型能有效地提高唇语识别的准确率.

外文摘要：An Encoder-Decoder Model based on Transformer and long short term memory(LSTM)was proposed for end-to-end sentence level Minnan dialect lip recognition.The encoder used a spatiotemporal convolutional neural network and Transformer encoder to extract spatiotemporal features of lip reading sequences.The decoder used a long-term and short-term memory network combined with cross attention mechanism for text sequence prediction.Finally,experiments were conducted on the self built Minnan di-alect lip language dataset,and the experimental results showed that the model can effectively improve the accuracy of lip language recognition.

外文关键词：

lip-languageMinnan languageTransformerlong short term memory(LSTM)using spatio-temporal convolutional neural networksattention mechanismend-to-end model

作者：

曾蔚、罗仙仙、王鸿伟

展开 >

作者单位：

泉州师范学院数学与计算机科学学院,福建泉州 362000

福建省大数据管理新技术与知识工程重点实验室,福建泉州 362000

智能计算与信息处理福建省高等学校重点实验室,福建泉州 362000

关键词：

唇语识别闽南语 Transformer 长短时记忆网络(LSTM) 用时空卷积神经网络注意力机制端到端模型

基金：

福建省教育厅中青年教师教育科研项目

项目编号：

JAT200542

出版年：

2024

泉州师范学院学报

泉州师范学院

泉州师范学院学报

影响因子：0.285

ISSN：1009-8224

年,卷(期)：2024.42(2)

参考文献量18