电子与信息学报2024,Vol.46Issue(7) :2932-2941.DOI:10.11999/JEIT230801

基于Transformer和多模态对齐的非自回归手语翻译技术研究

Non-Autoregressive Sign Language Translation Technology Based on Transformer and Multimodal Alignment

邵舒羽 杜垚 范晓丽
电子与信息学报2024,Vol.46Issue(7) :2932-2941.DOI:10.11999/JEIT230801

基于Transformer和多模态对齐的非自回归手语翻译技术研究

Non-Autoregressive Sign Language Translation Technology Based on Transformer and Multimodal Alignment

邵舒羽 1杜垚 2范晓丽3
扫码查看

作者信息

  • 1. 北京物资学院物流学院 北京 101149
  • 2. 北京航空航天大学自动化科学与电气工程学院 北京 100191
  • 3. 空军特色医学中心 北京 100142
  • 折叠

摘要

为了解决多模态数据的对齐及手语翻译速度较慢的问题,该文提出一个基于自注意力机制模型Trans-former 的非自回归手语翻译模型(Trans-SLT-NA),同时引入了对比学习损失函数进行多模态数据的对齐,通过学习输入序列(手语视频)和目标序列(文本)的上下文信息和交互信息,实现一次性地将手语翻译为自然语言.该文所提模型在公开数据集PHOENIX-2014T(德语)、CSL(中文)和How2Sign(英文)上进行实验评估,结果表明该文方法相比于自回归模型翻译速度提升11.6~17.6倍,同时在双语评估辅助指标(BLEU-4)、自动摘要评估指标(ROUGE)指标上也接近自回归模型.

Abstract

To address the challenge of aligning multimodal data and improving the slow translation speed in sign language translation,a Transformer Sign Language Translation Non-Autoregression(Trans-SLT-NA)is proposed in this paper,which utilizes a self-attention mechanism.Additionally,it incorporates a contrastive learning loss function to align the multimodal data.By capturing the contextual and interaction information between the input sequence(sign language videos)and the target sequence(text),the proposed model is able to perform sign language translation to natural language in s single step.The effectiveness of the proposed model is evaluated on publicly available datasets,including PHOENIX-2014-T(German),CSL(Chinese)and How2Sign(English).Results demonstrate that the proposed method achieves a significant improvement in translation speed,with a speed boost ranging from 11.6 to 17.6 times compared to autoregressive models,while maintaining comparable performance in terms of BiLingual Evaluation Understudy(BLEU-4)and Recall-Oriented Understudy for Gisting Evaluation(ROUGE)metrics.

关键词

手语翻译/自注意力机制/非自回归翻译/深度学习/多模态数据对齐

Key words

Sign language translation/Self-attention mechanism/Non-autoregressive translation/Deep learning/Alignment of multimodal data

引用本文复制引用

基金项目

国家自然科学基金(8210072143)

北京市教委科技计划青年项目(KM202210037001)

出版年

2024
电子与信息学报
中国科学院电子学研究所 国家自然科学基金委员会信息科学部

电子与信息学报

CSTPCD北大核心
影响因子:1.302
ISSN:1009-5896
参考文献量4
段落导航相关论文