面向低资源场景的神经机器翻译方法

扫码查看

原文链接

万方数据
维普

中文摘要：神经机器翻译需要大规模的双语平行语料利用深度学习的方法构建翻译模型,但低资源场景下平行句对缺乏,导致训练的神经机器翻译模型效果较差.无监督神经机器翻译技术仅使用两种语言的单语数据,解决了神经机器翻译对大规模双语平行数据的依赖问题.但是无监督神经机器翻译技术存在两个问题,一是对于句法建模能力欠缺;二是在低资源场景下存在的少量双语语料不能用于模型训练,造成双语语料资源浪费.为了解决上述问题,该文提出在无监督神经机器翻译中融合句法知识的方法,使模型可以充分学习句子的句法信息;同时引入少量双语平行语料辅助无监督神经机器翻译训练,使模型直接学习源语言与目标语言单词之间的转换.与基线模型相比较,在英-法和德-英单语新闻数据集上BLEU值分别提升了 1.65和1.79.

外文标题：Neural Machine Translation Method for Low-resource Scenarios

外文摘要：Neural machine translation requires a large-scale bilingual parallel corpus to build a translation model using deep learning methods.In low-resource scenarios,unsupervised neural machine translation is usually applied due to lack of large-scale bilingual parallel data.This paper proposes a method of fusing syntactic knowledge in unsuper-vised neural machine translation,so that the model can fully learn the syntactic information of sentences.At the same time,a small amount of bilingual parallel corpus is introduced to assist unsupervised neural machine translation training,so that the model can directly learn the mapping between source language and target language words.Compared with the baseline system,the proposed method imporves 1.65 to 1.79 BLEU score on the English-French and German-English tasks,respectively.

外文关键词：

unsupervised neural machine translationsyntactic knowledgedenoising auto-encoder

作者：

胡朝东、叶娜、张桂平、蔡东风

展开 >

作者单位：

沈阳航空航天大学人机智能研究中心,辽宁沈阳 110136

关键词：

无监督神经机器翻译句法知识去噪自动编码器

基金：

国家自然科学基金辽宁省重点研发计划沈阳市科学技术计划

项目编号：

U19082162019JH2/1010002020-202-1-28

出版年：

2024

中文信息学报

中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCHSSCD北大核心

影响因子：0.8

ISSN：1003-0077

年,卷(期)：2024.38(6)