中文信息学报2024,Vol.38Issue(6) :77-85.

基于域对抗迁移学习的低资源机器翻译

Domain-adversarial Transfer Learning for Low-resource Neural Machine Translation

常鑫 侯宏旭 乌尼尔 贾晓宁 李浩然
中文信息学报2024,Vol.38Issue(6) :77-85.

基于域对抗迁移学习的低资源机器翻译

Domain-adversarial Transfer Learning for Low-resource Neural Machine Translation

常鑫 1侯宏旭 1乌尼尔 1贾晓宁 1李浩然1
扫码查看

作者信息

  • 1. 内蒙古大学计算机学院,内蒙古呼和浩特 010021
  • 折叠

摘要

当域外和域内分别表示不同的语言时,语言之间的差异会导致域外知识难以适应至域内.因此提出域对抗迁移学习方法来改进机器翻译模型.采用对抗学习方法,加入一个域判别器对域外和域内的语义特征进行预测,通过最小化域外和域内语义特征预测值优化编码器.当两个领域的语义特征预测值相近时,说明模型学习到一个可以把域内数据映射到域外的映射函数.通过实验,该方法在蒙古语-汉语和维吾尔语-汉语等翻译任务上展现出一定的泛化能力.

Abstract

When the out-of-domain and in-domain represent different languages,the differences between languages will make it difficult adapt the out-of-domain knowledge to the in-domain.This paper proposes a domain-adversarial transfer learning method to improve the neural machine translation model.Under the adversarial learning frame-work,a domain discriminator is employed to predict the semantic features that from out-of-domain or in-domain,and the encoder is optimized by minimizing the prediction values of the semantic features.When the predicted values of semantic features in the two domains are similar,it means that the model has learned the mapping function that can transfer in-domain data into out-of-domain.Experiments show a certain generalization ability of this method on Mongolian-Chinese and Uyghur-Chinese translation tasks.

关键词

对抗/机器翻译/多语言/对抗学习

Key words

domain adaption/machine translation/multi-language/adversarial learning

引用本文复制引用

基金项目

内蒙古自治区科技成果转化专项(2019CG028)

内蒙古自然科学基金(2018MS06005)

内蒙古自然科学基金(14020202-0114)

出版年

2024
中文信息学报
中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCSCDCHSSCD北大核心
影响因子:0.8
ISSN:1003-0077
参考文献量18
段落导航相关论文