首页|基于对比学习的神经机器翻译研究

基于对比学习的神经机器翻译研究

扫码查看
神经机器翻译技术有助于突破语言障碍,加强文化交流。针对低资源型语言平行语料匮乏而导致的神经机器翻译质量不佳等问题,研究在无监督SimCSE对比学习框架基础上,通过组合随机字符扰动、词嵌入替换、句子顺序变换等三类数据增强方法来生成正样本,使通过该方法训练出的句子嵌入涵盖更丰富的语义信息;然后,使用该对比学习方法,混合单语语料预训练句子嵌入;最后用少量平行语料进行微调。实验证明,在神经机器翻译中,BLEU值提高了2。69。
Research on neural machine translation based on contrastive learning
Neural machine translation technology can help to break through the language barrier and strengthen cultural commu-nication.Aiming at the problems such as lack of parallel corpus and poor quality of neural machine translation,this paper com-bined three kinds of data augmentation methods to generate positive samples on unsupervised SimCSE comparative learning framework,so that the sentence embedment trained by this method could cover more semantic information.Then,using the com-parative learning method,the sentence embedding was pre-trained by mixing mono-language corpus.Finally,a small amount of parallel corpus was used to fine-tune the model.The experimental results showed that BLEU value was increased by 2.69.

contrastive learningdata augmentationneural machine translation

李泽宇、殷锋、陈赛飞扬、王小雪

展开 >

西南民族大学计算机科学与工程学院,四川成都 610041

对比学习 数据增强 神经机器翻译

四川省教育信息技术研究资助项目成都市哲学社会科学规划资助项目西南民族大学中央高校基本科研业务费专项资金资助项目

DSJ20220362022BS0272022SZL20

2024

西南民族大学学报(自然科学版)
西南民族大学

西南民族大学学报(自然科学版)

CSTPCD
影响因子:0.441
ISSN:2095-4271
年,卷(期):2024.50(4)