首页|融合BERT预训练语言知识的神经机器翻译方法

融合BERT预训练语言知识的神经机器翻译方法

扫码查看
[目的]针对在神经机器翻译任务中仅使用微调的方法不能充分利用预训练语言知识的问题进行研究.[方法]提出一种双阶段交互融合预训练模型的神经机器翻译方法.首先提取BERT预训练模型的多层表征,利用多层表征构建掩码知识矩阵,将BERT包含的预训练知识作用于神经机器翻译模型编码端词嵌入层.其次,通过自适应融合模块提取BERT多层表征中的有益知识,并与神经机器翻译模型交互融合.[结果]实验结果表明,与Transformer基线模型相比,所提方法在多个神经机器翻译任务上BLEU评分获得了 1.41~4.20的提升,相较于其他融合预训练知识的神经机器翻译方法,所提方法也有较为明显的模型性能提升.[结论]本文提出的双阶段交互融合预训练模型的神经机器翻译方法缓解了灾难性遗忘问题,缩小了预训练模型与神经机器翻译模型因训练目标不同而导致的差异,可以有效利用预训练语言知识来提升神经机器翻译模型性能.
Neural machine translation method integrating BERT's pre-trained language knowledge
[Objective]To study the problem that only fine-tuning method can not make full use of pre-trained language knowledge in neural machine translation task.[Methods]A neural machine translation method based on two-stage interactive fusion of pre-trained models is proposed.First,the multi-layer representation of BERT pre-trained model is extracted,then the mask knowledge matrix is constructed by using the multi-layer representation,and the pre-t raining knowledge contained in BERT is applied to the encoding word embedding layer of neural machine translation model.Second,the beneficial knowledge obtained from BERT multilayer representation is extracted by adaptive fusion module and interactively fused with neural machine translation model.[Results]Experimental results show that,compared with Transformer baseline model,the proposed method achieves an improvement of BLEU score of 1.41~4.20 in multiple neural machine translation tasks.Compared with other neural machine translation methods that integrate pre-training knowledge,the proposed method also secures a significant model performance improvement.[Conclusion]The neural machine translation method proposed herein,which combines pre-trained models with two-stage interaction,resolves the problem of catastrophic forgetting,reduces the difference between pre-trained models and neural machine translation models due to different training objectives,and can effectively use pre-trained language knowledge to improve the performance of neural machine translation models.

machine translationpre-trained language modelattention mechanismtransformer network model

谷雪鹏、郭军军、余正涛

展开 >

昆明理工大学信息工程与自动化学院,云南 昆明 650500

昆明理工大学云南省人工智能重点实验室,云南 昆明 650500

机器翻译 预训练语言模型 注意力机制 Transformer网络模型

2024

厦门大学学报(自然科学版)
厦门大学

厦门大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.449
ISSN:0438-0479
年,卷(期):2024.63(6)