融合乌尔都语词性序列预测的汉乌神经机器翻译

Chinese-Urdu neural machine translation interacting POS sequence prediction in Urdu language

陈欢欢 ¹王剑 ¹Muhammad Naeem Ul Hassan¹

扫码查看

作者信息

1. 昆明理工大学信息工程与自动化学院,云南昆明 650500;昆明理工大学云南省人工智能重点实验室,云南昆明 650500
折叠

摘要

面向南亚和东南亚的小语种机器翻译,目前已有不少研究团队开展了深入研究,但作为巴基斯坦官方语言的乌尔都语,由于稀缺的数据资源和与汉语之间的巨大差距,有针对性的汉乌机器翻译方法研究非常稀少.针对这种情况,提出了基于Transformer的融合乌尔都语词性序列的汉乌神经机器翻译模型.首先利用Transformer对目标语言乌尔都语的词性序列进行预测,然后将翻译模型的预测结果和词性序列模型的预测结果相结合进行联合预测,从而实现语言知识到翻译模型的融入.在现有小规模汉乌数据集上的实验表明,所提方法在数据集上的BLEU值相较于基准模型提升了0.13,取得了较为明显的效果.

Abstract

At present,many research teams have conducted in-depth research on minority language machine translation for South and Southeast Asia.However,as the official language of Pakistan,Urdu has limited data resources and a significant gap from Chinese,resulting in a lack of targeted research on Chinese-Urdu machine translation methods.To address this issue,this paper proposes a Chinese-Urdu neural machine translation model based on Transformer and incorporating Urdu part-of-speech sequence prediction.Firstly,Transformer is used to predict the part-of-speech sequence of the target language Urdu.Then,the translation model's prediction results are combined with the part-of-speech sequence prediction model's results to jointly predict the final translation,thereby integrating language knowledge into the translation model.Experimental results on a small-scale Chinese-Urdu dataset show that the proposed method has a BLEU score of 0.13 higher than the baseline model on the dataset,achieving sig-nificant improvement.

关键词

Transformer/神经机器翻译/乌尔都语/词性序列

Key words

Transformer/neural machine translation/Urdu/part of speech sequence

引用本文复制引用

基金项目

国家自然科学基金(62166022)

国家自然科学基金(62266028)

出版年

2024

计算机工程与科学

国防科学技术大学计算机学院

计算机工程与科学

CSTPCD北大核心

影响因子：0.787

ISSN：1007-130X

参考文献量18

段落导航