Research on Low-Resource Language Machine Translation for the"Belt and Road"
With the development of the"Belt and Road"initiative,the demand for cross-language communication between countries and regions along the"Belt and Road"has grown,and Machine Translation(MT)technology has gradually become an important means of in-depth exchange between countries.However,owing to the abundance of low-resource languages and scarcity of language materials in these countries,progress in machine translation research has been relatively slow.This paper proposes a low-resource language machine translation training method based on the NLLB model.An improved training strategy based on a multilingual pre-training model is deployed to optimize the loss function under the premise of data augmentation,thereby effectively improving the translation performance of low-resource languages in machine translation tasks.The ChatGPT and ChatGLM models are used to evaluate translation performance for Laotian-Chinese and Vietnamese-Chinese,respectively.Large Language Models(LLM)are already capable of translating low-resource languages,and the ChatGPT model significantly outperforms the traditional Neural Machine Translation(NMT)model in Vietnamese-Chinese translation tasks.H owever,the translation of Laotian requires further improvement.The experimental results show that compared to the NLLB-600M baseline model,the proposed model achieves average improvements of 1.33 in terms of BiLingual Evaluation Understudy(BLEU)score and 0.82 in terms of chrF++score in Chinese translation tasks for four low-resource languages.These results fully demonstrate the effectiveness of the proposed method in low-resource language machine translation.In another experiment,this method uses the ChatGPT and ChatGLM models to conduct preliminary studies on Laotian-Chinese and Vietnamese-Chinese,respectively.In Vietnamese-Chinese translation tasks,the ChatGPT model significantly outperformed the traditional NMT models with a 9.28 improvement in BLEU score and 3.12 improvement in chrF++score.
low-resource languagesMachine Translation(MT)data enhancementmultilingual pre-training modelsLarge Language Model(LLM)