A neural machine translation method based on language model distillation
The lack of large parallel corpora is one of the key issues in low-resource neural machine translation.This paper proposes a neural machine translation method based on language model distil-lation,which regularizes neural machine translation training by using a monolingual language model.This method introduces prior knowledge contained in the language model to improve translation results.Specifically,we draw on the idea of knowledge distillation,and use the target-side language model(teacher model)trained on rich monolingual data to construct the regularization factor of the low-resource neural machine translation model(student model),allowing the translation model to learn highly generalized prior knowledge from the language model.Unlike traditional monolingual language models that participate in the decoding process,the language model in this method is only used during training and does not participate in the inference stage,so it can effectively improve decoding speed.Ex-perimental results on two low-resource translation datasets of Uyghur-Chinese and Tibetan-Chinese from the 17th national machine translation conference(CCMT2021)show that compared with the cur-rent state-of-the-art language model fusion baseline system,BLEU can be improved by 1.42 points(Tibetan-Chinese)to 2.11 points(Chinese-Uyghu).
language modelknowledge distillationregularizationlow-resource neural machine translation