Lao-Chinese neural machine translation based on encoding transcription enhancement word embedding transfer
[Objective]As an effective method,the transfer learning improves the performance of low-resource neural machine translation.However,existing transfer learning methods do not perform satisfactorily in Thai to Lao language transfer learning.The main problem lies in that these writing systems of Thai and Lao languages differ from each other,thus leading to the difficulty of establishing accurate transfer vocabulary mappings.[Methods]In this article,we propose a Lao-Chinese neural machine translation method based on encoding and transcribing to enhance word embedding transfer.This method leverages language similarity to establish accurate word list mappings and achieve high-quality model transfer.First,we explore the phonetic similarities between Thai and Lao and employ these similarities to develop a unified romanization transcription rule.Subsequently,a Lao-Chinese neural machine translation framework is constructed to encode and transcribe enhanced word embeddings and facilitate transfer between Thai and Lao languages,thereby establishing accurate word list mapping relationships and achieving improved word embedding transfer from Thai to Lao.[Results]Compared to the baseline model,experimental results indicate that the proposed method achieves BLEU score improvements of 2.45 and 2.74 in Lao-Chinese and Lao-English translation directions.[Conclusion]The proposed romanization-enhanced word embedding transfer method performs satisfactorily in low-resource language transfer learning.Hopefully,it can provide an effective solution for Lao-Chinese neural machine translation for the related language community.
transfer learningThaiLaoromanizationmachine translation