A two-phase domain adaptation method for neural machine translation
[Objective]For the purpose of improving the transfer learning performance of neural machine translation(NMT)models,a domain adaptation method is explored with language data as the center.[Methods]According to the quantitative analysis results of two domain adaptation metrics of Kullback-leibler divergence and maximum mean discrepancy,a two-phase decremental learning framework is proposed for scenes with large-scale parallel sentences and small-scale domain texts.In the first phase,namely the domain filtering,domain texts are used to filter parallel sentences so that domain parallel sentences are obtained.Then,these obtained domain parallel sentences are used to train a domain NMT model.In the second phase,namely quality filtering,domain parallel sentences filtered in the first phase are translated by using the trained domain NMT model.Next,qualities of machine translation and manual translation are compared.Then,low quality parallel sentences are deleted to obtain high quality domain parallel sentences.Finally,an optimized domain NMT model is trained from the obtained high quality domain parallel sentences.[Results]Experimental results on English-Chinese NMT adapted to the legal domain show that the proposed two-phase algorithm only requires approximately a quarter of the original training steps,but can increase more than two BLEU points.[Conclusion]The experimental conclusion demonstrates that the decremental learning framework is capable of achieving the state-of-the-art performance with greatly reduced training space-time costs,and can implement fast domain transfer of NMT models.