Neural machine translation(NMT)models are usually trained using bilingual data.Building large-scale bi-lingual datasets is a huge challenge.In contrast,large-scale monolingual datasets for most languages are easier to construct.Pre-trained models(PTM)proposed in recent years can be trained on massive monolingual data.The ge-neric representation of knowledge learned through pre-training helps achieve significant performance gains in down-stream tasks.Currently pre-trained neural machine translation(PTNMT)has been extensively validated on resource-constrained datasets,but how to efficiently utilize PTM on high-resource NMT remains to be discussed.This paper focuses on reviewing and analyzing the current state and related problems of PTNMT,classifing PTNMT methods in terms of PTM's pre-trained methods,strategies,or specific tasks.We summarize the problems solved by PTNMT's methods,and conclude with a future outlook on PTNMT research.
关键词
自然语言处理/预训练模型/神经机器翻译
Key words
natural language processing/pre-trained model/neural machine translation