Research on Improved Method of Transformer Based on Differentiated Learning
The neural machine translation model represented by Transformer is the current research hotspot in the field of machine translation.The multi-head attention mechanism is an important part of Transformer.Its function is to enhance the model's ability to extract different information and improve the generalization of the model.However,there is a problem that some self-attention heads fail in the multi-head attention mechanism.In response to this problem,this paper proposes a Transformer improvement method based on differentiated learning,which can fully improve the effectiveness of the self-attention head by using novel differentiated learning methods in the training process of Transformer.Experimental results in a number of machine translation tasks show that,compared to the original Transformer,the improved Transformer based on the differentiated learning method can achieve a higher BLEU value.