Context Gate Based Multimodal Information Fusion for Image Description Translation
Image description translation translate image description with the image modal information in an end-to-end system.The traditional image description translation is to assist the translation of the source language by using the vital feature in the image.To capture the source language context that affects the adequacy of the translation to-gether with the target language context that affects the fluency,this paper proposes a multi-modal information fu-sion decoding method based on gating mechanism for the image description translation.Our model uses context gates to dynamically adjusts the contribution of the source and target language contexts to the translation results,improving both the adequacy and fluency of translation results.Experiments show that the method increases the per-formance of image description translation with 1.3%,1.0%,1.5%and 1.4%,respectively,on the four tasks of En-De and En-Fr in Multi30k-16 and Multi30k-17.
image description translationmultimodal machine translationcontext gatesadequacy and fluency