Deep attention-based image caption model with fusion of global and semantic feature
Aiming at the problems that existing image caption generation models face limitations when utilizing global features due to the fixed receptive field size,and object region-based image features lack background information,an image caption model(DFGS)is proposed.A multi-feature fusion module is designed to fuse global and semantic feature,allowing the model to focus on key object and background information in the image.A deep attention-based decoding module is designed to align visual and textual features,enhancing the generation of higher-quality image description statements.Experimental results on MSCOCO data-set show that the proposed model can produce more accurate captions,and is competitive compared with other advanced models.