图像描述是对指定图片进行自然语言描述,在现阶段的研究中大多是基于编码器-解码器结构进行的,为提升图像描述的精确度还可以引入注意力机制,使用模型在编码器-解码器架构基础上,同时引入了一种基于AoA(Attention on Atten-tion)的新的改进注意力机制,使注意力机制轻量化的同时将注意力结果和查询结果的相关性进行确定,来增强图片与词之间的相关性,最后输出自然语言。在公共数据集MSCOCO和Flickr30k作对比验证,通过实验结果与传统一般的注意力机制模型评价结果相比,在进行图像文本描述时使用的改进注意力机制模型,加快了整体模型的收敛速率,提高了相关评价指标并增强了模型性能,有显著的优越性。
An Improved Algorithm for Image Description Generation Based on Convolutional Neural Networks
Image description is a natural language description of a specified image.Most of the current research is based on the encoder-decoder structure.In order to improve the accuracy of image description,an attention mecha-nism can also be introduced.This paper uses the model in the encoder-decoder.On the basis of the decoder architec-ture,a new improved attention mechanism based on AoA(Attention on Attention)is introduced,which makes the at-tention mechanism lightweight and determines the correlation between the attention result and the query result to en-hance the correlation between the image and the word,and finally outputs natural language.Compared with the evalu-ation results of traditional general attention mechanism models,the improved attention mechanism model used in image text description has significantly improved the convergence rate of the overall model,improved relevant evalua-tion indicators,and enhanced model performance in the public dataset MSCOCO and Flickr30k.
Image descriptionNatural language processingAttention mechanismConvolutional neural networkLong short-term memory network