Aiming at the problem of inaccurate image caption generation results due to the complex attribute in-formation,high similarity of classes and low correlation between semantic attributes and visual information of minority clothing images,a local attribute attention network for minority clothing image caption generation is proposed.Firstly,a national clothing image description generation dataset containing 55 categories,30000 im-ages,and about 3 600 MB is constructed;at the same time,208 kinds of local key attribute vocabulary and 30089 text information of minority clothing are defined,and visual features are extracted through the local attribute learning module and text information embedding and use multi-instance learning to obtain local attributes.Then,an attention-aware module including semantics,vision,and gated attention was defined based on the double-layer long short-term memory network.And the image caption generation results of minority clothing were optimized by combining the local attributes,attribute-based visual features,and text encoding information.Experimental results on our established dataset for minority clothing image caption generation show that the proposed methods can generate image captions including key attributes such as minority category and clothing style,and can im-prove the accuracy index BLEU and semantic richness index CIDEr by 1.4%and 2.2%respectively compared with existing methods.
关键词
民族服装图像/图像描述生成/文本信息嵌入/局部属性学习/注意力感知
Key words
minority clothing image/image caption generation/text information embedding/local attribute learning/attention-aware