民族服装图像描述生成的局部属性注意网络

Local Attribute Attention Network for Minority Clothing Image Caption Generation

张绪辉 ¹刘骊 ²付晓东 ²刘利军 ²彭玮²

扫码查看

作者信息

1. 昆明理工大学信息工程与自动化学院昆明 650500
2. 昆明理工大学信息工程与自动化学院昆明 650500;昆明理工大学云南省计算机技术应用重点实验室昆明 650500
折叠

摘要

针对民族服装图像属性信息复杂、类间相似度高且语义属性与视觉信息关联性低,导致图像描述生成结果不准确的问题,提出民族服装图像描述生成的局部属性注意网络.首先构建包含55个类别、30000幅图像,约3600MB的民族服装图像描述生成数据集;然后定义民族服装208种局部关键属性词汇和30089条文本信息,通过局部属性学习模块进行视觉特征提取和文本信息嵌入,并采用多实例学习得到局部属性;最后基于双层长短期记忆网络定义包含语义、视觉、门控注意力的注意力感知模块,将局部属性、基于属性的视觉特征和文本编码信息进行融合,优化得到民族服装图像描述生成结果.在构建的民族服装描述生成数据集上的实验结果表明,所提出的网络能够生成包含民族类别、服装风格等关键属性的图像描述,较已有方法在精确性指标BLEU和语义丰富程度指标CIDEr上分别提升1.4％和2.2％.

Abstract

Aiming at the problem of inaccurate image caption generation results due to the complex attribute in-formation,high similarity of classes and low correlation between semantic attributes and visual information of minority clothing images,a local attribute attention network for minority clothing image caption generation is proposed.Firstly,a national clothing image description generation dataset containing 55 categories,30000 im-ages,and about 3 600 MB is constructed;at the same time,208 kinds of local key attribute vocabulary and 30089 text information of minority clothing are defined,and visual features are extracted through the local attribute learning module and text information embedding and use multi-instance learning to obtain local attributes.Then,an attention-aware module including semantics,vision,and gated attention was defined based on the double-layer long short-term memory network.And the image caption generation results of minority clothing were optimized by combining the local attributes,attribute-based visual features,and text encoding information.Experimental results on our established dataset for minority clothing image caption generation show that the proposed methods can generate image captions including key attributes such as minority category and clothing style,and can im-prove the accuracy index BLEU and semantic richness index CIDEr by 1.4％and 2.2％respectively compared with existing methods.

关键词

民族服装图像/图像描述生成/文本信息嵌入/局部属性学习/注意力感知

Key words

minority clothing image/image caption generation/text information embedding/local attribute learning/attention-aware

引用本文复制引用

基金项目

国家自然科学基金(62262036)

国家自然科学基金(61862036)

国家自然科学基金(61962030)

云南省中青年学术和技术带头人后备人才培养项目(202005AC160036)

出版年

2024

计算机辅助设计与图形学学报

中国计算机学会

计算机辅助设计与图形学学报

CSTPCDCSCD北大核心

影响因子：0.892

ISSN：1003-9775

参考文献量33

段落导航