首页|文本信息引导的注意力机制细粒度图像分类

文本信息引导的注意力机制细粒度图像分类

扫码查看
自然图像中带有显式语义信息的场景文本,能提供重要的线索用来解决对应的计算机视觉问题,在文本中,一般专注于利用视觉和文本提示形式的多模式内容来解决细粒度图像分类和检索的任务。论文采用图卷积网络执行多模式推理,并通过学习显着对象和图像中找到的文本之间的公共语义空间来获得关系增强的特征,通过获得一组增强的视觉和文本功能,所提出的模型在两个不同的任务(细粒度分类和上下文文本中的图像检索)方面大大优于现有技术。
Text-Information-Guided Attention Mechanism for Fine-Grained Image Classification
Scene texts with explicit semantic information in natural images can provide important clues to solve corresponding computer vision problems.In texts,they generally focus on using multimodal content in the form of visual and textual cues to solve fine-grained image classification and retrieval tasks.Specifically,this paper employs graph convolutional networks to perform multi-modal reasoning and obtain relation-enhanced features by learning the common semantic space between explicit objects and text found in images,by obtaining an enhanced set of visual and textual features,the proposed model outperforms the state-of-the-art by a large margin on two different tasks(fine-grained classification and image retrieval in contextual text).

fine-grained analysis of imagesmultimodal reasoningGCN

潘恒

展开 >

江苏科技大学能源与动力学院 镇江 212114

图像细粒度分析 多模态推理 图神经网络

2024

计算机与数字工程
中国船舶重工集团公司第七0九研究所

计算机与数字工程

CSTPCD
影响因子:0.355
ISSN:1672-9722
年,卷(期):2024.52(8)