基于CLIP与注意力机制的跨模态哈希检索算法

Cross modal hash retrieval algorithm based on CLIP and attention mechanism

党张敏 ¹喻崇仁 ¹殷双飞 ¹张宏娟 ²陕振 ¹马连志¹

扫码查看

作者信息

1. 中国航天科工集团第二研究院七○六所,北京 100854
2. 中国航天科工集团第二研究院军代室,北京 100854
折叠

摘要

针对传统无监督跨模态检索算法提取样本内部与样本之间的关联语义不充分,导致检索准确率低的问题,提出一种基于CLIP与注意力融合机制的无监督跨模态哈希检索算法CAFM_Net.将多模态预训练模型CLIP运用到样本特征提取阶段,从不同维度挖掘数据的相似信息;使用注意力融合机制对提取的特征进行处理,加强显著区域的权重;引入对抗学习的思想设计模态分类器,生成更趋于语义一致性的跨模态数据哈希编码.与现有的代表性哈希方法相比,CAFM_Net在多模态检索任务上准确率提升至少11％与9％.

Abstract

To address the issue of low retrieval accuracy caused by the inability of traditional unsupervised cross modal retrieval algorithms to fully extract the correlation semantics between samples,an unsupervised cross modal hash retrieval algorithm CAFM_Net based on CLIP and attention fusion mechanism was proposed.The multimodal pre-training model CLIP was applied to the sample feature extraction stage,mining similar information from different dimensions of the data.The attention fusion mechanism was used to process extracted features and enhance the weight of salient regions.The idea of adversarial learning was introduced to design a modal classifier,generating cross modal data hash encoding that tended towards semantic consistency.Compared with existing representative hash methods,CAFM_Net improves the accuracy by at least 11％and 9％on multimodal retrieval tasks.

关键词

无监督哈希/跨模态检索/CLIP/注意力融合/对抗学习/深度学习/Transformer

Key words

unsupervised hash/cross modal retrieval/CLIP/attention fusion/adversarial learning/deep learning/Transformer

引用本文复制引用

出版年

2024

计算机工程与设计

中国航天科工集团二院706所

计算机工程与设计

CSTPCD北大核心

影响因子：0.617

ISSN：1000-7024

参考文献量16

段落导航