计算机工程与设计2024,Vol.45Issue(3) :852-858.DOI:10.16208/j.issn1000-7024.2024.03.029

基于CLIP与注意力机制的跨模态哈希检索算法

Cross modal hash retrieval algorithm based on CLIP and attention mechanism

党张敏 喻崇仁 殷双飞 张宏娟 陕振 马连志
计算机工程与设计2024,Vol.45Issue(3) :852-858.DOI:10.16208/j.issn1000-7024.2024.03.029

基于CLIP与注意力机制的跨模态哈希检索算法

Cross modal hash retrieval algorithm based on CLIP and attention mechanism

党张敏 1喻崇仁 1殷双飞 1张宏娟 2陕振 1马连志1
扫码查看

作者信息

  • 1. 中国航天科工集团第二研究院七○六所,北京 100854
  • 2. 中国航天科工集团第二研究院军代室,北京 100854
  • 折叠

摘要

针对传统无监督跨模态检索算法提取样本内部与样本之间的关联语义不充分,导致检索准确率低的问题,提出一种基于CLIP与注意力融合机制的无监督跨模态哈希检索算法CAFM_Net.将多模态预训练模型CLIP运用到样本特征提取阶段,从不同维度挖掘数据的相似信息;使用注意力融合机制对提取的特征进行处理,加强显著区域的权重;引入对抗学习的思想设计模态分类器,生成更趋于语义一致性的跨模态数据哈希编码.与现有的代表性哈希方法相比,CAFM_Net在多模态检索任务上准确率提升至少11%与9%.

Abstract

To address the issue of low retrieval accuracy caused by the inability of traditional unsupervised cross modal retrieval algorithms to fully extract the correlation semantics between samples,an unsupervised cross modal hash retrieval algorithm CAFM_Net based on CLIP and attention fusion mechanism was proposed.The multimodal pre-training model CLIP was applied to the sample feature extraction stage,mining similar information from different dimensions of the data.The attention fusion mechanism was used to process extracted features and enhance the weight of salient regions.The idea of adversarial learning was introduced to design a modal classifier,generating cross modal data hash encoding that tended towards semantic consistency.Compared with existing representative hash methods,CAFM_Net improves the accuracy by at least 11%and 9%on multimodal retrieval tasks.

关键词

无监督哈希/跨模态检索/CLIP/注意力融合/对抗学习/深度学习/Transformer

Key words

unsupervised hash/cross modal retrieval/CLIP/attention fusion/adversarial learning/deep learning/Transformer

引用本文复制引用

出版年

2024
计算机工程与设计
中国航天科工集团二院706所

计算机工程与设计

CSTPCD北大核心
影响因子:0.617
ISSN:1000-7024
参考文献量16
段落导航相关论文