VisFEM:一种基于交叉注意力的双视图视觉特征提取模型

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：基于注意力的模型处理计算机视觉任务时,注意力机制的全局特征提取能力较弱,因此,提出了一种基于交叉注意力的双视图视觉特征提取模型VisFEM.模型采用编码器-解码器模型架构,通过交叉注意力机制从双视图中提取粗细粒度两种特征,并将不同编码器的输出特征融合,从而提高模型的全局特征提取能力.在ImageNet高清数据集的分类中准确率达到84.3%,在检索任务中正确召回率达到0.39.

外文标题：VisFEM:A Dual View Visual Feature Extraction Model Based on Cross Attention

外文摘要：When using attention based models to process computer vision tasks,the global feature extraction ability of the attention mechanism is weak.Therefore,a cross attention based dual view visual feature ex-traction model VisFEM is proposed.The model adopts an encoder-decoder model architecture,extracts coarse-grained and fine-grained features from dual views through cross attention mechanism,and fuses the output features of different encoders to improve the global feature extraction ability of the model.In the classification of the ImageNet high-definition dataset,the accuracy rate reaches 84.3%,and in the retrieval task,the correct recall rate reaches 0.39.

外文关键词：

deep learningcomputer visionencoder-decodercross attention mechanism

作者：

冯强、赵佳

展开 >

作者单位：

长春工业大学计算机科学与工程学院,长春 130051

长春工程学院人工智能技术研究院,长春 130012

关键词：

深度学习计算机视觉编码器-解码器交叉注意力机制

基金：

国家自然科学基金面上项目长春市科技发展计划重点研发项目吉林省科技发展计划重点研发项目吉林省教育科学"十四五"规划课题

项目编号：

6197205421ZY5320210201127GXGH21364

出版年：

2024

DOI：

10.3969/j.issn.1009-8984.2024.01.012

长春工程学院学报(自然科学版)

长春工程学院

长春工程学院学报(自然科学版)

影响因子：0.328

ISSN：1009-8984

年,卷(期)：2024.25(1)

参考文献量17