注意力机制和多尺度特征融合的细粒度图像分类

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：针对细粒度图像分类易受背景干扰、关键区域定位不准确以及模型参数量大的问题,提出了一种注意力机制和多尺度特征融合的分类网络(networks of combine attention mechanisms and multi-scale features,AM-Net).首先,以YOLOv7 网络为基础,使用Ghost BottleNeck模块重新搭建轻量级主干网络,并使用GhostConv替换颈部网络中的Conv,实现模型的轻量化.其次,引入无参的SimAM注意力机制,通过考虑空间和通道维度的相关性推断特征图的三维注意力权重,表征局部显著特征,抑制无用特征,提高目标区域信息的有效性.最后,构建可特征选择的金字塔池化模块(fast spatial pyramid pooling with feature selection and convolutions,SPPFC),帮助网络模型更好地捕捉和处理目标的多尺度特征,提高模型的感知能力.通过实验可知,AM-Net在Stanford Dogs数据集上的准确率、精确率、召回率和F1 分数分别达到88.9%、83.6%、85.7%和84.6%,模型参数量为26.53 MB,每秒帧率达到89.3 帧,在Stanford Cars数据集上的准确率、精确率和召回率分别达到95.2%、93.7%和94.9%.实验结果表明,AM-Net可以在轻量化网络的同时提高细粒度图像的分类精度,相比于其他网络模型性能有较大提升.

外文标题：Attentional mechanisms and multiscale feature fusion for fine-grained image classification

外文摘要：Fine-grained image classification is susceptible to background interference,inaccurate localization of key regions and a large number of model parameters.To address these problems,we propose a classification network with attention mechanism and multi-scale feature fusion.First,based on the YOLOv7 network,the lightweight backbone network is rebuilt using the Ghost BottleNeck module,and the Conv in the neck network is replaced with GhostConv to realize the lightweight of the model.Second,a parameter-free SimAM attention mechanism is introduced to infer the 3D attention weights of the feature map by considering the correlation between spatial and channel dimensions,characterizing locally salient features,suppressing useless features,and improving the effectiveness of the target region information.Finally,a feature-selectable pyramid pooling module is built to help the network model better capture and process the multi-scale features of the target and improve the model's perceptual ability.Our results suggest AM-Net on Stanford Dogs dataset reaches 88.9%in accuracy,83.6%in precision,85.7%in recall and 84.6%in F1 score.Moreover,the number of model parameters is 26.53 MB,and the frame rate per second reaches 89.3 frames per second on Stanford Cars dataset with 95.2%in accuracy,93.7%in precision and 94.9%in recall.Our experimental results show AM-Net improves the classification accuracy of fine-grained images and reduces the weight of the network,markedly improving the performance compared with other network models.

外文关键词：

artificial intelligencefine-grained classificationfeature extractionattention mechanismsmulti-scale feature fusion

作者：

李云红、郭越、谢蓉蓉、张蕾涛、苏雪平、李丽敏、陈锦妮

展开 >

作者单位：

西安工程大学电子信息学院,西安 710048

山西大学生命科学学院,太原 030031

关键词：

人工智能细粒度分类特征提取注意力机制多尺度特征融合

出版年：

2024

DOI：

10.3969/j.issn.1674-8425(z).2024.12.019

重庆理工大学学报

重庆理工大学

重庆理工大学学报

CSTPCD北大核心

影响因子：0.567

ISSN：1674-8425

年,卷(期)：2024.38(23)