基于特征融合与集成学习的细粒度图像分类

Fine-Grained Image Classification Based on Feature Fusion and Ensemble Learning

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：细粒度图像分类旨在准确分类给定超类的子类别,面临类内差异大、类间差异小和训练样本有限等挑战.目前,大多数方法基于Vision Transformer进行改进以提升分类性能,但仍存在一些问题:忽略不同层次分类令牌的互补信息导致全局特征提取不完整,多头自注意力机制中不同头部表现不一致导致局部定位不准确,以及有限训练样本易过拟合.基于此,提出一种基于特征融合与集成学习的细粒度图像分类网络,该网络包含3个模块:多层次特征融合模块融合互补信息以获取更完整的全局特征,多专家局部投票模块基于集成学习投票定位局部令牌以增强局部特征表示能力,注意力引导的混合增强模块缓解过拟合问题,提高分类准确性.所提网络在CUB-200-2011、Stanford Dogs、NABirds和IP102等数据集上的分类精度分别为 91.92%、93.10%、90.98%和 76.21%,相较原始Vision Transformer模型分别提高1.42百分点、1.50百分点、1.08百分点和2.81百分点,优于其他对比细粒度图像分类模型.

外文摘要：Fine-grained image classification aims to recognize subcategories within a given superclass accurately;however,it is faced with challenges of large intra-class differences,small inter-class differences,and limited training samples.Most current methods are improved based on Vision Transformer with the goal of enhancing classification performance.However,the following issues occur:ignoring the complementary information of classification tokens from different layers leads to incomplete global feature extraction,inconsistent performance of different heads in multi-head self-attention mechanism leads to inaccurate part localization,and limited training samples are prone to overfitting.In this study,a fine-grained image classification network based on feature fusion and ensemble learning is proposed to address the above issues.The network consists of three modules:the multi-level feature fusion module integrates complementary information to obtain more complete global features,the multi-expert part voting module votes for part tokens through ensemble learning to enhance the representation ability of part features,the attention-guided mixup augmentation module alleviates the overfitting issue and improves the classification accuracy.The classification accuracy on CUB-200-2011,Stanford Dogs,NABirds,and IP102 datasets is 91.92%,93.10%,90.98%,and 76.21%,respectively,with improvements of 1.42,1.50,1.08,and 2.81 percentage points,respectively,compared to the original Vision Transformer model,performing better than other compared fine-grained image classification methods.

外文关键词：

fine-grained image classificationVision Transformerfeature fusionensemble learningmixup augmentation

作者：

张文丽、宋威

展开 >

作者单位：

江南大学人工智能与计算机学院,江苏无锡 214122

江苏省模式识别与计算智能工程实验室(江南大学),江苏无锡 214122

关键词：

细粒度图像分类 Vision Transformer 特征融合集成学习混合增强

出版年：

2024

DOI：

10.3788/LOP240759

激光与光电子学进展

中国科学院上海光学精密机械研究所

激光与光电子学进展

CSTPCD北大核心

影响因子：1.153

ISSN：1006-4125

年,卷(期)：2024.61(22)