全局跨层交互网络学习细粒度图像特征表示

Global Cross-layer Interaction Networks Learning Fine-grained Images Features Representation

张高义 ¹徐杨 ²曹斌 ²石进¹

扫码查看

作者信息

1. 贵州大学大数据与信息工程学院,贵州贵阳 550025
2. 贵州大学大数据与信息工程学院,贵州贵阳 550025;贵阳铝镁设计研究院有限公司,贵州贵阳 550009
折叠

摘要

细粒度图像分类中的关键任务是提取极具鉴别性的特征.在以往的模型中,往往采用双线性池化技术及其变种来解决这个问题.然而,大多数双线性池化及其变体会忽略层内或层间特征交互,这种不充分的交互易导致鉴别信息丢失或使鉴别信息包含过多冗余信息.针对上述问题,设计一种新的学习细粒度图像特征及特征表示的方法——全局跨层交互(GCI)网络.提出的分层双三次池化方法具有平衡提取鉴别信息和过滤冗余信息能力,并能同时建模层内和层间的特征交互.进一步分析层间交互计算结构,发现易于将交互计算结构与现有的通道注意力机制结合形成交互注意力机制,以提升骨干网络的关键特征提取能力.最后,将交互注意力机制构成的特征提取网络与双三次池化方法融合得到GCI,用来提取鲁棒的细粒度图像特征表示.在3个细粒度基准数据集上进行实验,实验结果表明分层双三次池化实现了分层交互池化框架中最优效果,即在CUB-200-2011、Stanford-Cars、FGVC-Aircraft上分别达到了87.4%、93.2%和92.1%的分类精度,将交互注意力机制融入后分类精度进一步提升至88.5%、95.1%和93.9%.

Abstract

The key task of fine-grained visual categorization is to extract highly discriminative features.In previous models,bi-linear pooling techniques and their variants are often combined to solve this problem.However,most bilinear pooling and its vari-ants ignore intra-layer or inter-layer feature interactions,and such insufficient interactions can easily lead to the loss of discrimi-native information or make the discriminative information contain too much redundant information.Aiming at the above prob-lems,a new method for learning fine-grained image features and feature representations—Global Cross-layer Interaction(GCI)network is designed.The proposed hierarchical bicubic pooling method balances the ability of extracting discriminative informa-tion and filtering redundant information and can simultaneously model the feature interaction within and between layers.The in-teractive computing structure is combined with the existing channel attention mechanism to form an interactive attention mecha-nism to improve the key feature extraction capability of the backbone network.Finally,the feature extraction network composed of interactive attention mechanism is fused with bicubic pooling method to obtain GCI,and robust fine-grained image feature rep-resentation is extracted.Experiments are carried out on three fine-grained benchmark datasets,and the experimental results show that the hierarchical bicubic pooling achieves the best results in the hierarchical interactive pooling framework,namely the classification accuracy of CUB-200-2011,Stanford-Cars and FGVC-Aircraft is 87.4%,93.2%and 92.1%,respectively,and the classification accuracy is further improved to 88.5%,95.1%and 93.9%after the interactive attention mechanism is integrated.

关键词

细粒度图像识别/全局跨层交互网络/分层双三次池化/层内层间特征交互/交互注意力机制

Key words

fine-grained image recognition/global cross-layer interaction networks/hierarchical bicubic pooling/intra and in-ter layer feature interactions/interactive attention mechanism

引用本文复制引用

基金项目

贵州省科技计划(黔合科支撑[2021]一般176)

出版年

2024

计算机与现代化

江西省计算机学会江西省计算技术研究所

计算机与现代化

CSTPCD

影响因子：0.472

ISSN：1006-2475

参考文献量28

段落导航