用于红外-可见光图像分类的跨模态双流交替交互网络

Cross-modal dual-stream alternating interactive network for infrared-visible image classification

郑宗生 ¹杜嘉 ¹成雨荷 ¹赵泽骋 ¹张月维 ²王绪龙³

扫码查看

作者信息

1. 上海海洋大学信息学院,上海 201306
2. 广州气象卫星地面站,广州 510650
3. 山东省国土空间数据和遥感技术研究院(山东省海域动态监视监测中心),济南 250014
折叠

摘要

多特征模态融合时存在噪声的叠加,而为减小模态间的差异采用的级联方式的结构也未充分利用模态间的特征信息,因此设计一种跨模态双流交替交互网络(DAINet)方法.首先,构建双流交替增强(DAE)模块,以交互双分支形式融合模态特征,并通过学习模态数据的映射关系,以红外-可见光-红外(IR-VIS-IR)和可见光-红外-可见光(VIS-IR-VIS)的双向反馈调节实现模态间噪声的交叉抑制;然后,构建跨模态特征交互(CMFI)模块,并引入残差结构将红外-可见光模态内以及模态间的低层特征和高层特征进行有效融合,从而减小模态间的差异并充分利用模态间的特征信息;最后,在自建红外-可见光多模态台风数据集及RGB-NIR多模态公开场景数据集上进行实验,以验证DAE模块和CMFI模块的有效性.实验结果表明,与简单级联融合方法相比,所提的基于DAINet的特征融合方法在自建台风数据集上的红外模态和可见光模态上的总体分类精度分别提高了6.61和3.93个百分点,G-mean值分别提高了6.24和2.48个百分点,表明所提方法在类别不均衡分类任务上的通用性;所提方法在RGB-NIR数据集上的2种测试模态下的总体分类精度分别提高了13.47和13.90个百分点.同时,所提方法在2个数据集上分别与IFCNN(general Image Fusion framework based on Convolutional Neural Network)和DenseFuse方法进行对比的实验结果表明,所提方法在自建台风数据集上的2种测试模态下的总体分类精度分别提高了9.82、6.02和17.38、1.68个百分点.

Abstract

When multiple feature modalities are fused,there is a superposition of noise,and the cascaded structure used to reduce the differences between modalities does not fully utilize the feature information between modalities.To address these issues,a cross-modal Dual-stream Alternating Interactive Network(DAINet)method was proposed.Firstly,a Dual-stream Alternating Enhancement(DAE)module was constructed to fuse modal features in interactive dual-branch way.And by learning mapping relationships between modalities and employing bidirectional feedback adjustments of InFrared-VISible-InFrared(IR-VIS-IR)and VISible-InfRared-VISible(VIS-IR-VIS),the cross suppression of inter-modal noise was realized.Secondly,a Cross-Modal Feature Interaction(CMFI)module was constructed,and the residual structure was introduced to integrate low-level and high-level features within and between infrared-visible modalities,thereby minimizing differences and maximizing inter-modal feature utilization.Finally,on a self-constructed infrared-visible multi-modal typhoon dataset and a publicly available RGB-NIR multi-modal dataset,the effectiveness of DAE module and CMFI module was verified.Experimental results demonstrate that compared to the simple cascading fusion method on the self-constructed typhoon dataset,the proposed DAINet-based feature fusion method improves the overall classification accuracy by 6.61 and 3.93 percentage points for the infrared and visible modalities,respectively,with G-mean values increased by 6.24 and 2.48 percentage points,respectively.These results highlight the generalizability of the proposed method for class-imbalanced classification tasks.On the RGB-NIR dataset,the proposed method achieves the overall classification accuracy improvements of 13.47 and 13.90 percentage points,respectively,for the two test modalities.At the same time,experimental results of comparing with IFCNN(general Image Fusion framework based on Convolutional Neural Network)and DenseFuse methods demonstrate that the proposed method improves the overall classification accuracy by 9.82,6.02,and 17.38,1.68 percentage points for the two test modalities on the self-constructed typhoon dataset.

关键词

跨模态/深度学习/图像分类/特征学习/双流网络

Key words

cross-modal/deep learning/image classification/feature learning/dual-stream network

引用本文复制引用

出版年

2025

计算机应用

中国科学院成都计算机应用研究所

计算机应用

北大核心

影响因子：0.892

ISSN：1001-9081

段落导航