基于多模态特征与增强对齐的细粒度图像分类

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：针对现有模型在多模态信息处理过程中存在特征提取不足、信息交互不充分等问题,提出基于多模态特征增强对齐的细粒度图像分类模型.首先,提出层次特征自适应融合模块,以实现多模态特征的多层次自适应融合,从而充分利用卷积中间层的特征信息,增强模型对图像局部细节的感知能力.其次,为提高多模态特征之间的交互维度,提出增强对齐特征融合模块,以充分挖掘不同模态之间的映射关系.实验结果表明,所提模型在多个数据集上均取得了良好的识别效果,优于以往多模态特征融合的模型.同时,消融实验结果表明,2个模块单独使用的效果均优于原模型,进一步验证了所提模型的有效性.

外文标题：Fine-Grained Image Classification Based on Multi-Modal Features and Enhanced Alignment

外文摘要：Addressing the limitations of existing models in multimodal information processing,such as inadequate feature extraction and insufficient information interaction,a fine-grained image classification model is proposed,incorporating multi-modal features and enhanced alignment.A hierarchical feature adaptive fusion module is proposed to achieve multi-level adaptive fusion of multi-modal features,fully utilizing feature information of the convolutional intermediate layer and enhancing the model' s ability to perceive local details of the image.Additionally,an enhanced aligned feature fusion module is proposed to improve the interaction dimension between multimodal features and make full use of the mapping relationship between different modalities.Experimental results show that the proposed model achieves excellent recognition performance on several public datasets,outperforming previous multimodal feature fusion models.Furthermore,through comparative analysis in ablation experiments,the results of individual modules are better than the original model,highlighting the effectiveness of the proposed model.

外文关键词：

deep learningfine-grained image classificationmultimodaladaptive feature fusionattention mechanism

作者：

韩晶、张天鹏、吕学强

展开 >

作者单位：

北京信息科技大学网络文化与数字传播北京市重点实验室,北京100101

关键词：

深度学习细粒度图像分类多模态自适应特征融合注意力机制

基金：

国家自然科学基金项目北京市自然科学基金项目北京市教委科研计划科技一般项目

项目编号：

621710434232025KM202311232003

出版年：

2024

DOI：

10.13190/j.jbupt.2023-140

北京邮电大学学报

北京邮电大学

北京邮电大学学报

CSTPCD北大核心

影响因子：0.592

ISSN：1007-5321

年,卷(期)：2024.47(4)