遥感场景理解中视觉Transformer的参数高效微调

扫码查看

原文链接

万方数据
维普

中文摘要：随着深度学习和计算机视觉技术的飞速发展,遥感场景分类任务对预训练模型的微调通常需要大量的计算资源.为了减少内存需求和训练成本,该文提出一种名为"多尺度融合适配器微调(MuFA)"的方法,用于遥感模型的微调.MuFA引入了一个多尺度融合模块,将不同下采样倍率的瓶颈模块相融合,并与原始视觉Trans-former 模型并联.在训练过程中,原始视觉Transformer模型的参数被冻结,只有MuFA模块和分类头会进行微调.实验结果表明,MuFA在UCM和NWPU-RESISC45两个遥感场景分类数据集上取得了优异的性能,超越了其他参数高效微调方法.因此,MuFA不仅保持了模型性能,还降低了资源开销,具有广泛的遥感应用前景.

外文标题：Parameter Efficient Fine-tuning of Vision Transformers for Remote Sensing Scene Understanding

外文摘要：With the rapid development of deep learning and computer vision technologies,fine-tuning pre-trained models for remote sensing tasks often requires substantial computational resources.To reduce memory requirements and training costs,a method called"Multi-Fusion Adapter(MuFA)"for fine-tuning remote sensing models is proposed in this paper.MuFA introduces a fusion module that combines bottleneck modules with different down sample rates and connects them in parallel with the original vision Transformer model.During training,the parameters of the original vision Transformer model are frozen,and only the MuFA module and classification head are fine-tuned.Experimental results demonstrate that MuFA achieves superior performance on the UCM and NWPU-RESISC45 remote sensing scene classification datasets,surpassing other parameter efficient fine-tuning methods.Therefore,MuFA not only maintains model performance but also reduces resource overhead,making it highly promising for various remote sensing applications.

外文关键词：

Remote sensingScene classificationParameter efficientDeep learning

作者：

尹文昕、于海琛、刁文辉、孙显、付琨

展开 >

作者单位：

中国科学院空天信息创新研究院北京 100190

中国科学院空天信息创新研究院网络信息体系技术科技创新重点实验室北京 100190

中国科学院大学电子电气与通信工程学院北京 100049

关键词：

遥感图像场景分类参数高效深度学习

出版年：

2024

DOI：

10.11999/JEIT240218

电子与信息学报

中国科学院电子学研究所国家自然科学基金委员会信息科学部

电子与信息学报

CSTPCD北大核心

影响因子：1.302

ISSN：1009-5896

年,卷(期)：2024.46(9)