针对遥感图像普遍存在背景复杂、小目标多、特征提取难等问题,本文提出了一种改进的上下文通道变换器(Contextual and Simple Squeeze-and-excitation Transformer,CoST)风格模块算法.这种设计充分利用相邻Key之间的语义信息,引导注意力矩阵动态学习,同时引入简化版通道注意力(Simple Squeeze-and-excitation,SSE),促进了与当前任务相关的特征图通道的形成,抑制了与当前任务关系不大的特征通道,从而增强了视觉表征的能力.实验在目标检测(YO-LOv5s)框架上进行,在PASCAL VOC和DIOR数据集上进行评估,结果表明:改进后的YOLOv5s在模型在VOC以及在遥感数据集DIOR上平均精度分别提升了2.4%和 1.4%,验证了模块CoST的有效性.
Remote sensing image detection of YOLOv5 with local self-attention
In response to the common problems of complex background,multiple small targets,and difficult feature extrac-tion in remote sensing images,this paper proposes an improved Context and Simple Squeeze and Excitation Transformer(CoST)Transformer style module algorithm.This design makes full use of the semantic information between adjacent keys to guide the dynamic learning of the attention matrix,and at the same time introduces the simplified channel attention(Simple Squeeze-and-excitation,SSE),which promotes the formation of feature map channels related to the current task and inhibits the feature chan-nels that have little to do with the current task,thereby enhancing the ability of visual representation.By performing experiments on the object detection(YOLOv5s)framework and experimental evaluation on PASCAL VOC and DIOR datasets,the average accuracy of the improved YOLOv5s on the model on VOC and the remote sensing dataset DIOR is improved by 2.4%and 1.4%,respectively,which verifies the superiority of the module CoST.
deep learningobject detectionYOLOv5sself-attention mechanism