基于像素差异度注意力机制的轻量化YOLOv5行人检测算法

扫码查看

原文链接

万方数据
维普

中文摘要：针对实时行人检测场景存在遮挡、形态姿势不同的行人目标,YOLOv5模型对于这些目标检测有明显的漏检问题,提出一种像素差异度注意力机制(pixel difference attention,PDA),不同于传统的通道注意力机制用全局均值池化(global average pooling,GAP)、全局最大值池化(global max pooling,GMP)来概括整张特征图的信息,全局池化将空间压缩成一个值来表征整个通道,造成了空间信息的流失,PDA将空间信息沿高和宽分别压缩,并将其分别与通道信息联系起来做注意力加权操作,同时提出一种新的通道描述指标表征通道信息,增强空间信息与通道信息的交互,使模型更容易关注到综合了空间和通道维度上的特征图的重要信息,在主干网络末端插入PDA后使模型平均精度(mean average precision,mAP)0.5提升了 2.4个百分点,mAP0.5:0.95提升了 4.4个百分点;针对实时检测场景的部署和检测速度要求模型拥有较少的参数量和计算量,因此提出了新的轻量化特征提取模块AC3代替原YOLOv5模型中的C3模块,该模块使插入PDA后的改进模型在精度仅仅损失0.2个百分点的情况下,参数量(parameters,Param.)减少了 20％左右,浮点运算量(giga floating-point operations,GFLOPs)减少了 30％左右.实验结果表明,最终的改进模型比YOLOv5s原模型在VOC行人数据集上mAP0.5提升了 2.2个百分点,mAP0.5:0.95提升了 3.1个百分点,且参数量减少了 20％左右,浮点运算量减少了 30％左右,在GTX1050上的检测速度(frames per second,FPS)提升了 4.

外文标题：Lightweight YOLOv5 Pedestrian Detection Algorithm Based on Pixel Difference Attention

外文摘要：Aiming at pedestrian targets with occluded,different shapes and poses in the real-time pedestrian detection scene,YOLOv5 model has obvious missed detection problems for these targets.A pixel difference attention(PDA)mecha-nism is proposed.Different from the traditional channel attention mechanism,which uses global average pooling(GAP)and global max pooling(GMP)to summarize the information of the entire feature map,global pooling reduces the space into one value to represent the entire channel.PDA compresses the spatial information along the height and width respec-tively,and connects it with channel information respectively to perform attention-weighted operations.Meanwhile,a new channel description index is proposed to represent channel information,enhance the interaction between spatial information and channel information,and make the model more easily focus on the important information that integrates the spatial and channel dimension feature maps.After inserting PDA at the end of the backbone network,mean average precision(mAP)0.5 and mAP0.5:0.95 increased by 2.4 percentage points and 4.4 percentage points respectively.In view of the deployment and detection speed of real-time detection scenarios,the model requires fewer parameters and less computa-tion.Therefore,a new lightweight feature extraction module AC3 is proposed to replace the C3 module in the original YOLOv5 model.This module enables the improved model after insertion of PDA to achieve a precision loss of only 0.2 percentage points.The number of parameters(Param.)is reduced by about 20％,and the number of giga floating-point operations(GFLOPs)is reduced by about 30％.The experimental results show that compared with the original YOLOv5s model,the final improved model has improved mAP0.5 by 2.2 percentage points and MAP0.5:0.95 by 3.1 percentage points on the VOC pedestrian data set,and the number of parameters is reduced by about 20％and the floating point computation is reduced by about 30％.Frames per second(FPS)has been increased by 4 on the GTX1050.

外文关键词：

YOLOv5pedestrian detectionattention mechanismlightweight modelchannel description index

作者：

陈高宇、王晓军、李晓航

展开 >

作者单位：

上海工程技术大学电子电气工程学院,上海 201620

关键词：

YOLOv5 行人检测注意力机制轻量化模型通道描述指标

出版年：

2025

DOI：

10.3778/j.issn.1002-8331.2309-0227

计算机工程与应用

华北计算技术研究所

计算机工程与应用

北大核心

影响因子：0.683

ISSN：1002-8331

年,卷(期)：2025.61(1)