融合上下文信息和注意力机制的行人检测算法

扫码查看

原文链接

万方数据
维普

中文摘要：针对复杂交通场景下行人特征信息提取不完整、检测精度不高的问题,提出一种基于YOLOv5网络改进的融合上下文信息和注意力机制的行人检测算法——YOLOv5-STRDC.将Swin Transformer置于骨干网络中,在高效获取全局信息的同时丰富上下文信息.提出融合5个并行空洞卷积和改进卷积块注意模块(Convolutional Block Attention Module,CBAM)注意力机制的空间金字塔池化(Spatial Pyramid Pooling,SPP)模块,输出较大图像范围信息的同时分别从通道和空间维度上增强了特征的融合能力.集成坐标注意力(Coordinate Attention,CA)机制,突出局部重点区域,以得到更准确的特征信息.YOLOv5-STRDC算法在公开的WiderPerson数据集和INRIA数据集上的平均精度均值(mean Average Precision,mAP)分别达到了 71.60％和92.01％,相比YOLOv5模型,分别提升了 1.80％和1.34％,实现了较好的行人检测效果.所提算法的检测速度分别达到了 137.34、114.71帧/秒,满足了实时检测的要求.

外文标题：Pedestrian Detection Algorithms Incorporating Contextual Information and Attention Mechanisms

外文摘要：To address the challenges of incomplete feature extraction and low detection accuracy in complex traffic scenarios,a pedestrian detection algorithm YOLOv5-STRDC based on the YOLOv5 network improved by fusing context information and attention mechanism is proposed.Firstly,the Swin Transformer is placed in the backbone to enrich contextual information while efficiently acquiring global information.Secondly,the Spatial Pyramid Pooling(SPP)module that fuses five parallel null convolutions and improved Convolutional Block Attention Module(CBAM)attention mechanism is proposed,which outputs a larger image range of information while enhancing feature fusion in terms of channel and spatial dimensions,respectively.Finally,the Coordinate Attention(CA)module is integrated to highlight important local regions to extract more accurate feature information.The YOLOv5-STRDC algorithm achieves better pedestrian detection.It achieves a mean Average Precision(mAP)of 71.60％and 92.01％on the publicly available WiderPerson dataset and INRIA dataset,respectively,which is an improvement of 1.80％and 1.34％compared to the YOLOv5 model.Meanwhile,the detection frame rate of the proposed algorithm reaches 137.34 and 114.71 frame/s respectively,which meets the requirement of real-time detection.

外文关键词：

pedestrian detectioncontextual informationnull convolutionfeature pyramidsattentional mechanisms

作者：

荣幸、张志华、冯东东、袁昊

展开 >

作者单位：

兰州交通大学数理学院,甘肃兰州 730070

地理国情监测技术应用国家地方联合工程研究中心,甘肃兰州 730070

甘肃省地理国情监测工程实验室,甘肃兰州 730070

兰州交通大学测绘与地理信息学院,甘肃兰州 730070

展开 >

关键词：

行人检测上下文信息空洞卷积特征金字塔注意力机制

基金：

国家重点研发计划甘肃省自然科学基金

项目编号：

2022YFB390360423JRRA870

出版年：

2024

DOI：

10.3969/j.issn.1003-3106.2024.09.012

无线电工程

中国电子科技集团公司第五十四研究所

无线电工程

影响因子：0.667

ISSN：1003-3106

年,卷(期)：2024.54(9)