无线电工程2024,Vol.54Issue(9) :2152-2161.DOI:10.3969/j.issn.1003-3106.2024.09.012

融合上下文信息和注意力机制的行人检测算法

Pedestrian Detection Algorithms Incorporating Contextual Information and Attention Mechanisms

荣幸 张志华 冯东东 袁昊
无线电工程2024,Vol.54Issue(9) :2152-2161.DOI:10.3969/j.issn.1003-3106.2024.09.012

融合上下文信息和注意力机制的行人检测算法

Pedestrian Detection Algorithms Incorporating Contextual Information and Attention Mechanisms

荣幸 1张志华 2冯东东 2袁昊2
扫码查看

作者信息

  • 1. 兰州交通大学数理学院,甘肃兰州 730070;地理国情监测技术应用国家地方联合工程研究中心,甘肃兰州 730070;甘肃省地理国情监测工程实验室,甘肃兰州 730070
  • 2. 地理国情监测技术应用国家地方联合工程研究中心,甘肃兰州 730070;甘肃省地理国情监测工程实验室,甘肃兰州 730070;兰州交通大学测绘与地理信息学院,甘肃兰州 730070
  • 折叠

摘要

针对复杂交通场景下行人特征信息提取不完整、检测精度不高的问题,提出一种基于YOLOv5网络改进的融合上下文信息和注意力机制的行人检测算法——YOLOv5-STRDC.将Swin Transformer置于骨干网络中,在高效获取全局信息的同时丰富上下文信息.提出融合5个并行空洞卷积和改进卷积块注意模块(Convolutional Block Attention Module,CBAM)注意力机制的空间金字塔池化(Spatial Pyramid Pooling,SPP)模块,输出较大图像范围信息的同时分别从通道和空间维度上增强了特征的融合能力.集成坐标注意力(Coordinate Attention,CA)机制,突出局部重点区域,以得到更准确的特征信息.YOLOv5-STRDC算法在公开的WiderPerson数据集和INRIA数据集上的平均精度均值(mean Average Precision,mAP)分别达到了 71.60%和92.01%,相比YOLOv5模型,分别提升了 1.80%和1.34%,实现了较好的行人检测效果.所提算法的检测速度分别达到了 137.34、114.71帧/秒,满足了实时检测的要求.

Abstract

To address the challenges of incomplete feature extraction and low detection accuracy in complex traffic scenarios,a pedestrian detection algorithm YOLOv5-STRDC based on the YOLOv5 network improved by fusing context information and attention mechanism is proposed.Firstly,the Swin Transformer is placed in the backbone to enrich contextual information while efficiently acquiring global information.Secondly,the Spatial Pyramid Pooling(SPP)module that fuses five parallel null convolutions and improved Convolutional Block Attention Module(CBAM)attention mechanism is proposed,which outputs a larger image range of information while enhancing feature fusion in terms of channel and spatial dimensions,respectively.Finally,the Coordinate Attention(CA)module is integrated to highlight important local regions to extract more accurate feature information.The YOLOv5-STRDC algorithm achieves better pedestrian detection.It achieves a mean Average Precision(mAP)of 71.60%and 92.01%on the publicly available WiderPerson dataset and INRIA dataset,respectively,which is an improvement of 1.80%and 1.34%compared to the YOLOv5 model.Meanwhile,the detection frame rate of the proposed algorithm reaches 137.34 and 114.71 frame/s respectively,which meets the requirement of real-time detection.

关键词

行人检测/上下文信息/空洞卷积/特征金字塔/注意力机制

Key words

pedestrian detection/contextual information/null convolution/feature pyramids/attentional mechanisms

引用本文复制引用

基金项目

国家重点研发计划(2022YFB3903604)

甘肃省自然科学基金(23JRRA870)

出版年

2024
无线电工程
中国电子科技集团公司第五十四研究所

无线电工程

影响因子:0.667
ISSN:1003-3106
段落导航相关论文