摘要
针对复杂场景中的密集人群尺度变化、分布不均匀、背景遮挡等问题,提出一种基于多尺度特征融合与背景抑制的MFFBSNet人群计数算法.以视觉几何组网络VGG-16的前13层作为网络前端部分,引入空洞空间卷积池化金字塔(ASPP)和基于轻量级金字塔切分注意力机制(PSA)构建多尺度特征融合模块,以解决密集人群尺度变化问题;在网络的中间部分加入空间注意力机制以及通道注意力机制对特征图进行校准,突出图像人头区域;网络后端部分使用可加大感受野且不丢失图像分辨率的空洞卷积生成背景分割注意力图,抑制图像中背景噪声,提升人群分布密度图的质量.在ShanghaiTech、UCF_CC_50及NWPU-Crowd 3个公开数据集上的实验结果表明,相较于MCNN、SwitchCNN、CSRNet等算法,提出的基于MFFBSNet的人群计数算法的计数准确度较高.
Abstract
Aiming at the problems of scale variation,uneven distribution,and background occlusion of dense crowds in complex scenes,a crowd counting algorithm MFFBSNet based on multi-scale feature fusion and background suppression is proposed.The first 13 layers of the visual geometry group network VGG-16 are utilized as the front-end of the network.An atrous spatial pyramid pooling(ASPP)and a pyramid split attention(PSA)mechanism based on a lightweight design are introduced to construct a multi-scale feature fusion module,which addresses the problem of scale variation in dense crowds;In the middle of this network,spatial and channel attention mechanisms are incorporated to refine the fea-ture maps,highlighting the head regions in the image;The backend of this network employs atrous con-volution,which enlarges the receptive field without losing image resolution,to generate a background segmentation attention map.This suppresses background noise in the image and enhances the quality of the crowd density map.Experimental results on three public datasets,namely ShanghaiTech,UCF_CC_50,and NWPU-Crowd,demonstrate that the proposed crowd counting algorithm based on the MFFBSNet achieves higher counting accuracy compared to methods such as MCNN,SwitchCNN,and CSRNet.