融合注意力与特征金字塔的小尺度目标检测算法

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据
维普

中文摘要：针对Faster R-CNN算法对于小尺寸目标以及遮挡或截断物体检测能力不足的问题,提出一种融合CBAM注意力机制和特征金字塔结构的改进Faster R-CNN算法.为重点聚焦特征图像局部高效信息,在特征提取网络中融入CBAM机制,减少无效目标的干扰,提升面对遮挡或截断物体的检测能力.引入特征金字塔网络结构,联结高层与底层特征数据,获得高分辨率、强语义数据,从而增强小目标物体的检测效果.为缓解梯度消失现象以及减少超参数规模,使用表达能力较强的倒残差VS-ResNet网络替换VGG16网络,VS-ResNet网络在原有ResNet 50基础上修改了部分层次结构,加入辅助分类器,设计倒残差和组卷积方式,使激活函数信息在高维环境中完整保留,提高检测准确率.采用重置候选框分值计算方法弥补非极大值抑制算法误消除重叠检测框的缺陷.实验结果表明,相比VGG16,VS-ResNet在CIFAR-10数据集上的正确率提高2.97个百分点,该算法在Pascal VOC 2012数据集上的目标检测mAP值为76.2%,比原始Faster R-CNN算法的mAP值提高了13.9个百分点.

外文标题：Small-Scale Object Detection Algorithm Integrating Attention and Feature Pyramids

外文摘要：A modified Faster R-CNN algorithm is proposed to address the problem of poor detection ability for small-scale objects and occluded or truncated objects,combining the CBAM mechanism and feature pyramid structure.To focus on the efficient use of local information in feature images,the CBAM mechanism is integrated into the feature extraction network to reduce the interference of invalid targets and improve the detection ability,notwithstanding occluded or truncated objects.This introduces a Feature Pyramid Network(FPN)structure to connect high-and low-level feature data,obtaining high-resolution and strong semantic data,thereby enhancing the detection effect of small objects.To alleviate the phenomenon of gradient vanishing and reduce the scale of hyperparameters,the commonly used VGG16 network is replaced with a strong expressive ability of the inverse residual VS-ResNet network.VS-ResNet modifies some hierarchical structures based on the original ResNet 50,adds auxiliary classifiers,designs inverse residual and group convolution methods,such that the activation function information is fully preserved in high-dimensional environments,and improves detection accuracy.The reset candidate box score calculation method is used to compensate for the defect of the Non-Maximum Suppression(NMS)algorithm in mistakenly eliminating overlapping detection boxes.The experimental results demonstrate that compared to VGG16,VS-ResNet has a 2.97 percentage points improvement in accuracy on the CIFAR-10 dataset.The target detection mAP value of the proposed algorithm on the Pascal VOC 2012 dataset is 76.2%,which is 13.9 percentage points higher than that of the original Faster R-CNN algorithm.

外文关键词：

deep learningattention mechanismfeature pyramidsmall object detectiontruncated object detection

作者：

圣文顺、余熊峰、林佳燕、陈欣

展开 >

作者单位：

南京工业大学浦江学院,江苏南京 211200

关键词：

深度学习注意力机制特征金字塔小目标检测截断物体检测

基金：

江苏省青蓝工程国家自然科学基金江苏省高校自然科学基金面上项目

项目编号：

苏教师函[2021]11号6157122219KJD520005

出版年：

2024

DOI：

10.19678/j.issn.1000-3428.0066724

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

年,卷(期)：2024.50(1)

参考文献量8