特征融合与层间传递:一种基于Anchor DETR改进的目标检测方法

Feature fusion and inter-layer transmission:an improved object detection method based on Anchor DETR

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：目标检测是计算机视觉领域中的一项重要任务,旨在从图像或视频中准确识别和定位感兴趣的目标物体.本文提出了一种改进的目标检测算法,通过增加特征融合、优化编码器层间传递方式和设计随机跳跃保持方法,解决一般Transformer模型在目标检测任务中存在的局限性.针对Transformer视觉模型由于计算量限制只应用一层特征,导致目标对象信息感知不足的问题,利用卷积注意力机制实现了多尺度特征的有效融合,提高了对目标的识别和定位能力.通过优化编码器的层间传递方式,使得每层编码器有效地传递和学习更多的信息,减少层间信息的丢失.还针对解码器中间阶段预测优于最终阶段的问题,设计了随机跳跃保持方法,提高了模型的预测准确性和稳定性.实验结果表明,改进方法在目标检测任务中取得了显著的性能提升,在COCO2017 数据集上,模型的平均精度AP达到了 42.3%,小目标的平均精度提高了 2.2%;在PASCAL VOC2007数据集上,模型的平均精度AP提高了 1.4%,小目标的平均精度提高了 2.4%.

外文摘要：Object detection is a crucial task in the field of computer vision,aiming to accurately identify and locate objects of interest in images or videos.An improved object detection algorithm was proposed by incorporating feature fusion,optimizing the inter-layer transmission method of the encoder,and designing a random jump retention method.These improvements addressed the limitations of general Transformer models in object detection tasks.Specifically,to counteract the issue of insufficient object information perception due to the computational constraints limiting Transformer vision models to a single layer of features,a convolutional attention mechanism was utilized to achieve effective multi-scale feature fusion,thereby enhancing the capability of object recognition and localization.By optimizing the transfer mode between encoder layers,each encoder layer effectively transmitted and learned more information,reducing information loss between layers.Additionally,to address the problem where predictions in the intermediate stages of the decoder outperformed those in the final stage,a random jump retention method was designed to improve the model's prediction accuracy and stability.Experimental results demonstrated that the improved method significantly enhanced performance in object detection tasks.On the COCO2017 dataset,the model's AP reached 42.3%,and the AP for small targets improved by 2.2%;on the PASCAL VOC2007 dataset,the model's AP improved by 1.4%,and the AP for small targets improved by 2.4%.

外文关键词：

object detectionfeature fusionTransformerattention mechanismimage processing

作者：

章东平、魏杨悦、何数技、徐云超、胡海苗、黄文君

展开 >

作者单位：

中国计量大学信息工程学院,浙江杭州 310018

北京航空航天大学杭州创新研究院,浙江杭州 310051

浙江中控技术股份有限公司,浙江杭州 310053

关键词：

目标检测特征融合 Transformer 注意力机制图像处理

基金：

浙江省重点研发计划项目浙江省重点研发计划项目浙江省重点研发计划项目浙江省重点研发计划项目

项目编号：

2024C010282024C011082022C010822023C01032

出版年：

2024

DOI：

10.11996/JG.j.2095-302X.2024050968

图学学报

中国图学学会

图学学报

CSTPCD北大核心

影响因子：0.73

ISSN：2095-302X

年,卷(期)：2024.45(5)