一种自监督掩码图像建模的遮挡目标检测方法

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：为提升目标检测网络在更多遮挡场景下的适应性和检测效果,提出了一种自监督掩码图像建模方法,该方法将训练分为2个阶段:预训练阶段和微调阶段.在预训练阶段,采用局部掩码和重建的代理任务对无标签图像进行训练.在微调阶段,针对被遮挡目标尺度变化和不同大小目标的检测问题,提出了基于视觉Transformer(vision transformer,ViT)的金字塔结构.通过在CrowdHuman和CityPersons数据集上进行对比分析,自监督掩码图像建模方法在检测被遮挡目标方面优于其他方法.

外文标题：An occlusion object detection method based on self-supervised mask image modeling

外文摘要：As a fundamental pursuit within computer vision,object detection addresses the challenge of categorizing objects and accurately pinpointing their locations.Nevertheless,the intricacies of real-world scenarios frequently give rise to instances where objects are either partially or entirely obscured,introducing substantial complications for detection models. To bolster the versatility and detection proficiency of object detection networks when confronted with a multitude of occlusion scenarios,this paper introduces an innovative self-supervised approach to image modeling.The new approach is structured into two principal stages:pre-training and fine-tuning.During the pre-training phase,a surrogate task that entails the deliberate use of localized masking is employed,followed by the reconstruction of unlabeled images.This deliberate proxy task equips our model with valuable pre-training experiences,enabling it to acclimate to a spectrum of occlusion patterns and degrees.In the subsequent fine-tuning stage,the intrinsic challenges associated with detecting objects of varying scales and diverse sizes within occluded environments are addressed.A pyramid structure is proposed based on the Visual Transformer (ViT),a state-of-the-art architectural paradigm within computer vision.The ViT-FPN (Vision Transformer Feature Pyramid Network)substantially augments our detector's proficiency in effectively managing a diverse range of occlusion scenarios.The method's performance undergoes rigorous evaluation on benchmark datasets,including CrowdHuman and CityPersons.Our experimental results demonstrates the self-supervised masked image modeling approach presented in this study outperforms other methods in detecting occluded objects.

外文关键词：

object detectionself-supervisedmasked image modelingvision Transformer

作者：

冯欣、胡成杭

展开 >

作者单位：

重庆理工大学计算机科学与工程学院,重庆 400054

关键词：

目标检测自监督局部掩码图像建模视觉 Transformer

基金：

重庆市研究生科研创新项目重庆理工大学研究生教育高质量发展项目

项目编号：

CYS23678gzlcx20233194

出版年：

2024

DOI：

10.3969/j.issn.1674-8425(z).2024.06.023

重庆理工大学学报

重庆理工大学

重庆理工大学学报

CSTPCD北大核心

影响因子：0.567

ISSN：1674-8425

年,卷(期)：2024.38(11)

参考文献量3