首页|面向3D目标检测的多模态生成式图像数据增强的研究

面向3D目标检测的多模态生成式图像数据增强的研究

扫码查看
针对传统生成式图像数据增强算法丢失3D属性信息,无法应用于自动驾驶领域3D目标检测任务的问题,提出了一种基于稳定扩散模型的多模态图像生成算法,并基于该算法设计了一种面向3D目标检测的数据增强方法.算法通过增加多模态输入进一步约束图像的生成过程.算法设计了一种多模态特征在线生成模块,在线提取场景描述、语义分布和深度特征等信息;同时针对多模态特征融合网络设计了一种增强型门控自注意力模块,精准地捕捉潜在特征空间中的深度信息,从而保留图像的3D属性信息,实现对图像纹理、颜色以及光照等2D特征的针对性修改.基于算法出色的深度保持特性,将新图像与3D伪标签结合,构成新的图像样本,实现对图像样本的数据增强.在nuScenes公开数据集上3D检测结果表明,算法针对公交车、卡车等体积较大类别的3D属性保留效果较好,AP值分别提高了 17.2%和14.1%,同时mAP提高了 6.8%,NDS提高了 3.4%.
A multimodal generative image data enhancement for 3D object detection
The traditional generative image data augmentation algorithms usually lose 3D attribute information,rendering them unsuitable for 3D object detection in autonomous driving.To address the problem,we propose a multimodal image enhancement algorithm based on stable diffusion model.A data augmentation method specifically designed for 3D object detection is developed employing our proposed algorithm.It further constrains the image generation process by introducing more modal inputs.In addition,it has devised a multimodal feature online generation module to extract real-time information such as scene descriptions,semantic distributions,and depth features.Meanwhile,for the multimodal feature fusion network,an enhanced gating self-attention module is designed to accurately capture depth information in the latent feature space.This effectively preserves the 3D attribute information of the image,facilitating targeted modifications to 2D features like texture,color,and illumination.Leveraging the algorithm's exceptional depth-preserving characteristics,the new images are combined with 3D pseudo-labels to create novel image samples,thereby achieving data augmentation for image samples.The 3D detection results on the nuScenes public dataset demonstrate the effectiveness of our algorithm in preserving 3D attributes,particularly for larger categories such as buses and trucks.The AP values exhibit noticeable improvement of 17.2%and 14.1%respectively.Additionally,the indicator of mAP and DNS is increased by 6.8%and 3.4%respectively.

data enhancementstable diffusionimage generationobject detectionfeature fusion

张光钱、周广利、黄飞、刘文兵、向阳开

展开 >

重庆交通大学机电与车辆工程学院,重庆 400074

中国路桥工程有限责任公司,北京 100010

数据增强 稳定扩撒 图像生成 目标检测 特征融合

2024

重庆理工大学学报
重庆理工大学

重庆理工大学学报

CSTPCD北大核心
影响因子:0.567
ISSN:1674-8425
年,卷(期):2024.38(19)