首页|基于多尺度融合和高阶交互的单目3D检测算法

基于多尺度融合和高阶交互的单目3D检测算法

扫码查看
三维目标检测是三维场景理解的一项基础性和挑战性的任务,基于单目视觉的方法可以作为基于立体或基于雷达方法的经济替代。该文提出了一种基于MonoDLE改进的单目3D检测算法,用于优化由尺寸形状与3D位置偏差产生的精度损失。首先,提出了一个通用的多尺度池化注意力模块用于聚合更精细的多尺度特征并且高效地联系上下文信息。其次,为了增强模型的高阶空间交互能力,还提出了由递归门控卷积和分组归一化构成的递归门控卷积块,用于替代基线架构上采样模块的卷积层,有效提升上采样模块的表征能力。在单目3D检测通用数据集KITTI上的实验表明:经过多尺度池化注意力模块提高网络聚合特征的能力后,在3D视角且交并比大于0。7 的标准情况下,该算法的平均检测率指标AP40从13。66 提升到15。10;经过递归门控卷积块增强模型的高阶空间交互能力后,在3D视角且交并比大于0。7 的标准情况下,该算法的平均检测率指标AP40再次从15。10 提升到15。53;在两个模块协同作用下,在鸟瞰图视角且交并比大于0。7 的标准情况下,该算法的平均检测率指标AP40同样从19。33 提升到21。95。
Monocular 3D Detection Algorithm Based on Multi-scale Fusion and High-order Interaction
Due 3D target detection is a basic and challenging task in 3D scene understanding.The method based on monocular vision can be used as an economic alternative based on stereo or radar methods.An improved monocular 3D detection algorithm based on MonoDLE is proposed to optimize the accuracy loss caused by the deviation between size and shape and 3D position.Firstly,a general multi-scale pooled attention module is proposed,which is used to aggregate more fine multi-scale features and efficient context information.Secondly,in order to enhance the high-order spatial interaction ability of the model,a recursive gated convolution block composed of recursive gated convolution and GN regularization is proposed to replace the convolution layer of the sampling module on the baseline architecture and effectively improve the representation ability of the up-sampling module.The experimental results on the monocular 3D detection general data set KITTI show that after the ability of network aggregation is improved by multi-scale pooled attention module,the average detection rate index AP40 of the proposed algorithm is improved from 13.66 to 15.10 under the standard condition of 3D viewing angle and intersection-merge ratio greater than 0.70;after the recursive gated convolutional blocks enhance the high-order spatial interaction ability of the model,the average detection rate index AP40 of the proposed algorithm is increased from 15.10 to 15.53 again in the standard case of 3D viewing angle and intersection-union ratio greater than 0.7;under the synergistic action of the two modules,the average detection rate index AP40 of the proposed algorithm is also improved from 19.33 to 21.95 in the case of aerial view and the intersection ratio is greater than 0.70.

monocular 3D detectioncharacteristic pyramid poolingattention mechanismrecursive gated convolutiongrouping normali-zation

孙延康、王璇之、封澳、谢玉阳、肖建

展开 >

南京邮电大学 集成电路科学与工程学院,江苏 南京 210023

南京邮电大学 电子与光学工程学院、柔性电子(未来技术)学院,江苏 南京 210023

单目3D检测 特征金字塔池化 注意力机制 递归门控卷积 分组归一化

国家自然科学基金项目

61974073

2024

计算机技术与发展
陕西省计算机学会

计算机技术与发展

CSTPCD
影响因子:0.621
ISSN:1673-629X
年,卷(期):2024.34(10)