点云多尺度编码的单阶段3D目标检测网络

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：目的自动引导运输小车(automatic guided vehicles,AGV)在工厂中搬运货物时会沿着规定路线运行,但是在靠近障碍物时只会简单地自动停止,无法感知障碍物的具体位置和大小,为了让AGV小车在复杂的工业场景中检测出各种障碍物,提出了一个点云多尺度编码的单阶段3D目标检测网络(multi-scale encoding for single-stage 3D object detector from point clouds,MSE-SSD).方法首先,该网络通过可学习的前景点下采样模块来对原始点云进行下采样,以精确地分割出前景点.其次,将这些前景点送入多抽象尺度特征提取模块进行处理,该模块能够分离出不同抽象尺度的特征图并对它们进行自适应地融合,以减少特征信息的丢失.然后,从特征图中预测出中心点,通过多距离尺度特征聚合模块将中心点周围的前景点按不同距离尺度进行聚合编码,得到语义特征向量.最后,利用中心点和语义特征向量一起预测包围框.结果 MSE-SSD在自定义数据集中进行实验,多个目标的平均精度(average precision,AP)达到了最优,其中,在困难级别下空AGV分类、简单级别下载货AGV分类比排名第2的IA-SSD(learning highly efficient point-based detectors for 3D LiDAR point clouds)高出 1.27％、0.08％,在简单级别下工人分类比排名第 2 的 SA-SSD(structure aware single-stage 3D object detection from point cloud)高出 0.71％.网络运行在单个RTX 2080Ti GPU上检测速度高达77帧/s,该速度在所有主流网络中排名第2.将训练好的网络部署在AGV小车搭载的开发板TXR上,检测速度达到了 8.6帧/s.结论 MSE-SSD在AGV小车避障检测方面具有较高的精确性和实时性.

外文标题：Multiscale encoding for single-stage 3D object detector from point clouds

外文摘要：Objective In today's industrial environment,large-scale automatic production lines are gradually replacing the traditional manual production mode,and the concept of an intelligent factory has also received increasing attention from several enterprises.Among them,automatic guided vehicles(AGVs)are used to replace manual handling of goods in many modern factories.The factory pastes a QR code every two meters on the path of the AGV operation.The central con-trol system of the factory continuously assigns different meanings to each QR code.When the AGV drives on the road of the factory and covers one of the QR codes,the scanning system at the bottom will read the QR code information to determine whether the next step is to turn,accelerate,lift,or unload heavy objects.When hundreds of AGVs in the workshop are running simultaneously,the central control system of the workshop will plan the most efficient path and then transmit the control information to the AGV as the physical terminal through two-dimensional codes to realize the intelligent transporta-tion of goods in the factory.When an obstacle is in front of the AGV,regardless of whether the object will hinder the nor-mal operation of the AGV,the common solution is to provide the AGV with a control signal to stop it when the sensor in front of the AGV detects an object.When the AGV is in an environment with many people or goods in the factory,the work-ing efficiency of the AGV is substantially reduced due to frequent parking.Therefore,providing the AGV with specific information regarding the obstacles ahead is necessary to effectively conduct subsequent obstacle avoidance.Therefore,a multiscale encoding for single-stage 3D object detectors from point clouds(MSE-SSD)is introduced to help AGV detect various obstacles in complex industrial scenes.Method First,the learnable downsampling module of the foreground points is used to sample the point cloud,and the foreground points are accurately and efficiently obtained from the point cloud.This module can gradually extract the semantic features of the input point cloud through the multiple-layer perceptron opera-tion and quantify the semantic features of the points into the foreground score.The Top-K method then selects the first K points as the front attractions according to the foreground score to filter out the front attractions with rich target information.Second,the point cloud space with only the foreground points is sent to the multi-abstract scale feature extraction module.In this module,the point cloud space is compressed into a bird's-eye view(BEV)after voxelization.During the BEV fea-ture extraction,three abstract scale feature maps are extracted from the convolution layer,and attention is used to adap-tively fuse them to generate the final feature map and reduce the loss of feature information caused by two-dimensional BEV.Despite the complex plant environment,the target information is relatively simple and clear.The three abstract scale feature maps can provide the computer with almost all target semantic information.The final feature map is used to predict the heatmap,which is sent to the next module.The multi-distance scale feature aggregation module then obtains the center point of each target from the heatmap and aggregates the foreground points near each center point in the voxel space.The module quickly obtains the foreground points through a voxel query and groups them according to the different distances between them and the center point.When the probability that the foreground point close to the center point belongs to this target is high,the probability that the foreground point far away belongs to the center point target is low.Therefore,net-works with different weights are used to encode the groups of foreground points to obtain distance-sensitive multiscale semantic features.Finally,the semantic feature and the center point jointly predict the bounding box,where the center point represents the center coordinate of the bounding box and the semantic feature predicts the confidence,size,and deflection angle of the bounding box.Result The official data sets KITTI and Waymo are used to evaluate the performance of the model,and the custom data set is then utilized to evaluate the final combat effect of the model.In the KITTI test set,the nine most popular methods at present are compared.MSE-SSD ranked third in detection speed,and the frames per sec-ond reached 34.Simultaneously,in the comparison of average precision(AP),MSE-SSD and the most advanced single-stage detector at present were almost the same.In the Waymo verification set,compared with other single-stage detectors,the average accuracy of multiple indicators(pedestrians and bicycles)of MSE-SSD for relatively complex targets ranked first.In the customized data set,the following three targets are detected:empty AGV,loaded AGV,and pedestrian.Under the simple level,the AP of MSE-SSD in the cargo AGV and pedestrian targets is 0.08％and 0.71％higher than the second,respectively.At this difficulty level,the AP of MSE-SSD is 1.27％higher than the second in the empty AGV tar-get.Simultaneously,the detection speed of MSE-SSD reached the second level at 65 frame/s.The trained network is deployed on the TXR demoboard carried by the AGV car,and the detection speed reached 7.3 frame/s.Conclusion Con-sidering the transportation problem in the industrial scene,an obstacle avoidance detection method for AGV is introduced based on two point cloud scales.This method has high detection accuracy and speed and provides a detection guarantee for AGV when running on mobile devices.

外文关键词：

3D object detectionsingle-stage detectorpoint cloud down-samplingpoint cloud feature extractionpoint cloud feature aggregation

作者：

韩俊博、胡海洋、李忠金、潘开来、王利红

展开 >

作者单位：

杭州电子科技大学计算机学院,杭州 310018

浙江省脑机协同智能重点实验室,杭州 310018

关键词：

3D目标检测单阶段检测网络点云下采样点云特征提取点云特征聚合

出版年：

2024

DOI：

10.11834/jig.230105

中国图象图形学报

中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心

影响因子：1.111

ISSN：1006-8961

年,卷(期)：2024.29(11)