稀疏卷积在处理激光雷达点云单目标跟踪时的潜力尚未得到充分发掘.目前,绝大多数点云跟踪算法使用基于球邻域的骨干网络,其显存计算资源占用大并且目标感知的关系建模不充分.针对此问题,本文提出一种基于稀疏卷积结构的LiDAR(Lightlaser Detection And Ranging)点云跟踪算法,并创新性地融合了空间点与体素双通道的关系建模模块,以高效适应稀疏框架下目标判别信息的嵌入.首先,本文采用3D稀疏卷积残差网络来分别提取模板和搜索区域的特征,并利用反卷积来获取逐点特征来保证跟踪任务中对空间位置特性的要求.其次,关系建模模块进一步在模板与搜索区域特征之间计算相似度语义查询表.为了捕捉到模板与搜索区域间细粒度的关联性,该模块一方面在空间点通道中利用近邻算法找出每个搜索区域点的模板近邻点,并根据语义查询表提取对应特征;另一方面,在体素通道中以每个搜索区域点为中心构建局部多尺度体素,并根据落入体素单元的模板点索引计算语义查询表中值的累计和.最后,将双通道的特征融合并送入基于鸟瞰图的候选包围盒生成模块来回归目标包围盒.为了验证所提出方法的优越性,本文在KITTI和NuScenes数据集进行了测试,对比其他使用稀疏卷积的算法,本文方法平均成功率和精确率分别提升了11.0%和12.0%.本文方法在继承了稀疏卷积高效特点的同时还实现了跟踪精度的提高.
LiDAR Point Cloud Tracking Method Using Point-Voxel Relationship Modeling Under 3D Sparse Convolutional Framework
The potential of sparse convolution in the field of single target tracking from LiDAR(Lightlaser Detection And Ranging)point cloud has not been fully explored.The vast majority of point cloud tracking algorithms use point-based backbone networks which require higher computation costs and the target-aware relationship modeling is insufficient.To ad-dress this problem,this paper proposes a 3D target tracking algorithm based on a sparse convolutional framework,and incor-porates it with a point-voxel dual channel relationship modeling module to facilitate the embedding of target discrimination information in the such sparse framework.Firstly,this work uses a 3D convolutional residual network to extract the features of the template and search area separately,then uses deconvolution to obtain pointwise features for the spatial position in tracking tasks.Secondly,the relationship modeling module further calculates a semantic similarity query table based on the above features of the template and the search area.In order to capture the fine-grained correlation,on the one hand,the mod-ule utilizes the nearest neighbor algorithm in the spatial point channel to find the template points for each search area point,and extracts corresponding features based on the query table;on the other hand,local multi-scale voxels are constructed with each search area point as the center in the voxel channel,and the accumulated similarity of templates falling into voxel units is used as clues to extract features.Finally,the dual channel feature fusion is sent into the candidate bounding box gen-eration module based on bird's-eye view to estimate the target bounding box.To verify the superiority of the proposed meth-od,we evaluated it on the KITTI and NuScenes datasets,and compared with the baseline algorithm adopting sparse convolu-tion,the mean success and precision rates achieved a considerable improvement of 11.0%and 12.0%.The proposed method not only inherits the efficient characteristics of sparse convolution but also improves tracking accuracy.
point cloud understandingobject trackingmachine visionsparse convolutionfeature fusion