融合点云与图像的环境目标检测研究进展

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：在数字仿真技术应用领域,特别是在自动驾驶技术的发展中,目标检测是至关重要的一个环节,它涉及对周围环境中物体的感知,为智能装备的决策和规划提供了关键信息.近年来,随着传感器技术的进步,图像和点云成为两种主要的感知数据源,它们各自在基于深度学习技术的目标检测方法研究中具有独特的优势.为了更加全面地对现有基于点云和图像的目标检测方法进行研究,本文对基于图像、点云及两者联合的3类目标检测算法进行系统的梳理和总结,旨在探索如何将这两种数据源融合起来,促进提高目标检测的准确性、稳定性和鲁棒性,并对融合点云和图像的环境目标检测发展方向进行展望.

外文标题：Survey on the fusion of point clouds and images for environmental object detection

外文摘要：In the field of digital simulation technology applications,especially in the development of autonomous driving,object detection is a crucial component.It involves the perception of objects in the surrounding environment,which pro-vides essential information for the decision-making process and planning of intelligent systems.Traditional object detection methods typically involve steps such as feature extraction,object classification,and position regression on images.How-ever,these methods are limited by manually designed features and the performance of classifiers,which restrict their effec-tiveness in complex scenes and for objects with significant variations.The advent of deep learning technology has led to the widespread adoption of object detection methods based on deep neural networks.Notably,the convolutional neural network(CNN)has emerged as one of the most prominent approaches in this field.By leveraging multiple layers of convolution and pooling operations,CNNs are capable of automatically extracting meaningful feature representations from image data.In addition to image data,light detection and ranging(LiDAR)data play a crucial role in object detection tasks,particularly for 3D object detection.LiDAR data represent objects through a set of unordered and discrete points on their surfaces.Accurately detecting point cloud clusters representing objects and providing their pose estimation from these unordered points is a challenging task.LiDAR data,with their unique characteristics,offer high-precision obstacle detection and dis-tance measurement,which contributes to the perception of surrounding roadways,vehicles,and pedestrian targets.In real-world autonomous driving and related environmental perception scenarios,using a single modality often presents numerous challenges.For instance,while image data can provide a wide variety of high-resolution visual information such as color,texture,and shape,it is susceptible to lighting conditions.In addition,models may struggle to handle occlusions caused by objects obstructing the view due to inherent limitations in camera perspectives.Fortunately,LiDAR exhibits exceptional performance in challenging lighting conditions and excels at accurately spatially locating objects in diverse and harsh weather scenarios.However,it possesses certain limitations.Specifically,the low resolution of LiDAR input data results in sparse point cloud when detecting distant targets.Extracting semantic information from LiDAR data is also more chal-lenging than that from image data.Thus,an increasing number of researchers are emphasizing multimodal environmental object detection.A robust multimodal perception algorithm can offer richer feature information,enhanced adaptability to diverse environments,and improved detection accuracy.Such capabilities empower the perception system to deliver reli-able results across various environmental conditions.Certainly,multimodal object detection algorithms also face certain limitations and pressing challenges that require immediate attention.One challenge is the difficulty in data annotation.Annotating point cloud and image data is relatively complex and time consuming,particularly for large-scale datasets.Moreover,accurately labeling point cloud data is challenging due to their sparsity and the presence of noisy points.Addressing these issues is crucial for further advancements in multimodal object detection.Moreover,the data structure and feature representation of point cloud and image data,as two distinct perception modalities,differ significantly.The current research focus lies in effectively integrating the information from the two modalities and extracting accurate and com-prehensive features that can be utilized effectively.Furthermore,processing large-scale point cloud data are equally chal-lenging.Point cloud data typically encompass a substantial number of 3D coordinates,which necessitates greater demands on computing resources and algorithmic efficiency compared with pure image data.This study aims to summarize and refine existing approaches to facilitate researchers in gaining a deeper and more efficient understanding of object detection algo-rithms that integrate images and point clouds.It classifies object detection algorithms based on multimodal fusion of point clouds,images,and combinations of both.Furthermore,we analyze the strengths and weaknesses of various methods while discussing potential solutions.Moreover,we provide a comprehensive review of the development of object detection algorithms that fuse point clouds and images,with considerations of aspects such as data collection,representation,and model design.Ultimately,we give a perspective on the future development direction of environmental target detection,and the goal is to enhance overall capabilities in autonomous systems.

外文关键词：

point cloudautonomous drivingmultimodalobject detectionfusion

作者：

贾明达、杨金明、孟维亮、郭建伟、张吉光、张晓鹏

展开 >

作者单位：

中国科学院自动化研究所多模态人工智能系统全国重点实验室,北京 100190

中国科学院大学人工智能学院,北京 100049

关键词：

点云自动驾驶多模态目标检测融合

基金：

北京市自然科学基金—丰台轨道交通前沿研究联合基金国家自然科学基金国家自然科学基金国家自然科学基金国家自然科学基金国家自然科学基金国家自然科学基金北京航空航天大学虚拟实现国家重点实验室开放课题

项目编号：

L231013U21A20515623762716217241652175493U22B203462365014VRLAB2023B01

出版年：

2024

DOI：

10.11834/jig.240030

中国图象图形学报

中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心

影响因子：1.111

ISSN：1006-8961

年,卷(期)：2024.29(6)

参考文献量4